Built · Shelved Case study ~7 min read

A short-form video pipeline that doesn't make every creator sound the same.

RescueViral Creator OS is a subject-agnostic production line for short-form video. Concept to story to scenes to finished video, all generated by AI, with the creator approving at each gate. Fully implemented end-to-end with one anonymous design partner. Currently shelved on cost grounds — the heavy video-generation step still needs the cloud, and the cloud is expensive per video. It comes back when local video-gen models reach the threshold to run on hardware we own.

Built. Proven end-to-end. Held until local video-gen catches up. The pattern is the artefact; the timing is what we're patient about.

Talk to the studio Why local-first matters here

/ Status Concept implemented; on the back burner until local video-gen models become viable on consumer hardware. SaaS productisation when the trigger lands.

Editorial illustration of a creator's editing surface: a long-form video timeline on the left, three short clips stacked on the right, with a thin ember-orange line connecting them — voice preservation visualized as continuity. — A continuity line preserves the creator's voice across formats.

The treadmill, and the slop

A creator who runs a short-form channel is on a cadence treadmill. The platform algorithms reward consistency — three videos a week, every week, beats one video done perfectly — and the manual workflow doesn't survive that schedule. Script the voiceover. Source the B-roll. Hand-burn the captions. Mix the music. Format vertical. Two hours of mechanical labour per video, plus the editorial labour, plus the platform-specific export. The mechanical tail is the tail that breaks the cadence.

The generic AI clipping tools that exist today don't fix this in a way the creator can stand to use. They optimise for "engagement signals" — whatever pattern a recommendation algorithm rewards on average — and they flatten every creator who runs through them into the same algorithm-shaped paste. Same hook patterns. Same caption fonts. Same energy. The audience can detect it within the first two seconds, and the platforms have started detecting it too. The creator who survives the next platform rule change is the creator who hasn't traded their voice for a slightly higher retention curve.

The pain, in order: time, then cadence, then creative dilution. The system is built for all three, with the third taken seriously enough to constrain the first two.

Concept to story to scenes to video

A five-stage pipeline: concept, story, script, scenes, finished video, with three creator approval gates along the way. — Five stages. Three creator gates. One finished video.

Open the app. Pick a category. Click Generate Stories and the language model drafts a batch of story concepts with hook-first openings; the creator reviews, dismisses, or approves. Approve a story and the language model writes the script around it — opener hook, narrative tension, payoff, in the duration band the platform rewards.

Approve the script and the production line takes over. A local TTS engine reads the voiceover. A footage layer assembles the scenes. A captioner transcribes and burns styled captions onto the frames. A small audio mixer drops a royalty-free background track behind the narration with automatic ducking. The assembly step renders the final 9:16 video with a hardcoded disclosure overlay during the first five seconds. A thumbnail generator produces three options. A publish kit drafts a title, a description (with the disclosure text included), and hashtag suggestions. The creator exports the file and uploads manually. The system does not auto-publish.

The pipeline pauses three times for human review — at the story queue, at the script, at the export. The creator's brand is the creator's face. The system does the mechanical work. The creator keeps the editorial control.

Subject-agnostic by design

The pipeline doesn't care what the channel is about. The categories on the dashboard today are illustrative, not load-bearing — the workflow underneath them (concept → story → scenes → video) applies to any topic an LLM can write a hook for and a footage source can supply imagery for. History channels. Fitness channels. Devotional content. Science explainers. Comedy. Business. Anything whose format is "voiceover over relevant imagery with captions" is a channel the system can drive.

Channel identity is configurable. A consistent narrator voice across every video. A channel-style configuration for fonts, colours, lower-third templates. Category-specific prompt templates the creator edits to taste. The defaults are a starting point; the channel they end up with is the one the creator built.

The first user is one anonymous design partner from the creator-economy space. Their channel is one shape; the system's user base, when it returns, will not be one shape. Anyone whose morning includes the same mechanical tail every video is a candidate.

The local-first compromise — and the trigger to come back

The marginal-cost picture: per-video cost on the local fleet versus the cloud video-gen step that keeps the project shelved. — Marginal cost is the lever. The pipeline waits.

The studio's standing position is local-first. Open-weight models, hardware we own, no per-token meter, no vendor whose pricing decides our margin. For most of the pipeline, that's the architecture: the language model that writes the story and the script, the TTS engine that reads the voiceover, the captioner that transcribes the audio, the assembly step that renders the final video — all of it runs on the studio's rig.

Today, one step does not. The video-generation pass — the part that turns scene descriptions into the actual moving frames — runs on a state-of-the-art upscale cloud model, because no consumer-hardware-runnable model produces output good enough to ship. We are honest about this. It is a deliberate compromise on the studio's local-first principle, and it is the reason the project is currently shelved: at cloud-per-video pricing, the unit economics for a SaaS subscription don't work the way the rest of the system does.

The path back is a curve we're patient about. The open-weight video-gen ecosystem is moving fast. When a model good enough to ship runs on hardware in the studio's price range, the cloud step migrates onto the rig and the project comes back. Until then, the system sits ready, the build proven, the pipeline waiting for its bottleneck to fix itself.

Where the project is today

The creator dashboard: a stories queue, a script-review screen, a videos page, and a render meter showing where in the pipeline each video sits. — The creator dashboard with the render meter.

The system is built. The frontend ships six screens against a live backend — a stories dashboard, a script-review screen, a videos page, an analytics dashboard, a content calendar, and a settings page. The database has real run history. The first user has driven the pipeline end-to-end and produced finished videos through it.

The shelf is honest. We're not pretending it's a concept — the build is real, the proof is on disk — and we're not pretending it's live, either. It's paused on cost grounds, which is a different shape of pause from "we couldn't make it work." When the local video-gen model lands, the pipeline reactivates against a SaaS productisation plan with subscription tiers. The competitive value is straightforward: do what the existing creator-AI subscriptions do, at a small fraction of their monthly cost, with the creator's voice intact.

For the broader thinking on why local-first wins on long horizons — and on the workloads where it doesn't, yet — the local-first piece is the one to read. Don't trust the cloud.