An agent skill for Codex and Claude Code that turns any product input — a doc, a webpage, a GitHub repo, or just an idea — into a launch-quality explainer video. One prompt in, polished MP4 out.
A 30-second demo of the skill in action, captured live as it produces a launch video. The user types a prompt; the skill drafts a storyboard, asks for approval, then renders a 1:1 square video with synced music in about 3 minutes.
https://github.com/encircleacity2/bobyte-explainer/blob/main/assets/demos/demo.mp4
In your Codex or Claude Code session, paste one of:
| You have | What you say |
|---|---|
| A product page / docs URL | "Make an explainer video for https://example.com" |
| A Lark / Feishu doc | "Turn this Lark doc into a launch video: [doc URL]" |
| A GitHub repo | "Produce a Shorts video for this GitHub repo: [repo URL]" |
| PDFs + screenshots | "Make an explainer from these files" (attach them) |
| Just an idea | "Make a 30s video about [describe the product]" |
The skill handles everything from intake → storyboard → render → music → delivery. A typical 30-45s video lands in ~/Downloads/ in under 5 minutes at $0–$0.20 in API cost.
The default for product launches. Polished motion + UI + typography + music — no talking head, no portrait photo needed at onboarding. Saves ~80% of cost vs hybrid mode and produces the OpenAI/Apple-style aesthetic modern product videos use.
Phase 1 asks where the video will live and chooses the right aspect ratio + duration sweet spot automatically:
| Channel | Aspect | Sweet-spot duration |
|---|---|---|
| X / LinkedIn / IG feed | 1:1 (1440×1440) | 30–60s |
| TikTok / Reels / Shorts | 9:16 (1080×1920) | 21–34s |
| YouTube / website hero | 16:9 (1920×1080) | 60–180s |
openai-clean— geometric bold sans + lavender liquid + minimalanthropic-warm— warm earth tones + serif italic + editoriallinear-minimal— dark mode + neon accents + technicalapple-keynote— deep black + hero typography + cinematicbrand-bold— high-contrast + oversized type + color-block
Each preset ships a real design.md with full tokens (palette, typography, motion easings, scene recipe). Or paste your own design.md and the skill renders against your brand.
The skill enforces an 8-pattern narrative discipline before any render runs — protagonist with a real name (not "the user"), a small canon of specific entities preserved across frames, a central artifact that echoes across multiple beats, etc. The built-in auditor blocks "screen catalogue" storyboards (a sequence of UI shots with no story) — the #1 reason finished videos don't communicate.
After drafting the storyboard, the skill shows it inline and asks you in your conversation language: Approve / Suggest changes / Stop. No render runs on "looks good 👍". If you suggest changes, the model revises and re-presents — looping until you click approve or stop.
Before render and after, a unified verify.py runs validators for storyboard craft, overlap, assets, launch-grade motion, layout guardrails, zoom logic, camera overflow, render spec, audio levels, and pixel-edge bleed. Auto-fix mechanically repairs the safe subset (cap camera scales, re-encode keyframes, re-mix audio gain, deconflict tracks) and re-verifies up to N iterations.
Every video renders at 60fps --quality high — visibly smoother than the 30fps default most tools settle for. Motion follows a "house style" reference that bans linear easings and codifies entrance / exit / camera curves, layout safety, and target-led zooms so motion craft stays consistent across compositions.
If enabled at onboarding, the skill generates a custom instrumental via Volcengine's music API per video (matching the storyboard's mood), then sidechain-ducks against any voice. Cost: ~$0.20 per track. Disable entirely if you prefer to add music yourself.
For customer overviews, benchmark walkthroughs, and B-roll-first product videos, generate narration independently from Seedance A-roll. This makes timing easier to control, lets you regenerate voice without re-spending video credits, and keeps the final mix clean by rendering voice tracks before adding a continuous music bed.
Provider keys and base URLs can be routed through local api_profiles or an internal proxy. This lets teams switch between BytePlus, Volcengine, ElevenLabs, or a gateway without hardcoding secrets in project files.
For personal-brand videos where you want to appear on camera: hybrid mode generates a Seedance 2.0 AI talking-head from your portrait photo + reference voice clip, then composes it with B-roll. Skip this mode for product launches — it's not needed and adds cost.
Codex:
git clone --depth 1 https://github.com/encircleacity2/bobyte-explainer.git \
~/.codex/skills/explainer-videoClaude Code:
git clone --depth 1 https://github.com/encircleacity2/bobyte-explainer.git \
~/.claude/skills/explainer-videoRestart your agent session — the skill is auto-discovered from SKILL.md.
If you already installed the skill, update it:
git -C ~/.codex/skills/explainer-video pull --ff-onlyThen restart Codex so it reloads the updated SKILL.md.
After installation, a Codex run should not produce silent 30fps slides or static placeholder scenes. A healthy run will:
- Ask/confirm mode, visual identity, and distribution channel.
- Draft a storyboard and wait for explicit approval before rendering.
- Generate real TTS when narration is enabled; BytePlus streaming JSON is decoded into audio before normalization.
- Author real HyperFrames scene HTML/GSAP. The default renderer refuses TODO/stub B-roll scenes instead of quietly rendering a bad video.
- Render at 60fps high quality, run render/audio checks, and inspect sampled frames for overlap, overflow, cramped spacing, and broken typography.
If Codex skips those steps, the skill is stale or the workflow was bypassed. Pull the latest version, restart the agent, and retry from the storyboard stage.
The first time you trigger the skill, it walks you through a one-time setup that writes ~/.explainer-video/config.json (mode 600). Two paths:
- Credential file (fast) — fill in a copy of
credentials.template.mdand paste its path. The template is designed to be emailed to teammates so each person onboards in one step. - Step-by-step — the skill prompts for each item interactively, in your conversation language.
It collects:
- Output folder — where finished videos land (default
~/Downloads) - AI background music — yes / no — if yes, Volcengine music AK / SK
- BytePlus ModelArk + IAM keys — only if you plan to use hybrid mode (avatar A-roll)
- Personal portrait photo + reference video — only if you plan to use hybrid mode
- Standalone TTS provider — optional; useful for B-roll-only narration
- API proxy / provider profiles — optional; useful for team key management
Pure-broll users only need steps 1–2 to start producing videos unless they want standalone TTS.
Once onboarded, each new video walks through 5 phases:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Phase 1 │ → │ Phase 2 │ → │ Phase 3 │ → │ Phase 4 │ → │ Phase 5 │
│ Intake + │ │ Restyle │ │Storyboard│ │Production│ │ Deliver │
│Preflight │ │(skipped) │ │+ Approval│ │ + Verify │ │ MP4 │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
Phase 1 — Intake + Preflight (~30s) The skill parses your input (URL, file, description) into a brief, then asks 3 questions in your conversation language:
- Mode — pure-broll-product-demo (default) / hybrid / aroll-only
- Visual identity — your own
design.md, or one of the 5 built-in presets - Distribution channel — X / TikTok / YouTube / multi-channel
Phase 2 — Portrait restyle (skipped in pure-broll) Only runs in hybrid mode. Seedream 4.5 generates 4 portrait variants (new outfit / setting / lighting); you pick one or skip.
Phase 3 — Storyboard + approval (~1–2 min of model work + your review) The skill drafts the storyboard in 9 mandatory steps (Cast → Canon → Echo → Narrative answers → Frame names → Narration cues → Click chain → arc_map → then segments). Runs the pre-render auditor to catch structural issues. Presents the storyboard inline in your conversation language with 3 options: Approve / Suggest changes / Stop.
Phase 4 — Production (~3–5 min)
- Auto-installs any HyperFrames registry blocks the storyboard references
- Generates the composition HTML from storyboard + chosen style preset
- Renders at 60fps
--quality high - Generates Volcengine music if enabled
- Mixes (sidechain ducking under any voice)
- Runs post-render
verify.py— audio level adjustments auto-applied
Phase 5 — Deliver Saves the MP4 to your configured output folder. Reports duration, size, path. Optionally uploads to Lark with explicit confirmation.
| Time | What's happening |
|---|---|
| 0:00 | You: "Make an explainer video for [URL]" |
| 0:30 | Skill presents preflight questions; you pick mode / style / channel |
| 1:00 | Skill drafts storyboard, runs auditor, presents inline |
| 2:00 | You hit Approve |
| 3:30 | HyperFrames render complete (1440×1440 @ 60fps) |
| 4:30 | Music generated + mixed |
| 5:00 | Final MP4 in ~/Downloads/ |
~/.codex/skills/explainer-video/ or ~/.claude/skills/explainer-video/
├── SKILL.md # 5-phase orchestration spec
├── references/ # 15 reference docs (load on demand)
│ ├── narrative-arc.md # 8 storyline patterns + 5-beat arc (READ FIRST)
│ ├── motion-house-style.md # easings, fps, duration windows
│ ├── storyboard-format.md # full storyboard.json schema
│ ├── style-presets.md # when-to-use guide for the 5 presets
│ ├── channel-aspect-ratios.md # platform × aspect × duration matrix
│ ├── hyperframes-catalog.md # curated registry block subset
│ ├── caption-components.md # registry caption picker
│ ├── screen-script-format.md # in-device screen HTML scripting
│ ├── meta-output-beat.md # opt-in pattern for video products
│ ├── agent-list.md # known AI coding agent brand info
│ ├── seedance-api.md # A-roll API (hybrid mode only)
│ ├── seedream-api.md # Portrait restyle API
│ ├── tts-api.md # Standalone narration providers + timing
│ ├── api-proxy.md # Provider profile / gateway routing
│ ├── duration-planning.md # Content-density duration planning
│ ├── taste-guide.md # OpenAI / Anthropic-inspired restraint
│ ├── volcengine-music-api.md # Music generation API
│ ├── production-techniques.md # Compose / slice / concat / mix
│ └── ...
├── scripts/ # 16 helpers
│ ├── verify.py # unified validator + auto-fix loop
│ ├── audit_storyboard.py # storyboard auditor
│ ├── compose_and_render.py # Phase 4 orchestrator
│ ├── generate_tts.py # standalone TTS generation + normalization
│ ├── plan_duration.py # content-density duration recommendation
│ ├── synthesize_screen_ui.py # LLM-synth in-device screens
│ ├── fetch_registry.py # HyperFrames registry cache
│ └── ...
├── templates/ # reusable recipe scaffolds
│ ├── openai-product-demo.json # canonical pure-broll recipe
│ └── agent-chip-row.html # named-agent opening pattern
├── assets/
│ ├── style-presets/ # 5 built-in design.md preset files
│ ├── hyperframes-template.html # composition scaffold
│ └── macos-window-chrome.html # reusable macOS window UI
└── credentials.template.md # onboarding template (fill + email)
- HyperFrames — the renderer. HTML compositions with GSAP timelines compile to deterministic MP4 via headless Chrome. Free, local, registry of 80+ blocks.
- Seedance 2.0 (BytePlus ModelArk) — A-roll digital human generation (hybrid mode only)
- Seedream 4.5 (BytePlus) — Portrait restyle (hybrid mode only)
- Volcengine music API — AI background music
- ffmpeg — mixing, level adjustment, slicing, post-render verification
- Node 18+ and npm (HyperFrames renderer)
- Python 3.11+ (audit + verify scripts)
- ffmpeg (compose, mux, audio level adjustments)
- Chrome / Chromium (headless render — managed by HyperFrames automatically)
Optional (only for specific features):
- BytePlus ModelArk + IAM keys — hybrid / aroll-only modes
- Volcengine music keys — AI background music
lark-cli— Lark/Feishu doc ingestion or upload- Anthropic API key —
synthesize_screen_ui.pyfor in-device UI synthesis (has no-LLM fallback)
Python packages installed as needed:
pip install --user requests Pillow volcengine anthropic librosaThe skill runs scripts/verify.py automatically at two points:
- Pre-render — after composition generation, before the (slow) render. Catches storyboard issues + asset issues + lint errors. Auto-fix repairs safe issues (cap camera scales, deconflict tracks, etc.) and re-runs.
- Post-render — after the MP4 is produced. Pixel-based overflow detection, render-spec match (resolution/fps/duration vs declared), audio levels in mode-target range, no clipping. Severe findings mean the MP4 is not delivery-ready unless explicitly forced as a draft.
Severe issues block delivery; warnings surface in the report but don't auto-block.
| Layer | How |
|---|---|
| Brand | Pick a built-in preset, or paste your own design.md |
| Aspect ratio | Phase 1 question — picks defaults per channel |
| Mode | pure-broll-product-demo / aroll-broll-hybrid / aroll-only |
| Storyboard | Drafted by the model; you approve / change / stop |
| Length | Recommended per content profile + channel sweet spot |
| Recipe | Drop a new templates/<name>.json to add a recipe; auditor enforces the schema |
| Adding a new design preset | assets/style-presets/<name>/design.md |
| Adding a known agent | Append to references/agent-list.md (brand color + logo hint) |
CHANGELOG.md— full per-PR history of changesSMOKE_TEST.md— per-feature test commands
MIT
