# OpenMontage - Agent Guide Start here. This is the complete operating guide and agent contract for OpenMontage. For architecture, key files, and conventions see [`PROJECT_CONTEXT.md`](PROJECT_CONTEXT.md). ## First Interaction — Onboarding When the user's first message is vague, exploratory, or asks what you can do ("make me a video", "what can you do?", "help me create something", "I want to make content"), read the onboarding skill **before** doing anything else: **Read:** `skills/meta/onboarding.md` This skill teaches you to run discovery, classify the user's setup, present capabilities in plain language, and offer starter prompts tailored to their available tools. The goal: get the user from "curious" to "making a video" in under 60 seconds. **Skip onboarding** when the user arrives with a specific, actionable request (e.g., "Make a 60-second explainer about black holes"). Go directly to Rule Zero. ## Reference Video Entry Point When the user provides a **video URL or local video file as inspiration** — for example: - "Can you make a video like this?" - "I love this YouTube Short. Make me something similar." - "Use this Reel as a reference." — do **not** treat this as a generic web-search or prompt-writing request. This is a first-class workflow in OpenMontage. ### Required behavior 1. **Read:** `skills/meta/video-reference-analyst.md` 2. **Run the reference analysis workflow** using the local analysis tools (`video_analyzer`, transcript extraction, scene detection, frame sampling) 3. **Produce a grounded summary** of what the reference is doing: - content - pacing - structure - style - what makes it work 4. **Then** run normal capability audit and pipeline selection 5. Present **2-3 differentiated concepts** for the user's version — not a carbon copy ### Important distinction - **Reference-driven request:** "make me something like this" -> use `video-reference-analyst.md` - **Source-footage request:** "edit this footage" / "cut this into clips" -> use `source_media_review` and the appropriate footage-led pipeline If a model misses this distinction, it will often fall back to plain search + guesswork. That is incorrect for OpenMontage. ## Rule Zero — All Production Goes Through a Pipeline **Every video production request MUST go through the pipeline system. No exceptions.** When the user asks to make, create, produce, or generate any video content — a trailer, explainer, clip, animation, or any other video — the agent must: 1. **Identify the pipeline.** Match the request to one of the pipelines in `pipeline_defs/`. If unclear, ask the user. 2. **Read the pipeline manifest.** `pipeline_defs/.yaml` — know the stages, tools, and quality gates. 3. **Run preflight.** Discover available tools via the registry. Present the capability menu. 4. **Execute stage by stage.** For EACH stage, read the stage director skill (`skills/pipelines//-director.md`) BEFORE doing any work in that stage. 5. **Read Layer 3 skills before calling tools.** Before using any tool with an `agent_skills` field, read the referenced skill in `.agents/skills/`. These contain provider-specific prompting guidance, parameter optimization, and quality techniques that dramatically improve output. **Do NOT:** - Write ad-hoc Python scripts to call tools directly - Skip the pipeline and go straight to API calls - Generate assets without reading the stage director skill first - Use a tool without checking its Layer 3 skill for prompting guidance - Bypass preflight, checkpoints, or review The intelligence is in the skills, not in improvised code. An agent that reads the director skills and Layer 3 knowledge will produce significantly better output than one that calls tools directly with generic prompts. ## What OpenMontage Is OpenMontage is an instruction-driven video production system. The AI agent IS the intelligence — it reads instructions (pipeline manifests + stage director skills + meta skills) and drives the pipeline using tools. ``` Agent reads pipeline manifest (YAML) -> reads stage director skill (MD) -> uses tools (Python BaseTool subclasses) -> self-reviews (meta skill) -> checkpoints (Python utility) -> presents to human for approval ``` **Python = tools + persistence.** No orchestration logic, creative decisions, review logic, or checkpoint policy in Python code. The agent makes those decisions guided by instructions. Core loop: 1. Select a pipeline. 2. Run preflight. 3. Discover real tools from the registry. 4. Present the user with concepts, tool plan, production plan, and cost. 5. Execute stage by stage with checkpoints. ## Decision Communication Contract For any meaningful production decision, the agent must communicate the decision before acting. The user should never have to infer which provider, model, or render path was chosen after the fact. ### Announce Before Execution Before any paid or consequential generation call, state: - the exact tool name, - the provider, - the model or provider variant, - the reason it was chosen, - whether it is a sample or a batch run. ### Ask Before Major Changes The agent must ask the user before changing any major production choice, including: - switching provider, - switching model family or provider variant, - switching from video-led to still-led treatment, - switching composition engine when that changes the output character, - dropping narration, music, or other approved creative elements, - changing from sample mode to batch mode. Minor prompt refinements inside an already approved provider/model path do not require separate approval unless they materially change the creative direction. ### Present Both Composition Runtimes (HARD RULE) When both Remotion and HyperFrames are available on the machine (check `video_compose.get_info()["render_engines"]`), the agent **MUST present both options to the user** before locking `render_runtime` at the proposal stage. The agent MAY recommend one with rationale — but silently picking a "default" is forbidden even when the pipeline manifest or a director skill suggests one. The presentation MUST include, for each runtime: 1. A one-sentence plain-language description of what it is best at for **this specific brief**. 2. A one-sentence honest tradeoff (why it might not be the right pick here). 3. The agent's recommendation and the reason, tied to the brief's delivery_promise and visual approach. Then wait for explicit user approval before advancing. Record the full shortlist — BOTH runtimes plus any "ffmpeg" option that applies — as `options_considered` in the `render_runtime_selection` decision logged in `decision_log`. A decision log entry with only one runtime considered when both were available is a CRITICAL reviewer finding. Exception: if only one runtime is available on the machine, the agent proceeds with it but MUST say so explicitly ("HyperFrames isn't installed on this machine; I'm proceeding with Remotion. Install HyperFrames if you want the alternative."). The `render_runtime_selection` decision still records the unavailable option as `rejected_because: "runtime not available on this machine"`. This rule applies to every pipeline that invokes `video_compose` — not just Wave 1. A pipeline's director skill may recommend a runtime, but that recommendation is input to the conversation with the user, not a decision. ### Escalate Blockers Explicitly When a blocker occurs, the agent must surface it immediately using this structure: 1. What was attempted 2. What failed 3. Whether the issue is auth, provider access, tool bug, or prompt/design quality 4. What options exist next 5. Which option the agent recommends, with reasoning Do not continue with a substitute path until the user approves. ### Recommendation Style When asking the user to choose, do not just list options. The agent should: - provide the shortlist, - explain the tradeoffs briefly, - recommend one option, - wait for approval before proceeding. ### No Unilateral Substitutions If the approved path is blocked, the agent may investigate and prepare alternatives, but may not execute those alternatives without user approval. This applies especially to: - provider swaps, - model swaps, - fallback tools, - prompt-only substitutes for reference-driven generation, - still-image animatics in place of true motion. ## Orchestrator The agent itself orchestrates the production state machine: `research -> proposal -> script -> scene_plan -> assets -> edit -> compose` The agent: 1. Reads the pipeline manifest (`pipeline_defs/*.yaml`) to know the process 2. Calls `checkpoint.get_next_stage()` to find where to resume 3. Reads the stage's director skill (`skills/pipelines//-director.md`) to know HOW 4. Uses tools (`tools/`) for concrete capabilities 5. Self-reviews using the reviewer meta skill (`skills/meta/reviewer.md`) 6. Checkpoints via the checkpoint protocol (`skills/meta/checkpoint-protocol.md`) 7. Presents to human for approval when `human_approval_default: true` Infrastructure files: - `lib/checkpoint.py` — read/write checkpoints, stage validation - `tools/cost_tracker.py` — budget governance - `lib/pipeline_loader.py` — manifest loading and helpers ## Project Directory Convention Every production run creates a project workspace under `projects/`. This directory is gitignored — all generated assets are regenerable. ``` projects// ├── artifacts/ # JSON artifacts from each stage (research_brief, script, scene_plan, etc.) ├── assets/ │ ├── images/ # Generated images (PNG) │ ├── video/ # Generated video clips (MP4) │ ├── audio/ # Narration segments + final mix (MP3/WAV) │ ├── music/ # Background music track (MP3) │ └── subtitles.srt # Generated subtitles └── renders/ └── final.mp4 # Final rendered video (the deliverable) ``` **Naming convention**: Use kebab-case derived from the video title (e.g., `hidden-math-of-nature`, `how-music-rewires-brain`). Create the project directory at pipeline initialization, before any stage runs. All tools and agents should write outputs to these paths — never to the repo root or ad-hoc locations. ## Music Library Users can place royalty-free music tracks in `music_library/` (gitignored). The asset director will check this folder before falling back to API-based music generation. ``` music_library/ ├── ambient_track.mp3 ├── cinematic_epic.mp3 └── ... ``` If the folder has tracks, the proposal and asset stages should present them as options alongside generated music. See the proposal-director and asset-director skills for details. ## Available Pipelines | Pipeline | Best For | Stability | |----------|----------|-----------| | `animated-explainer` | Topic to fully generated explainer | production | | `talking-head` | Footage-led speaker videos | beta | | `screen-demo` | Screen recordings and walkthroughs | production | | `clip-factory` | Many clips from one long source | beta | | `podcast-repurpose` | Podcast highlights and derivatives | beta | | `cinematic` | Trailer, teaser, and mood-led edits | production | | `animation` | Motion-graphics and animation-first videos | production | | `character-animation` | Local rigged cartoon characters and reusable character acting | beta | | `hybrid` | Source footage plus support visuals | production | | `avatar-spokesperson` | Presenter-led avatar or lip-sync videos | production | | `localization-dub` | Subtitle, dub, and translated variants | beta | | `framework-smoke` | Test: minimal 2-stage smoke test | test | > **Beta pipelines** have not been fully audited. They work, but expect rough edges. Mention this when the user selects one. ## Mandatory Preflight Do this before any creative work. **Use `provider_menu_summary()` first — it's the human-ready rollup.** The raw `support_envelope()` dump is a firehose (megabytes of JSON on a well-configured machine); pasting it into chat will bury the user. ```bash python -c " from tools.tool_registry import registry import json registry.discover() print(json.dumps(registry.provider_menu_summary(), indent=2)) " ``` The summary returns four fields the agent should translate into plain language: - `composition_runtimes` — booleans for `ffmpeg`, `remotion`, `hyperframes`. This is the source of truth for the "Present Both Composition Runtimes (HARD RULE)" check. - `capabilities[]` — one entry per capability family with `configured / total` counts and provider lists. Ready-made for the "N of M configured" menu. - `setup_offers[]` — unavailable tools whose install is a 1-minute env-var fix. Lead with these when offering upgrades. - `runtime_warnings[]` — specific signals like "hyperframes: npm package not resolvable". Surface these to the user verbatim — they're the kind of silent-failure bugs that break the governance contract. Then, for deeper inspection (only when the summary isn't enough): ```bash # Full menu — grouped available/unavailable per capability. python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_menu(), indent=2))" # Raw envelope — every tool's full contract. Slow/firehose; use for debugging only. python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.support_envelope(), indent=2))" ``` Then: 1. Read the selected manifest in `pipeline_defs/`. 2. Check every `required_tools` entry against the registry. 3. Check `fallback_tools` for unavailable tools. 4. Report one of: `passed`, `degraded`, or `blocked`. 5. Do not start production until the user understands the real capability envelope. ### Provider Menu (Mandatory at Preflight) Already fetched via `provider_menu_summary()` above. Read that output and **present it to the user as a capability menu**, not as a flat tool list. Use `provider_menu()` directly only when you need the per-tool detail the summary collapses. **How to present:** ``` YOUR CAPABILITIES Video Generation: 0/13 configured Image Generation: 1/7 configured Text-to-Speech: 1/3 configured Music Generation: 1/1 configured Composition: 3/3 configured (FFmpeg, video_stitch, video_trimmer) You can produce videos now with images + TTS + FFmpeg. Quick upgrades available — see below. ``` For EACH capability with unavailable providers, read the `install_instructions` field from the menu output and present setup options grouped by effort: ``` QUICK SETUP OPTIONS (1-minute each — set an env var in .env) Video Generation (0/13 -> unlock the biggest upgrade): Each unavailable provider lists its own install_instructions. Read them from the provider_menu output and present grouped by env var. Example: if 3 tools need FAL_KEY, group them: "FAL_KEY unlocks 3 providers" Image Generation (1/7 -> more style options): Same pattern — read install_instructions from each unavailable tool. Text-to-Speech (1/3): Same pattern. LOCAL OPTIONS (free, needs hardware): Tools with runtime=LOCAL or runtime=LOCAL_GPU — read from the menu. Already Available: List what's working. The user should feel good about what they have. ``` **Rules:** - Do NOT hardcode provider names, API key names, or setup URLs in your prompts. Read them from the registry's `install_instructions` field on each tool. - Always show the ratio: "X of Y configured" — this makes breadth visible. - Group by capability, not by individual tool. - Show what they CAN do now, then what they COULD unlock. - If the user declines setup, proceed with the best available path — no nagging. - If a tool shares an env var with others, group them (read from `dependencies` field). ### Setup Offer Protocol When tools are `UNAVAILABLE` but can be fixed with simple configuration, **offer the user setup help instead of silently working around the limitation.** Many tools are one env var away from working. | Fix Complexity | Action | |----------------|--------| | **1-minute fix** (env var) | Offer to help configure now — read `install_instructions` from the tool | | **5-minute fix** (install) | Explain what to install and why — read `install_instructions` from the tool | | **Complex fix** (GPU, model download) | Note the limitation, explain what it would unlock, move on | **Rules:** - Always tell the user what they're missing AND what they'd gain - Show the cost difference (free local vs. paid API) - If the user declines setup, proceed with the best available path — no nagging - Group related fixes (tools sharing the same env var dependency) ### Composition Runtimes (Inside video_compose) `video_compose` has **three** render engines / runtimes. They are parallel, not ranked — the choice is made at proposal and locked in `edit_decisions.render_runtime`. Check which are available: ```bash python -c " from tools.tool_registry import registry registry.discover() info = registry._tools['video_compose'].get_info() print('Render engines:', info.get('render_engines')) print('Remotion note:', info.get('remotion_note')) print('HyperFrames note:', info.get('hyperframes_note')) " ``` | Engine | Used For | Requires | |--------|----------|----------| | **FFmpeg** | Video-only cuts, concat, trim, subtitle burn | `ffmpeg` binary (always available) | | **Remotion** | React-based composition: still images → animated video, text cards, stat cards, charts, callouts, comparisons, transitions with spring physics, word-level caption burn, TalkingHead avatar | Node.js (`npx`) + `remotion-composer/` + `node_modules` | | **HyperFrames** | HTML/CSS/GSAP composition: kinetic typography, product promos, launch reels, website-to-video, registry-block-driven scenes, SVG character rigs | Node.js ≥ 22 + FFmpeg + `npx` (consumed via `npx hyperframes`) | `render_runtime` is **locked at proposal** (`proposal_packet.production_plan.render_runtime`) and **carried through edit_decisions unchanged**. `video_compose` routes based on this field; silent runtime swaps are forbidden. If the chosen runtime becomes unavailable at compose time, surface a structured blocker per "Escalate Blockers Explicitly" above. See `skills/core/hyperframes.md` for the Remotion-vs-HyperFrames decision matrix. ### Critical Rule: Motion-Required Requests For any request where the deliverable inherently depends on motion rather than static coverage, treat motion as a hard requirement. Examples: - sci-fi trailers, - cinematic teasers built from generated clips, - hype edits, - avatar or agent videos, - any brief whose promise depends on moving shots rather than still frames. For these requests: - The `render_runtime` chosen at proposal (Remotion, HyperFrames, or FFmpeg) must be confirmed available up front if the planned visual treatment depends on it. - Still-image fallback is forbidden. Do not quietly convert the job into a Ken Burns teaser, animatic, or slide-based video. - FFmpeg-only fallback is forbidden when it changes the approved deliverable from motion-led video to still-led video. - **Silent runtime swap is forbidden.** If `render_runtime="hyperframes"` was locked and HyperFrames is unavailable, do NOT route to Remotion instead. Surface the blocker, propose options, get user approval, log a `render_runtime_selection` decision — then proceed. - Bubble critical issues immediately. If the chosen runtime is unavailable, fails to render, or provider clip generation fails in a way that blocks the approved treatment, stop and tell the user before proceeding. - Do not spend more tokens or time on downgraded output unless the user explicitly approves the downgrade as an animatic or proof-of-concept. **When Remotion is available**, the agent should design production plans around it: - Explainer videos with `flat-motion-graphics` playbook -> Remotion animated scenes, not Ken Burns - Data-driven videos -> Remotion stat cards and charts, not static image screenshots - Any pipeline using still images -> Remotion spring animations, not FFmpeg pan-and-zoom - **Screen demos of a CLI/terminal/install flow -> `TerminalScene` (synthetic screen recording), not OS-level capture.** See `.agents/skills/synthetic-screen-recording/SKILL.md`. Faster, deterministic, privacy-safe. Use real capture (`screen_recorder`, `cap_recorder`, `playwright-recording`) only when the demo is a real app UI or requires unpredictable live behavior. ### Remotion scene types available in `remotion-composer/` See `remotion-composer/SCENE_TYPES.md` for the authoritative list and their cut schemas. Current scene types usable via `cut.type`: `text_card`, `stat_card`, `callout`, `comparison`, `hero_title`, `terminal_scene`, `anime_scene`, `bar_chart`, `line_chart`, `pie_chart`, `kpi_grid`, `progress_bar`. Overlay types include `section_title`, `stat_reveal`, `hero_title`, `provider_chip`. **When Remotion is NOT available** and `render_runtime="remotion"` was NOT locked, `video_compose` may use FFmpeg Ken Burns motion on still images. This still works but produces less engaging visuals. Mention this tradeoff in the proposal. When `render_runtime="remotion"` IS locked and Remotion is unavailable, that's a blocker — escalate, don't silently swap. When `render_runtime="hyperframes"` is locked and HyperFrames is unavailable (Node < 22, missing `ffmpeg`/`npx`, or `hyperframes doctor` reports issues), that's also a blocker. Do not substitute Remotion or FFmpeg without user approval + a logged `render_runtime_selection` decision. Routing is automatic — `video_compose` reads `edit_decisions.render_runtime` and dispatches to the matching engine (`_render_via_hyperframes`, `_remotion_render`, or `_render_via_ffmpeg`). But the **agent must know both Remotion and HyperFrames exist at proposal time** so it can design the visual approach intentionally. Don't default to Remotion for motion-graphics-heavy concepts that HTML/GSAP would express more naturally, and don't default to HyperFrames for briefs that reuse the existing React scene stack. ## Capability Discovery OpenMontage uses two layers for capability choice: - selector tools: capability-level routing such as `tts_selector` and `video_selector` - provider tools: concrete tools discovered via the registry that call a specific backend Always inspect the registry first: ```bash python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.capability_catalog(), indent=2))" python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_catalog(), indent=2))" ``` For finalist tools inspect: - `capability` - `provider` - `usage_location` - `supports` - `fallback_tools` - `related_skills` Do not rely on memory or old docs when the registry can answer it. ## Tool Families **Do not maintain hardcoded tool lists.** Always query the registry at runtime: ```bash # See all tools grouped by capability (TTS, video_generation, image_generation, etc.) python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.capability_catalog(), indent=2))" # See all tools grouped by provider (elevenlabs, openai, ffmpeg, etc.) python -c "from tools.tool_registry import registry; import json; registry.discover(); print(json.dumps(registry.provider_catalog(), indent=2))" ``` Key capability families to look for in the output: - **tts** — Text-to-speech providers. Route via `tts_selector`. - **video_generation** — Video generation providers (cloud, local GPU, stock). Route via `video_selector`. - **image_generation** — Image generation providers (cloud, local GPU, stock). Route via `image_selector`. - **music_generation** — Music and sound effect generation. - **video_post** — Composition, stitching, trimming (FFmpeg-based, always local). - **audio_processing** — Mixing, enhancement (FFmpeg-based, always local). - **analysis** — Transcription, scene detection, frame sampling. - **avatar** — Talking head and lip sync generation. - **character_animation** — Local character specs, SVG rigs, pose libraries, action timelines, previews, and QA. - **enhancement** — Upscale, background removal, face enhance, color grading. Each tool in the registry declares `best_for`, `install_instructions`, `runtime` (LOCAL, API, LOCAL_GPU, HYBRID), and `status`. Read these fields — do not assume tool strengths from memory. ### Tool Class Naming Convention All tool classes use **PascalCase without a "Tool" suffix**. When importing tools in Python: | Module | Class Name | NOT | |--------|-----------|-----| | `tools.audio.music_gen` | `MusicGen` | ~~MusicGenTool~~ | | `tools.video.video_compose` | `VideoCompose` | ~~VideoComposeTool~~ | | `tools.audio.audio_mixer` | `AudioMixer` | ~~AudioMixerTool~~ | | `tools.tts.elevenlabs_tts` | `ElevenLabsTTS` | ~~ElevenLabsTTSTool~~ | | `tools.analysis.transcriber` | `Transcriber` | ~~TranscriberTool~~ | | `tools.subtitle.subtitle_gen` | `SubtitleGen` | ~~SubtitleGenTool~~ | When in doubt, check: `grep "^class " tools/.py` All tools call via `.execute(params_dict)` (returns `ToolResult` with `.success`, `.data`, `.error`), NOT `.run()`. ### Selector Pattern Three selector tools abstract multi-provider capabilities. **Selectors auto-discover providers from the registry.** Adding a new provider tool automatically makes it available through the selector — no selector code changes needed. | Selector | Routes to | How it discovers | |----------|-----------|-----------------| | `tts_selector` | All tools with `capability="tts"` (ElevenLabs, Google TTS, OpenAI, Piper) | `registry.get_by_capability("tts")` | | `image_selector` | All tools with `capability="image_generation"` (FLUX, Google Imagen, DALL-E, Recraft, etc.) | `registry.get_by_capability("image_generation")` | | `video_selector` | All tools with `capability="video_generation"` | `registry.get_by_capability("video_generation")` | Selectors route based on: user preference > availability > discovery order. They adapt input schemas between providers transparently. ## User-Facing Planning Protocol Before committing to execution, present: 1. `4-5` concept directions when the brief is still open. 2. Recommended pipeline. 3. Recommended tool path. 4. Alternative tool paths that are actually available. 5. Cost estimate and quality tradeoffs. 6. **Music plan** — mandatory for every pipeline that has audio. See below. 7. Production plan by stage. 8. Approval gate before asset generation. If a user prefers a specific vendor and that tool is available, surface it directly. Do not hide provider choice. ### Music Plan (Mandatory) Music is a critical part of any video. **Surface the music situation to the user at proposal/idea time** — do not silently defer it to the asset stage where a failure becomes expensive. Check music availability in this order and present the options: 1. **User music library (`music_library/`):** Check if this folder exists and contains tracks. If so, list available tracks with durations and let the user pick one. 2. **Music generation APIs:** Check which music tools are available via the registry (`registry.get_by_capability("music_generation")`). Report their status honestly — include quota status if known. 3. **Royalty-free sources:** Note if the user can provide their own track (e.g., from YouTube Audio Library, Jamendo, or other free sources). Offer the `music_library/` drop path. **Always present the user with explicit choices:** - Use a track from their library (which one?) - Provide a different track (drop it in `music_library/`) - Generate one via API (if available — name the provider and cost) - Proceed without music **If no music source is available:** Tell the user explicitly. Do NOT let this surface as a surprise at the asset stage. Record the music decision in the proposal/brief artifact so the asset director knows what to do. ## Pipeline Asset Expectations Each pipeline manifest's `tools_available` field declares what tools a stage can use. Use selectors for multi-provider capabilities — the selector handles routing to whatever is available. Read the pipeline manifest for the authoritative list per stage. ## Stage Agents Each stage produces one canonical artifact that becomes the contract for the next stage. The stage director skill teaches the agent HOW to produce it. | Stage | Director Skill | Canonical output | Core quality bar | |------|---------------|------------------|------------------| | `idea` | `*-director.md` | `brief` | Clear hook, target platform, duration, tone, and user intent | | `script` | `*-director.md` | `script` | Structured sections, valid timing, coherent narration | | `scene_plan` | `*-director.md` | `scene_plan` | Ordered scenes, timings, asset requirements | | `assets` | `*-director.md` | `asset_manifest` | Provenance, paths, model/tool metadata, scene linkage | | `edit` | `*-director.md` | `edit_decisions` | Concrete cuts, overlays, subtitle/music decisions | | `compose` | `*-director.md` | `render_report` | Output paths, encoding profile, verification notes | Stage contract rules: - A completed or awaiting-human checkpoint must include the stage's canonical artifact. - Canonical artifacts must validate against the JSON schema in `schemas/artifacts/`. - Non-canonical outputs such as media files belong in stage-specific directories. - Tools should record seeds/model versions for reproducibility. ## Reviewer Protocol The reviewer is a meta skill (`skills/meta/reviewer.md`) — advisory, never directly blocks progression. - Self-review after every stage execution, before checkpointing. - Load `review_focus` items from the pipeline manifest for the current stage. - Maximum two review rounds. After that, pass with warnings and move on. - Findings categorized: critical (must fix), suggestion (should fix), nitpick (nice-to-have). - Critical findings -> fix and re-review. Suggestions -> note and proceed. - Check playbook `quality_rules` as constraints, not suggestions. ## Human Checkpoint Protocol The checkpoint protocol meta skill (`skills/meta/checkpoint-protocol.md`) teaches the agent when to pause: - Read `human_approval_default` from the pipeline manifest per stage - Creative stages (`idea`, `script`, `scene_plan`) typically require approval - Technical stages (`assets`, `edit`, `compose`) typically auto-proceed - When approval is required: present artifact summary, review findings, and cost snapshot - Wait for human to approve, request revision, or abort ## Communication Protocol Agents coordinate through canonical JSON artifacts, checkpoints, pipeline manifests, and the tool registry. Primary files: - Artifact schemas: `schemas/artifacts/` - Checkpoint schema: `schemas/checkpoints/checkpoint.schema.json` - Pipeline manifest schema: `schemas/pipelines/pipeline_manifest.schema.json` - Pipeline manifests: `pipeline_defs/` - Style playbooks: `styles/*.yaml` (validated by `schemas/styles/playbook.schema.json`) - Tool contract: `tools/base_tool.py` - Tool registry: `tools/tool_registry.py` - Stage director skills: `skills/pipelines//-director.md` - Meta skills: `skills/meta/*.md` Checkpoint rules: - Checkpoints live at `pipelines//checkpoint_.json`. - `status` may be `completed`, `failed`, `awaiting_human`, or `in_progress`. - `completed` and `awaiting_human` checkpoints must include the canonical artifact. - Invalid checkpoints or invalid canonical artifacts are contract violations and should fail fast. Pipeline manifest rules: - Pipelines are declarative YAML manifests in `pipeline_defs/`. - Stages declare: `skill` (director skill path), `produces`, `tools_available`, `review_focus`, `success_criteria`, `human_approval_default`. - Adding a new pipeline requires a manifest + stage director skills. Tool rules: - Every production tool must inherit from `BaseTool`. - Tool discovery flows through the registry, not ad hoc imports. - Support-envelope reporting is the source of truth for capability, status, and resource requirements. ## Style Playbooks | Playbook | Best For | |----------|----------| | `clean-professional` | Corporate, educational, SaaS | | `flat-motion-graphics` | Social media, TikTok, startups | | `minimalist-diagram` | Technical deep-dives, architecture | ## Layer Map OpenMontage has three instruction layers: 1. `tools/` What exists, what is available, cost, runtime, fallback, related skills. 2. `skills/` How OpenMontage wants those tools used in pipelines. 3. `.agents/skills/` Raw vendor or technology knowledge. Reading order: 1. registry / tool contract — discover what's available 2. relevant pipeline or creative skill (Layer 2) — know HOW to use it in this context 3. underlying vendor skill (Layer 3) — **mandatory before calling any generation tool** **Prefer skills over source code for tool usage.** Skills exist precisely so you don't need implementation details in the common case. Layer 2 tells you *what* and *when*. Layer 3 tells you *how*. For authoring prompts, choosing parameters, or understanding usage patterns, you should be reading skills — not `.py` files. **Exception: debugging, audits, and verifying the governance contract.** When a skill and a tool disagree, or when something behaves differently than the skill claims, reading the tool source is fair game — that's often the only way to catch a silent-availability bug or a stale doc string. An audit that refuses to look at the implementation will miss exactly the bugs that matter most. If you do read source to debug, consider whether the finding belongs in a skill update afterward so the next agent doesn't need to repeat the dive. **Layer 3 is not optional.** Every generation tool (video, image, TTS, music) has an `agent_skills` field listing its Layer 3 skills. These skills contain provider-specific prompt engineering, parameter tuning, and quality techniques. Read them before writing prompts. The difference between a generic prompt and a skill-informed prompt is the difference between "usable" and "cinematic." Example: Before calling `kling_video`, read its `agent_skills` → `ai-video-gen` → get Kling-specific prompt structure, camera direction syntax, and quality keywords that the model responds to best. ### Layer 3 skills, by category The `.agents/skills/` directory is large. When you're not coming in through a tool's `agent_skills` pointer, use this table to find the right file by *what you're trying to do*: | Category | Skills | |---|---| | **Composition runtime** | `remotion`, `remotion-best-practices`, `synthetic-screen-recording` (fake terminal/UI demos via Remotion TerminalScene) | | **Animation knowledge (generic)** | `gsap-core`, `gsap-timeline`, `gsap-plugins` (SplitText / MorphSVG / DrawSVG / MotionPath / Flip / CustomEase), `gsap-utils`, `gsap-react`, `gsap-performance`, `gsap-scrolltrigger`, `gsap-frameworks`, `framer-motion` (Disney 12 principles), `lottie-bodymovin` (Lottie export) | | **Character animation** | `character-rigging`, `svg-character-animation`, `pose-library-design`, `canvas-procedural-animation`, `character-animation-qa` | | **Image generation** | `bfl-api`, `flux-best-practices` | | **Video generation** | `seedance-2-0` (preferred premium default — cinematic, trailer, multi-shot, synced audio, lip-sync), `ai-video-gen`, `ltx2` | | **Audio** | `elevenlabs`, `music`, `sound-effects`, `acestep`, `text-to-speech`, `setup-api-key` | | **Avatar / lip-sync** | `avatar-video`, `heygen`, `create-video`, `faceswap`, `video-translate`, `speech-to-text`, `agents` | | **Capture** | `playwright-recording` (browser flows), `ffmpeg` (post) | | **Visualization** | `beautiful-mermaid`, `d3-viz`, `manim-composer`, `manimce-best-practices`, `manimgl-best-practices` | | **Media editing** | `video-edit`, `video-download`, `video-understand`, `video_toolkit`, `visual-style` | **When in doubt, read the category's meta routing file first:** - Picking an animation runtime? → `skills/meta/animation-runtime-selector.md` routes between Remotion primitives, GSAP plugins, framer-motion, Lottie, Manim, D3. - Picking a screen-recording mode (real capture vs synthetic terminal)? → `pipeline_defs/screen-demo.yaml` + `skills/pipelines/screen-demo/idea-director.md`. ## Quick Lookup | Question | Where to look | |----------|---------------| | What tools exist? | `tools/tool_registry.py` and `registry.support_envelope()` | | What providers are available for a capability? | `registry.capability_catalog()` | | What tools exist for a vendor? | `registry.provider_catalog()` | | How does a tool actually work? | the tool's `usage_location` from the registry | | How should this pipeline stage behave? | `skills/pipelines//...` | | What is the checkpoint/review policy? | `skills/meta/` | ## What Not To Do - **Do not bypass the pipeline.** Never write ad-hoc scripts to call tools directly. All production goes through pipeline stages with director skills. See Rule Zero. - **Do not call generation tools without reading their Layer 3 skill.** Check the tool's `agent_skills` field, read the referenced skill, then craft your prompts using that guidance. - **Do not skip stage director skills.** Before executing any pipeline stage, read its director skill. The skill contains the quality bar, the workflow, and the review criteria. - Do not use deleted legacy names such as `tts_cloud`, `tts_engine`, or `video_gen`. - Do not hardcode provider names, API key names, or setup URLs. Read them from the registry's `install_instructions` and `dependencies` fields. - Do not begin asset generation before user approval on the production plan. - Do not hide degraded paths. Record substitutions and blocked options explicitly. - Do not present a single unavailable tool in isolation. Always show the full capability picture: "X of Y providers configured for this capability." - Do not skip the Provider Menu at preflight. The user must see what they have AND what they could unlock. - Do not change provider, model, or render path without telling the user first and getting approval when the change is material.