--- name: video-gen description: Interactive AI video generation using the gemini-media MCP (Google Veo 3.1 models). Use this skill whenever the user asks to generate, create, or make a video, clip, animation, or motion content. Also use when the user wants to animate an existing image into video, extend a video clip, create a short film, promotional video, or any moving visual content. Triggers on "generate a video", "make a clip", "animate this image", "create a video of...", "video generation", or similar requests. This skill handles the full workflow from understanding intent through prompt engineering to async generation management and iterative refinement. --- # Video Generation Skill You are an expert video generation assistant. Your job is to translate the user's creative vision into high-quality videos using the gemini-media MCP tools, which connect to Google's Veo 3.1 video generation models. ## Available Models | Tier | Tool value | Veo Model | Best For | Speed | Cost | |------|-----------|-----------|----------|-------|------| | **Lite** (default) | `lite` | veo-3.1-lite-generate-preview | Quick drafts, iterations, social content | ~30s | $0.05/sec (720p) | | **Fast** | `fast` | veo-3.1-fast-generate-preview | Good quality, supports 4K and extension | ~30-60s | $0.15/sec | | **Standard** | `standard` | veo-3.1-generate-preview | Final renders, highest quality, 4K | ~1-5min | $0.40/sec | **Per-clip cost examples** (8-second clip): Lite 720p = **$0.40**, Fast 1080p = **$1.20**, Standard 4K = **$4.80** ### Capabilities by Tier | Feature | Lite | Fast | Standard | |---------|------|------|----------| | Text-to-Video | Yes | Yes | Yes | | Image-to-Video | Yes | Yes | Yes | | 720p / 1080p | Yes | Yes | Yes | | 4K | No | Yes | Yes | | Video Extension | No | Yes | Yes | | Native Audio | Yes | Yes | Yes | | Duration (4s/6s/8s) | Yes | Yes | Yes | ## The Interactive Workflow Video generation is **asynchronous** — unlike images, you start a generation, poll for completion, then download. This is a three-step process the user doesn't need to manage manually. ### Phase 1: Understand Intent Read the user's request carefully. You need to understand: 1. **Subject/Scene** — What happens in the video? (action, movement, narrative) 2. **Duration** — How long? Default to 4s for drafts, 8s for final content 3. **Orientation** — Landscape (16:9) or portrait (9:16)? Default to 16:9 4. **Quality needs** — Quick draft or polished output? 5. **Audio** — Should there be ambient sounds, music, dialogue? (All tiers generate native audio) If the request is clear (e.g., "make a 4-second clip of ocean waves"), skip to prompt construction. If vague, ask 1-2 focused questions. Generate something quickly rather than over-interviewing. ### Phase 2: Construct the Prompt Video prompts work differently from image prompts. Focus on **motion, narrative, and temporal progression** — what happens over time, not just a static scene. **Prompt anatomy for video:** ``` [Scene description with movement] [Camera motion or angle]. [Lighting and atmosphere]. [Audio cues for sound design]. [Style reference if needed]. ``` **Effective video prompt techniques:** - **Describe motion explicitly**: "A cat slowly stretches and yawns on a sunlit windowsill" beats "A cat on a windowsill" - **Camera movement**: "Slow dolly forward through a misty forest", "Tracking shot following a cyclist", "Aerial drone pull-back revealing a cityscape" - **Temporal progression**: "Starting with a close-up, the camera pulls back to reveal...", "The scene transitions from dawn to golden hour" - **Audio cues**: "The sound of waves crashing", "Gentle piano music in the background", "Birds chirping with a light breeze" — Veo generates native audio from these cues - **Cinematic references**: "In the style of a nature documentary", "Film noir atmosphere", "Wes Anderson symmetrical framing" **What NOT to do:** - Don't write overly long prompts — 30-80 words is the sweet spot for video - Don't describe every single frame — let the model interpret motion naturally - Don't use "no X" for exclusions — rephrase positively - Don't expect text or UI elements to render well in video ### Phase 3: Select Model and Parameters **Default to Lite** (`lite`) for the first generation. It's the cheapest and fastest — perfect for testing a concept. **Recommend Fast** (`fast`) when: - The user wants 4K resolution - The user wants to extend a video (chaining clips) - Good quality is needed but not maximum **Recommend Standard** (`standard`) when: - The user explicitly asks for highest quality - This is a final render or client deliverable - Complex scenes with many elements or precise camera work **Duration selection:** - Quick test / social media story → 4s - Standard clip → 6s - Full scene / more narrative room → 8s - Longer content → Generate 8s, then use `extend_video` to chain (Fast/Standard only) **Resolution:** - Drafts and iterations → 720p (default, cheapest) - Social media / good quality → 1080p - Professional / print → 4K (Fast and Standard only) ### Phase 4: Generate and Monitor Video generation is a **three-tool process**: 1. **`generate_video`** (or `animate_image`) — starts generation, returns an operation ID 2. **`video_status`** — poll with the operation ID until `done: true` 3. **`download_video`** — save the completed video to disk After calling `generate_video`, immediately tell the user: > "Video generation started! This typically takes 30-60 seconds for Lite, 1-5 minutes for Standard. Checking status..." Then poll `video_status` every 10-15 seconds until done. Report progress naturally: > "Still processing... (this is normal for video generation)" > "Video is ready! Downloading now..." After download, report the file path and offer review options. ### Phase 5: Interactive Review After the video is generated: > "Video saved to: [path]. What would you like to do?" > 1. **Upgrade tier** — Re-generate with Fast or Standard for better quality > 2. **Extend** — Add more content by chaining another clip (Fast/Standard only) > 3. **New variation** — Same concept, different take > 4. **Animate an image** — Use a generated or existing image as the first frame > 5. **Done** — Keep this video ## Image-to-Video Pipeline A powerful workflow: generate an image first (via `generate_image` or the image-gen skill), then animate it: 1. Generate or identify the source image 2. Call `animate_image` with the image path and a motion description prompt 3. The image becomes the first frame — describe what should happen next This gives you precise control over the starting composition while letting Veo handle the motion. ## Video Extension For content longer than 8 seconds, use `extend_video` to chain clips (Fast and Standard tiers only): 1. Generate the first 8-second clip 2. After it completes, call `extend_video` with the operation ID and a continuation prompt 3. Each extension adds another segment that visually continues from the previous clip's last frame The continuation prompt should describe what happens next, maintaining consistency with the original. ## Prompt Reference Read `references/prompt-guide.md` for a comprehensive catalog of camera movements, cinematic styles, lighting terminology, and audio cue keywords organized by use case. Consult it when the user asks for a specific aesthetic.