--- name: ffmpeg-analyse-video description: Analyse video content by extracting frames with ffmpeg and using AI vision to generate timestamped step-by-step summaries. Use when user provides a video file and wants to understand its visual content — screen recordings, tutorials, presentations, footage, or animations. Triggers on "analyse this video", "what happens in this video", "summarise this recording", or any request involving understanding video file contents. --- # FFmpeg Video Analysis Extract frames from video files with ffmpeg. Delegate frame reading to sub-agents to preserve the main context window. Synthesise a structured timestamped summary from text-only sub-agent reports. ## Architecture: Context-Efficient Sub-Agent Pipeline **Problem**: Reading dozens of images into the main conversation context consumes most of the context window, leaving little room for synthesis and follow-up. **Solution**: A 3-phase pipeline: ``` Main Agent Sub-Agents (disposable context) ────────── ────────────────────────────── 1. ffprobe metadata ───► 2. ffmpeg frame extraction ───► 3. Split frames into batches ──► 4. Read images (vision) Write text descriptions to batch_N_analysis.md 5. Read text files only ◄─── (context discarded) 6. Synthesise final output ``` Images only ever exist inside sub-agent contexts. The main agent only reads lightweight text files. This cuts context usage by ~90%. ## 1. Prerequisites ```bash which ffmpeg && which ffprobe ``` If either is missing, show platform-specific install instructions and STOP: - **macOS**: `brew install ffmpeg` - **Ubuntu/Debian**: `sudo apt install ffmpeg` - **Windows**: `choco install ffmpeg` or `winget install ffmpeg` ## 2. Setup Temp Directory ```bash # macOS/Linux TMPDIR="/tmp/video-analysis-$(date +%s)" mkdir -p "$TMPDIR" # Windows (PowerShell) # $TMPDIR = "$env:TEMP\video-analysis-$(Get-Date -UFormat %s)" # New-Item -ItemType Directory -Path $TMPDIR ``` ## 3. Extract Video Metadata ```bash ffprobe -v quiet -print_format json -show_format -show_streams "VIDEO_PATH" ``` Extract and report: duration, resolution (width x height), fps, codec, file size, whether audio is present. If no video stream is found, report "audio-only file" and STOP. If file size > 2GB, warn the user and suggest analysing a time range with `-ss START -to END`. ## 4. Extract Frames Choose strategy based on duration: | Duration | Strategy | Command | |----------|----------|---------| | 0-60s | 1 frame every 2s | `ffmpeg -hide_banner -y -i INPUT -vf "fps=1/2,scale='min(1280,iw)':-2" -q:v 5 DIR/frame_%04d.jpg` | | 1-10min | Scene detection (threshold 0.3) | `ffmpeg -hide_banner -y -i INPUT -vf "select='gt(scene,0.3)',scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/scene_%04d.jpg` | | 10-30min | Keyframe extraction | `ffmpeg -hide_banner -y -skip_frame nokey -i INPUT -vf "scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/key_%04d.jpg` | | 30min+ | Thumbnail filter | `ffmpeg -hide_banner -y -i INPUT -vf "thumbnail=SEGMENT_FRAMES,scale='min(1280,iw)':-2" -vsync vfr -q:v 5 DIR/thumb_%04d.jpg` | For thumbnail filter, calculate `SEGMENT_FRAMES = total_frames / 60` to cap output at ~60 frames. **Fallbacks:** - Scene detection yields 0 frames → retry with interval at 1 frame/5s - More than 100 frames extracted → subsample evenly to 80 - Frame extraction fails → try the next simpler strategy (scene → interval, keyframe → interval) **Time range analysis:** When user specifies a range, prepend `-ss START -to END` before `-i`. **Higher detail mode:** If requested, double the fps rate and lower scene threshold to 0.2. After extraction, list all frame files and calculate each frame's timestamp from its sequence number and the extraction rate. ## 5. Delegate Frame Analysis to Sub-Agents **This is the critical context-saving step.** Do NOT read frame images in the main conversation. Instead, split frames into batches and delegate each batch to a sub-agent. ### 5a. Prepare Batch Manifest Split the extracted frame file list into batches of 8-10 frames each. For each batch, record: - Batch number (1, 2, 3, ...) - Frame file paths (absolute) - Frame timestamps (calculated from sequence number) - Output file path: `TMPDIR/batch_N_analysis.md` ### 5b. Spawn Sub-Agents For each batch, spawn a sub-agent with the prompt below. **Launch all batches in parallel** where the tool supports it — they are fully independent. #### Sub-Agent Prompt Template Use this prompt verbatim, substituting the placeholders: ``` You are analysing frames extracted from a video file. VIDEO: {filename} DURATION: {duration} BATCH: {batch_number} of {total_batches} Read each frame image listed below using the Read tool (or equivalent file reading tool that supports images). For each frame, write a structured description. FRAMES: {for each frame in batch} - {absolute_path_to_frame} (timestamp: {MM:SS}) {end for} For each frame, describe: 1. SCENE: What is visible (layout, UI elements, environment) 2. CONTENT: Text, code, labels, menus, or dialogue visible on screen 3. ACTION: What is happening or has changed since the likely previous frame 4. DETAILS: Any notable specifics (error messages, URLs, file names, button states) After describing all frames, add a BATCH SUMMARY section with: - Content type (one of: Screencast, Presentation, Tutorial, Footage, Animation) - Key events in this batch's time range - Any text/prompts/commands the user typed (quote exactly) Write the complete analysis to: {TMPDIR}/batch_{N}_analysis.md Format the output file as: # Batch {N} Analysis ({start_timestamp} - {end_timestamp}) ## Frame-by-Frame ### Frame {sequence} ({timestamp}) - **Scene**: ... - **Content**: ... - **Action**: ... - **Details**: ... (repeat for each frame) ## Batch Summary - **Content Type**: ... - **Key Events**: ... - **Quoted Text/Prompts**: ... ``` #### How to Spawn Use whatever sub-agent, background task, or independent agent mechanism your tool provides. The requirements are simple — each sub-agent needs to: 1. **Read image files** (the frame JPEGs) 2. **Write a text file** (the batch analysis markdown) Launch all batches in parallel if your tool supports it — they are fully independent with no shared state. **If your tool has no sub-agent mechanism**, fall back to reading frames directly in the main context but limit to **20 frames maximum** and warn the user about context usage. ### 5c. Collect Results After all sub-agents complete, read the text analysis files. These are lightweight markdown — no images enter the main context. ```bash ls TMPDIR/batch_*_analysis.md ``` Read each `batch_N_analysis.md` file **in order**. These contain only text descriptions — the context cost is minimal compared to reading the original images. ## 6. Synthesise Output Using only the text from the batch analysis files, perform synthesis in the main context: 1. Merge all frame descriptions into a single chronological timeline 2. Group frames into natural segments (same scene, slide, or screen) 3. Detect the dominant content type across all batches 4. Identify 3-7 key moments 5. Extract all quoted text, prompts, or commands the user typed 6. Write a 2-5 sentence narrative summary Format the output as: ```markdown # Video Analysis: [filename] ## Metadata | Property | Value | |----------|-------| | Duration | M:SS | | Resolution | WxH | | FPS | N | | Content Type | [detected] | | Frames Analysed | N | ## Timeline ### [Segment Title] (M:SS - M:SS) Description of what happens in this segment. ### [Segment Title] (M:SS - M:SS) Description of what happens in this segment. ## Key Moments 1. **[M:SS] Title**: Description 2. **[M:SS] Title**: Description 3. **[M:SS] Title**: Description ## Summary [2-5 sentence narrative paragraph summarising the entire video] ``` ## 7. Cleanup Remove the temp directory after output is complete: ```bash # macOS/Linux rm -rf "$TMPDIR" # Windows (PowerShell) # Remove-Item -Recurse -Force $TMPDIR ``` Skip cleanup if the user asks to keep frames. ## Advanced Options - **Time range**: "Analyse 2:00 to 5:00 of video.mp4" → use `-ss 120 -to 300` - **Higher detail**: "Analyse in high detail" → double frame rate, lower scene threshold to 0.2 - **Focus area**: "Focus on the code shown" → prioritise text/code extraction in sub-agent prompts - **Sprite sheet**: For a visual overview, generate a contact sheet: ```bash ffmpeg -hide_banner -y -i INPUT -vf "select='not(mod(n,EVERY_N))',scale='min(320,iw)':-2,tile=5xROWS" -frames:v 1 DIR/sprite.jpg ``` ## Error Handling - ffmpeg not found → install instructions per platform, STOP - No video stream → report audio-only, STOP - Scene detection yields 0 frames → fallback to interval - Too many frames (>100) → subsample to 80 - Large files (>2GB) → warn, suggest time range - Sub-agent fails or times out → read that batch's frames directly as fallback, warn about context usage - Frame read failure in sub-agent → skip frame, note gap in batch analysis file