---
name: video-subtitle-cutter
description: Transcribe video, analyze subtitles with AI, and cut video by removing filler words, pauses, and mistakes
license: MIT
compatibility: opencode
metadata:
  service: video-editing
  category: media-automation
---

## What I Do

Automate video editing by:

1. Transcribing video to timestamped subtitles (Whisper)
2. Analyzing transcript with AI to identify cuts (filler words, pauses, mistakes)
3. Generating FFmpeg commands to cut and concatenate clean segments
4. Generating subtitles (SRT) for the final video

## CRITICAL: Always Re-encode (Never Use `-c copy`)

**The #1 mistake is using `-c copy` for cutting.** This causes:

- Frozen frames at cut points (1-8 seconds of freeze)
- Audio/video sync issues
- Glitchy playback

**Why?** H.264 video uses keyframes (I-frames) every 2-10 seconds. `-c copy` can only cut at keyframes, so FFmpeg includes extra frames that display as frozen.

**Solution:** Always re-encode segments with quality settings:

```bash
# WRONG - causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# CORRECT - smooth cuts at any timestamp
ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -preset fast -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \
  segment.mp4
```

**Quality presets (CRF = Constant Rate Factor):**

- `crf 15-17` = Near lossless (large files)
- `crf 18-20` = High quality (recommended)
- `crf 21-23` = Good quality (smaller files)
- `crf 24-28` = Medium quality (much smaller)

## Prerequisites

```bash
# Install Whisper (choose one)
pip install openai-whisper          # Local (requires Python 3.9+)
# OR use OpenAI API (no local install needed)

# Install FFmpeg
brew install ffmpeg                  # macOS
sudo apt install ffmpeg              # Linux
```

## Quick Start

### Step 1: Transcribe Video

**Option A: Local Whisper (free, slower)**

```bash
whisper video.mp4 --model medium --output_format json --output_dir ./
```

**Option B: OpenAI Whisper API (fast, paid)**

```bash
curl https://api.openai.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -F file="@video.mp4" \
  -F model="whisper-1" \
  -F response_format="verbose_json" \
  -F timestamp_granularities[]="segment" \
  > transcript.json
```

**Option C: Use ffmpeg to extract audio first (for large files)**

```bash
# Extract audio (much smaller file to upload)
ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 audio.mp3

# Then transcribe the audio
whisper audio.mp3 --model medium --output_format json
```

### Step 2: Analyze Transcript for Cuts

Feed the transcript to the AI with this prompt:

```
Analyze this video transcript and identify segments to CUT (remove).

TRANSCRIPT:
{paste transcript.json segments here}

Identify these issues:
1. FILLER WORDS: "um", "uh", "like", "you know", "basically", "actually", "so", "right"
2. FALSE STARTS: Incomplete sentences that restart ("I think— actually, let me...")
3. LONG PAUSES: Gaps > 1.5 seconds between segments
4. REPETITIONS: Same word/phrase repeated ("really really really")
5. CORRECTIONS: "Wait, I meant...", "Sorry, let me rephrase..."
6. TANGENTS: Off-topic rambling (use judgment)

Return a JSON array of segments to KEEP (not cut):
[
  {"start": 0.0, "end": 2.5, "text": "Welcome to this video"},
  {"start": 3.1, "end": 8.4, "text": "Today we're going to cover..."},
  ...
]

Rules:
- Merge adjacent keep segments if gap < 0.3s
- Ensure cuts don't happen mid-word (check word boundaries)
- Preserve natural speech rhythm (don't over-cut)
- When in doubt, keep the segment
```

### Step 3: Generate FFmpeg Commands (High Quality)

Once you have the keep segments, use this Python script for smooth cuts:

```python
import json
import subprocess
import os

VIDEO_INPUT = "video.mp4"
VIDEO_OUTPUT = "video_clean.mp4"
SEGMENTS_FILE = "keep_segments.json"

with open(SEGMENTS_FILE) as f:
    segments = json.load(f)

segment_files = []
for i, seg in enumerate(segments):
    outfile = f"temp_seg_{i:04d}.mp4"
    segment_files.append(outfile)

    # MUST re-encode for smooth cuts (no -c copy!)
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(seg['start']),      # Seek BEFORE input (fast)
        '-i', VIDEO_INPUT,
        '-t', str(seg['end'] - seg['start']),  # Duration
        '-c:v', 'libx264',
        '-preset', 'fast',              # fast/medium/slow
        '-crf', '18',                   # Quality (lower = better, 15-23 recommended)
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',  # Fix timestamp issues
        '-async', '1',                  # Sync audio
        outfile
    ]
    subprocess.run(cmd, capture_output=True)
    print(f"✓ Segment {i+1}/{len(segments)}")

# Create concat file
with open('temp_concat.txt', 'w') as f:
    for sf in segment_files:
        f.write(f"file '{sf}'\n")

# Concatenate (can use -c copy here since all segments match)
subprocess.run([
    'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
    '-i', 'temp_concat.txt',
    '-c', 'copy',
    VIDEO_OUTPUT
])

# Cleanup
for sf in segment_files:
    os.remove(sf)
os.remove('temp_concat.txt')
print(f"✓ Created: {VIDEO_OUTPUT}")
```

**Key flags explained:**

- `-ss` before `-i`: Fast seek (doesn't decode entire video)
- `-t`: Duration of segment (not end time)
- `-crf 18`: High quality encoding
- `-avoid_negative_ts make_zero`: Fixes concat timestamp issues
- `-async 1`: Keeps audio in sync

### Step 4: Generate Subtitles

After creating the final video, generate fresh subtitles with Whisper:

```bash
# Generate SRT subtitles for the cleaned video
whisper video_clean.mp4 --model medium --output_format srt --output_dir ./

# For higher accuracy (slower):
whisper video_clean.mp4 --model large --output_format srt --language en

# Output: video_clean.srt
```

**Burn subtitles into video (optional):**

```bash
# Embed subtitles permanently
ffmpeg -i video_clean.mp4 -vf "subtitles=video_clean.srt:force_style='FontSize=24,FontName=Arial,PrimaryColour=&HFFFFFF,OutlineColour=&H000000,Outline=2'" -c:a copy video_with_subs.mp4
```

**Subtitle styling options:**

- `FontSize=24` - Text size
- `FontName=Arial` - Font face
- `PrimaryColour=&HFFFFFF` - White text (BGR format)
- `OutlineColour=&H000000` - Black outline
- `Outline=2` - Outline thickness
- `MarginV=50` - Distance from bottom

---

## Complete Workflow Script (High Quality)

```python
#!/usr/bin/env python3
"""
video_clean.py - Clean up video by removing filler words/pauses
Uses re-encoding for smooth cuts (no freeze frames)
"""

import json
import subprocess
import os
import sys

def get_duration(filepath):
    """Get video duration in seconds"""
    result = subprocess.run([
        'ffprobe', '-v', 'quiet', '-print_format', 'json', '-show_format', filepath
    ], capture_output=True, text=True)
    return float(json.loads(result.stdout)['format']['duration'])

def extract_segment(input_file, start, end, output_file, crf=18, preset='fast'):
    """Extract a segment with re-encoding for smooth cuts"""
    cmd = [
        'ffmpeg', '-y',
        '-ss', str(start),
        '-i', input_file,
        '-t', str(end - start),
        '-c:v', 'libx264',
        '-preset', preset,
        '-crf', str(crf),
        '-c:a', 'aac',
        '-b:a', '192k',
        '-avoid_negative_ts', 'make_zero',
        '-async', '1',
        output_file
    ]
    return subprocess.run(cmd, capture_output=True, text=True)

def concatenate_segments(segment_files, output_file):
    """Concatenate segments into final video"""
    with open('temp_concat.txt', 'w') as f:
        for sf in segment_files:
            f.write(f"file '{sf}'\n")

    subprocess.run([
        'ffmpeg', '-y', '-f', 'concat', '-safe', '0',
        '-i', 'temp_concat.txt',
        '-c', 'copy',
        output_file
    ], capture_output=True)

    os.remove('temp_concat.txt')

def generate_subtitles(video_file, model='medium'):
    """Generate SRT subtitles using Whisper"""
    subprocess.run([
        'whisper', video_file,
        '--model', model,
        '--output_format', 'srt',
        '--output_dir', './'
    ])

def main(video_input, segments, output_name, crf=18):
    """Main workflow"""
    segment_files = []

    print(f"\n{'='*50}")
    print(f"Processing: {video_input}")
    print(f"Quality: CRF {crf} (lower=better, 15-23 recommended)")
    print(f"{'='*50}\n")

    # Extract segments with re-encoding
    for i, seg in enumerate(segments):
        outfile = f"temp_seg_{i:04d}.mp4"
        segment_files.append(outfile)

        result = extract_segment(video_input, seg['start'], seg['end'], outfile, crf)
        if result.returncode == 0:
            duration = seg['end'] - seg['start']
            print(f"✓ Segment {i+1}/{len(segments)}: {duration:.1f}s")
        else:
            print(f"✗ Error on segment {i+1}")
            print(result.stderr[-500:])

    # Concatenate
    print("\nConcatenating segments...")
    concatenate_segments(segment_files, output_name)

    # Cleanup temp segments
    for sf in segment_files:
        os.remove(sf)

    # Generate subtitles
    print("\nGenerating subtitles...")
    generate_subtitles(output_name)

    # Stats
    orig_duration = get_duration(video_input)
    new_duration = get_duration(output_name)
    orig_size = os.path.getsize(video_input) / (1024*1024)
    new_size = os.path.getsize(output_name) / (1024*1024)

    print(f"\n{'='*50}")
    print(f"COMPLETE")
    print(f"{'='*50}")
    print(f"Original:  {orig_duration:.0f}s | {orig_size:.1f} MB")
    print(f"Output:    {new_duration:.0f}s | {new_size:.1f} MB")
    print(f"Removed:   {orig_duration - new_duration:.0f}s ({((orig_duration - new_duration)/orig_duration)*100:.0f}%)")
    print(f"Video:     {output_name}")
    print(f"Subtitles: {output_name.replace('.mp4', '.srt')}")

if __name__ == '__main__':
    # Example usage
    VIDEO = "input.mp4"
    SEGMENTS = [
        {"start": 0.0, "end": 10.5},
        {"start": 12.3, "end": 25.0},
        # ... add your segments
    ]
    main(VIDEO, SEGMENTS, "output_clean.mp4", crf=18)
```

---

## AI Analysis Prompt Templates

### Basic Cleanup (Filler Words Only)

```
Remove filler words from this transcript. Return segments to KEEP.

Filler words to remove: um, uh, like, you know, basically, actually, so, right, I mean

TRANSCRIPT SEGMENTS:
{segments}

Return JSON: [{"start": float, "end": float, "text": "cleaned text"}, ...]
```

### Aggressive Cleanup (Podcast/Interview)

```
Clean this podcast transcript for a tight, professional edit.

REMOVE:
- All filler words (um, uh, like, you know, basically, so, right)
- False starts and restarts
- Pauses longer than 1 second
- Repetitions
- Off-topic tangents
- "That's a great question" type filler responses
- Excessive laughter/reactions (keep some for naturalness)

KEEP:
- Core content and insights
- Natural transitions
- Important reactions that add context

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP with cleaned text.
```

### Light Cleanup (Preserve Natural Feel)

```
Lightly clean this transcript while preserving natural speech patterns.

ONLY REMOVE:
- "Um" and "uh" when standalone (not part of thinking pause)
- Obvious mistakes followed by corrections
- Technical issues (coughs, phone rings, etc.)

PRESERVE:
- Natural "like" and "you know" that add personality
- Thinking pauses that feel authentic
- Personality quirks

TRANSCRIPT:
{segments}

Return JSON array of segments to KEEP.
```

---

## Transcript Format Reference

### Whisper JSON Output

```json
{
  "text": "Full transcript text...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": " Welcome to this video.",
      "tokens": [50364, 5765, ...],
      "temperature": 0.0,
      "avg_logprob": -0.25,
      "compression_ratio": 1.2,
      "no_speech_prob": 0.01
    },
    {
      "id": 1,
      "start": 2.5,
      "end": 5.8,
      "text": " Um, so today we're going to...",
      ...
    }
  ],
  "language": "en"
}
```

### Keep Segments Format (for FFmpeg)

```json
[
  { "start": 0.0, "end": 2.5, "text": "Welcome to this video." },
  { "start": 3.2, "end": 5.8, "text": "Today we're going to..." }
]
```

---

## Advanced: Word-Level Timestamps

For precise filler word removal, use word-level timestamps:

```bash
# Whisper with word timestamps
whisper video.mp4 --model medium --word_timestamps True --output_format json
```

This gives you:

```json
{
  "segments": [
    {
      "start": 0.0,
      "end": 2.5,
      "text": "Um welcome to this video",
      "words": [
        { "word": "Um", "start": 0.0, "end": 0.3 },
        { "word": "welcome", "start": 0.5, "end": 0.9 },
        { "word": "to", "start": 0.9, "end": 1.0 },
        { "word": "this", "start": 1.0, "end": 1.2 },
        { "word": "video", "start": 1.2, "end": 1.6 }
      ]
    }
  ]
}
```

Now you can cut precisely around "Um" (0.0-0.3) and keep "welcome to this video" (0.5-1.6).

---

## Troubleshooting

### Frozen Frames at Cut Points (MOST COMMON)

**Cause:** Using `-c copy` which can only cut at keyframes.

**Solution:** Always re-encode with `-c:v libx264 -crf 18` (see examples above).

### Audio/Video Sync Issues

Add these flags when extracting segments:

```bash
ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -c:a aac -b:a 192k \
  -avoid_negative_ts make_zero \  # Fix negative timestamps
  -async 1 \                       # Sync audio to video
  segment.mp4
```

### Cuts Sound Abrupt

Add audio fade in/out to each segment:

```bash
ffmpeg -ss 10 -i video.mp4 -t 5 \
  -c:v libx264 -crf 18 \
  -af "afade=t=in:st=0:d=0.05,afade=t=out:st=4.95:d=0.05" \
  -c:a aac segment.mp4
```

### Large Files Take Forever

1. Use `-preset fast` or `-preset veryfast` (trades quality for speed)
2. Extract audio first for transcription (much smaller)
3. Use Whisper API instead of local model
4. Process in parallel (multiple segments at once)

```bash
# Faster encoding (slightly lower quality)
ffmpeg ... -preset veryfast -crf 20 ...

# Even faster for previews
ffmpeg ... -preset ultrafast -crf 23 ...
```

### Whisper Misses Words

- Use `--model large` for better accuracy
- Use `--language en` to force English
- Normalize audio first:

```bash
ffmpeg -i video.mp4 -af "loudnorm=I=-16:TP=-1.5:LRA=11" -c:v copy normalized.mp4
```

### File Size Too Large After Re-encoding

Increase CRF value (higher = smaller file, lower quality):

```bash
# Original quality (large)
-crf 18

# Good quality (medium)
-crf 22

# Acceptable quality (small)
-crf 26
```

---

## Integration with OpenCode

When using this skill in OpenCode:

1. **Extract audio** (faster transcription):

   ```bash
   ffmpeg -i video.mp4 -vn -acodec libmp3lame -q:a 2 temp_audio.mp3 -y
   ```

2. **Transcribe with Whisper**:

   ```bash
   whisper temp_audio.mp3 --model medium --output_format json --output_dir ./
   ```

3. **Read transcript.json** and analyze segments

4. **Identify segments to KEEP** based on:
   - Removing filler words (um, uh, like, you know)
   - Removing long pauses (>1.5s gaps)
   - Removing false starts and repetitions
   - For "shorts style": Keep only hook + key points + CTA

5. **Re-encode and concatenate** (MUST re-encode, never -c copy):

   ```python
   # Use the Python script above with crf=18 for quality
   ```

6. **Generate subtitles** for final video:

   ```bash
   whisper output.mp4 --model medium --output_format srt
   ```

7. **Report results** with before/after stats

### Quality Settings Reference

| Use Case       | CRF   | Preset   | Notes                      |
| -------------- | ----- | -------- | -------------------------- |
| Archive/Master | 15-17 | slow     | Near lossless, large files |
| YouTube/Vimeo  | 18-20 | medium   | High quality, recommended  |
| Social Media   | 21-23 | fast     | Good quality, smaller      |
| Preview/Draft  | 24-28 | veryfast | Quick renders              |

### Anti-Patterns (DO NOT DO)

```bash
# WRONG: -c copy causes freeze frames
ffmpeg -ss 10 -i video.mp4 -t 5 -c copy segment.mp4

# WRONG: -to instead of -t with -ss before -i
ffmpeg -ss 10 -i video.mp4 -to 15 ...  # -to is absolute, not relative

# WRONG: Missing timestamp fix flags
ffmpeg ... -c:v libx264 ...  # Missing -avoid_negative_ts
```