---
name: extract-moves-from-video
description: Guidance for extracting text-based game commands, moves, or inputs from video recordings using OCR and frame analysis. This skill applies when extracting user inputs from screen recordings of text-based games (Zork, interactive fiction), terminal sessions, or any video where typed commands need to be recovered. It covers OCR preprocessing, region-of-interest extraction, domain-aware validation, and deduplication strategies.
---

# Extract Moves from Video

This skill provides structured guidance for extracting text-based commands or moves from video recordings, particularly for text-based games, terminal sessions, or any screen recording where typed inputs need to be recovered.

## When to Use This Skill

This skill applies when:
- Extracting game commands from recordings of text-based games (Zork, interactive fiction, MUDs)
- Recovering typed inputs from terminal session recordings
- Extracting user commands from any screen recording where text input is visible
- OCR-based extraction of sequential text entries from video

## Strategic Approach

### Phase 1: Environment Assessment

Before processing, assess the available tools and estimate resource requirements:

1. **Check available tools in one comprehensive sweep:**
   - FFmpeg for frame extraction
   - OCR engines (tesseract, pytesseract)
   - Image processing libraries (opencv-python, Pillow)
   - Python environment and package managers (pip, uv, conda)

2. **Analyze video characteristics:**
   - Duration and frame rate
   - Resolution and text clarity
   - Location of command input area (typically fixed position)
   - Text color contrast against background

3. **Estimate processing time:**
   - Benchmark OCR on 3-5 sample frames before full processing
   - Calculate expected total time based on frame count and benchmark

### Phase 2: Region of Interest (ROI) Identification

**Critical optimization**: Identify and crop to the command input region before OCR.

For text-based games like Zork:
- Command input typically appears at a fixed screen location (often bottom)
- Command prompts have consistent markers (e.g., `>` prefix)
- Cropping to ROI dramatically improves OCR accuracy and speed

To identify the ROI:
1. Extract a few sample frames where commands are visible
2. Manually or programmatically identify the bounding box of the input area
3. Apply consistent cropping to all frames before OCR

### Phase 3: Frame Extraction Strategy

Select frame extraction rate based on content characteristics:

- **Text-based games**: Commands persist on screen for seconds; 1-3 second intervals typically suffice
- **Fast-paced inputs**: May require higher frequency (0.5 second intervals)
- **Start conservative**: Begin with lower frequency, increase only if commands are missed

```bash
# Example: Extract frames at 1 frame per second
ffmpeg -i video.mp4 -vf "fps=1" frames/frame_%04d.png
```

### Phase 4: OCR Preprocessing

Apply preprocessing to improve OCR accuracy:

1. **Convert to grayscale**
2. **Apply thresholding** (binary threshold for high contrast text)
3. **Consider additional techniques if needed:**
   - Contrast enhancement
   - Noise reduction
   - Dilation/erosion for text clarity
   - Inversion if text is light on dark background

See `references/ocr_video_processing.md` for detailed preprocessing techniques.

### Phase 5: Domain-Aware Extraction and Validation

**Key insight**: Use domain knowledge to validate and correct OCR results.

For text-based games:
1. Obtain or construct a list of valid commands for the game
2. Use command vocabulary for spell-checking OCR output
3. Identify command syntax patterns (e.g., `VERB NOUN`, `DIRECTION`)
4. Flag entries that don't match known patterns for manual review

Common Zork-style commands include:
- Directions: n, s, e, w, ne, nw, se, sw, up, down
- Actions: get, take, drop, put, open, close, read, examine, look, inventory
- Combinations: `get lamp`, `put sword in case`, `open mailbox`

### Phase 6: Deduplication and Cleaning

Handle duplicates arising from:
- Same command captured across multiple frames
- OCR variations of the same command (e.g., "get lamp" vs "get 1amp")

Deduplication strategy:
1. Normalize whitespace and case
2. Use fuzzy matching to group similar entries
3. When OCR variations exist, prefer the version matching known vocabulary
4. Remove incomplete/partial commands (single letters that aren't valid directions)

### Phase 7: Validation

Before finalizing, validate the extracted command list:

1. **Syntax validation**: Verify commands match expected patterns
2. **Sequence plausibility**: Check that command order makes logical sense
3. **Coverage check**: Estimate if extracted count matches expected (based on video length)
4. **Interpreter testing** (if available): Run commands through a game interpreter to verify validity

## Common Pitfalls

1. **Skipping ROI extraction**: Processing full frames wastes time and reduces accuracy
2. **Inadequate preprocessing**: Raw frames often need contrast/threshold adjustments
3. **Ignoring domain knowledge**: Valid command vocabulary enables validation and correction
4. **Ad-hoc cleaning scripts**: Design one robust cleaning pipeline rather than multiple iterations
5. **No early validation**: Test on sample frames before processing entire video
6. **Timeout misestimation**: Benchmark before committing to full processing
7. **Capturing game output as commands**: Filter to only lines with command prompt markers

## Verification Checklist

- [ ] ROI identified and applied to frame extraction
- [ ] Preprocessing parameters tested on sample frames
- [ ] OCR benchmarked for time estimation
- [ ] Domain vocabulary used for validation
- [ ] Duplicates and near-duplicates removed
- [ ] Output validated against expected command syntax
- [ ] Command count reasonable for video duration