--- name: ingest-youtube description: > Extract transcripts from YouTube videos. Use when user says "get transcript", "transcribe this video", "what does this YouTube video say", "extract captions", "youtube transcript", or provides a YouTube URL and wants the text content. allowed-tools: Bash, Read triggers: - ingest youtube - youtube ingest - get transcript - transcribe video - youtube transcript - extract captions - what does this video say - get subtitles from - youtube url text metadata: short-description: YouTube transcript extraction --- # YouTube Transcripts Skill Extract transcripts from YouTube videos with **three-tier fallback**: 1. **Direct** - youtube-transcript-api (fastest) 2. **Proxy** - IPRoyal residential proxy rotation (handles rate limits) 3. **Whisper** - yt-dlp audio download → faster-whisper local transcription (free, GPU-accelerated) ## Quick Start ```bash cd .pi/skills/ingest-youtube # Get transcript (auto-fallback through all tiers) uv run python youtube_transcript.py get -i dQw4w9WgXcQ # Skip proxy tier uv run python youtube_transcript.py get -i VIDEO_ID --no-proxy # Skip whisper tier uv run python youtube_transcript.py get -i VIDEO_ID --no-whisper # List available transcript languages uv run python youtube_transcript.py list-languages -i VIDEO_ID # Check proxy configuration uv run python youtube_transcript.py check-proxy ``` ## Commands ### Get Transcript ```bash uv run python youtube_transcript.py get \ --url "https://youtube.com/watch?v=dQw4w9WgXcQ" \ --lang en ``` **Options:** | Option | Short | Description | |--------|-------|-------------| | `--url` | `-u` | YouTube video URL | | `--video-id` | `-i` | Video ID directly | | `--lang` | `-l` | Language code (default: en) | | `--no-proxy` | | Skip proxy tier | | `--no-whisper` | | Skip Whisper fallback tier | | `--retries` | `-r` | Max retries per tier (default: 3) | **Output:** JSON with transcript segments (text, start time, duration) ### List Available Languages ```bash uv run python youtube_transcript.py list-languages -i VIDEO_ID ``` **Output:** JSON with available transcript languages ### Check Proxy ```bash uv run python youtube_transcript.py check-proxy uv run python youtube_transcript.py check-proxy --test-rotation ``` Tests IPRoyal proxy connectivity and IP rotation. ## Output Format ```json { "meta": { "video_id": "dQw4w9WgXcQ", "language": "en", "took_ms": 3029, "method": "direct" }, "transcript": [ {"text": "Hello world", "start": 0.0, "duration": 2.5}, {"text": "This is a test", "start": 2.5, "duration": 3.0} ], "full_text": "Hello world This is a test...", "errors": [] } ``` **Method values:** `direct`, `proxy`, `whisper-local`, `whisper-api`, or `null` (if all failed) ## Three-Tier Fallback ### Tier 1: Direct - Uses youtube-transcript-api without proxy - Fastest, no additional cost - May fail with rate limits on repeated requests ### Tier 2: IPRoyal Proxy - Uses IPRoyal residential proxy (auto-rotates IPs) - Handles rate limiting (429) and blocking (403) - Requires proxy credentials in `.env` file **Environment variables (in .env):** | Variable | Description | |----------|-------------| | `IPROYAL_HOST` | Proxy host (e.g., geo.iproyal.com) | | `IPROYAL_PORT` | Proxy port (e.g., 12321) | | `IPROYAL_USER` | Proxy username | | `IPROYAL_PASSWORD` | Proxy password | ### Tier 3: Whisper Fallback - Downloads audio with yt-dlp - Transcribes with **faster-whisper** (local, free, GPU-accelerated) - Falls back to OpenAI Whisper API if local fails - Works for videos with disabled/unavailable captions **Note:** Some channels (like TheRemembrancer) have NO native captions, so 100% of videos require Whisper fallback. This is significantly slower but fully automatic. ## Dependencies ```bash pip install youtube-transcript-api requests yt-dlp openai faster-whisper rich ``` - `youtube-transcript-api` - Tier 1 & 2 - `requests` - Proxy support - `yt-dlp` - Tier 3 audio download - `faster-whisper` - Tier 3 local transcription (CTranslate2, 4-8x faster than openai-whisper) - `openai` - Tier 3 API transcription (fallback if local fails) - `rich` - Progress display ## Batch Processing Download transcripts from entire channels with intelligent rate limiting: ### Get Video IDs from Channel ```bash # Get all video IDs from a YouTube channel yt-dlp --flat-playlist --print id "https://www.youtube.com/@ChannelName/videos" > videos.txt ``` ### Batch Command ```bash uv run python youtube_transcript.py batch \ --input videos.txt \ --output ./transcripts \ --delay-min 5 --delay-max 15 \ --whisper \ --resume ``` **Options:** | Option | Short | Description | |--------|-------|-------------| | `--input` | `-f` | File with video IDs (one per line) | | `--output` | `-o` | Output directory for transcripts | | `--delay-min` | | Minimum delay between requests (default: 30) | | `--delay-max` | | Maximum delay between requests (default: 60) | | `--lang` | `-l` | Language code (default: en) | | `--no-proxy` | | Skip proxy tier | | `--whisper/--no-whisper` | | Enable/disable Whisper fallback | | `--resume/--no-resume` | | Resume from last position (default: True) | | `--max` | `-n` | Max videos to process (0 = all) | ### Smart Delay Logic The batch processor uses **adaptive delays** based on fetch method: - **Direct fetch success:** 2-5 seconds (low risk with rotating IPs) - **Proxy fetch success:** 5-15 seconds - **Whisper/failed:** Uses configured delay-min/delay-max This dramatically speeds up channels with native captions while remaining cautious after failures. ### Resume Support Batch processing saves state after each video to `.batch_state.json` in the output directory. State includes: - Completed video IDs - Stats (success, failed, skipped, rate_limited, whisper) - Current video being processed - Current method (fetching, whisper) If interrupted, simply re-run the same command to resume from where it left off. ## Supervisor (Auto-Restart, Hung Detection & Rich Progress) The supervisor provides: - **nvtop-style progress display** with GPU monitoring - **Auto-restart** if batch processes crash - **Hung detection** - automatically restarts stalled processes - **Multi-job management** for parallel batch processing - **JSON status** for agent integration ### Run with Supervisor ```bash cd .pi/skills/ingest-youtube # Single batch with supervisor uv run python supervisor.py run \ --input videos.txt \ --output ./transcripts \ --delay-min 5 --delay-max 15 # Multiple batches from config uv run python supervisor.py multi --config batches.json ``` ### Config File Format (batches.json) ```json { "jobs": [ { "name": "Luetin09", "input": "/path/to/luetin09_videos.txt", "output": "/path/to/luetin09", "delay_min": 5, "delay_max": 15, "max_restarts": 20, "hung_timeout": 600 }, { "name": "Remembrancer", "input": "/path/to/remembrancer_videos.txt", "output": "/path/to/remembrancer", "delay_min": 5, "delay_max": 15, "max_restarts": 20, "hung_timeout": 2700 } ] } ``` **Config options:** | Option | Description | Default | |--------|-------------|---------| | `name` | Job display name | filename | | `input` | Path to video IDs file | required | | `output` | Output directory | required | | `delay_min` | Min delay seconds | 30 | | `delay_max` | Max delay seconds | 60 | | `max_restarts` | Max automatic restarts | 10 | | `hung_timeout` | Seconds without progress before restart | 1800 | **Hung timeout recommendations:** - Channels with native captions: 600 seconds (10 min) - Channels requiring Whisper (no captions): 2700 seconds (45 min) - allows time for long video transcription ### Check Status (Agent-Friendly) ```bash # Human-readable output uv run python supervisor.py status --output ./transcripts # JSON output for agents uv run python supervisor.py status --output ./transcripts --json ``` **JSON output:** ```json { "output_dir": "./transcripts", "completed": 150, "stats": {"success": 145, "failed": 3, "skipped": 2, "rate_limited": 0, "whisper": 50}, "current_video": "dQw4w9WgXcQ", "current_method": "whisper", "last_updated": "2026-01-21 15:30:00", "consecutive_failures": 0 } ``` ## Local Whisper (Free & Fast) Tier 3 uses **faster-whisper** (CTranslate2 optimized) by default: - **Free** - no API key or costs - **4-8x faster** than openai-whisper - **GPU accelerated** - uses CUDA float16 when available - **VAD filtering** - skips silence for additional speed - **Automatic fallback** - falls back to OpenAI API if local fails Model size: `base` (good balance of speed/quality) **Performance benchmarks:** | Video Length | GPU Time | CPU Time | |--------------|----------|----------| | 10 min | ~30 sec | ~2 min | | 1 hour | ~3 min | ~15 min | | 10 hours | ~15 min | ~90 min | ## Rate Estimates | Channel Type | Native Captions | Rate | ETA (1000 videos) | |--------------|-----------------|------|-------------------| | Most channels | Yes | ~200-400/hr | 3-5 hours | | No captions (e.g., Remembrancer) | No (100% Whisper) | ~4-12/hr | 80-250 hours | ## Cross-Agent Monitoring ### Option 1: HTTP API (Recommended) Start the status API server: ```bash cd .pi/skills/ingest-youtube uv run python status_api.py --port 8765 ``` Then from any project/agent: ```bash # List all jobs curl http://localhost:8765/ # Get all job status (aggregated) curl http://localhost:8765/all # Get specific job status curl http://localhost:8765/status/luetin09 curl http://localhost:8765/status/remembrancer ``` **Response format:** ```json { "jobs": { "luetin09": {"completed": 535, "stats": {...}, "current_video": "..."}, "remembrancer": {"completed": 183, "stats": {...}, "current_video": "..."} }, "totals": {"completed": 718, "success": 88, "failed": 2, "whisper": 24} } ``` ### Option 2: CLI Script ```bash # Simple status check python .pi/skills/ingest-youtube/status.py /path/to/transcripts # Watch mode (updates every 5s) python .pi/skills/ingest-youtube/status.py /path/to/transcripts --watch ``` ### Option 3: Direct File Read ```bash # Read state file directly (no dependencies) cat /path/to/transcripts/.batch_state.json | jq '{completed: .completed | length, stats, current_video, last_updated}' ``` **Active batch locations:** - Luetin09: `/home/graham/workspace/experiments/pi-mono/run/youtube-transcripts/luetin09` - Remembrancer: `/home/graham/workspace/experiments/pi-mono/run/youtube-transcripts/remembrancer` ## Troubleshooting ### Process appears hung The supervisor monitors progress and auto-restarts hung processes. If a process shows no progress for `hung_timeout` seconds, it will be killed and restarted. **Symptoms:** - `current_method: "fetching"` for >10 minutes → likely hung - `current_method: "whisper"` for >45 minutes on non-huge video → may be hung **Manual restart:** ```bash # Kill specific batch process kill -9 # Supervisor will auto-restart it ``` ### All videos using Whisper fallback Some channels have NO native captions. Check: ```bash uv run python youtube_transcript.py list-languages -i VIDEO_ID ``` If empty, the video has no captions and requires Whisper. ### Rate limiting despite proxy - Check proxy is configured: `uv run python youtube_transcript.py check-proxy` - Verify .env file has correct credentials - Try `--test-rotation` to verify IP rotation ## Limitations - Tier 1-2 require captions (auto-generated or manual) - Tier 3 (Whisper) works for any video but takes longer - Private/unlisted videos may not be accessible - Very long videos (>10 hours) may take 15-45 minutes with Whisper - Some videos may fail due to regional restrictions or removal