--- name: transcribe-video description: Generate subtitles (SRT/VTT) and plain text transcripts from video or audio files using AWS Transcribe. Use when creating captions, extracting spoken content, generating transcripts for notes, or making video content searchable. argument-hint: "[file] [language-code]" allowed-tools: Bash(ffmpeg:*), Bash(aws:*), Bash(ls:*), Bash(rm:*), Bash(which:*) --- # Video Transcription Skill Generate subtitles and transcripts from `$ARGUMENTS` (a video or audio file path, optionally followed by a language code like `en-US` or `es-ES`) using AWS Transcribe. Outputs `.srt`, `.vtt`, and `.txt` files next to the source file. ## Process 1. **Verify prerequisites** - check `ffmpeg` and `aws` CLI are installed and configured 2. **Extract audio** from the video as MP3 using ffmpeg 3. **Create temporary S3 bucket**, upload audio 4. **Run AWS Transcribe** job with SRT and VTT subtitle output 5. **Download results** and generate plain text transcript 6. **Clean up all AWS resources** - delete S3 bucket, Transcribe job, and temp files. No recurring costs. ## Prerequisites - `ffmpeg` installed (`brew install ffmpeg`) - `aws` CLI installed and configured with valid credentials (`brew install awscli && aws configure`) - AWS credentials need permissions for: `s3:*` (create/delete buckets), `transcribe:*` (start/delete jobs) ## Step-by-Step ### Step 1: Extract audio ```bash ffmpeg -i "input.mp4" -vn -acodec mp3 -q:a 2 "/tmp/transcribe-audio.mp3" -y ``` ### Step 2: Create temp S3 bucket and upload ```bash BUCKET="tmp-transcribe-$(date +%s)" aws s3 mb "s3://$BUCKET" --region us-east-1 aws s3 cp "/tmp/transcribe-audio.mp3" "s3://$BUCKET/audio.mp3" ``` ### Step 3: Start transcription job ```bash JOB_NAME="tmp-job-$(date +%s)" aws transcribe start-transcription-job \ --transcription-job-name "$JOB_NAME" \ --language-code en-US \ --media-format mp3 \ --media "MediaFileUri=s3://$BUCKET/audio.mp3" \ --subtitles "Formats=srt,vtt" \ --output-bucket-name "$BUCKET" \ --region us-east-1 ``` **Language codes:** `en-US`, `es-ES`, `fr-FR`, `de-DE`, `pt-BR`, `ja-JP`, `zh-CN`, `it-IT`, `ko-KR`, etc. Default to `en-US` if not specified. ### Step 4: Poll until complete ```bash while true; do STATUS=$(aws transcribe get-transcription-job \ --transcription-job-name "$JOB_NAME" \ --region us-east-1 \ --query 'TranscriptionJob.TranscriptionJobStatus' \ --output text) if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then break; fi sleep 5 done ``` ### Step 5: Download subtitle files Save `.srt` and `.vtt` next to the original file: ```bash aws s3 cp "s3://$BUCKET/$JOB_NAME.srt" "/path/to/input.srt" aws s3 cp "s3://$BUCKET/$JOB_NAME.vtt" "/path/to/input.vtt" ``` ### Step 6: Generate plain text transcript Download the JSON result and extract the full transcript text: ```bash aws s3 cp "s3://$BUCKET/$JOB_NAME.json" "/tmp/transcribe-result.json" ``` Then use a tool to extract the `.results.transcripts[0].transcript` field from the JSON and save it as a `.txt` file next to the original. ### Step 7: Clean up everything **IMPORTANT:** Always clean up to avoid recurring S3 storage costs. ```bash # Delete S3 bucket and all contents aws s3 rb "s3://$BUCKET" --force --region us-east-1 # Delete the transcription job aws transcribe delete-transcription-job --transcription-job-name "$JOB_NAME" --region us-east-1 # Delete temp audio file rm -f "/tmp/transcribe-audio.mp3" "/tmp/transcribe-result.json" ``` ## Real-World Results (Reference) From actual transcription runs: | Video | Duration | Audio Size | Transcribe Time | Subtitle Segments | |-------|----------|------------|-----------------|-------------------| | X/Twitter clip | 2:40 | 2.5 MB | ~20 seconds | 83 | | Screen recording | 18:45 | 11.4 MB | ~60 seconds | 500+ | ### Key Insights 1. **AWS Transcribe is fast** - even 19-minute videos complete in about a minute 2. **Short-form content** (tweets, reels) transcribes almost instantly 3. **Cost is negligible** - AWS Transcribe charges ~$0.024/min, so a 19-min video costs ~$0.46 4. **Cleanup is critical** - always delete the S3 bucket to avoid storage charges 5. **SRT is most compatible** - works with most video players and editors; VTT is better for web ## Output Files ``` original-video.mp4 original-video.srt # Subtitles with timestamps (most compatible) original-video.vtt # Web-optimized subtitles (for HTML5 ) original-video.txt # Plain text transcript (no timestamps) ``` ## After Transcription 1. **Verify all output files exist**: `ls -lh /path/to/original-video.{srt,vtt,txt}` 2. Report the number of subtitle segments and total duration 3. Confirm all AWS resources have been cleaned up (no S3 buckets, no Transcribe jobs remaining)