--- name: autocut-shorts description: Main orchestration skill for automatic creation of short-form content (TikTok, YouTube Shorts, Instagram Reels) from long videos. Fully automated workflow: download video, transcribe, detect highlights (transcript + laughter + sentiment + scenes), trim segments, resize to 9:16 portrait, and add subtitles. Finds viral-worthy moments like OpusClip and Vizard.ai. allowed-tools: Bash(ffmpeg:*) Bash(yt-dlp:*) Bash(python:*) compatibility: Requires all trimer-clip dependencies and FFmpeg metadata: version: "1.0" platforms: "TikTok, YouTube Shorts, Instagram Reels, Facebook Reels" --- # Autocut Shorts This is the main orchestration skill that combines all other skills to automatically create short-form content from long videos. ## What It Does This skill automates the entire workflow: 1. **Download** video from YouTube URL (if provided) 2. **Transcribe** audio using Whisper or Gemini API 3. **Perform speaker diarization** (pyannote or Gemini) - identifies who speaks when 4. **Detect highlights** using combined analysis: - Transcript analysis (hooks, viral phrases) - Speaker dynamics (debates, interactions, overlapping speech) - Laughter detection (humorous moments) - Sentiment analysis (emotional peaks) - Scene detection (cut points) 5. **Select** best segments (15-60 seconds each) 6. **Trim** video to highlight segments 7. **Resize** to 9:16 portrait format (1080x1920) 8. **Add** burned-in subtitles with speaker labels 9. **Export** multiple clips ready for upload ## When to Use - User wants to create TikTok clips from a YouTube video - Converting podcasts to short-form content - Finding viral moments in vlogs or tutorials - Repurposing gaming content for Shorts/Reels - Batch processing multiple videos ## Available Scripts ### `scripts/autocut.py` Main autocut workflow script. **Usage:** ```bash python skills/autocut-shorts/scripts/autocut.py [options] ``` **Options:** - `--source`: Source type (file, youtube) - auto-detected - `--num-clips`: Number of clips to generate (default: 5) - `--min-duration`: Minimum clip duration in seconds (default: 15) - `--max-duration`: Maximum clip duration in seconds (default: 60) - `--platform`: Target platform (tiktok, shorts, reels, facebook) - default: tiktok - `--output-dir`: Output directory (default: `./shorts/`) - `--transcription-model`: Transcription model (auto, whisper, gemini) - default: auto - `--diarization-model`: Speaker diarization (auto, pyannote, gemini, none) - default: auto - `--huggingface-token`: HuggingFace token for pyannote (or use env var) - `--focus-speaker`: Extract clips only for specific speaker (SPEAKER_00, etc.) - `--gemini-api-key`: Gemini API key (or use env var) - `--skip-transcribe`: Skip transcription if already have transcript - `--skip-diarization`: Skip speaker diarization - `--skip-scenes`: Skip scene detection - `--skip-laughter`: Skip laughter detection - `--skip-sentiment`: Skip sentiment analysis - `--transcript-path`: Use existing transcript file - `--style`: Subtitle style (tiktok, shorts, reels) - default: tiktok **Examples:** Basic autocut from file: ```bash python skills/autocut-shorts/scripts/autocut.py video.mp4 ``` Autocut from YouTube URL: ```bash python skills/autocut-shorts/scripts/autocut.py "https://www.youtube.com/watch?v=VIDEO_ID" ``` Generate 10 clips for Instagram Reels: ```bash python skills/autocut-shorts/scripts/autocut.py video.mp4 --num-clips 10 --platform reels --style reels ``` Use Gemini for transcription: ```bash python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcription-model gemini ``` Custom duration range: ```bash python skills/autocut-shorts/scripts/autocut.py video.mp4 --min-duration 20 --max-duration 45 ``` Use existing transcript: ```bash python skills/autocut-shorts/scripts/autocut.py video.mp4 --transcript-path video.srt --skip-transcribe ``` ### `scripts/quick_cut.py` Quick cut without full analysis (faster). **Usage:** ```bash python skills/autocut-shorts/scripts/quick_cut.py [options] ``` **Options:** - `--timestamps`: JSON file with timestamps to cut - `--output-dir`: Output directory - `--platform`: Target platform **Example:** ```bash python skills/autocut-shorts/scripts/quick_cut.py video.mp4 --timestamps cuts.json ``` ## Workflow Steps ### Step 1: Download (Optional) If URL provided: - Downloads from YouTube using yt-dlp - Best quality MP4 - Saves to temp directory ### Step 2: Transcribe Extracts audio and transcribes: - **Auto mode**: Chooses based on requirements - **Whisper**: Local processing, good for privacy - **Gemini**: Cloud processing, better quality + features ### Step 3: Detect Highlights Runs detection modules: - **Transcript analysis**: Viral phrases, hooks, questions - **Laughter detection**: Funny moments (if enabled) - **Sentiment analysis**: Emotional peaks (if enabled) - **Scene detection**: Visual cut points (if enabled) ### Step 4: Score and Rank Combines all signals: ``` Virality Score = 35% Transcript (hooks, viral content) + 25% Laughter (humor) + 25% Sentiment (emotion) + 15% Scenes (visual transitions) ``` Ranks all segments and selects top N. ### Step 5: Trim For each highlight: - Extends 2-3 seconds before/after for context - Trims using FFmpeg (stream copy for speed) - Validates duration constraints ### Step 6: Resize to Portrait Converts to 9:16: - Smart crop (focus on subjects) - 1080x1920 resolution - Maintains quality ### Step 7: Add Subtitles Burns in captions: - Platform-specific styling - White text with black outline - Bottom position - Readable size (24-28px) ### Step 8: Export Saves final clips: - Named: `{original}_short_{index}.mp4` - Organized in output directory - JSON report with metadata ## Output Format ### Directory Structure ``` shorts/ video_short_001.mp4 video_short_002.mp4 video_short_003.mp4 report.json ``` ### JSON Report ```json { "success": true, "source": { "type": "youtube", "url": "https://youtube.com/watch?v=...", "title": "Video Title", "duration": 1200.5 }, "processing": { "transcription_model": "gemini-flash-lite-latest", "detection_methods": ["transcript", "laughter", "sentiment", "scenes"], "platform": "tiktok" }, "results": { "total_clips": 5, "clips": [ { "rank": 1, "filename": "video_short_001.mp4", "start_time": 45.2, "end_time": 72.5, "duration": 27.3, "virality_score": 0.92, "text": "This is the key moment...", "output_path": "shorts/video_short_001.mp4" } ], "total_duration": 135.5, "avg_virality_score": 0.78 }, "performance": { "total_time": 180.5, "transcription_time": 45.2, "analysis_time": 67.3, "processing_time": 68.0 } } ``` ## Platform Presets ### TikTok - Resolution: 1080x1920 - Duration: 15-60 seconds - Subtitle style: TikTok - Output naming: `_tiktok_{index}.mp4` ### YouTube Shorts - Resolution: 1080x1920 - Duration: 15-60 seconds - Subtitle style: Shorts - Output naming: `_shorts_{index}.mp4` ### Instagram Reels - Resolution: 1080x1920 - Duration: 15-90 seconds - Subtitle style: Reels - Output naming: `_reels_{index}.mp4` ### Facebook Reels - Resolution: 1080x1920 - Duration: 15-90 seconds - Subtitle style: Default - Output naming: `_facebook_{index}.mp4` ## Viral Detection Algorithm ### High-Value Signals **Transcript (35% weight):** - Viral phrases ("you won't believe", "this changes everything") - Hooks ("let me tell you", "here's the secret") - Questions and answers - Story beats **Laughter (25% weight):** - Explicit laughter markers - High-confidence laughter detection - Audience reactions **Sentiment (25% weight):** - Positive emotions (excitement, joy) - Surprise moments - Negative emotions (controversy, drama) - Emotional intensity > 0.7 **Scenes (15% weight):** - Scene transitions - Visual changes - Topic shifts ### Scoring ```python virality_score = ( transcript_score * 0.35 + laughter_score * 0.25 + sentiment_score * 0.25 + scene_score * 0.15 ) ``` **Premium Clips (0.8-1.0):** Must include **Excellent Clips (0.6-0.8):** High priority **Good Clips (0.4-0.6):** Consider including ## AI Agent Decision Guide ### When to Use pyannote vs Gemini Diarization **Default Behavior (--diarization-model auto):** The AI agent automatically selects based on context: ```python # Use pyannote when: if "podcast" in user_request or "interview" in user_request: return "pyannote" # Multi-speaker, needs accuracy if "accurate" in user_request or "precise" in user_request: return "pyannote" # User explicitly wants accuracy if "panel" in user_request or "debate" in user_request: return "pyannote" # Complex multi-speaker scenarios if "overlapping" in user_request or "talk over" in user_request: return "pyannote" # Overlapping speech detection if "privacy" in user_request or "offline" in user_request: return "pyannote" # Local processing needed # Use Gemini when: if "quick" in user_request or "fast" in user_request: return "gemini" # Speed priority if "single speaker" in user_request or "monologue" in user_request: return "gemini" # Simple scenario if "no diarization" in user_request or "skip speakers" in user_request: return "none" # User doesn't want speaker detection # Default for ambiguous cases: return "pyannote" if likely_multi_speaker(video) else "gemini" ``` **Decision Matrix:** | Scenario | Recommended | Reason | |----------|-------------|--------| | Podcast with 2-3 hosts | **pyannote** | High accuracy for multi-speaker | | Interview (host + guest) | **pyannote** | Precise speaker separation | | Panel discussion | **pyannote** | Handles 4+ speakers well | | Single speaker vlog | **gemini** | Faster, good enough | | Gaming commentary | **gemini** | Usually 1-2 speakers | | Tutorial video | **gemini** | Single speaker, speed matters | | Debate/competitive | **pyannote** | Overlapping speech detection | | Privacy-sensitive | **pyannote** | Local processing | **Examples by Use Case:** ```bash # Podcast - use pyannote automatically python skills/autocut-shorts/scripts/autocut.py podcast.mp4 # Interview - use pyannote for accuracy python skills/autocut-shorts/scripts/autocut.py interview.mp4 # Vlog - use gemini (single speaker, faster) python skills/autocut-shorts/scripts/autocut.py vlog.mp4 # Force pyannote explicitly python skills/autocut-shorts/scripts/autocut.py video.mp4 --diarization-model pyannote # Skip diarization for simple content python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --diarization-model none # Extract only host's segments python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --focus-speaker SPEAKER_00 ``` ### Smart Defaults **The agent automatically detects:** 1. Content type (podcast, vlog, tutorial, gaming, etc.) 2. Likely speaker count based on audio patterns 3. User priority (speed vs accuracy vs privacy) 4. Available resources (GPU, internet, API keys) **Override any time:** Users can always override with `--diarization-model` flag. ## Integration This skill uses all other skills: - `youtube-downloader`: Download from URL - `video-transcriber`: Transcribe audio - `scene-detector`: Find visual cut points - `laughter-detector`: Find funny moments - `sentiment-analyzer`: Find emotional peaks - `highlight-scanner`: Combine all signals - `video-trimmer`: Cut segments - `portrait-resizer`: Convert to 9:16 - `subtitle-overlay`: Add captions ## Common Use Cases ### Podcast to Shorts ```bash python skills/autocut-shorts/scripts/autocut.py podcast.mp4 --num-clips 10 --platform shorts ``` ### Vlog Highlights ```bash python skills/autocut-shorts/scripts/autocut.py vlog.mp4 --num-clips 5 --platform tiktok ``` ### YouTube to TikTok ```bash python skills/autocut-shorts/scripts/autocut.py "https://youtube.com/watch?v=..." --platform tiktok ``` ### Tutorial Clips ```bash python skills/autocut-shorts/scripts/autocut.py tutorial.mp4 --min-duration 30 --max-duration 60 ``` ## Performance **Processing Time (approximate):** - 1-minute video: ~30-60 seconds - 10-minute video: ~3-5 minutes - 30-minute video: ~8-12 minutes - 1-hour video: ~15-25 minutes **Breakdown:** - Download: 5-30 seconds (depends on video) - Transcription: 20-60 seconds - Detection: 10-30 seconds per method - Trimming: 1-5 seconds per clip - Resizing: 5-10 seconds per clip - Subtitles: 5-10 seconds per clip ## Error Handling - **Download failure**: Retries up to 3 times - **Transcription failure**: Falls back to alternative model - **No highlights found**: Returns error with suggestions - **Processing failure**: Reports which step failed - **Partial success**: Reports successful clips vs failed ## Tips - Use Gemini transcription for best highlight detection - Provide more clips requested than needed (filter by score) - 15-30 second clips perform best on TikTok - 30-60 second clips work well for Shorts/Reels - Keep 2-3 second buffer around highlights - Test different platforms for best engagement - Use transcript-only mode for faster processing - Batch process multiple videos for efficiency ## References - OpusClip: https://www.opus.pro/ - Vizard.ai: https://vizard.ai/ - TikTok specs: https://www.tiktok.com/business/en-US/solutions/tiktok-specs - YouTube Shorts specs: https://support.google.com/youtube/answer/10059066 - Instagram Reels specs: https://help.instagram.com/609412256345459