--- name: elevenlabs-narration description: ElevenLabs TTS integration for video narration. Use when generating voiceover audio, selecting voices, or building script-to-audio pipelines tags: [video, audio, narration, tts, elevenlabs, voice, speech] context: fork agent: demo-producer user-invocable: false version: 1.0.0 complexity: low --- # ElevenLabs Narration for Video Production Complete integration guide for using ElevenLabs text-to-speech in video production pipelines. Covers voice selection, timing calculations, API patterns, and cost optimization for professional narration. ## Overview - Generating narration audio for video segments - Selecting appropriate voices for content type - Calculating segment timing from frames to milliseconds - Building script-to-audio pipelines - Optimizing API usage and costs - Handling rate limits and errors ## ElevenLabs API Overview ### Model Comparison (2026) | Model | Latency | Quality | Cost | Best For | |-------|---------|---------|------|----------| | **eleven_multilingual_v2** | Medium | Best | $0.30/1K chars | Production, multilingual | | **eleven_turbo_v2_5** | Low | Excellent | $0.18/1K chars | Real-time, drafts | | **eleven_flash_v2_5** | Lowest | Good | $0.08/1K chars | Previews, testing | | **eleven_english_sts_v2** | Medium | Best | $0.30/1K chars | Speech-to-speech | ### API Endpoints ``` Base URL: https://api.elevenlabs.io/v1 POST /text-to-speech/{voice_id} # Generate audio POST /text-to-speech/{voice_id}/stream # Stream audio GET /voices # List voices GET /voices/{voice_id} # Voice details GET /user # Usage/quota POST /speech-to-speech/{voice_id} # Voice conversion ``` ## Core Integration Pattern ### Basic Text-to-Speech ```typescript import { ElevenLabsClient } from 'elevenlabs'; const client = new ElevenLabsClient({ apiKey: process.env.ELEVENLABS_API_KEY }); async function generateNarration( text: string, voiceId: string = 'Rachel' ): Promise { const audio = await client.generate({ voice: voiceId, text: text, model_id: 'eleven_multilingual_v2', voice_settings: { stability: 0.5, similarity_boost: 0.8, style: 0.0, use_speaker_boost: true } }); // Convert stream to buffer const chunks: Buffer[] = []; for await (const chunk of audio) { chunks.push(chunk); } return Buffer.concat(chunks); } ``` ## Voice Selection Quick Reference ### Pre-Built Voices for Video Narration | Voice | ID | Characteristics | Use Case | |-------|-----|-----------------|----------| | **Rachel** | 21m00Tcm4TlvDq8ikWAM | Warm, conversational | General narration | | **Adam** | pNInz6obpgDQGcFmaJgB | Deep, authoritative | Tech explainers | | **Antoni** | ErXwobaYiN019PkySvjV | Energetic, youthful | Product demos | | **Bella** | EXAVITQu4vr4xnSDxMaL | Friendly, engaging | Tutorials | | **Josh** | TxGEqnHWrfWFTfGW9XjX | Deep, narrative | Documentaries | ### Voice Settings Explained ```typescript interface VoiceSettings { stability: number; // 0.0-1.0 (lower = more expressive) similarity_boost: number; // 0.0-1.0 (higher = closer to original) style: number; // 0.0-1.0 (v2 models only) use_speaker_boost: boolean; // Clarity enhancement } // Recommended settings by content type const VOICE_PRESETS = { narration: { stability: 0.65, similarity_boost: 0.8, style: 0.0 }, conversational: { stability: 0.4, similarity_boost: 0.75, style: 0.2 }, dramatic: { stability: 0.3, similarity_boost: 0.9, style: 0.5 }, professional: { stability: 0.8, similarity_boost: 0.85, style: 0.0 }, energetic: { stability: 0.35, similarity_boost: 0.85, style: 0.4 } }; ``` ## Segment Timing Calculations ### Frame-to-Milliseconds Conversion ```typescript function framesToMs(frames: number, fps: number = 30): number { return Math.round((frames / fps) * 1000); } function msToFrames(ms: number, fps: number = 30): number { return Math.round((ms / 1000) * fps); } // Examples framesToMs(90, 30); // 3000ms (3 seconds at 30fps) framesToMs(150, 30); // 5000ms (5 seconds at 30fps) msToFrames(2500, 30); // 75 frames ``` ### Words Per Minute Reference ``` Speaking Speed WPM Words/30s Use Case ---------------------------------------------------------- Slow (dramatic) 100 50 Hooks, reveals Normal narration 130-150 65-75 Standard content Conversational 150-170 75-85 Tutorials, demos Fast (excited) 170-190 85-95 Features, energy Very fast 200+ 100+ Avoid (unclear) ``` ## Remotion Integration ### Audio Component for Remotion ```typescript import { Audio, Sequence, useVideoConfig } from 'remotion'; interface NarrationProps { audioUrl: string; startFrame: number; volume?: number; } export const Narration: React.FC = ({ audioUrl, startFrame, volume = 1 }) => { return (