--- name: elevenlabs-speech description: Text-to-Speech and Speech-to-Text using ElevenLabs AI. Use when the user wants to convert text to speech, transcribe voice messages, or work with voice in multiple languages. Supports high-quality AI voices and accurate transcription. --- # ElevenLabs Speech Complete voice solution — both TTS and STT using one API: - **TTS**: Text-to-Speech (high-quality voices) - **STT**: Speech-to-Text via Scribe (accurate transcription) ## Quick Start ### Environment Setup Set your API key: ```bash export ELEVENLABS_API_KEY="sk_..." ``` Or create `.env` file in workspace root. ### Text-to-Speech (TTS) Convert text to natural-sounding speech: ```bash python scripts/elevenlabs_speech.py tts -t "Hello world" -o greeting.mp3 ``` With custom voice: ```bash python scripts/elevenlabs_speech.py tts -t "Hello" -v "voice_id_here" -o output.mp3 ``` ### List Available Voices ```bash python scripts/elevenlabs_speech.py voices ``` ## Using in Code ```python from scripts.elevenlabs_speech import ElevenLabsClient client = ElevenLabsClient(api_key="sk_...") # Basic TTS result = client.text_to_speech( text="Hello from zerox", output_path="greeting.mp3" ) # With custom settings result = client.text_to_speech( text="Your text here", voice_id="21m00Tcm4TlvDq8ikWAM", # Rachel stability=0.5, similarity_boost=0.75, output_path="output.mp3" ) # Get available voices voices = client.get_voices() for voice in voices['voices']: print(f"{voice['name']}: {voice['voice_id']}") ``` ## Popular Voices | Voice ID | Name | Description | |----------|------|-------------| | `21m00Tcm4TlvDq8ikWAM` | Rachel | Natural, versatile (default) | | `AZnzlk1XvdvUeBnXmlld` | Domi | Strong, energetic | | `EXAVITQu4vr4xnSDxMaL` | Bella | Soft, soothing | | `ErXwobaYiN019PkySvjV` | Antoni | Well-rounded | | `MF3mGyEYCl7XYWbV9V6O` | Elli | Warm, friendly | | `TxGEqnHWrfWFTfGW9XjX` | Josh | Deep, calm | | `VR6AewLTigWG4xSOukaG` | Arnold | Authoritative | ## Voice Settings - **stability** (0-1): Lower = more emotional, Higher = more stable - **similarity_boost** (0-1): Higher = closer to original voice Default: stability=0.5, similarity_boost=0.75 ## Models - `eleven_turbo_v2_5` - Fast, high quality (default) - `eleven_multilingual_v2` - Best for non-English - `eleven_monolingual_v1` - English only ## Integration with Telegram When user sends text and wants voice reply: ```python # Generate speech result = client.text_to_speech(text=user_text, output_path="reply.mp3") # Send via Telegram message tool with media path message(action="send", media="path/to/reply.mp3", as_voice=True) ``` ## Pricing Check https://elevenlabs.io/pricing for current rates. Free tier available! ## Speech-to-Text (STT) with ElevenLabs Scribe Transcribe voice messages using ElevenLabs Scribe: ### Transcribe Audio ```bash python scripts/elevenlabs_scribe.py voice_message.ogg ``` With specific language: ```bash python scripts/elevenlabs_scribe.py voice_message.ogg --language ara ``` With speaker diarization (multiple speakers): ```bash python scripts/elevenlabs_scribe.py voice_message.ogg --speakers 2 ``` ### Using in Code ```python from scripts.elevenlabs_scribe import ElevenLabsScribe client = ElevenLabsScribe(api_key="sk-...") # Basic transcription result = client.transcribe("voice_message.ogg") print(result['text']) # With language hint (improves accuracy) result = client.transcribe("voice_message.ogg", language_code="ara") # With speaker detection result = client.transcribe("voice_message.ogg", num_speakers=2) ``` ### Supported Formats - mp3, mp4, mpeg, mpga, m4a, wav, webm - Max file size: 100 MB - Works great with Telegram voice messages (`.ogg`) ### Language Support Scribe supports 99 languages including: - Arabic (`ara`) - English (`eng`) - Spanish (`spa`) - French (`fra`) - And many more... Without language hint, it auto-detects. ## Complete Workflow Example **User sends voice message → You reply with voice:** ```python from scripts.elevenlabs_scribe import ElevenLabsScribe from scripts.elevenlabs_speech import ElevenLabsClient # 1. Transcribe user's voice message stt = ElevenLabsScribe() transcription = stt.transcribe("user_voice.ogg") user_text = transcription['text'] # 2. Process/understand the text # ... your logic here ... # 3. Generate response text response_text = "Your response here" # 4. Convert to speech tts = ElevenLabsClient() tts.text_to_speech(response_text, output_path="reply.mp3") # 5. Send voice reply message(action="send", media="reply.mp3", as_voice=True) ``` ## Pricing Check https://elevenlabs.io/pricing for current rates: **TTS (Text-to-Speech):** - Free tier: 10,000 characters/month - Paid plans available **STT (Speech-to-Text) - Scribe:** - Free tier available - Check website for current pricing