--- title: Text-to-Speech (TTS) Integration Guide description: Complete guide to NeuroLink's TTS capabilities for generating high-quality audio from text and AI responses keywords: tts, text-to-speech, audio, voice, speech synthesis, google cloud tts, audio generation --- # Text-to-Speech (TTS) Integration Guide NeuroLink provides integrated Text-to-Speech (TTS) capabilities, allowing you to generate high-quality audio from text prompts or AI-generated responses. This feature is perfect for voice assistants, accessibility features, narration, podcasts, and more. ## Overview **Key Features:** - **Multiple providers** - Google Cloud TTS, OpenAI TTS, ElevenLabs, Azure TTS, Fish Audio, and Cartesia - **High-quality voices** - Neural, Wavenet, Standard, and multilingual voice types - **Multiple languages** - 50+ voices across 10+ languages - **Flexible audio formats** - MP3, WAV, OGG/Opus - **Voice customization** - Adjust speed, pitch, and volume - **Two synthesis modes** - Direct text-to-speech OR AI response synthesis - **Production-ready** - Works with Google Cloud, OpenAI, ElevenLabs, Azure, Fish Audio, and Cartesia --- ## Quick Start ### Installation TTS support is built into NeuroLink. No additional installation required. ### Environment Setup Set the appropriate environment variables for your chosen TTS provider: ```bash # Google AI Studio (google-ai TTS uses the Google Cloud TTS SDK; ADC / service # account is required — there is no API-key auth for the TTS handler itself) export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" # (GOOGLE_AI_API_KEY is used by the LLM/STT side of `google-ai`, not TTS.) # Google Vertex AI (vertex) — service account recommended for production export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json" # OpenAI TTS (openai-tts) export OPENAI_API_KEY="your-openai-api-key" # ElevenLabs (elevenlabs) export ELEVENLABS_API_KEY="your-elevenlabs-api-key" # Azure TTS (azure-tts) export AZURE_SPEECH_KEY="your-azure-speech-key" export AZURE_SPEECH_REGION="eastus" # or your Azure region # Fish Audio TTS (fish-audio) — low-cost, voice cloning, 14 languages export FISH_AUDIO_API_KEY="your-fish-audio-api-key" # Cartesia TTS (cartesia) — low-latency Sonic models, voice cloning export CARTESIA_API_KEY="sk_car_..." ``` **Google API Key Configuration:** If using API key authentication for Google, enable both APIs in Google Cloud Console: 1. Navigate to "APIs & Services" > "Credentials" 2. Create or select your API key 3. Under "API restrictions", enable: - **Generative Language API** (for Gemini) - **Cloud Text-to-Speech API** (for TTS) ### Basic Usage **CLI:** ```bash # Generate and play audio automatically neurolink generate "Hello, world!" \ --provider google-ai \ --tts-voice en-US-Neural2-C # Save to file neurolink generate "Welcome to our application" \ --provider google-ai \ --tts-voice en-US-Neural2-C \ --tts-output welcome.mp3 ``` **SDK:** ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Hello, world!" }, provider: "google-ai", tts: { enabled: true, voice: "en-US-Neural2-C", format: "mp3", play: true, // Auto-play in CLI, manual in SDK }, }); // Access generated audio console.log("Audio size:", result.audio?.size, "bytes"); console.log("Audio format:", result.audio?.format); ``` --- ## Supported Providers TTS is available through the following providers: | Provider | Authentication | Voices / Models | Notes | | -------------- | ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **google-ai** | Service Account (`GOOGLE_APPLICATION_CREDENTIALS`) | 50+ voices (Neural2, Wavenet, Standard) | Same auth as `vertex` (TTS uses Google Cloud Text-to-Speech client) | | **vertex** | Service Account (`GOOGLE_APPLICATION_CREDENTIALS`) | 50+ voices (Neural2, Wavenet, Standard) | Recommended for production | | **openai-tts** | API Key (`OPENAI_API_KEY`) | 6 voices: alloy, echo, fable, onyx, nova, shimmer; models: tts-1, tts-1-hd | Good default quality | | **elevenlabs** | API Key (`ELEVENLABS_API_KEY`) | Multilingual voices; model: eleven_multilingual_v2 | High-quality multilingual synthesis | | **azure-tts** | API Key (`AZURE_SPEECH_KEY` + region `AZURE_SPEECH_REGION`) | Neural voices with SSML support | Enterprise-grade Azure Speech | | **fish-audio** | API Key (`FISH_AUDIO_API_KEY`) | 14 languages, voice cloning (15 s reference); models: `s1` (default), `speech-1.6`, `speech-1.5` | Low-cost, ~80% cheaper than ElevenLabs — see [provider guide](../getting-started/providers/fish-audio.md) | | **cartesia** | API Key (`CARTESIA_API_KEY`) | Cartesia voice library, English-first; models: `sonic-2` (default), `sonic` | Low-latency Sonic models — see [provider guide](../getting-started/providers/cartesia.md). Synchronous `/tts/bytes`; the WebSocket streaming flow is exposed separately via `CartesiaStream` in the voice server. | **Planned for future releases:** - AWS Polly --- ## Voice Selection ### Available Voice Types Google Cloud TTS offers three voice quality tiers: | Voice Type | Quality | Cost | Use Case | Example Voice | | ------------ | ------- | ------ | --------------------------------------- | ------------------ | | **Neural2** | Highest | High | Natural conversations, voice assistants | `en-US-Neural2-C` | | **Wavenet** | High | Medium | Professional narration, podcasts | `en-US-Wavenet-D` | | **Standard** | Good | Low | Cost optimization, bulk generation | `en-US-Standard-B` | ### Voice Discovery Voice identifiers follow Google Cloud TTS naming conventions: `---` (e.g., `en-US-Neural2-C`, `en-GB-Wavenet-D`). Refer to the [Google Cloud TTS voice list](https://cloud.google.com/text-to-speech/docs/voices) for all available voices. ### Supported Languages **English Variants:** - `en-US` - United States English - `en-GB` - British English - `en-AU` - Australian English - `en-IN` - Indian English **Other Languages:** - `es-ES`, `es-US` - Spanish (Spain, Latin America) - `fr-FR`, `fr-CA` - French (France, Canada) - `de-DE` - German - `ja-JP` - Japanese - `hi-IN` - Hindi - `zh-CN`, `zh-TW` - Chinese (Simplified, Traditional) - `pt-BR`, `pt-PT` - Portuguese (Brazil, Portugal) - `it-IT` - Italian - `ko-KR` - Korean - `ru-RU` - Russian ### Voice Selection Guidelines **For Natural Conversations:** ```typescript tts: { voice: "en-US-Neural2-C", // Female, natural // OR voice: "en-US-Neural2-A", // Male, natural } ``` **For Professional Narration:** ```typescript tts: { voice: "en-US-Wavenet-D", // Male, professional // OR voice: "en-GB-Wavenet-A", // British, professional } ``` **For Cost Optimization:** ```typescript tts: { voice: "en-US-Standard-B", // Lower cost } ``` --- ## TTS Synthesis Modes NeuroLink supports two TTS synthesis modes: ### Mode 1: Direct Text-to-Speech (Default) Converts input text directly to speech **without** AI generation. ```typescript const result = await neurolink.generate({ input: { text: "Welcome to our service!" }, provider: "google-ai", tts: { enabled: true, useAiResponse: false, // Default: synthesize input text voice: "en-US-Neural2-C", }, }); // Audio contains: "Welcome to our service!" // No AI generation occurs ``` **Use cases:** - Pre-written scripts - System notifications - Fixed announcements - Voice confirmations ### Mode 2: AI Response Synthesis Generates AI response first, then converts the response to speech. ```typescript const result = await neurolink.generate({ input: { text: "Tell me a joke" }, provider: "google-ai", tts: { enabled: true, useAiResponse: true, // Synthesize AI's response voice: "en-US-Neural2-C", }, }); // AI generates joke text // TTS synthesizes the joke audio // Both text and audio available in result ``` > **Note:** when `useAiResponse: true`, NeuroLink synthesizes the chat > provider's text response. If your chat provider has no TTS counterpart > (e.g. `anthropic`, `bedrock`), set `tts.provider` explicitly — otherwise > stream Mode 2 will throw `No TTS provider resolved for stream Mode 2` > after text generation completes. Chat providers that double as TTS > handlers (e.g. `google-ai`, `vertex`) auto-resolve when `tts.provider` > is omitted. **Use cases:** - Voice assistants - Interactive AI conversations - Dynamic content narration - AI-powered podcasts --- ## Audio Format Options ### Supported Formats | Format | Quality | File Size | Platform Support | Use Case | | ------------ | ------- | -------------------- | ---------------- | ------------------------------ | | **MP3** | Good | Small (~100 KB/min) | All platforms | Default, balanced quality/size | | **WAV** | Best | Large (~1 MB/min) | All platforms | Highest quality, editing | | **OGG/Opus** | Good | Medium (~150 KB/min) | macOS, Linux | Web streaming | ### Format Selection ```typescript // Default: MP3 (balanced quality and size) tts: { voice: "en-US-Neural2-C", format: "mp3" // Default } // Best quality: WAV tts: { voice: "en-US-Neural2-C", format: "wav" } // Web streaming: OGG tts: { voice: "en-US-Neural2-C", format: "ogg" } ``` ### Platform-Specific Considerations **Windows:** - Built-in playback only supports WAV format - Auto-converts to WAV when `play: true` on Windows - Use MP3 for file output, WAV for immediate playback **macOS/Linux:** - All formats supported - `afplay` (macOS) and `ffplay` (Linux) handle all formats - Use MP3 for general purpose --- ## Voice Customization ### Speaking Rate Control speech speed (0.25 to 4.0): ```typescript // Slower (half speed) tts: { voice: "en-US-Neural2-C", speed: 0.5 } // Normal speed (default) tts: { voice: "en-US-Neural2-C", speed: 1.0 // Default } // Faster (double speed) tts: { voice: "en-US-Neural2-C", speed: 2.0 } ``` **CLI:** ```bash neurolink generate "This is faster speech" \ --provider google-ai \ --tts-voice en-US-Neural2-C \ --tts-speed 1.5 ``` ### Pitch Adjustment Adjust voice pitch (-20.0 to 20.0 semitones): ```typescript // Lower pitch (deeper voice) tts: { voice: "en-US-Neural2-C", pitch: -5.0 } // Normal pitch (default) tts: { voice: "en-US-Neural2-C", pitch: 0.0 // Default } // Higher pitch tts: { voice: "en-US-Neural2-C", pitch: 5.0 } ``` **CLI:** ```bash neurolink generate "Higher pitch test" \ --provider google-ai \ --tts-voice en-US-Neural2-C \ --tts-pitch 3.0 ``` ### Volume Adjustment Control output volume (-96.0 to 16.0 dB): ```typescript tts: { voice: "en-US-Neural2-C", volumeGainDb: 0.0 // Default (no change) } ``` --- ## Complete Configuration Reference ### SDK Configuration ```typescript import { NeuroLink } from "@juspay/neurolink"; const neurolink = new NeuroLink(); const result = await neurolink.generate({ input: { text: "Your text here" }, provider: "google-ai", // or "vertex" tts: { enabled: true, // Enable TTS output useAiResponse: false, // false = input text, true = AI response voice: "en-US-Neural2-C", // Voice identifier format: "mp3", // Audio format: "mp3" | "wav" | "ogg" speed: 1.0, // Speaking rate: 0.25-4.0 pitch: 0.0, // Pitch adjustment: -20.0 to 20.0 volumeGainDb: 0.0, // Volume: -96.0 to 16.0 quality: "standard", // Quality: "standard" | "hd" output: "./audio.mp3", // Optional file path play: false, // Auto-play (CLI only) }, }); // Access results console.log("Text:", result.content); console.log("Audio size:", result.audio?.size, "bytes"); console.log("Audio format:", result.audio?.format); console.log("Voice used:", result.audio?.voice); // Save audio to file if (result.audio?.buffer) { import { writeFileSync } from "fs"; writeFileSync("output.mp3", result.audio.buffer); } ``` ### CLI Flags ```bash neurolink generate "Your text" \ --provider google-ai \ --tts \ --tts-provider \ --tts-voice \ --tts-format \ --tts-speed \ --tts-pitch \ --tts-output \ --tts-use-ai-response # --tts : enable TTS (boolean flag, required) # --tts-provider : google-ai|vertex|openai-tts|elevenlabs|azure-tts|fish-audio|cartesia # --tts-voice : voice id (optional — provider default applies) # --tts-format : mp3|wav|ogg (default: mp3) # --tts-speed : 0.25-4.0 (default: 1.0) # --tts-pitch : -20.0 to 20.0 (default: 0.0) # --tts-output : save to file # --tts-use-ai-response : synthesize AI response instead of input text ``` **Selecting a specific TTS provider:** ```bash # Use OpenAI TTS neurolink generate "Hello" --tts --tts-provider openai-tts # Use ElevenLabs neurolink generate "Hello" --tts --tts-provider elevenlabs # Use Azure TTS neurolink generate "Hello" --tts --tts-provider azure-tts # Use Fish Audio neurolink generate "Hello" --tts --tts-provider fish-audio # Use Cartesia neurolink generate "Hello" --tts --tts-provider cartesia ``` --- ## Use Cases & Examples ### 1. Voice Assistant Create a voice assistant that speaks responses: ```typescript const assistant = new NeuroLink(); const response = await assistant.generate({ input: { text: "What's the weather like today?" }, provider: "google-ai", tts: { enabled: true, useAiResponse: true, // Speak AI's weather response voice: "en-US-Neural2-C", play: true, }, }); // AI generates weather info and speaks it ``` ### 2. Accessibility Features Screen reader-style narration for visually impaired users: ```typescript const narration = await neurolink.generate({ input: { text: "Button clicked. Navigation menu opened." }, provider: "google-ai", tts: { enabled: true, voice: "en-US-Neural2-C", speed: 1.2, // Slightly faster for efficiency play: true, }, }); ``` ### 3. Podcast Generation Generate professional podcast intros: ```bash neurolink generate "Welcome to Tech Insights Podcast, episode 42. Today we're discussing the future of AI development." \ --provider google-ai \ --tts-voice en-US-Wavenet-D \ --tts-speed 0.95 \ --tts-format mp3 \ --tts-output podcast-intro.mp3 ``` ### 4. Language Learning Slow pronunciation for language learners: ```bash # Slow French pronunciation neurolink generate "Je m'appelle Claude. Comment allez-vous?" \ --provider google-ai \ --tts-voice fr-FR-Neural2-A \ --tts-speed 0.7 \ --tts-output french-slow.mp3 # Normal speed for comparison neurolink generate "Je m'appelle Claude. Comment allez-vous?" \ --provider google-ai \ --tts-voice fr-FR-Neural2-A \ --tts-speed 1.0 \ --tts-output french-normal.mp3 ``` ### 5. Multilingual Support Generate audio in multiple languages: ```typescript const translations = { english: { text: "Hello, welcome to our application.", voice: "en-US-Neural2-C", }, french: { text: "Bonjour, bienvenue dans notre application.", voice: "fr-FR-Wavenet-A", }, spanish: { text: "Hola, bienvenido a nuestra aplicación.", voice: "es-ES-Neural2-A", }, hindi: { text: "नमस्ते, हमारे एप्लिकेशन में आपका स्वागत है।", voice: "hi-IN-Wavenet-A", }, }; for (const [lang, config] of Object.entries(translations)) { const result = await neurolink.generate({ input: { text: config.text }, provider: "google-ai", tts: { enabled: true, voice: config.voice, format: "mp3", output: `welcome-${lang}.mp3`, }, }); console.log(`Generated ${lang} audio (${result.audio?.size} bytes)`); } ``` ### 6. Batch Audio Generation Generate multiple audio files efficiently: ```typescript async function generateBatchAudio( texts: string[], voice: string = "en-US-Neural2-C", ) { const results = []; for (const text of texts) { const result = await neurolink.generate({ input: { text }, provider: "google-ai", tts: { enabled: true, voice, format: "mp3", }, }); results.push({ text, audioBuffer: result.audio?.buffer, audioSize: result.audio?.size, }); } return results; } // Usage const audioFiles = await generateBatchAudio([ "Welcome to our application.", "Please enter your username and password.", "Login successful. Redirecting to dashboard.", ]); // Save all files audioFiles.forEach((item, index) => { if (item.audioBuffer) { writeFileSync(`audio-${index}.mp3`, item.audioBuffer); } }); ``` ### 7. Streaming Text + Audio Stream AI-generated text and convert to audio: ```typescript async function streamAndSpeak(prompt: string, voice: string) { // Step 1: Stream AI response const streamResult = await neurolink.stream({ input: { text: prompt }, provider: "google-ai", model: "gemini-2.0-flash-exp", }); let fullText = ""; for await (const chunk of streamResult.stream) { fullText += chunk.content; process.stdout.write(chunk.content); } console.log("\n\nConverting to audio..."); // Step 2: Convert complete text to audio const ttsResult = await neurolink.generate({ input: { text: fullText }, provider: "google-ai", tts: { enabled: true, voice, play: true, }, }); return { text: fullText, audio: ttsResult.audio, }; } // Usage const result = await streamAndSpeak( "Explain quantum computing in simple terms", "en-US-Neural2-C", ); ``` --- ## Error Handling ### Common Error Patterns ```typescript async function generateTTSWithRetry( text: string, voice: string, maxRetries: number = 3, ) { let lastError: Error | undefined; for (let attempt = 1; attempt <= maxRetries; attempt++) { try { const result = await neurolink.generate({ input: { text }, provider: "google-ai", tts: { enabled: true, voice, format: "mp3", }, }); // Validate audio buffer if (!result.audio || result.audio.size === 0) { throw new Error("Empty audio buffer received"); } return { success: true, audio: result.audio, attempt, }; } catch (error) { lastError = error as Error; console.error(`Attempt ${attempt} failed:`, error.message); // Don't retry on invalid input if ( error.message.includes("invalid voice") || error.message.includes("text too long") ) { break; } // Exponential backoff if (attempt < maxRetries) { await new Promise((resolve) => setTimeout(resolve, 1000 * attempt)); } } } return { success: false, error: lastError?.message || "Unknown error occurred", attempts: maxRetries, }; } // Usage const result = await generateTTSWithRetry( "Generate this with retry logic", "en-US-Neural2-C", ); if (result.success && result.audio) { console.log("Success!"); writeFileSync("output.mp3", result.audio.buffer); } else { console.error("Failed:", result.error); } ``` --- ## Troubleshooting ### Common Issues | Issue | Cause | Solution | | -------------------------------- | ------------------------ | -------------------------------------------------------------------------------------------- | | **"TTS client not initialized"** | Missing credentials | Set `GOOGLE_APPLICATION_CREDENTIALS` or `GOOGLE_AI_API_KEY` | | **"Invalid voice name"** | Voice ID not found | Check the [Google Cloud TTS voice list](https://cloud.google.com/text-to-speech/docs/voices) | | **"Text too long"** | Input exceeds 5000 bytes | Split text into smaller chunks | | **"Synthesis failed"** | Network/API error | Check network connection and credentials | | **Audio doesn't play** | Missing audio player | Install `afplay` (macOS), `ffplay` (Linux), or use WAV on Windows | | **Empty audio buffer** | API returned no content | Check API quota and retry | ### Authentication Issues **Service Account:** ```bash # Verify credentials file exists ls -la $GOOGLE_APPLICATION_CREDENTIALS # Test authentication gcloud auth application-default login ``` **API Key:** ```bash # Verify API key is set echo $GOOGLE_AI_API_KEY ``` ### Audio Playback Issues **macOS:** - `afplay` is pre-installed, supports all formats - If playback fails, check system volume settings **Linux:** - Install `ffmpeg` for full format support: `sudo apt install ffmpeg` - Alternative: Use `aplay` for WAV files only **Windows:** - Built-in playback only supports WAV - Install VLC or Windows Media Player for other formats - SDK auto-converts to WAV when `play: true` on Windows --- ## Best Practices ### Performance Optimization 1. **Cache voices** - Voice list is cached for 5 minutes 2. **Batch processing** - Group multiple TTS requests when possible 3. **Use appropriate quality** - Standard voices are faster and cheaper 4. **Optimize text length** - Keep under 5000 bytes per request ### Production Deployment 1. **Use service accounts** - More secure than API keys 2. **Implement retry logic** - Handle transient network failures 3. **Monitor quota usage** - Track Google Cloud TTS API usage 4. **Set appropriate timeouts** - Default is 30 seconds 5. **Handle errors gracefully** - Provide fallback behavior ### Voice Selection 1. **Test before deploying** - Different voices suit different use cases 2. **Match gender to persona** - Choose appropriate gender for your application 3. **Consider language variants** - `en-US` vs `en-GB` vs `en-IN` 4. **Use Neural2 for quality** - Best natural-sounding voices ### Cost Management 1. **Use Standard voices** - For high-volume, non-critical use cases 2. **Cache generated audio** - Avoid regenerating the same content 3. **Monitor API usage** - Set budget alerts in Google Cloud Console --- ## Pricing Google Cloud TTS pricing (as of 2026): | Voice Type | Price per 1M characters | | ------------ | ----------------------- | | **Neural2** | $16.00 | | **Wavenet** | $16.00 | | **Standard** | $4.00 | **Monthly free tier:** 1 million characters (Standard voices) or 1 million characters (Wavenet/Neural2 voices) For detailed pricing, see [Google Cloud TTS Pricing](https://cloud.google.com/text-to-speech/pricing). --- ## Related Features **Multimodal Capabilities:** - [Multimodal Guide](multimodal.md) - Images, PDFs, CSV inputs - [PDF Support](pdf-support.md) - Document processing - [Video Generation](video-generation.md) - AI-powered video creation - [PPT Generation](ppt-generation.md) - AI-powered PowerPoint presentations **Advanced Features:** - [Streaming](../advanced/streaming.md) - Stream AI responses in real-time - [Provider Orchestration](provider-orchestration.md) - Multi-provider failover **Documentation:** - [CLI Commands](../cli/commands.md) - Complete CLI reference - [SDK API Reference](../sdk/api-reference.md) - Full API documentation - [Troubleshooting](../troubleshooting.md) - Extended error catalog --- ## Summary NeuroLink's TTS integration provides: - **Multiple TTS providers** - Google Cloud TTS, OpenAI TTS, ElevenLabs, Azure TTS, Fish Audio, Cartesia - **High-quality voices** - Neural2, Wavenet, Standard, and multilingual options - **Multiple languages** - 50+ voices across 10+ languages - **Flexible synthesis modes** - Direct text or AI response - **Voice customization** - Speed, pitch, volume control - **Easy integration** - Works seamlessly with CLI and SDK via `--tts-provider` flag **Next Steps:** 1. Set up [Google Cloud credentials](#environment-setup) 2. Discover available [voices](#voice-discovery) 3. Try the [quick start examples](#quick-start) 4. Explore [use cases](#use-cases--examples) for your application 5. Check [troubleshooting](#troubleshooting) if needed