---
name: text-to-speech
description: Convert text to natural speech using Sarvam AI's Bulbul model. Use when the user needs to generate audio from text, create voiceovers, build voice interfaces, or synthesize Indian language speech. Supports 11 Indian languages with multiple voices, controllable pitch/pace/loudness, and real-time streaming. Returns base64-encoded audio.
license: Apache-2.0
metadata:
  author: sarvam-ai
  version: "1.0"
  model: bulbul:v2
---

# Text-to-Speech with Bulbul

Bulbul is Sarvam AI's text-to-speech model that generates natural-sounding speech in Indian languages with support for voice customization and streaming.

## Installation

```bash
pip install sarvamai
```

## Quick Start

```python
from sarvamai import SarvamAI
from sarvamai.play import save

client = SarvamAI()

response = client.text_to_speech.convert(
    text="नमस्ते, आप कैसे हैं?",
    target_language_code="hi-IN",
    model="bulbul:v2",
    speaker="anushka"
)

# Response contains base64-encoded audio
save(response,
"output.wav")
```

## Base64 Audio Response

The API returns audio as **base64-encoded strings** in the `audios` array:

```json
{
    "request_id": "abc123",
    "audios": [
        "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
    ]
}
```

### Decode Manually

```python
import base64

response = client.text_to_speech.convert(
    text="Hello world",
    target_language_code="en-IN",
    model="bulbul:v2",
    speaker="anushka"
)

# Decode base64 to bytes
audio_bytes = base64.b64decode(response.audios[
    0
])

# Save to file
with open("output.wav",
"wb") as f:
    f.write(audio_bytes)
```

## Supported Languages

| Code | Language | Code | Language |
|------|----------|------|----------|
| `hi-IN` | Hindi | `ta-IN` | Tamil |
| `bn-IN` | Bengali | `te-IN` | Telugu |
| `kn-IN` | Kannada | `ml-IN` | Malayalam |
| `mr-IN` | Marathi | `gu-IN` | Gujarati |
| `pa-IN` | Punjabi | `or-IN` | Odia |
| `en-IN` | English (Indian) | | |

## Available Voices

| Voice | Type | Best For |
|-------|------|----------|
| `anushka` | Female | General, warm tone |
| `manisha` | Female | Professional, clear |
| `vidya` | Female | Friendly, conversational |
| `arjun` | Male | Authoritative, news |
| `amol` | Male | Casual, storytelling |
| `amartya` | Male | Deep, formal |

## Voice Control

Customize pitch, pace, and loudness:

```python
response = client.text_to_speech.convert(
    text="यह एक परीक्षण है।",
    target_language_code="hi-IN",
    model="bulbul:v2",
    speaker="anushka",
    pitch=0.2,          # -1.0 to 1.0 (higher = higher pitch)
    pace=1.2,           # 0.5 to 2.0 (higher = faster)
    loudness=1.5        # 0.5 to 2.0 (higher = louder)
)
```

## Audio Formats

Set output format with `output_audio_codec`:

| Format | Description |
|--------|-------------|
| `wav` | Uncompressed (default) |
| `mp3` | MPEG Layer-3 |
| `aac` | Advanced Audio Coding |
| `opus` | Optimized for speech |
| `flac` | Lossless |
| `linear16` | Raw PCM |
| `mulaw` | Telephony (8-bit) |
| `alaw` | Telephony (8-bit) |

```python
response = client.text_to_speech.convert(
    text="Hello",
    target_language_code="en-IN",
    model="bulbul:v2",
    speaker="anushka",
    output_audio_codec="mp3"
)
```

## Sample Rates

| Rate | Use Case |
|------|----------|
| `8000` | Telephony |
| `16000` | Voice assistants |
| `22050` | Standard audio |
| `24000` | High quality (default) |

```python
response = client.text_to_speech.convert(
    text="Hello",
    target_language_code="en-IN",
    model="bulbul:v2",
    speaker="anushka",
    sample_rate=8000  # For phone systems
)
```

## JavaScript

```javascript
import { SarvamAI
} from "sarvamai";
import fs from "fs";

const client = new SarvamAI();

const response = await client.textToSpeech.convert({
  text: "नमस्ते",
  targetLanguageCode: "hi-IN",
  model: "bulbul:v2",
  speaker: "anushka"
});

// Decode base64 and save
const audioBuffer = Buffer.from(response.audios[
    0
],
"base64");
fs.writeFileSync("output.wav", audioBuffer);
```

## cURL

```bash
curl -X POST "https://api.sarvam.ai/text-to-speech" \
  -H "api-subscription-key: $SARVAM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
        "नमस्ते, कैसे हो?"
    ],
    "target_language_code": "hi-IN",
    "model": "bulbul:v2",
    "speaker": "anushka"
}'
```

## Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `text` / `inputs` | string/array | Yes | Text to synthesize |
| `target_language_code` | string | Yes | BCP-47 language code |
| `model` | string | Yes | `bulbul:v2` or `bulbul:v1` |
| `speaker` | string | Yes | Voice name |
| `pitch` | float | No | -1.0 to 1.0 |
| `pace` | float | No | 0.5 to 2.0 |
| `loudness` | float | No | 0.5 to 2.0 |
| `output_audio_codec` | string | No | Audio format |
| `sample_rate` | int | No | Output sample rate |

## Response

```json
{
    "request_id": "20241115_abc123",
    "audios": [
        "UklGRiQAAABXQVZFZm10IBAAAAABAAEA..."
    ]
}
```

See [references/voices.md
](references/voices.md) for voice samples and recommendations.