--- title: OpenAI TTS cost date: "2025-11-02T05:12:32Z" lastmod: "2025-11-02T05:12:34Z" categories: - experiments - llms wp_id: 4245 description: I compared OpenAI’s TTS models by measuring real-world API billing for specific inputs. I discovered that GPT-4o mini produces about six audio tokens per text token and found TTS-1 offers the best balance of natural quality and cost. keywords: [openai, text-to-speech, gpt-4o-mini, tts-1, api pricing, benchmarking] --- ![OpenAI TTS cost](/blog/assets/doodle.webp)

The OpenAI text-to-speech cost documentation is confusing.

As of 2 Nov 2025:

I wanted to find the approximate total cost for a typical text input measured per character and token.

I converted this podcast with 4,096 ASCII characters and 877 tokens on o200k_base using:

I ran:

```bash curl https://api.openai.com/v1/audio/speech \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d "$(jq -n --arg text "$(cat podcast.txt)" '{ model: "tts-1", voice: "coral", input: $text }')" \ --output tts-1.mp3 ```

This took 46 seconds to generate and produced a 5.1 MB MP3 file (256 seconds)

To measure the cost, I ran:

```bash curl "https://api.openai.com/v1/organization/costs?start_time=$(date -d '1 day ago' +%s)&project_ids=$PROJECT_ID&group_by=line_item" \ -H "Authorization: Bearer $OPENAI_ADMIN_KEY" \ -H "Content-Type: application/json" ```

This cost: USD 0.061425. Then I ran:

```bash curl https://api.openai.com/v1/audio/speech \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -H "Content-Type: application/json" \ -d "$(jq -n --arg text "$(cat podcast.txt)" '{ model: "gpt-4o-mini-tts", voice: "coral", input: $text }')" \ --output gpt-4o-mini-tts.mp3 ```

This took 44 seconds to generate and produced a 4.3 MB MP3 file (268 seconds).

When I ran the admin API call again, the costs did not reflect for 5 minutes. So I ran it the GPT-4o mini TTS call again with the same input. This took 44 seconds to generate a 4.3 MB MP3 file. When I ran the admin API call again, the total cost for the 2 requests was: USD 0.12942 audio output and USD 0.0010524 input.

I also checked for TTS-1 HD. Here are the costs in USD:

Model$ / MTok$ / MChars$ / hourTime (s)Audio (s)Cost $
GPT-4o mini TTS74.415.90.876462680.0652
TTS-170.015.00.864442560.0614
TTS-1 HD140.030.01.728622570.1228

The GPT-4o mini TTS audio output cost was USD 0.06471 for the input of 877 tokens, i.e. $73.8 / MTok. Since the actual cost is $12 / MTok, this is a 6.15x multiplier. I guess 1 input text token produces ~6 output audio tokens.

In terms of quality:

I will likely use TTS-1 for now given the cost difference is small and the quality is good enough.


Incidentally, the usage API did not show an GPT 4o mini TTS line items even after 20 minutes.

```bash curl "https://api.openai.com/v1/organization/usage/audio_speeches?start_time=$(date -d '1 day ago' +%s)&project_ids=$PROJECT_ID&group_by=model" \ -H "Authorization: Bearer $OPENAI_ADMIN_KEY" \ -H "Content-Type: application/json" ```