{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Convert text to speech\n", "\n", "Generate natural-sounding audio from text using OpenAI's text-to-speech models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "You need to convert text content into spoken audio—for accessibility, content repurposing, or voice applications.\n", "\n", "| Use case | Input | Output |\n", "|----------|-------|--------|\n", "| Accessibility | Blog posts | Audio articles |\n", "| Learning | Documentation | Audio guides |\n", "| Content | Newsletters | Podcast episodes |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "**What's in this recipe:**\n", "\n", "- Generate speech with OpenAI TTS\n", "- Choose from multiple voice options\n", "- Store text and audio together\n", "\n", "You add a computed column that converts text to audio. The audio is cached and only regenerated when the source text changes." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable openai" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "if 'OPENAI_API_KEY' not in os.environ:\n", " os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata\n", "Created directory 'tts_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions.openai import speech\n", "\n", "# Create a fresh directory\n", "pxt.drop_dir('tts_demo', force=True)\n", "pxt.create_dir('tts_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create text-to-speech pipeline" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'articles'.\n" ] } ], "source": [ "# Create table for articles\n", "articles = pxt.create_table(\n", " 'tts_demo/articles', {'title': pxt.String, 'content': pxt.String}\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add audio generation column\n", "articles.add_computed_column(\n", " audio=speech(articles.content, model='tts-1', voice='alloy')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate audio" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `articles`: 2 rows [00:00, 423.90 rows/s]\n", "Inserted 2 rows with 0 errors.\n" ] }, { "data": { "text/plain": [ "2 rows inserted, 6 values computed." ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample articles\n", "sample_articles = [\n", " {\n", " 'title': 'Welcome to AI',\n", " 'content': 'Artificial intelligence is transforming how we work and live. From smart assistants to autonomous vehicles, AI is becoming part of our daily lives.',\n", " },\n", " {\n", " 'title': 'Getting Started',\n", " 'content': 'To begin your journey with machine learning, start by understanding the basics of data preparation and model training.',\n", " },\n", "]\n", "\n", "articles.insert(sample_articles)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titlecontentaudio
Welcome to AIArtificial intelligence is transforming how we work and live. From smart assistants to autonomous vehicles, AI is becoming part of our daily lives.
\n", " \n", "
Getting StartedTo begin your journey with machine learning, start by understanding the basics of data preparation and model training.
\n", " \n", "
" ], "text/plain": [ " title content \\\n", "0 Welcome to AI Artificial intelligence is transforming how we... \n", "1 Getting Started To begin your journey with machine learning, s... \n", "\n", " audio \n", "0 /Users/pjlb/.pixeltable/media/b448d43a91a64b15... \n", "1 /Users/pjlb/.pixeltable/media/b448d43a91a64b15... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View articles with generated audio\n", "articles.select(\n", " articles.title, articles.content, articles.audio\n", ").collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**OpenAI TTS models:**\n", "\n", "| Model | Speed | Quality | Use case |\n", "|-------|-------|---------|----------|\n", "| `tts-1` | Fast | Good | Real-time, drafts |\n", "| `tts-1-hd` | Slower | Higher | Production audio |\n", "\n", "**Voice options:**\n", "\n", "| Voice | Style |\n", "|-------|-------|\n", "| `alloy` | Neutral, balanced |\n", "| `echo` | Warm, conversational |\n", "| `fable` | Expressive, storytelling |\n", "| `onyx` | Deep, authoritative |\n", "| `nova` | Friendly, upbeat |\n", "| `shimmer` | Clear, professional |\n", "\n", "**Tips:**\n", "\n", "- Use `tts-1` for drafts and real-time applications\n", "- Use `tts-1-hd` for final production audio\n", "- Audio is cached—no regeneration on queries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Transcribe audio](https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe) - Convert audio to text\n", "- [Summarize podcasts](https://docs.pixeltable.com/howto/cookbooks/audio/audio-summarize-podcast) - Transcribe and summarize audio" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }