{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Summarize podcasts and audio\n", "\n", "Transcribe audio files and generate summaries automatically using Whisper and LLMs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "You have podcast episodes, meeting recordings, or interviews that need both transcription and summarization. Doing this manually is time-consuming and doesn't scale.\n", "\n", "| Content | Duration | Need |\n", "|---------|----------|------|\n", "| Podcast episodes | 60 min | Episode summary + key points |\n", "| Meeting recordings | 30 min | Action items + decisions |\n", "| Interviews | 45 min | Main topics + quotes |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "**What's in this recipe:**\n", "\n", "- Transcribe audio with Whisper (runs locally)\n", "- Generate summaries with an LLM\n", "- Chain transcription → summarization automatically\n", "\n", "You create a pipeline where audio is transcribed first, then the transcript is summarized. Both steps run automatically when you insert new audio files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable openai-whisper openai" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "if 'OPENAI_API_KEY' not in os.environ:\n", " os.environ['OPENAI_API_KEY'] = getpass.getpass('OpenAI API Key: ')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created directory 'podcast_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions import openai, whisper\n", "\n", "# Create a fresh directory\n", "pxt.drop_dir('podcast_demo', force=True)\n", "pxt.create_dir('podcast_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create the pipeline\n", "\n", "Create a table with audio input, then add computed columns for transcription and summarization:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'episodes'.\n" ] } ], "source": [ "# Create table for audio files\n", "podcasts = pxt.create_table(\n", " 'podcast_demo/episodes', {'title': pxt.String, 'audio': pxt.Audio}\n", ")" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 1: Transcribe with local Whisper (uses GPU if available)\n", "podcasts.add_computed_column(\n", " transcription=whisper.transcribe(podcasts.audio, model='base.en')\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the text from transcription result (cast to String for concatenation)\n", "podcasts.add_computed_column(\n", " transcript_text=podcasts.transcription.text.astype(pxt.String)\n", ")" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Step 2: Summarize the transcript with OpenAI\n", "summary_prompt = (\n", " \"\"\"Summarize this transcript in 2-3 sentences, then list 3 key points.\n", "\n", "Transcript:\n", "\"\"\"\n", " + podcasts.transcript_text\n", ")\n", "\n", "podcasts.add_computed_column(\n", " summary_response=openai.chat_completions(\n", " messages=[{'role': 'user', 'content': summary_prompt}],\n", " model='gpt-4o-mini',\n", " )\n", ")" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract summary text from response\n", "podcasts.add_computed_column(\n", " summary=podcasts.summary_response.choices[0].message.content\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Process audio files\n", "\n", "Insert audio files and watch the pipeline run automatically:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages/whisper/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead\n", " warnings.warn(\"FP16 is not supported on CPU; using FP32 instead\")\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `episodes`: 1 rows [00:00, 185.18 rows/s]\n", "Inserted 1 row with 0 errors.\n" ] }, { "data": { "text/plain": [ "1 row inserted, 8 values computed." ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample audio\n", "audio_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/10-minute%20tour%20of%20Pixeltable.mp3'\n", "\n", "podcasts.insert([{'title': 'Pixeltable Tour', 'audio': audio_url}])" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titletranscript_text
Pixeltable TourThis conversation is powered by Google Illuminate. Check out illuminate.google.com for more. Welcome to this discussion on Pixel Table, a powerful tool for managing and manipulating data, especially image data, within a database framework. We'll be exploring how it simplifies, working with machine learning tasks, particularly object detection. What's the core concept behind Pixel Table that makes it so unique? Pixel Table's core strength lies in its combination of a database system with the ...... What kind of users would benefit most from using Pixel Table? Data scientists, machine learning engineers, and anyone working with large data sets and complex ML pipelines would find Pixel Table extremely beneficial. Its ability to manage data, transformations, and model applications in a unified and persistent environment makes it a powerful tool for streamlining workflows. This has been a very informative discussion on Pixel Table. Thank you for explaining its capabilities and advantages.
" ], "text/plain": [ " title transcript_text\n", "0 Pixeltable Tour This conversation is powered by Google Illumi..." ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View transcript\n", "podcasts.select(podcasts.title, podcasts.transcript_text).collect()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titlesummary
Pixeltable TourThe conversation discusses Pixel Table, a tool designed for managing and manipulating image data within a database system, especially useful for machine learning tasks like object detection. It highlights Pixel Table's unique feature of computed columns that streamline data transformations and model applications, making workflows more efficient by automating tasks like data updates and API calls. The tool’s integration with ML models and the ability to define user-defined functions (UDFs) pr ...... lity with computed columns, allowing automatic data transformations and model executions to streamline workflows.\n", "2. It enables easy integration of various machine learning models, such as DETR and OpenAI's GPT-4-0, managing processes like image analysis and result storage efficiently.\n", "3. While providing significant advantages in scalability and workflow management, Pixel Table requires some technical expertise for database setup and may face performance limitations based on data complexity.
" ], "text/plain": [ " title summary\n", "0 Pixeltable Tour The conversation discusses Pixel Table, a tool..." ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View summary\n", "podcasts.select(podcasts.title, podcasts.summary).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**Pipeline architecture:**\n", "\n", "```\n", "Audio → Whisper transcription → Transcript text → LLM summarization → Summary\n", "```\n", "\n", "Each step is a computed column that depends on the previous one. When you insert a new audio file, all steps run automatically in sequence.\n", "\n", "**Whisper model options:**\n", "\n", "| Model | Size | Speed | Accuracy |\n", "|-------|------|-------|----------|\n", "| `tiny.en` | 39M | Fastest | Good for clear speech |\n", "| `base.en` | 74M | Fast | Balanced |\n", "| `small.en` | 244M | Medium | Better accuracy |\n", "| `medium.en` | 769M | Slow | High accuracy |\n", "\n", "For production with varied audio quality, use `small.en` or larger." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Transcribe audio](https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe) - Basic audio transcription\n", "- [Summarize text](https://docs.pixeltable.com/howto/cookbooks/text/text-summarize) - Text summarization patterns" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }