{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extract audio from video\n", "\n", "Extract the audio track from video files for transcription, analysis, or processing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Problem\n", "\n", "You have video files but need to work with just the audio track—for transcription, speaker analysis, or audio processing. Extracting audio manually with ffmpeg is tedious and doesn't integrate with your data pipeline.\n", "\n", "| Source | Goal |\n", "|--------|------|\n", "| Lecture recordings | Transcribe for notes |\n", "| Meeting videos | Extract for speaker ID |\n", "| Video podcasts | Create audio-only version |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Solution\n", "\n", "**What's in this recipe:**\n", "\n", "- Extract audio from video as a computed column\n", "- Choose audio format (mp3, wav, flac)\n", "- Chain with transcription for automatic video-to-text\n", "\n", "You use the `extract_audio` function to create an audio column from video. This integrates seamlessly with transcription and other audio processing." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable boto3 'numpy<2.4'" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata\n", "Created directory 'audio_extract_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions.video import extract_audio\n", "\n", "# Create a fresh directory\n", "pxt.drop_dir('audio_extract_demo', force=True)\n", "pxt.create_dir('audio_extract_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Extract audio from video" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'videos'.\n" ] } ], "source": [ "# Create table for videos\n", "videos = pxt.create_table(\n", " 'audio_extract_demo/videos', {'title': pxt.String, 'video': pxt.Video}\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 0 column values with 0 errors.\n" ] }, { "data": { "text/plain": [ "No rows affected." ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add computed column to extract audio as MP3\n", "videos.add_computed_column(\n", " audio=extract_audio(videos.video, format='mp3')\n", ")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserting rows into `videos`: 1 rows [00:00, 207.52 rows/s]\n", "Inserted 1 row with 0 errors.\n" ] }, { "data": { "text/plain": [ "1 row inserted, 4 values computed." ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert a sample video (from multimedia-commons with audio)\n", "video_url = 's3://multimedia-commons/data/videos/mp4/ffe/ffb/ffeffbef41bbc269810b2a1a888de.mp4'\n", "\n", "videos.insert([{'title': 'Sample Video', 'video': video_url}])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titleaudio
Sample Video
\n", " \n", "
" ], "text/plain": [ " title audio\n", "0 Sample Video /Users/pjlb/.pixeltable/media/0441741fa9664272..." ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View results\n", "videos.select(videos.title, videos.audio).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Chain with transcription\n", "\n", "Add transcription as a follow-up computed column:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install whisper for transcription\n", "%pip install -qU openai-whisper" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/opt/miniconda3/envs/pixeltable/lib/python3.11/site-packages/whisper/transcribe.py:132: UserWarning: FP16 is not supported on CPU; using FP32 instead\n", " warnings.warn(\"FP16 is not supported on CPU; using FP32 instead\")\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Added 1 column value with 0 errors.\n" ] }, { "data": { "text/plain": [ "1 row updated, 1 value computed." ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pixeltable.functions import whisper\n", "\n", "# Add transcription of the extracted audio\n", "videos.add_computed_column(\n", " transcription=whisper.transcribe(videos.audio, model='base.en')\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 1 column value with 0 errors.\n" ] }, { "data": { "text/plain": [ "1 row updated, 1 value computed." ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Extract the transcript text\n", "videos.add_computed_column(transcript=videos.transcription.text)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titletranscript
Sample Videovaporized one .
" ], "text/plain": [ " title transcript\n", "0 Sample Video vaporized one ." ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View the full pipeline results\n", "videos.select(videos.title, videos.transcript).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explanation\n", "\n", "**Audio format options:**\n", "\n", "| Format | Use case |\n", "|--------|----------|\n", "| `mp3` | Compressed, widely compatible |\n", "| `wav` | Uncompressed, for processing |\n", "| `flac` | Lossless compression |\n", "\n", "**Pipeline flow:**\n", "\n", "```\n", "Video → extract_audio → Audio → whisper.transcribe → Transcript\n", "```\n", "\n", "Each step is a computed column. When you insert a new video:\n", "\n", "1. Audio is extracted automatically\n", "1. Whisper transcribes the audio\n", "1. All results are cached for future queries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## See also\n", "\n", "- [Transcribe audio](https://docs.pixeltable.com/howto/cookbooks/audio/audio-transcribe) - Audio-only transcription\n", "- [Summarize podcasts](https://docs.pixeltable.com/howto/cookbooks/audio/audio-summarize-podcast) - Transcribe and summarize\n", "- [Extract video frames](https://docs.pixeltable.com/howto/cookbooks/video/video-extract-frames) - Work with video frames" ] } ], "metadata": { "kernelspec": { "display_name": "pixeltable", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.11" } }, "nbformat": 4, "nbformat_minor": 2 }