{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with Twelve Labs in Pixeltable\n", "\n", "Twelve Labs provides multimodal embeddings that project text, images, audio, and video into the **same semantic space**. This enables true **cross-modal search** - the most powerful feature of this integration.\n", "\n", "**What makes this special?** You can search a video index using *any* modality:\n", "\n", "| Query Type | Use Case |\n", "|------------|----------|\n", "| **Text to Video** | \"Find clips of a man giving a speech\" |\n", "| **Image to Video** | Find videos visually similar to a photo |\n", "| **Audio to Video** | Find videos with similar speech/sounds |\n", "| **Video to Video** | Find videos similar to a clip |\n", "\n", "This notebook demonstrates this cross-modal capability with video, then shows how to apply the same embeddings to other modalities.\n", "\n", "### Prerequisites\n", "\n", "- A Twelve Labs account with an API key ([playground.twelvelabs.io](https://playground.twelvelabs.io/))\n", "- Audio and video must be at least 4 seconds long" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%pip install -qU pixeltable twelvelabs" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import getpass\n", "import os\n", "\n", "if 'TWELVELABS_API_KEY' not in os.environ:\n", " os.environ['TWELVELABS_API_KEY'] = getpass.getpass(\n", " 'Enter your Twelve Labs API key: '\n", " )" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/pierre/pixeltable/pixeltable/env.py:501: UserWarning: Progress reporting is disabled because ipywidgets is not installed. To fix this, run: `pip install ipywidgets`\n", " warnings.warn(\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Connected to Pixeltable database at: postgresql+psycopg://postgres:@/pixeltable?host=/Users/pjlb/.pixeltable/pgdata\n", "Created directory 'twelvelabs_demo'.\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pixeltable as pxt\n", "from pixeltable.functions.twelvelabs import embed\n", "\n", "# Create a fresh directory for our demo\n", "pxt.drop_dir('twelvelabs_demo', force=True)\n", "pxt.create_dir('twelvelabs_demo')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-Modal Video Search\n", "\n", "Let's index a video and search it using text, images, audio, and other videos - all against the same index." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create Video Table and Index" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'videos'.\n", "Inserted 1 row with 0 errors in 2.35 s (0.42 rows/s)\n" ] }, { "data": { "text/plain": [ "1 row inserted." ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pixeltable.functions.video import video_splitter\n", "\n", "# Create a table for videos\n", "video_t = pxt.create_table('twelvelabs_demo/videos', {'video': pxt.Video})\n", "\n", "# Insert a sample video\n", "video_url = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness.mp4'\n", "video_t.insert([{'video': video_url}])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Create a view that segments the video into searchable chunks\n", "# Twelve Labs requires minimum 4 second segments\n", "video_chunks = pxt.create_view(\n", " 'twelvelabs_demo/video_chunks',\n", " video_t,\n", " iterator=video_splitter(\n", " video=video_t.video, duration=5.0, min_segment_duration=4.0\n", " ),\n", ")\n", "\n", "# Add embedding index for cross-modal search\n", "video_chunks.add_embedding_index(\n", " 'video_segment', embedding=embed.using(model_name='marengo3.0')\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Text to Video Search\n", "\n", "Find video segments matching a text description." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
video_segmentscore
\n", " \n", "
0.576
\n", " \n", "
0.435
\n", " \n", "
0.298
" ], "text/plain": [ " video_segment score\n", "0 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.575589\n", "1 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.435466\n", "2 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.297902" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sim = video_chunks.video_segment.similarity(string='pink')\n", "\n", "video_chunks.order_by(sim, asc=False).limit(3).select(\n", " video_chunks.video_segment, score=sim\n", ").collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Image to Video Search\n", "\n", "Find video segments similar to an image." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
video_segmentscore
\n", " \n", "
0.691
\n", " \n", "
0.675
" ], "text/plain": [ " video_segment score\n", "0 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.690817\n", "1 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.674969" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "image_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Screenshot.png'\n", "\n", "sim = video_chunks.video_segment.similarity(image=image_query)\n", "\n", "video_chunks.order_by(sim, asc=False).limit(2).select(\n", " video_chunks.video_segment, score=sim\n", ").collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Video to Video Search\n", "\n", "Find video segments similar to another video clip." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
video_segmentscore
\n", " \n", "
0.875
\n", " \n", "
0.836
" ], "text/plain": [ " video_segment score\n", "0 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.875341\n", "1 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.836088" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "video_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Video-Extract.mp4'\n", "\n", "sim = video_chunks.video_segment.similarity(video=video_query)\n", "\n", "video_chunks.order_by(sim, asc=False).limit(2).select(\n", " video_chunks.video_segment, score=sim\n", ").collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Audio to Video Search\n", "\n", "Find video segments with similar audio/speech content." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
video_segmentscore
\n", " \n", "
0.866
\n", " \n", "
0.723
" ], "text/plain": [ " video_segment score\n", "0 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.865856\n", "1 /Users/pjlb/.pixeltable/media/7b7be784c3174523... 0.723026" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "audio_query = 'https://github.com/pixeltable/pixeltable/raw/main/docs/resources/The-Pursuit-of-Happiness-Audio-Extract.m4a'\n", "\n", "sim = video_chunks.video_segment.similarity(audio=audio_query)\n", "\n", "video_chunks.order_by(sim, asc=False).limit(2).select(\n", " video_chunks.video_segment, score=sim\n", ").collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Embedding Options\n", "\n", "For video embeddings, you can focus on specific aspects:\n", "\n", "- `'visual'` - Focus on what you see\n", "- `'audio'` - Focus on what you hear\n", "- `'transcription'` - Focus on what is said" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added 51 column values with 0 errors in 17.13 s (2.98 rows/s)\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
video_segmentvisual_embedding
\n", " \n", "
[ 0.034 0.071 -0.038 0.062 -0.01 0.061 ... 0.047 -0.069 -0.009 -0.021 0.036 0.002]
\n", " \n", "
[ 0.037 0.036 0.002 0.084 -0.008 0.013 ... 0.024 -0.02 -0.013 0.012 0.037 0.056]
" ], "text/plain": [ " video_segment \\\n", "0 /Users/pjlb/.pixeltable/media/7b7be784c3174523... \n", "1 /Users/pjlb/.pixeltable/media/7b7be784c3174523... \n", "\n", " visual_embedding \n", "0 [0.034423828, 0.07080078, -0.037841797, 0.0620... \n", "1 [0.036865234, 0.03564453, 0.0022125244, 0.0844... " ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Add a visual-only embedding column\n", "video_chunks.add_computed_column(\n", " visual_embedding=embed(\n", " video_chunks.video_segment,\n", " model_name='marengo3.0',\n", " embedding_option=['visual'],\n", " )\n", ")\n", "\n", "video_chunks.select(\n", " video_chunks.video_segment, video_chunks.visual_embedding\n", ").limit(2).collect()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Modalities: Text, Images, and Documents\n", "\n", "Twelve Labs embeddings also work for text, images, and documents. Here's a compact example showing **multiple embedding indexes on a single table**." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Created table 'content'.\n" ] } ], "source": [ "# Create a multimodal content table\n", "content_t = pxt.create_table(\n", " 'twelvelabs_demo/content',\n", " {\n", " 'title': pxt.String,\n", " 'description': pxt.String,\n", " 'thumbnail': pxt.Image,\n", " },\n", ")\n", "\n", "# Add embedding index on text column\n", "content_t.add_embedding_index(\n", " 'description', embedding=embed.using(model_name='marengo3.0')\n", ")\n", "\n", "# Add embedding index on image column\n", "content_t.add_embedding_index(\n", " 'thumbnail', embedding=embed.using(model_name='marengo3.0')\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Inserted 4 rows with 0 errors in 1.15 s (3.47 rows/s)\n" ] }, { "data": { "text/plain": [ "4 rows inserted." ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Insert sample content\n", "content_t.insert(\n", " [\n", " {\n", " 'title': 'Beach Sunset',\n", " 'description': 'A beautiful sunset over the ocean with palm trees.',\n", " 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000025.jpg',\n", " },\n", " {\n", " 'title': 'Mountain Hiking',\n", " 'description': 'Hikers climbing a steep mountain trail with scenic views.',\n", " 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000139.jpg',\n", " },\n", " {\n", " 'title': 'City Street',\n", " 'description': 'Busy urban street with cars and pedestrians.',\n", " 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000042.jpg',\n", " },\n", " {\n", " 'title': 'Wildlife Safari',\n", " 'description': 'Elephants and zebras on the African savanna.',\n", " 'thumbnail': 'https://raw.githubusercontent.com/pixeltable/pixeltable/main/docs/resources/images/000000000061.jpg',\n", " },\n", " ]\n", ")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titledescriptionscore
Mountain HikingHikers climbing a steep mountain trail with scenic views.0.519
Beach SunsetA beautiful sunset over the ocean with palm trees.0.452
" ], "text/plain": [ " title description \\\n", "0 Mountain Hiking Hikers climbing a steep mountain trail with sc... \n", "1 Beach Sunset A beautiful sunset over the ocean with palm tr... \n", "\n", " score \n", "0 0.518636 \n", "1 0.451873 " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Search by text description\n", "sim = content_t.description.similarity(string='outdoor nature adventure')\n", "\n", "content_t.order_by(sim, asc=False).limit(2).select(\n", " content_t.title, content_t.description, score=sim\n", ").collect()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
titlethumbnailscore
Beach Sunset
\n", " \n", "
0.758
Mountain Hiking
\n", " \n", "
0.741
" ], "text/plain": [ " title thumbnail \\\n", "0 Beach Sunset \n", " \n", " \n", " title\n", " thumbnail\n", " score\n", " \n", " \n", " \n", " \n", " City Street\n", "
\n", " \n", "
\n", " 0.135\n", " \n", " \n", " Mountain Hiking\n", "
\n", " \n", "
\n", " -0.003\n", " \n", " \n", "" ], "text/plain": [ " title thumbnail \\\n", "0 City Street