{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Create a Table from video frames\n", "\n", "Create a 3LC Table by extracting individual frames from video files in the UCF11 action recognition dataset.\n", "\n", "![img](../images/create-video-thumbnail-table.jpg)\n", "\n", "\n", "\n", "Video analysis often requires working with individual frames rather than entire video files. This approach allows for frame-level analysis, data augmentation, and easier integration with image-based machine learning pipelines.\n", "\n", "This notebook processes video files from the UCF11 dataset, extracting all frames as PIL Images and creating a table where each row represents a single frame. Each frame is linked to its source video and frame number, enabling both individual frame analysis and sequence reconstruction. The dataset follows a structured format with categorized video clips:\n", "\n", "```\n", "UCF11/\n", "├─ basketball/\n", "│ ├─ v_shooting_01\n", "| │ ├─ v_shooting_01_01.mpg\n", "| │ ├─ v_shooting_01_02.mpg\n", "| │ ├─ ...\n", "│ ├─ v_shooting_02\n", "| │ ├─ ...\n", "├─ biking/\n", "│ ├─ ...\n", "├─ ...\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Project setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "DATA_PATH = \"../../data\"\n", "NUM_FRAMES = 10\n", "PROJECT_NAME = \"3LC Tutorials - Create Tables\"\n", "TABLE_NAME = \"UCF YouTube Actions - Frames\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import cv2\n", "import tlc\n", "from PIL import Image" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create Table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The class names are read from the directory names." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATASET_LOCATION = Path(DATA_PATH) / \"ucf11\"\n", "\n", "assert DATASET_LOCATION.exists(), f\"Dataset not found at {DATASET_LOCATION}\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class_directories = [path for path in DATASET_LOCATION.glob(\"*\") if path.is_dir()]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "classes = [c.name for c in class_directories]\n", "classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now define a schema for the `Table`. Each row will contain a sequence ID (video path), frame ID (frame number), the actual frame as a PIL Image, and a categorical label for the video class." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "column_schemas = {\n", " \"sequence_id\": tlc.StringSchema(),\n", " \"frame_id\": tlc.Int32Schema(),\n", " \"frame\": tlc.ImageUrlSchema(sample_type=\"PILImage\"),\n", " \"label\": tlc.CategoricalLabelSchema(classes),\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then iterate over the videos, extract all frames as PIL Images, and write the `Table` with a `TableWriter`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def extract_frames_from_video(video_path, max_frames=None):\n", " \"\"\"Extract all frames from a video and return them as PIL Images.\"\"\"\n", " cap = cv2.VideoCapture(str(video_path))\n", " frames = []\n", "\n", " while True:\n", " ret, frame = cap.read()\n", " if not ret:\n", " break\n", "\n", " # Convert BGR to RGB (OpenCV uses BGR, PIL expects RGB)\n", " frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n", "\n", " # Convert to PIL Image\n", " pil_image = Image.fromarray(frame_rgb)\n", " frames.append(pil_image)\n", "\n", " if max_frames is not None and len(frames) >= max_frames:\n", " break\n", "\n", " cap.release()\n", " return frames\n", "\n", "\n", "table_writer = tlc.TableWriter(\n", " project_name=PROJECT_NAME,\n", " dataset_name=\"UCF YouTube Actions\",\n", " table_name=TABLE_NAME,\n", " column_schemas=column_schemas,\n", " description=\"Table with frames from UCF YouTube Actions\",\n", ")\n", "\n", "for class_idx, class_directory in enumerate(class_directories):\n", " for video_path in class_directory.rglob(\"*mpg\"):\n", " video_path = video_path.absolute()\n", "\n", " # Extract all frames from the video\n", " frames = extract_frames_from_video(video_path, max_frames=NUM_FRAMES)\n", "\n", " # Get the sequence_id (relative path to video)\n", " sequence_id = tlc.Url(video_path).to_relative().to_str()\n", "\n", " # Write a row for each frame\n", " for frame_id, frame in enumerate(frames):\n", " row = {\n", " \"sequence_id\": sequence_id,\n", " \"frame_id\": frame_id,\n", " \"frame\": frame,\n", " \"label\": class_idx,\n", " }\n", " table_writer.add_row(row)\n", "\n", "table = table_writer.finalize()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table.table_rows[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "table.url" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 2 }