{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Create a Table from video frames\n",
    "\n",
    "Create a 3LC Table by extracting individual frames from video files in the UCF11 action recognition dataset.\n",
    "\n",
    "![img](../images/create-video-thumbnail-table.jpg)\n",
    "\n",
    "<!-- Tags: [\"video\"] -->\n",
    "\n",
    "Video analysis often requires working with individual frames rather than entire video files. This approach allows for frame-level analysis, data augmentation, and easier integration with image-based machine learning pipelines.\n",
    "\n",
    "This notebook processes video files from the UCF11 dataset, extracting all frames as PIL Images and creating a table where each row represents a single frame. Each frame is linked to its source video and frame number, enabling both individual frame analysis and sequence reconstruction. The dataset follows a structured format with categorized video clips:\n",
    "\n",
    "```\n",
    "UCF11/\n",
    "├─ basketball/\n",
    "│  ├─ v_shooting_01\n",
    "|  │  ├─ v_shooting_01_01.mpg\n",
    "|  │  ├─ v_shooting_01_02.mpg\n",
    "|  │  ├─ ...\n",
    "│  ├─ v_shooting_02\n",
    "|  │  ├─ ...\n",
    "├─ biking/\n",
    "│  ├─ ...\n",
    "├─ ...\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Project setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": [
     "parameters"
    ]
   },
   "outputs": [],
   "source": [
    "DATA_PATH = \"../../data\"\n",
    "NUM_FRAMES = 10\n",
    "PROJECT_NAME = \"3LC Tutorials - Create Tables\"\n",
    "TABLE_NAME = \"UCF YouTube Actions - Frames\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "import cv2\n",
    "import tlc\n",
    "from PIL import Image"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Table"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The class names are read from the directory names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATASET_LOCATION = Path(DATA_PATH) / \"ucf11\"\n",
    "\n",
    "assert DATASET_LOCATION.exists(), f\"Dataset not found at {DATASET_LOCATION}\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class_directories = [path for path in DATASET_LOCATION.glob(\"*\") if path.is_dir()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classes = [c.name for c in class_directories]\n",
    "classes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We now define a schema for the `Table`. Each row will contain a sequence ID (video path), frame ID (frame number), the actual frame as a PIL Image, and a categorical label for the video class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "column_schemas = {\n",
    "    \"sequence_id\": tlc.StringSchema(),\n",
    "    \"frame_id\": tlc.Int32Schema(),\n",
    "    \"frame\": tlc.ImageUrlSchema(sample_type=\"PILImage\"),\n",
    "    \"label\": tlc.CategoricalLabelSchema(classes),\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then iterate over the videos, extract all frames as PIL Images, and write the `Table` with a `TableWriter`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_frames_from_video(video_path, max_frames=None):\n",
    "    \"\"\"Extract all frames from a video and return them as PIL Images.\"\"\"\n",
    "    cap = cv2.VideoCapture(str(video_path))\n",
    "    frames = []\n",
    "\n",
    "    while True:\n",
    "        ret, frame = cap.read()\n",
    "        if not ret:\n",
    "            break\n",
    "\n",
    "        # Convert BGR to RGB (OpenCV uses BGR, PIL expects RGB)\n",
    "        frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)\n",
    "\n",
    "        # Convert to PIL Image\n",
    "        pil_image = Image.fromarray(frame_rgb)\n",
    "        frames.append(pil_image)\n",
    "\n",
    "        if max_frames is not None and len(frames) >= max_frames:\n",
    "            break\n",
    "\n",
    "    cap.release()\n",
    "    return frames\n",
    "\n",
    "\n",
    "table_writer = tlc.TableWriter(\n",
    "    project_name=PROJECT_NAME,\n",
    "    dataset_name=\"UCF YouTube Actions\",\n",
    "    table_name=TABLE_NAME,\n",
    "    column_schemas=column_schemas,\n",
    "    description=\"Table with frames from UCF YouTube Actions\",\n",
    ")\n",
    "\n",
    "for class_idx, class_directory in enumerate(class_directories):\n",
    "    for video_path in class_directory.rglob(\"*mpg\"):\n",
    "        video_path = video_path.absolute()\n",
    "\n",
    "        # Extract all frames from the video\n",
    "        frames = extract_frames_from_video(video_path, max_frames=NUM_FRAMES)\n",
    "\n",
    "        # Get the sequence_id (relative path to video)\n",
    "        sequence_id = tlc.Url(video_path).to_relative().to_str()\n",
    "\n",
    "        # Write a row for each frame\n",
    "        for frame_id, frame in enumerate(frames):\n",
    "            row = {\n",
    "                \"sequence_id\": sequence_id,\n",
    "                \"frame_id\": frame_id,\n",
    "                \"frame\": frame,\n",
    "                \"label\": class_idx,\n",
    "            }\n",
    "            table_writer.add_row(row)\n",
    "\n",
    "table = table_writer.finalize()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table.table_rows[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "table.url"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}