{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Ingest PandaSet autonomous driving dataset\n", "\n", "This notebook shows how to load 3D point clouds, 3D oriented bounding boxes and semantic segmentations from the PandaSet dataset into a 3LC Table. \n", "\n", "Tables with large 3D geometries use the [bulk data pattern](https://docs.3lc.ai/3lc/latest/tutorials/geometry/bulk_data.html#bulk-data-tutorial) for storing data. For details on the ingestion process, see the [loading script](./load_pandaset.py)\n", "\n", "![](../../../images/pandaset-light.png)\n", "\n", "\n", "\n", "Running this notebook requires the [PandaSet DevKit](https://github.com/scaleapi/pandaset-devkit/blob/master/README.md). \n", "\n", "The dataset can be downloaded from [HuggingFace](https://huggingface.co/datasets/georghess/pandaset). If you have already downloaded `pandaset.zip`, ensure the dataset root below points to the unzipped pandaset directory.\n", "\n", "If not, the notebook will download `pandaset.zip` and unzip it into the dataset root directory. This requires authentication with HuggingFace, for example by setting the `HF_TOKEN` environment variable.\n", "\n", "> ⚠️ Storage requirements\n", ">\n", "> The unzipped dataset is ~42GB, and ingesting all sequences into 3LC will\n", "> require another 50GB of disk space. Ensure you have enough free space before\n", "> running the notebook.\n" ] }, { "cell_type": "markdown", "id": "1", "metadata": {}, "source": [ "## Project Setup" ] }, { "cell_type": "code", "execution_count": null, "id": "2", "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "PROJECT_NAME = \"3LC Tutorials - Pandaset\"\n", "DATASET_NAME = \"pandaset\"\n", "TABLE_NAME = \"pandaset\"\n", "DATA_PATH = \"../../../../data\"\n", "DOWNLOAD_PATH = \"../../../../transient_data\"\n", "MAX_FRAMES = None\n", "MAX_SEQUENCES = None" ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [], "source": [ "%pip install -q \"pandaset @ git+https://github.com/scaleapi/pandaset-devkit.git@master#subdirectory=python\"\n", "%pip install -q 3lc\n", "%pip install -q huggingface-hub" ] }, { "cell_type": "markdown", "id": "4", "metadata": {}, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "id": "5", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "from load_pandaset import load_pandaset" ] }, { "cell_type": "markdown", "id": "6", "metadata": {}, "source": [ "## Prepare Dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "7", "metadata": {}, "outputs": [], "source": [ "DATASET_ROOT = Path(DOWNLOAD_PATH) / \"pandaset\"\n", "\n", "if not DATASET_ROOT.exists():\n", " import zipfile\n", "\n", " from huggingface_hub import hf_hub_download\n", "\n", " print(\"Downloading dataset from HuggingFace\")\n", " hf_hub_download(\n", " repo_id=\"georghess/pandaset\",\n", " repo_type=\"dataset\",\n", " filename=\"pandaset.zip\",\n", " local_dir=DATASET_ROOT.parent.absolute().as_posix(),\n", " )\n", "\n", " with zipfile.ZipFile(f\"{DATASET_ROOT.parent}/pandaset.zip\", \"r\") as zip_ref:\n", " zip_ref.extractall(DATASET_ROOT.parent)\n", "\n", " # Remove the pandaset.zip file after extraction\n", " (DATASET_ROOT.parent / \"pandaset.zip\").unlink(missing_ok=True)\n", "else:\n", " print(f\"Dataset root {DATASET_ROOT} already exists\")" ] }, { "cell_type": "markdown", "id": "8", "metadata": {}, "source": [ "## Create Table" ] }, { "cell_type": "code", "execution_count": null, "id": "9", "metadata": {}, "outputs": [], "source": [ "table = load_pandaset(\n", " dataset_root=DATASET_ROOT,\n", " table_name=TABLE_NAME,\n", " dataset_name=DATASET_NAME,\n", " project_name=PROJECT_NAME,\n", " data_path=DATA_PATH,\n", " max_frames=MAX_FRAMES,\n", " max_sequences=MAX_SEQUENCES,\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": ".venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "test_marks": [ "slow", "dependent" ] }, "nbformat": 4, "nbformat_minor": 5 }