{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " Try in Google Colab\n", " \n", " \n", " \n", " \n", " Share via nbviewer\n", " \n", " \n", " \n", " \n", " View on GitHub\n", " \n", " \n", " \n", " \n", " Download notebook\n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# People in Public 175K Dataset\n", "\n", "In this example, we'll load the [People in Public 175K Dataset](https://visym.github.io/collector/pip_175k) from [Visym Labs](https://www.visym.com/site_0820/index.html) into FiftyOne.\n", "\n", "Per the dataset homepage, PIP-175K contains 184,379 video clips of 68 classes of activities performed by people in public places. The activity labels are subsets of the 37 activities in the [Multiview Extended Video with Activities (MEVA) dataset](https://mevadata.org) and is consistent with the [Activities in Extended Video (ActEV) challenge](https://actev.nist.gov)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download dataset\n", "\n", "The dataset can be downloaded from [this page](https://visym.github.io/collector/pip_175k) via [this link (55.3GB)](https://dl.dropboxusercontent.com/s/xwiacwo9y5uci9v/pip_175k.tar.gz)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download and unpack the dataset\n", "!wget https://dl.dropboxusercontent.com/s/xwiacwo9y5uci9v/pip_175k.tar.gz\n", "!tar -xvzf pip_175k.tar.gz\n", "!rm pip_175k.tar.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After downloading, you'll have a `pip_175k/` directory with the following contents:\n", "\n", "```\n", "pip_175k/\n", " videos/\n", " car_starts/\n", " .mp4\n", " ...\n", " person_transfers_object_to_person/\n", " .mp4\n", " ...\n", " ...\n", " trainset.pkl\n", " testset.pkl\n", " valset.pkl\n", " ...\n", "```\n", "\n", "The `videos/` subdirectory contains the videos files organized as a directory tree that encodes the primary activity in each video.\n", "\n", "The `.pkl` files contain dense 2D bounding annotations + additional activity labels for each video, stored in [VIPY format](https://github.com/visym/vipy).\n", "\n", "We'll need to install the YIPY package in order to load the dense annotations:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install VIPY package\n", "!pip install vipy" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Installing FiftyOne\n", "\n", "You can install FiftyOne and the necessary dependencies, if necessary, as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Install FiftyOne\n", "!pip install --index https://pypi.voxel51.com fiftyone\n", "\n", "# We'll need ffmpeg to work with video datasets\n", "!sudo apt-get install -y ffmpeg\n", "#!brew install ffmpeg" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quick preview\n", "\n", "FiftyOne provides native support for visualizing datasets stored as [video classification directory trees](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#videoclassificationdirectorytree) on disk, like the `pip_175k/videos/` sudirectory of the PIP-175K dataset.\n", "\n", "Therefore, you can preview a random subset of the dataset as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Path to your copy of PIP-175K\n", "PIP_175K_DIR = \"/path/to/pip_175k\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import fiftyone as fo\n", "\n", "# Load 100 random videos\n", "dataset = fo.Dataset.from_dir(\n", " os.path.join(PIP_175K_DIR, \"videos\"),\n", " fo.types.VideoClassificationDirectoryTree,\n", " name=\"PIP-175K-sample\",\n", " shuffle=True,\n", " max_samples=100,\n", ")\n", "\n", "# Visualize in the FiftyOne App\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![pip-175k-sample-gif](https://user-images.githubusercontent.com/25985824/97036100-50eb9b00-1535-11eb-8e9b-a939aba87b5b.gif)\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading the full annotations\n", "\n", "We can load the complete annotations from the VIPY `.pkl` files by [writing a custom DatasetImporter](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/datasets.html#writing-a-custom-datasetimporter):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from collections import defaultdict\n", "import logging\n", "import os\n", "\n", "import vipy\n", "\n", "import eta.core.utils as etau\n", "import eta.core.video as etav\n", "\n", "import fiftyone as fo\n", "import fiftyone.utils.data as foud\n", "\n", "\n", "logger = logging.getLogger(__name__)\n", "\n", "\n", "class VIPYDatasetImporter(foud.LabeledVideoDatasetImporter):\n", " \"\"\"Importer for labeled video datasets stored in\n", " `VIPY format `_.\n", "\n", " Args:\n", " dataset_dir: the dataset directory\n", " pkl_file (None): the name of the ``.pkl`` file within ``dataset_dir``\n", " from which to load samples + annotations\n", " shuffle (False): whether to randomly shuffle the order in which the\n", " samples are imported\n", " seed (None): a random seed to use when shuffling\n", " max_samples (None): a maximum number of samples to import. By default,\n", " all samples are imported\n", " \"\"\"\n", "\n", " def __init__(\n", " self,\n", " dataset_dir,\n", " pkl_file=None,\n", " shuffle=False,\n", " seed=None,\n", " max_samples=None,\n", " ):\n", " if pkl_file is None:\n", " pkl_paths = etau.get_glob_matches(\n", " os.path.join(dataset_dir, \"*.pkl\")\n", " )\n", " pkl_file = os.path.basename(pkl_paths[0])\n", "\n", " super().__init__(\n", " dataset_dir, shuffle=shuffle, seed=seed, max_samples=max_samples\n", " )\n", "\n", " self.pkl_file = pkl_file\n", " self._pkl_path = os.path.join(dataset_dir, pkl_file)\n", " self._samples = None\n", " self._iter_samples = None\n", " self._num_samples = None\n", "\n", " def __iter__(self):\n", " self._iter_samples = iter(self._samples)\n", " return self\n", "\n", " def __len__(self):\n", " return self._num_samples\n", "\n", " def __next__(self):\n", " v = next(self._iter_samples)\n", " return _parse_vipy_video(v)\n", "\n", " @property\n", " def has_dataset_info(self):\n", " return False\n", "\n", " @property\n", " def has_video_metadata(self):\n", " return False\n", "\n", " @property\n", " def label_cls(self):\n", " return fo.Classifications\n", "\n", " @property\n", " def frame_labels_cls(self):\n", " return fo.Detections\n", "\n", " def setup(self):\n", " logger.info(\"Loading VIPY pkl '%s'...\", self._pkl_path)\n", " pip = vipy.util.load(self._pkl_path)\n", " logger.info(\"Loading complete\")\n", "\n", " self._samples = self._preprocess_list(pip)\n", " self._num_samples = len(self._samples)\n", "\n", "\n", "def _parse_vipy_video(v):\n", " video_path = v.filename()\n", "\n", " video_metadata = fo.VideoMetadata.build_for(video_path)\n", " width = video_metadata.frame_width\n", " height = video_metadata.frame_height\n", "\n", " # Activities\n", " activities = fo.Classifications(\n", " classifications=[\n", " fo.Classification(label=a.category())\n", " for a in v.activities().values()\n", " ]\n", " )\n", "\n", " # Detections\n", " frames = defaultdict(lambda: defaultdict(fo.Detections))\n", " for track in v.tracks().values():\n", " label = track.category()\n", " for frame_number in range(track.startframe(), track.endframe() + 1):\n", " x, y, w, h = track[frame_number].to_xywh()\n", " bounding_box = [x / width, y / height, w / width, h / height]\n", " detection = fo.Detection(label=label, bounding_box=bounding_box)\n", " frames[frame_number + 1][\"objects\"].detections.append(detection)\n", "\n", " return video_path, None, activities, frames" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can then use the `VIPYDatasetImporter` to load samples with their full annotations into FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Make an importer that will load 100 random samples\n", "importer = VIPYDatasetImporter(\n", " PIP_175K_DIR,\n", " pkl_file=\"valset.pkl\",\n", " shuffle=True,\n", " max_samples=100,\n", ")\n", "\n", "# Load samples into FiftyOne dataset\n", "dataset = fo.Dataset.from_importer(\n", " importer,\n", " label_field=\"gt\",\n", " name=\"PIP-175K-sample-with-detections\",\n", ")\n", "\n", "# Visualize samples in the App\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![pip-175k-sample-with-detections-gif](https://user-images.githubusercontent.com/25985824/97036117-5517b880-1535-11eb-90fd-10fcf95446a6.gif)\n", "\n", "\n", "\n", "![pip-175k-sample-with-detections-expanded](https://user-images.githubusercontent.com/25985824/97035121-e0904a00-1533-11eb-9e4b-f6c961a8ff1e.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring the dataset\n", "\n", "With the data in FiftyOne, we can now explore the dataset using [dataset views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html).\n", "\n", "For example, we can filter the dataset to only show videos with label `person_exits_car`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Create a view that contains only videos with label `person_exits_car`\n", "view = dataset.filter_labels(\n", " \"gt\", F(\"label\") == \"person_exits_car\", only_matches=True\n", ")\n", "\n", "# Show view in App\n", "session.view = view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![person_exits_car](https://user-images.githubusercontent.com/25985824/97035137-e554fe00-1533-11eb-8805-6a8abcdf54fe.png)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" }, "nbsphinx": { "execute": "never" } }, "nbformat": 4, "nbformat_minor": 4 }