{ "cells": [ { "cell_type": "markdown", "id": "2ecf2e51", "metadata": {}, "source": [ "# Exploring the Dataset Zoo\n", "\n", "This experience introduces you to the core components of the FiftyOne Zoo:\n", "- The **Dataset Zoo** for accessing and exploring public datasets\n", "- The **Model Zoo** for running pre-trained models on your data\n", "- Creating your **own remotely-sourced datasets** for reuse and collaboration\n", "\n", "Whether you're a researcher, engineer, or educator, these tools help streamline your computer vision workflows in FiftyOne.\n", "\n", "> 💡 Make sure to run `pip install fiftyone torch torchvision` before starting." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "bat" } }, "outputs": [], "source": [ "!pip install fiftyone\n", "!pip install torch torchvision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## FiftyOne Zoo: A Hub for Datasets and Models\n", "\n", "FiftyOne Zoo provides easy access to a vast collection of pre-built datasets and pre-trained models. This notebook will guide you through exploring and using these resources.\n", "\n", "### Key Components:\n", "\n", "* **Dataset Zoo:** Offers a wide range of computer vision datasets, ready for immediate use.\n", "* **Model Zoo:** Provides pre-trained models for various tasks, enabling quick experimentation and deployment.\n", "\n", "Let's dive in!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset Zoo\n", "\n", "### Exploring the Dataset Zoo\n", "\n", "The Dataset Zoo simplifies the process of loading and working with popular datasets.\n", "\n", "#### Listing Available Datasets" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Available Datasets:\n", "- activitynet-100\n", "- activitynet-200\n", "- bdd100k\n", "- caltech101\n", "- caltech256\n", "- cifar10\n", "- cifar100\n", "- cityscapes\n", "- coco-2014\n", "- coco-2017\n", "- fashion-mnist\n", "- fiw\n", "- hmdb51\n", "- imagenet-2012\n", "- imagenet-sample\n", "- kinetics-400\n", "- kinetics-600\n", "- kinetics-700\n", "- kinetics-700-2020\n", "- kitti\n", "- kitti-multiview\n", "- lfw\n", "- mnist\n", "- open-images-v6\n", "- open-images-v7\n", "- places\n", "- quickstart\n", "- quickstart-3d\n", "- quickstart-geo\n", "- quickstart-groups\n", "- quickstart-video\n", "- sama-coco\n", "- ucf101\n", "- voc-2007\n", "- voc-2012\n" ] } ], "source": [ "print(\"Available Datasets:\")\n", "for dataset_name in foz.list_zoo_datasets():\n", " print(f\"- {dataset_name}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading a Dataset (Example: MNIST)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset = foz.load_zoo_dataset(\"mnist\")\n", "print(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing the Dataset " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Session launched. Run `session.show()` to open the App in a cell output.\n", "\n", "Welcome to\n", "\n", "███████╗██╗███████╗████████╗██╗ ██╗ ██████╗ ███╗ ██╗███████╗\n", "██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗ ██║██╔════╝\n", "█████╗ ██║█████╗ ██║ ╚████╔╝ ██║ ██║██╔██╗ ██║█████╗\n", "██╔══╝ ██║██╔══╝ ██║ ╚██╔╝ ██║ ██║██║╚██╗██║██╔══╝\n", "██║ ██║██║ ██║ ██║ ╚██████╔╝██║ ╚████║███████╗\n", "╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═══╝╚══════╝ v1.3.1\n", "\n", "If you're finding FiftyOne helpful, here's how you can get involved:\n", "\n", "|\n", "| ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐\n", "| https://github.com/voxel51/fiftyone\n", "|\n", "| 🚀🚀🚀 Join the FiftyOne Discord community 🚀🚀🚀\n", "| https://community.voxel51.com/\n", "|\n", "\n" ] } ], "source": [ "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![visualizate_dataset](https://cdn.voxel51.com/getting_started_model_dataset_zoo/notebook1/visualizate_dataset.webp)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading a Specific Split (Example: COCO)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " coco_train = foz.load_zoo_dataset(\"coco-2017\", split=\"train\")\n", " print(coco_train)\n", "except:\n", " print(\"coco-2017 dataset is not available, please install it if needed.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Downloading and Loading a Dataset with Specific Splits and Downsampling (Example: open-images-v6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " dataset = foz.load_zoo_dataset(\n", " \"open-images-v6\",\n", " splits=[\"train\", \"validation\"],\n", " label_types=[\"detections\", \"segmentations\"],\n", " classes=[\"Car\", \"Person\"],\n", " max_samples=50,\n", " )\n", " print(dataset)\n", "except:\n", " print(\"open-images-v6 dataset is not available, please install it if needed.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Working with Dataset Metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "try:\n", " metadata = foz.get_zoo_dataset_info(\"coco-2017\")\n", " print(metadata)\n", "except:\n", " print(\"coco-2017 metadata is not available, please install it if needed.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Loading a Remote Image Dataset\n", "\n", "With fiftyOne you can work/create zoo datasets whose download/preparation methods are hosted via GitHub repositories or public URLs\n", "\n" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading https://github.com/voxel51/coco-2017...\n", " 33.7Kb [1.4ms elapsed, ? remaining, 350.4Mb/s] \n", "Downloading split 'validation' to '/home/paula/fiftyone/voxel51/coco-2017/validation' if necessary\n", "Downloading annotations to '/home/paula/fiftyone/voxel51/coco-2017/tmp-download/annotations_trainval2017.zip'\n", " 100% |██████| 1.9Gb/1.9Gb [1.4m elapsed, 0s remaining, 20.7Mb/s] \n", "Extracting annotations to '/home/paula/fiftyone/voxel51/coco-2017/raw/instances_val2017.json'\n", "Downloading images to '/home/paula/fiftyone/voxel51/coco-2017/tmp-download/val2017.zip'\n", " 20% |█-----| 1.2Gb/6.1Gb [1.0m elapsed, 4.0m remaining, 22.8Mb/s] " ] } ], "source": [ "dataset = foz.load_zoo_dataset(\n", " \"https://github.com/voxel51/coco-2017\",\n", " split=\"validation\",\n", ")\n", "\n", "session = fo.launch_app(dataset, port=5152, auto=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Other loading examples with remote datasets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load 50 random samples from the validation split\n", "\n", "Only the required images will be downloaded (if necessary).\n", "By default, only detections are loaded" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset = foz.load_zoo_dataset(\n", " \"https://github.com/voxel51/coco-2017\",\n", " split=\"validation\",\n", " max_samples=50,\n", " shuffle=True,\n", ")\n", "\n", "session = fo.launch_app(dataset, port=5152, auto=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load segmentations for 25 samples from the validation split that contain cats and dogs\n", "\n", "Images that contain all `classes` will be prioritized first, followed by images that contain at least one of the required `classes`. If there are not enough images matching `classes` in the split to meet `max_samples`, only the available images will be loaded. Images will only be downloaded if necessary" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset = foz.load_zoo_dataset(\n", " \"https://github.com/voxel51/coco-2017\",\n", " split=\"validation\",\n", " label_types=[\"segmentations\"],\n", " classes=[\"cat\", \"dog\"],\n", " max_samples=25,\n", ")\n", "\n", "session = fo.launch_app(dataset, port=5152, auto=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Download the entire validation split and load both detections and segmentations. \n", "\n", "Subsequent partial loads of the validation split will never require downloading any images.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "dataset = foz.load_zoo_dataset(\n", " \"https://github.com/voxel51/coco-2017\",\n", " split=\"validation\",\n", " label_types=[\"detections\", \"segmentations\"],\n", ")\n", "\n", "session = fo.launch_app(dataset, port=5152, auto=False)" ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 2 }