{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Adding Object Detections to a Dataset\n", "\n", "This recipe provides a glimpse into the possibilities for integrating FiftyOne into your ML workflows. Specifically, it covers:\n", "\n", "- Loading an object detection dataset from the [Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/index.html)\n", "- [Adding predictions](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#object-detection) from an object detector to the dataset\n", "- Launching the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) and visualizing/exploring your data\n", "- Integrating the App into your data analysis workflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we'll use an off-the-shelf [Faster R-CNN detection model](https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnn) provided by PyTorch. To use it, you'll need to install `torch` and `torchvision`, if necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install torch torchvision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a detection dataset\n", "\n", "In this recipe, we'll work with the validation split of the [COCO dataset](https://cocodataset.org/#home), which is conveniently available for download via the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#coco-2017).\n", "\n", "The snippet below will download the validation split and load it into FiftyOne." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Split 'validation' already downloaded\n", "Loading 'coco-2017' split 'validation'\n", " 100% |████████████████████| 5000/5000 [43.3s elapsed, 0s remaining, 114.9 samples/s] \n", "Dataset 'detector-recipe' created\n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dataset = foz.load_zoo_dataset(\n", " \"coco-2017\",\n", " split=\"validation\",\n", " dataset_name=\"detector-recipe\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect the dataset to see what we downloaded:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: detector-recipe\n", "Media type: image\n", "Num samples: 5000\n", "Persistent: False\n", "Info: {'classes': ['0', 'person', 'bicycle', ...]}\n", "Tags: ['validation']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n" ] } ], "source": [ "# Print some information about the dataset\n", "print(dataset)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Print a ground truth detection\n", "sample = dataset.first()\n", "print(sample.ground_truth.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the ground truth detections are stored in the `ground_truth` field of the samples.\n", "\n", "Before we go further, let's launch the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) and use the GUI to explore the dataset visually:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding model predictions\n", "\n", "Now let's add some predictions from an object detector to the dataset.\n", "\n", "We'll use an off-the-shelf [Faster R-CNN detection model](https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnn) provided by PyTorch. The following cell downloads the model and loads it:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Model ready\n" ] } ], "source": [ "import torch\n", "import torchvision\n", "\n", "# Run the model on GPU if it is available\n", "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", "\n", "# Load a pre-trained Faster R-CNN model\n", "model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)\n", "model.to(device)\n", "model.eval()\n", "\n", "print(\"Model ready\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code below performs inference with the model on a randomly chosen subset of 100 samples from the dataset and [stores the predictions](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#object-detection) in a `predictions` field of the samples. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Choose a random subset of 100 samples to add predictions to\n", "predictions_view = dataset.take(100, seed=51)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |██████████████████████| 100/100 [12.7m elapsed, 0s remaining, 0.1 samples/s] \n" ] } ], "source": [ "from PIL import Image\n", "from torchvision.transforms import functional as func\n", "\n", "import fiftyone as fo\n", "\n", "# Get class list\n", "classes = dataset.default_classes\n", "\n", "# Add predictions to samples\n", "with fo.ProgressBar() as pb:\n", " for sample in pb(predictions_view):\n", " # Load image\n", " image = Image.open(sample.filepath)\n", " image = func.to_tensor(image).to(device)\n", " c, h, w = image.shape\n", " \n", " # Perform inference\n", " preds = model([image])[0]\n", " labels = preds[\"labels\"].cpu().detach().numpy()\n", " scores = preds[\"scores\"].cpu().detach().numpy()\n", " boxes = preds[\"boxes\"].cpu().detach().numpy()\n", " \n", " # Convert detections to FiftyOne format\n", " detections = []\n", " for label, score, box in zip(labels, scores, boxes):\n", " # Convert to [top-left-x, top-left-y, width, height]\n", " # in relative coordinates in [0, 1] x [0, 1]\n", " x1, y1, x2, y2 = box\n", " rel_box = [x1 / w, y1 / h, (x2 - x1) / w, (y2 - y1) / h]\n", "\n", " detections.append(\n", " fo.Detection(\n", " label=classes[label],\n", " bounding_box=rel_box,\n", " confidence=score\n", " )\n", " )\n", " \n", " # Save predictions to dataset\n", " sample[\"predictions\"] = fo.Detections(detections=detections)\n", " sample.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load `predictions_view` in the App to visualize the predictions that we added:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = predictions_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the FiftyOne App\n", "\n", "Now let's use the App to analyze the predictions we've added to our dataset in more detail." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing bounding boxes\n", "\n", "Each field of the samples are shown as togglable checkboxes on the left sidebar which can be used to control whether ground truth or predicted boxes are rendered on the images.\n", "\n", "You can also double-click on an image to view individual samples in more detail:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing object patches\n", "\n", "It can be beneficial to view every object as an individual sample, especially when there are multiple overlapping detections like in the image above. \n", "\n", "In FiftyOne this is called a [patches view](https://voxel51.com/docs/fiftyone/user_guide/app.html#viewing-object-patches) and can be created through Python or directly in the App." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: detector-recipe\n", "Media type: image\n", "Num patches: 849\n", "Tags: ['validation']\n", "Patch fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " sample_id: fiftyone.core.fields.StringField\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detection)\n", "View stages:\n", " 1. Exists(field='predictions', bool=True)\n", " 2. ToPatches(field='ground_truth')\n" ] } ], "source": [ "patches_view = predictions_view.to_patches(\"ground_truth\")\n", "print(patches_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use the App to create the same view as above. To do so, we just need to click the [patches button](https://voxel51.com/docs/fiftyone/user_guide/app.html#viewing-object-patches) in the App and select `ground_truth`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(view=predictions_view)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(view=predictions_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence thresholding in the App\n", "\n", "From the App instance above, it looks like our detector is generating some spurious low-quality detections. Let's use the App to interactively filter the predictions by `confidence` to identify a reasonable confidence threshold for our model:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Click the down caret on the `predictions` field of Fields Sidebar\n", "# and apply a confidence threshold\n", "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence thresholding in Python\n", "\n", "FiftyOne also provides the ability to [write expressions](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering) that match, filter, and sort detections based on their attributes. See [using DatasetViews](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) for full details.\n", "\n", "For example, we can programmatically generate a view that contains only detections whose `confidence` is at least `0.75` as follows:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Only contains detections with confidence >= 0.75\n", "high_conf_view = predictions_view.filter_labels(\"predictions\", F(\"confidence\") > 0.75)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: detector-recipe\n", "Media type: image\n", "Num samples: 100\n", "Tags: ['validation']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", "View stages:\n", " 1. Take(size=100, seed=51)\n", " 2. FilterLabels(field='predictions', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=True)\n" ] } ], "source": [ "# Print some information about the view\n", "print(high_conf_view)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Print a prediction from the view to verify that its confidence is > 0.75\n", "sample = high_conf_view.first()\n", "print(sample.predictions.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's load our view in the App to view the predictions that we programmatically selected:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Load high confidence view in the App\n", "session.view = high_conf_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting samples of interest\n", "\n", "You can select images in the App by clicking on them. Then, you can create a view that contains only those samples by opening the selected samples dropdown in the top left corner of the image grid and clicking `Only show selected`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.show()" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "session.freeze() # screenshot the active App for sharing" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }