{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluating Object Detections with FiftyOne\n", "\n", "This walkthrough demonstrates how to use FiftyOne to perform hands-on evaluation of your detection model.\n", "\n", "It covers the following concepts:\n", "\n", "- Loading a dataset with ground truth labels [into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html)\n", "- [Adding model predictions](https://voxel51.com/docs/fiftyone/recipes/adding_detections.html) to your dataset\n", "- [Evaluating your model](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#detections) using FiftyOne's evaluation API\n", "- Viewing the best and worst performing samples in your dataset\n", "\n", "**So, what's the takeaway?**\n", "\n", "Aggregate measures of performance like mAP don't give you the full picture of your detection model. In practice, the limiting factor on your model's performance is often data quality issues that you need to **see** to address. FiftyOne is designed to make it easy to do just that.\n", "\n", "Running the workflow presented here on your ML projects will help you to understand the current failure modes (edge cases) of your model and how to fix them, including:\n", "\n", "- Identifying scenarios that require additional training samples in order to boost your model's performance\n", "- Deciding whether your ground truth annotations have errors/weaknesses that need to be corrected before any subsequent model training will be profitable" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we'll use an off-the-shelf [Faster R-CNN detection model](https://pytorch.org/docs/stable/torchvision/models.html#faster-r-cnn) provided by PyTorch. To use it, you'll need to install `torch` and `torchvision`, if necessary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install torch torchvision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you wanted to, you could download the pretrained model from the web and load it with `torchvision`. However, this model is also available via the [FiftyOne Model Zoo](https://docs.voxel51.com/user_guide/model_zoo/index.html), which makes our lives much easier!" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import fiftyone.zoo as foz\n", "model = foz.load_zoo_model('faster-rcnn-resnet50-fpn-coco-torch')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll perform our analysis on the validation split of the [COCO dataset](https://cocodataset.org/#home), which is conveniently available for download via the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#coco-2017).\n", "\n", "The snippet below will download the validation split and load it into FiftyOne." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading split 'validation' to '/Users/jacobmarks/fiftyone/coco-2017/validation' if necessary\n", "Found annotations at '/Users/jacobmarks/fiftyone/coco-2017/raw/instances_val2017.json'\n", "Images already downloaded\n", "Existing download of split 'validation' is sufficient\n", "Loading 'coco-2017' split 'validation'\n", " 100% |███████████████| 5000/5000 [14.7s elapsed, 0s remaining, 360.0 samples/s] \n", "Dataset 'evaluate-detections-tutorial' created\n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dataset = foz.load_zoo_dataset(\n", " \"coco-2017\",\n", " split=\"validation\",\n", " dataset_name=\"evaluate-detections-tutorial\",\n", ")\n", "dataset.persistent = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's inspect the dataset to see what we downloaded:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: evaluate-detections-tutorial\n", "Media type: image\n", "Num samples: 5000\n", "Persistent: True\n", "Tags: []\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n" ] } ], "source": [ "# Print some information about the dataset\n", "print(dataset)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Print a ground truth detection\n", "sample = dataset.first()\n", "print(sample.ground_truth.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that the ground truth detections are stored in the `ground_truth` field of the samples.\n", "\n", "Before we go further, let's launch the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) and use the GUI to explore the dataset visually:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Evaluate Detections Dataset](images/evaluate_detections_dataset.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add predictions to dataset\n", "\n", "Now let's generate some predictions to analyze.\n", "\n", "Because we loaded the model from the FiftyOne Model Zoo, it is a FiftyOne model object, which means we can apply it directly \n", "to our dataset (or any subset thereof) for inference using the sample collection's `apply_model()` method.\n", "\n", "The code below performs inference with the Faster R-CNN model on a randomly chosen subset of 100 samples from the dataset and stores the resulting predictions in a `faster_rcnn` field of the samples. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "# Choose a random subset of 100 samples to add predictions to\n", "predictions_view = dataset.take(100, seed=51)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |█████████████████| 100/100 [1.4m elapsed, 0s remaining, 1.3 samples/s] \n" ] } ], "source": [ "predictions_view.apply_model(model, label_field=\"faster_rcnn\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's load `predictions_view` in the App to visualize the predictions that we added:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "session.view = predictions_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Predictions View](images/evaluate_detections_prediction_view.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing detections\n", "\n", "Let's analyze the raw predictions we've added to our dataset in more detail." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing bounding boxes\n", "\n", "Let's start by loading the full dataset in the App:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Resets the session; the entire dataset will now be shown\n", "session.view = None" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![All boxes](images/evaluate_detections_all_boxes.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Only the 100 samples in `predictions_view` have predictions in their `faster_rcnn` field, so some of the samples we see above do not have predicted boxes.\n", "\n", "If we want to recover our predictions view, we can do this programmatically via `session.view = predictions_view`, or we can use the [view bar in the App](https://voxel51.com/docs/fiftyone/user_guide/app.html#using-the-view-bar) to accomplish the same thing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use the view bar to create an `Exists(faster_rcnn, True)` stage\n", "# Now your view contains only the 100 samples with predictions in `faster_rcnn` field\n", "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Exists Filter](images/evaluate_detections_exists_filter.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each field of the samples are shown as togglable checkboxes on the left sidebar which can be used to control whether ground truth detections or predictions are rendered on the images.\n", "\n", "You can also click on an image to view the sample in more detail:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Click Image](images/evaluate_detections_click_image.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting samples of interest\n", "\n", "You can select images in the App by clicking on the checkbox when hovering over an image. Then, you can create a view that contains only those samples by clicking the orange checkmark with the number of selected samples in the top left corner of the sample grid and clicking `Only show selected samples`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Only show selected](images/evaluate_detections_only_show_selected.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's reset our session to show our `predictions_view`:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "session.view = predictions_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence thresholding in the App\n", "\n", "From the App instance above, it looks like our detector is generating some spurious low-quality detections. Let's use the App to interactively filter the predictions by `confidence` to identify a reasonable confidence threshold for our model:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Click the down caret on the `faster_rcnn` field of Fields Sidebar\n", "# and apply a confidence threshold\n", "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Confidence Threshold](images/evaluate_detections_confidence_filter.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It looks like a confidence threshold of 0.75 is a good choice for our model, but we'll confirm that quantitatively later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Confidence thresholding in Python\n", "\n", "FiftyOne also provides the ability to [write expressions](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering) that match, filter, and sort detections based on their attributes. See [using DatasetViews](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) for full details.\n", "\n", "For example, we can programmatically generate a view that contains only detections whose `confidence` is at least `0.75` as follows:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Only contains detections with confidence >= 0.75\n", "high_conf_view = predictions_view.filter_labels(\"faster_rcnn\", F(\"confidence\") > 0.75, only_matches=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note the `only_matches=False` argument. When filtering labels, any samples that no longer contain labels would normally be removed from the view. However, this is not desired when performing evaluations since it can skew your results between views. We set `only_matches=False` so that all samples will be retained, even if some no longer contain labels." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: evaluate-detections-tutorial\n", "Media type: image\n", "Num samples: 100\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " faster_rcnn: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", "View stages:\n", " 1. Take(size=100, seed=51)\n", " 2. FilterLabels(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)\n" ] } ], "source": [ "# Print some information about the view\n", "print(high_conf_view)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Print a prediction from the view to verify that its confidence is > 0.75\n", "sample = high_conf_view.first()\n", "print(sample.faster_rcnn.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's load our view in the App to view the predictions that we programmatically selected:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Load high confidence view in the App\n", "session.view = high_conf_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![High Confidence](images/evaluate_detections_high_conf_view.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing object patches\n", "\n", "There are multiple situations where it can be useful to visualize each object separately. For example, if a sample contains dozens of objects overlapping one another or if you want to look specifically for instances of a class of objects.\n", "\n", "In any case, the FiftyOne App provides a [patches view button](https://voxel51.com/docs/fiftyone/user_guide/app.html#viewing-object-patches) that allows you to take any `Detections` field in your dataset and visualize each object as an individual patch in the image grid. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Patches View](images/evaluate_detections_gt_patches.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate detections\n", "\n", "Now that we have samples with ground truth and predicted objects, let's use FiftyOne to evaluate the quality of the detections.\n", "\n", "FiftyOne provides a powerful [evaluation API](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html) that contains a collection of methods for performing evaluation of model predictions. Since we're working with object detections here, we'll use [detection evaluation](https://voxel51.com/docs/fiftyone/user_guide/evaluation.html#detections)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running evaluation\n", "\n", "We can run evaluation on our samples via [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections). Note that this method is available on both the `Dataset` and `DatasetView` classes, which means that we can run evaluation on our `high_conf_view` to assess the quality of only the high confidence predictions in our dataset.\n", "\n", "By default, this method will use the [COCO evaluation protocol](https://cocodataset.org/#detection-eval), plus some extra goodies that we will use later." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |█████████████████| 100/100 [1.8s elapsed, 0s remaining, 53.7 samples/s] \n", "Performing IoU sweep...\n", " 100% |█████████████████| 100/100 [882.6ms elapsed, 0s remaining, 113.3 samples/s] \n" ] } ], "source": [ "# Evaluate the predictions in the `faster_rcnn` field of our `high_conf_view`\n", "# with respect to the objects in the `ground_truth` field\n", "results = high_conf_view.evaluate_detections(\n", " \"faster_rcnn\",\n", " gt_field=\"ground_truth\",\n", " eval_key=\"eval\",\n", " compute_mAP=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Aggregate results\n", "\n", "The `results` object returned by the evaluation routine provides a number of convenient methods for analyzing our predictions.\n", "\n", "For example, let's print a classification report for the top-10 most common classes in the dataset:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " person 0.89 0.80 0.84 263\n", " car 0.72 0.56 0.63 55\n", " chair 0.53 0.23 0.32 35\n", " book 1.00 0.30 0.47 33\n", " bottle 0.60 0.67 0.63 9\n", " cup 0.93 0.81 0.87 16\n", " dining table 0.50 0.62 0.55 13\n", "traffic light 0.50 0.46 0.48 13\n", " bowl 0.71 0.38 0.50 13\n", " handbag 0.50 0.18 0.26 17\n", "\n", " micro avg 0.81 0.64 0.72 467\n", " macro avg 0.69 0.50 0.56 467\n", " weighted avg 0.81 0.64 0.70 467\n", "\n" ] } ], "source": [ "# Get the 10 most common classes in the dataset\n", "counts = dataset.count_values(\"ground_truth.detections.label\")\n", "classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]\n", "\n", "# Print a classification report for the top-10 classes\n", "results.print_report(classes=classes_top10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also compute the mean average-precision (mAP) of our detector:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.3519380509318074\n" ] } ], "source": [ "print(results.mAP())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections) uses the official [COCO evaluation protocol](https://cocodataset.org/#detection-eval), this mAP value will match what `pycocotools` would report.\n", "\n", "We can also view some precision-recall (PR) curves for specific classes of our model:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install ipywidgets to view the PR curves:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install 'ipywidgets>=8,<9'" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": {}, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b2bf1a19f4b748d39ec822b60fb9a5b3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "FigureWidget({\n", " 'data': [{'customdata': array([0.99988198, 0.99976206, 0.99951147, 0.9986501 , 0.89960496, 0.89951644,\n", " 0.89939888, 0.89922802, 0.89912866, 0.89890523, 0.89880363, 0.89853175,\n", " 0.89828404, 0.89820041, 0.8979461 , 0.8976577 , 0.89749938, 0.89728618,\n", " 0.89644278, 0.8958758 , 0.89509976, 0.8889141 , 0.88526444, 0.87643276,\n", " 0.79657969, 0.79617561, 0.79541582, 0.79494855, 0.79427896, 0.79371219,\n", " 0.79325874, 0.79228693, 0.79010382, 0.78802435, 0.78699406, 0.78525752,\n", " 0.78358504, 0.78199387, 0.77871775, 0.7757645 , 0.77280099, 0.76468775,\n", " 0.75996894, 0.75730504, 0.67831692, 0.67567982, 0.67354833, 0.66996306,\n", " 0.66708532, 0.66099035, 0.65253912, 0.64910335, 0.6437327 , 0.6400404 ,\n", " 0.6362098 , 0.55842573, 0.55395973, 0.55032358, 0.54633195, 0.53838573,\n", " 0.5347954 , 0.53086147, 0.5256753 , 0.517045 , 0.51440163, 0.43472316,\n", " 0.42722535, 0.41919324, 0.41490045, 0.40448562, 0.32215623, 0.3179356 ,\n", " 0.31494951, 0.31069795, 0.30823869, 0.23063738, 0.15299265, 0.15200521,\n", " 0.07574205, 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0. , 0. , 0. ,\n", " 0. , 0. , 0. , 0. , 0. ]),\n", " 'hovertemplate': ('class: %{text}
recal' ... 'customdata:.3f}'),\n", " 'line': {'color': '#3366CC'},\n", " 'mode': 'lines',\n", " 'name': 'person (AP = 0.509)',\n", " 'text': array(['person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person', 'person', 'person', 'person', 'person',\n", " 'person', 'person', 'person'], dtype='class: %{text}
recal' ... 'customdata:.3f}'),\n", " 'line': {'color': '#DC3912'},\n", " 'mode': 'lines',\n", " 'name': 'car (AP = 0.344)',\n", " 'text': array(['car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car', 'car',\n", " 'car'], dtype='\n" ] } ], "source": [ "# Our detections have helpful evaluation data on them\n", "sample = high_conf_view.first()\n", "print(sample.faster_rcnn.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These extra fields were added because we provided the ``eval_key`` parameter to [evaluate_detections()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections). If we had omitted this parameter, then no information would have been recorded on our samples.\n", "\n", "Don't worry, if you forget what evaluations you've run, you can retrieve information about the evaluation later:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['eval']\n" ] } ], "source": [ "print(dataset.list_evaluations())" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " \"key\": \"eval\",\n", " \"version\": \"0.24.0\",\n", " \"timestamp\": \"2024-03-27T22:10:32.599000\",\n", " \"config\": {\n", " \"cls\": \"fiftyone.utils.eval.coco.COCOEvaluationConfig\",\n", " \"type\": \"detection\",\n", " \"method\": \"coco\",\n", " \"pred_field\": \"faster_rcnn\",\n", " \"gt_field\": \"ground_truth\",\n", " \"iou\": 0.5,\n", " \"classwise\": true,\n", " \"iscrowd\": \"iscrowd\",\n", " \"use_masks\": false,\n", " \"use_boxes\": false,\n", " \"tolerance\": null,\n", " \"compute_mAP\": true,\n", " \"iou_threshs\": [\n", " 0.5,\n", " 0.55,\n", " 0.6,\n", " 0.65,\n", " 0.7,\n", " 0.75,\n", " 0.8,\n", " 0.85,\n", " 0.9,\n", " 0.95\n", " ],\n", " \"max_preds\": 100,\n", " \"error_level\": 1\n", " }\n", "}\n" ] } ], "source": [ "print(dataset.get_evaluation_info(\"eval\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even load the view on which you ran an evaluation by calling the [load_evaluation_view()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html?highlight=load_evaluation_view#fiftyone.core.collections.SampleCollection.load_evaluation_view) method on the parent dataset:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: evaluate-detections-tutorial\n", "Media type: image\n", "Num samples: 100\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " faster_rcnn: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", "View stages:\n", " 1. Take(size=100, seed=51)\n", " 2. FilterLabels(field='faster_rcnn', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)\n" ] } ], "source": [ "# Load the view on which we ran the `eval` evaluation\n", "eval_view = dataset.load_evaluation_view(\"eval\")\n", "print(eval_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you can delete an evaluation from a dataset, including any information that was added to your samples, by calling [delete_evaluation()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.delete_evaluation)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Evaluation views\n", "\n", "So, now that we have a sense for the aggregate performance of our model, let's dive into sample-level analysis by creating an [evaluation view](https://voxel51.com/docs/fiftyone/user_guide/app.html#viewing-evaluation-patches).\n", "\n", "Any evaluation that you stored on your dataset can be used to generate an [evaluation view](https://voxel51.com/docs/fiftyone/user_guide/app.html#viewing-evaluation-patches) that is a patches view creating a sample for every true positive, false positive, and false negative in your dataset.\n", "Through this view, you can quickly filter and sort evaluated detections by their type (TP/FP/FN), evaluated IoU, and if they are matched to a crowd object.\n", "\n", "These evaluation views can be created through Python or directly in the App as shown below." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: evaluate-detections-tutorial\n", "Media type: image\n", "Num patches: 37747\n", "Patch fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " sample_id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " faster_rcnn: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " crowd: fiftyone.core.fields.BooleanField\n", " type: fiftyone.core.fields.StringField\n", " iou: fiftyone.core.fields.FloatField\n", "View stages:\n", " 1. ToEvaluationPatches(eval_key='eval', config=None)\n" ] } ], "source": [ "eval_patches = dataset.to_evaluation_patches(\"eval\")\n", "print(eval_patches)" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "session.view = high_conf_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Evaluation Patches](images/evaluate_detections_eval_patches.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's use this evaluation view to find individual false positive detections with a confidence of 0.85 or greater." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![FP Confidence Filter](images/evaluate_detections_high_conf_fp.gif)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the best-performing samples\n", "\n", "To dig in further, let's create a view that sorts by `eval_tp` so we can see the best-performing cases of our model (i.e., the samples with the most correct predictions):" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# Show samples with most true positives\n", "session.view = high_conf_view.sort_by(\"eval_tp\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Best Samples](images/evaluate_detections_eval_tp.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View the worst-performing samples\n", "\n", "Similarly, we can sort by the `eval_fp` field to see the worst-performing cases of our model (i.e., the samples with the most false positive predictions):" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "# Show samples with most false positives\n", "session.view = high_conf_view.sort_by(\"eval_fp\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Worst Samples](images/evaluate_detections_eval_fp.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Filtering by bounding box area\n", "\n", "[Dataset views](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) are extremely powerful. For example, let's look at how our model performed on small objects by creating a view that contains only predictions whose bounding box area is less than `32^2` pixels:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "# Compute metadata so we can reference image height/width in our view\n", "dataset.compute_metadata()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "#\n", "# Create an expression that will match objects whose bounding boxes have\n", "# area less than 32^2 pixels\n", "#\n", "# Bounding box format is [top-left-x, top-left-y, width, height]\n", "# with relative coordinates in [0, 1], so we multiply by image\n", "# dimensions to get pixel area\n", "#\n", "bbox_area = (\n", " F(\"$metadata.width\") * F(\"bounding_box\")[2] *\n", " F(\"$metadata.height\") * F(\"bounding_box\")[3]\n", ")\n", "small_boxes = bbox_area < 32 ** 2\n", "\n", "# Create a view that contains only small (and high confidence) predictions\n", "small_boxes_view = high_conf_view.filter_labels(\"faster_rcnn\", small_boxes)\n", "\n", "session.view = small_boxes_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Small Bboxes](images/evaluate_detections_small_bboxes.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can always re-run evaluation to see how our detector fairs on only small boxes:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |███████████████████| 34/34 [339.1ms elapsed, 0s remaining, 100.3 samples/s] \n" ] } ], "source": [ "# Create a view that contains only small GT and predicted boxes\n", "small_boxes_eval_view = (\n", " high_conf_view\n", " .filter_labels(\"ground_truth\", small_boxes, only_matches=False)\n", " .filter_labels(\"faster_rcnn\", small_boxes, only_matches=False)\n", ")\n", "\n", "# Run evaluation\n", "small_boxes_results = small_boxes_eval_view.evaluate_detections(\n", " \"faster_rcnn\",\n", " gt_field=\"ground_truth\",\n", ")" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " person 0.66 0.44 0.53 80\n", " car 0.69 0.43 0.53 21\n", " chair 0.00 0.00 0.00 4\n", " book 0.00 0.00 0.00 20\n", " bottle 0.25 1.00 0.40 1\n", " cup 0.00 0.00 0.00 1\n", " dining table 0.00 0.00 0.00 2\n", "traffic light 0.56 0.33 0.42 15\n", " handbag 0.00 0.00 0.00 7\n", " boat 0.00 0.00 0.00 1\n", "\n", " micro avg 0.61 0.33 0.43 152\n", " macro avg 0.22 0.22 0.19 152\n", " weighted avg 0.50 0.33 0.39 152\n", "\n" ] } ], "source": [ "# Get the 10 most common small object classes\n", "small_counts = small_boxes_eval_view.count_values(\"ground_truth.detections.label\")\n", "classes_top10_small = sorted(small_counts, key=counts.get, reverse=True)[:10]\n", "\n", "# Print a classification report for the top-10 small object classes\n", "small_boxes_results.print_report(classes=classes_top10_small)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Viewing detections in a crowd\n", "\n", "If you're familiar with the [COCO data format](https://cocodataset.org/#format-data), you'll know that the ground truth annotations have an `iscrowd = 0/1` attribute that indicates whether a box contains multiple instances of the same object." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# View the `iscrowd` attribute on a ground truth object\n", "sample = dataset.first()\n", "print(sample.ground_truth.detections[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a view that contains only samples with at least one detection for which `iscrowd` is 1:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [], "source": [ "# Create a view that contains only samples for which at least one detection has \n", "# its iscrowd attribute set to 1\n", "crowded_images_view = high_conf_view.match(\n", " F(\"ground_truth.detections\").filter(F(\"iscrowd\") == 1).length() > 0\n", ")\n", "\n", "session.view = crowded_images_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Crowd Boxes](images/evaluate_detections_crowded.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### More complex insights\n", "\n", "Let's combine our previous operations to form more complex queries that provide deeper insight into the quality of our detections.\n", "\n", "For example, let's sort our view of crowded images from the previous section in decreasing order of false positive counts, so that we can see samples that have many (allegedly) spurious predictions in images that are known to contain crowds of objects:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "session.view = crowded_images_view.sort_by(\"eval_fp\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Crowd FPs](images/evaluate_detections_crowded_fp.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's compare the above view to another view that just sorts by false positive count, regardless of whether the image is crowded:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "scrolled": false }, "outputs": [], "source": [ "session.view = high_conf_view.sort_by(\"eval_fp\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![High Conf Eval FPs Sort](images/evaluate_detections_high_conf_eval_fp.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This was one of the first images in the view. As we can see, while the evaluation is detecting $7$ false positives, all of the model's predictions seem accurate. It is just that the ground truth labels lumped a bunch of orange slices together into one box." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**See anything interesting?**\n", "\n", "What you find will likely be different because a random subset of samples were chosen. In our case, we find missing ground truth boxes for two of the laptop keyboards, a bottle, and even perhaps a cell phone. The model did not confidently predict many of the boxes in this image, but from a high-level, an example like this makes us consider the consequences of including complex or dense images in datasets. It will likely mean incorrect or incomplete ground truth annotations the annotators are not diligent! And that ultimately leads to confused models, and misinformed evaluations.\n", "\n", "This conclusion would have been nearly impossible to achieve without visually inspecting the individual samples in the dataset according to the variety of criteria that we considered in this tutorial." ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "session.freeze() # screenshot the active App for sharing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tagging and next steps\n", "\n", "In practice, the next step is to take action on the issues that we identified above. A natural first step is to *tag* the issues so they can be retrieved and dealt with later. FiftyOne provides support for [tagging samples and labels](https://voxel51.com/docs/fiftyone/user_guide/app.html#tags-and-tagging), both programmatically and via the App.\n", "\n", "In your App instance, try tagging the predictions with missing ground truth detections. You can do this by clicking on the boxes of the predictions of interest and using the tagging element in the top-right corner to assign a `possibly-missing` tag.\n", "\n", "Alternatively, we can programmatically tag a batch of labels by creating a view that contains the objects of interest and then applying [tag_labels()](https://voxel51.com/docs/fiftyone/user_guide/fiftyone.core.collections.html?highlight=tag_labels#fiftyone.core.collections.SampleCollection.tag_labels):" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "# Tag all highly confident false positives as \"possibly-missing\"\n", "(\n", " high_conf_view\n", " .filter_labels(\"faster_rcnn\", F(\"eval\") == \"fp\")\n", " .select_fields(\"faster_rcnn\")\n", " .tag_labels(\"possibly-missing\")\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These tagged labels could then be sent off to our annotation provider of choice for review and addition to the ground truth labels. FiftyOne currently offers integrations for [Scale AI](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.scale.html), [Labelbox](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.labelbox.html), and [CVAT](https://voxel51.com/docs/fiftyone/api/fiftyone.types.dataset_types.html?highlight=cvat#fiftyone.types.dataset_types.CVATImageDataset).\n", "\n", "For example, the snippet below exports the tagged labels and their source media to disk in CVAT format:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Export all labels with the `possibly-missing` tag in CVAT format\n", "(\n", " dataset\n", " .select_labels(tags=[\"possibly-missing\"])\n", " .export(\"/path/for/export\", fo.types.CVATImageDataset)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "In this tutorial, we covered loading a dataset into FiftyOne and analyzing the performance of an out-of-the-box object detection model on the dataset.\n", "\n", "**So, what's the takeaway?**\n", "\n", "Aggregate evaluation results for an object detector are important, but they alone don't tell the whole story of a model's performance. It's critical to study the failure modes of your model so you can take the right actions to improve them.\n", "\n", "In this tutorial, we covered two types of analysis:\n", "\n", "- Analyzing the performance of your detector across different strata, like high confidence, small objects in crowded scenes\n", "- Inspecting the hardest samples in your dataset to diagnose the underlying issue, whether it be your detector or the ground truth annotations" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.0" } }, "nbformat": 4, "nbformat_minor": 4 }