{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Removing Duplicate Objects\n", "\n", "This recipe demonstrates a simple workflow for finding and removing duplicate objects in your FiftyOne datasets using [intersection over union (IoU)](https://en.wikipedia.org/wiki/Jaccard_index).\n", "\n", "Specificially, it covers:\n", "\n", "- Using the [compute_max_ious()](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) utility to compute overlap between spatial objects\n", "- Using the [App's tagging UI](https://voxel51.com/docs/fiftyone/user_guide/app.html#tags-and-tagging) to review and delete duplicate labels\n", "- Using FiftyOne's [CVAT integration](https://voxel51.com/docs/fiftyone/integrations/cvat.html) to edit duplicate labels\n", "- Using the [find_duplicates()](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.iou.html#fiftyone.utils.iou.find_duplicates) utility to automatically detect duplicate objects\n", "\n", "Also, check out [our blog post](https://towardsdatascience.com/iou-a-better-detection-evaluation-metric-45a511185be1) for more information about using IoU to evaluate your object detection models." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load a dataset\n", "\n", "In this recipe, we'll work with the validation split of the [COCO dataset](https://cocodataset.org/#home), which is conveniently available for download via the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#coco-2017).\n", "\n", "The snippet below downloads and loads a subset of the validation split into FiftyOne:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading split 'validation' to '/Users/Brian/fiftyone/coco-2017/validation' if necessary\n", "Found annotations at '/Users/Brian/fiftyone/coco-2017/raw/instances_val2017.json'\n", "Sufficient images already downloaded\n", "Existing download of split 'validation' is sufficient\n", "Loading 'coco-2017' split 'validation'\n", " 100% |███████████████| 1000/1000 [4.9s elapsed, 0s remaining, 216.7 samples/s] \n", "Dataset 'coco-2017-validation-1000' created\n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dataset = foz.load_zoo_dataset(\"coco-2017\", split=\"validation\", max_samples=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's print the dataset to see what we downloaded:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: coco-2017-validation-1000\n", "Media type: image\n", "Num samples: 1000\n", "Persistent: False\n", "Tags: ['validation']\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n" ] } ], "source": [ "print(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Finding duplicate objects\n", "\n", "Now let's use the [compute_max_ious()](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) utility to compute the maximum IoU between each object in the `ground_truth` field with another object of the same class (`classwise=True`) within the same image.\n", "\n", "The max IOU will be stored in a `max_iou` attribute of each object, and the idea here is that duplicate objects will necessarily have high [IoU](https://en.wikipedia.org/wiki/Jaccard_index) with another object." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |███████████████| 1000/1000 [3.2s elapsed, 0s remaining, 348.2 samples/s] \n", "Max IoU range: (0.000000, 0.951640)\n" ] } ], "source": [ "import fiftyone.utils.iou as foui\n", "\n", "foui.compute_max_ious(dataset, \"ground_truth\", iou_attr=\"max_iou\", classwise=True)\n", "print(\"Max IoU range: (%f, %f)\" % dataset.bounds(\"ground_truth.detections.max_iou\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that [compute_max_ious()](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.iou.html#fiftyone.utils.iou.compute_max_ious) provides an optional `other_field` parameter if you would like to compute IoUs between objects in different fields instead.\n", "\n", "In any case, let's [create a view](https://voxel51.com/docs/fiftyone/user_guide/using_views.html#filtering-sample-contents) that contains only labels with a max IoU > 0.75:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: coco-2017-validation-1000\n", "Media type: image\n", "Num samples: 7\n", "Tags: ['validation']\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", "View stages:\n", " 1. FilterLabels(field='ground_truth', filter={'$gt': ['$$this.max_iou', 0.75]}, only_matches=True, trajectories=False)\n" ] } ], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Retrieve detections that overlap above a chosen threshold\n", "dups_view = dataset.filter_labels(\"ground_truth\", F(\"max_iou\") > 0.75)\n", "print(dups_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and load it in the App:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "