{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding Classification Mistakes with FiftyOne\n", "\n", "Annotations mistakes create an artificial ceiling on the performance of your models. However, finding these mistakes by hand is at least as arduous as the original annotation work! Enter FiftyOne.\n", "\n", "In this tutorial, we explore how FiftyOne can be used to help you find mistakes in your classification annotations. To detect mistakes in detection datasets, check out [this tutorial](https://voxel51.com/docs/fiftyone/tutorials/detection_mistakes.html).\n", "\n", "We'll cover the following concepts:\n", "\n", "- Loading your existing dataset [into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html)\n", "- [Adding model predictions](https://voxel51.com/docs/fiftyone/recipes/adding_classifications.html) to your dataset\n", "- Computing insights into your dataset relating to [possible label mistakes](https://voxel51.com/docs/fiftyone/user_guide/brain.html#label-mistakes)\n", "- Visualizing mistakes in the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n", "\n", "**So, what's the takeaway?**\n", "\n", "FiftyOne can help you find and correct label mistakes in your datasets, enabling you to curate higher quality datasets and, ultimately, train better models!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll also need `torch` and `torchvision` installed:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!pip install torch torchvision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this tutorial, we'll use a pretrained CIFAR-10 PyTorch model (a ResNet-50) from the web:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download the software\n", "!git clone --depth 1 --branch v2.1 https://github.com/huyvnphan/PyTorch_CIFAR10.git\n", "\n", "# Download the pretrained model (90MB)\n", "!eta gdrive download --public \\\n", " 1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu \\\n", " PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Manipulating the data\n", "\n", "For this walkthrough, we will artificially perturb an existing dataset with mistakes on the labels. Of course, in your normal workflow, you would not add labeling mistakes; this is only for the sake of the walkthrough.\n", "\n", "The code block below loads the test split of the [CIFAR-10 dataset](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html#cifar-10) into FiftyOne and randomly breaks 10% (1000 samples) of the labels:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import random\n", "\n", "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "# Load the CIFAR-10 test split\n", "# Downloads the dataset from the web if necessary\n", "dataset = foz.load_zoo_dataset(\"cifar10\", split=\"test\")\n", "\n", "# Get the CIFAR-10 classes list\n", "classes = dataset.default_classes\n", "\n", "# Artificially corrupt 10% of the labels\n", "_num_mistakes = int(0.1 * len(dataset))\n", "for sample in dataset.take(_num_mistakes):\n", " mistake = random.randint(0, 9)\n", " while classes[mistake] == sample.ground_truth.label:\n", " mistake = random.randint(0, 9)\n", "\n", " sample.tags.append(\"mistake\")\n", " sample.ground_truth = fo.Classification(label=classes[mistake])\n", " sample.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's print some information about the dataset to verify the operation that we\n", "performed:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Name: cifar10-test\n", "Media type: image\n", "Num samples: 10000\n", "Persistent: False\n", "Tags: ['mistake', 'test']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n" ] } ], "source": [ "# Verify that the `mistake` tag is now in the dataset's schema\n", "print(dataset)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1000 ground truth labels are now mistakes\n" ] } ], "source": [ "# Count the number of samples with the `mistake` tag\n", "num_mistakes = len(dataset.match_tags(\"mistake\"))\n", "print(\"%d ground truth labels are now mistakes\" % num_mistakes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add predictions to the dataset\n", "\n", "Using an off-the-shelf model, let's now add predictions to the dataset, which\n", "are necessary for us to deduce some understanding of the possible label\n", "mistakes.\n", "\n", "The code block below adds model predictions to another randomly chosen 10%\n", "(1000 samples) of the dataset:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |███████████████████| 50/50 [11.0s elapsed, 0s remaining, 4.7 samples/s] \n" ] } ], "source": [ "import sys\n", "\n", "import numpy as np\n", "import torch\n", "import torchvision\n", "from torch.utils.data import DataLoader\n", "\n", "import fiftyone.utils.torch as fout\n", "\n", "sys.path.insert(1, \"PyTorch_CIFAR10\")\n", "from cifar10_models import resnet50\n", "\n", "\n", "def make_cifar10_data_loader(image_paths, sample_ids, batch_size):\n", " mean = [0.4914, 0.4822, 0.4465]\n", " std = [0.2023, 0.1994, 0.2010]\n", " transforms = torchvision.transforms.Compose(\n", " [\n", " torchvision.transforms.ToTensor(),\n", " torchvision.transforms.Normalize(mean, std),\n", " ]\n", " )\n", " dataset = fout.TorchImageDataset(\n", " image_paths, sample_ids=sample_ids, transform=transforms\n", " )\n", " return DataLoader(dataset, batch_size=batch_size, num_workers=4)\n", "\n", "\n", "def predict(model, imgs):\n", " logits = model(imgs).detach().cpu().numpy()\n", " predictions = np.argmax(logits, axis=1)\n", " odds = np.exp(logits)\n", " confidences = np.max(odds, axis=1) / np.sum(odds, axis=1)\n", " return predictions, confidences, logits\n", "\n", "\n", "#\n", "# Load a model\n", "#\n", "# Model performance numbers are available at:\n", "# https://github.com/huyvnphan/PyTorch_CIFAR10\n", "#\n", "\n", "model = resnet50(pretrained=True)\n", "model_name = \"resnet50\"\n", "\n", "#\n", "# Extract a few images to process\n", "# (some of these will have been manipulated above)\n", "#\n", "\n", "num_samples = 1000\n", "batch_size = 20\n", "view = dataset.take(num_samples)\n", "image_paths, sample_ids = zip(*[(s.filepath, s.id) for s in view.iter_samples()])\n", "data_loader = make_cifar10_data_loader(image_paths, sample_ids, batch_size)\n", "\n", "#\n", "# Perform prediction and store results in dataset\n", "#\n", "\n", "with fo.ProgressBar() as pb:\n", " for imgs, sample_ids in pb(data_loader):\n", " predictions, _, logits_ = predict(model, imgs)\n", "\n", " # Add predictions to your FiftyOne dataset\n", " for sample_id, prediction, logits in zip(sample_ids, predictions, logits_):\n", " sample = dataset[sample_id]\n", " sample.tags.append(\"processed\")\n", " sample[model_name] = fo.Classification(\n", " label=classes[prediction], logits=logits,\n", " )\n", " sample.save()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's print some information about the predictions that were generated and how\n", "many of them correspond to samples whose ground truth labels were corrupted:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Added predictions to 1000 samples\n", "86 of these samples have label mistakes\n" ] } ], "source": [ "# Count the number of samples with the `processed` tag\n", "num_processed = len(dataset.match_tags(\"processed\"))\n", "\n", "# Count the number of samples with both `processed` and `mistake` tags\n", "num_corrupted = len(dataset.match_tags(\"processed\").match_tags(\"mistake\"))\n", "\n", "print(\"Added predictions to %d samples\" % num_processed)\n", "print(\"%d of these samples have label mistakes\" % num_corrupted)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the mistakes\n", "\n", "Now we can run a method from FiftyOne that estimates the mistakenness of the\n", "ground samples for which we generated predictions:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing mistakenness...\n", " 100% |███████████████| 1000/1000 [2.4s elapsed, 0s remaining, 446.1 samples/s] \n", "Mistakenness computation complete\n" ] } ], "source": [ "import fiftyone.brain as fob\n", "\n", "# Get samples for which we added predictions\n", "h_view = dataset.match_tags(\"processed\")\n", "\n", "# Compute mistakenness\n", "fob.compute_mistakenness(h_view, model_name, label_field=\"ground_truth\", use_logits=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above method added `mistakenness` field to all samples for which we added\n", "predictions. We can easily sort by likelihood of mistakenness from code:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: cifar10-test\n", "Media type: image\n", "Num samples: 1000\n", "Tags: ['mistake', 'processed', 'test']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " resnet50: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " mistakenness: fiftyone.core.fields.FloatField\n", "View stages:\n", " 1. MatchTags(tags=['processed'])\n", " 2. SortBy(field_or_expr='mistakenness', reverse=True)\n" ] } ], "source": [ "# Sort by likelihood of mistake (most likely first)\n", "mistake_view = (dataset\n", " .match_tags(\"processed\")\n", " .sort_by(\"mistakenness\", reverse=True)\n", ")\n", "\n", "# Print some information about the view\n", "print(mistake_view)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[,\n", " 'resnet50': ,\n", " 'mistakenness': 0.9778614850560818,\n", "}>, ,\n", " 'resnet50': ,\n", " 'mistakenness': 0.967886808991774,\n", "}>, ,\n", " 'resnet50': ,\n", " 'mistakenness': 0.9653186284617471,\n", "}>]\n" ] } ], "source": [ "# Inspect the first few samples\n", "print(mistake_view.head())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's open the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) to visually inspect the results:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(dataset)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show only the samples that were processed\n", "session.view = dataset.match_tags(\"processed\")" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show only the samples for which we added label mistakes\n", "session.view = dataset.match_tags(\"mistake\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Show the samples we processed in rank order by the mistakenness\n", "session.view = mistake_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In a real world scenario, we would then take the ground truth classifications that are likely mistakes and send them off to our annotation provider of choice as annotations to be reviewed. FiftyOne currently offers integrations for both [Labelbox](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.labelbox.html) and [Scale](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.scale.html)." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "session.freeze() # screenshot the active App for sharing" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }