{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " Try in Google Colab\n", " \n", " \n", " \n", " \n", " Share via nbviewer\n", " \n", " \n", " \n", " \n", " View on GitHub\n", " \n", " \n", " \n", " \n", " Download notebook\n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Exploring Image Hardness\n", "\n", "Say you have a repository of data with millions of unlabeled images in it. You managed to label a subset of data and trained an image classification model on it, but it's not performing as well as you hope. **How do you decide \n", "which new samples to annotate and add to your training set?**\n", "\n", "You could just randomly select new samples to annotate, but there is a better way. Hard sample mining is a tried and true method to distill a large amount of raw unlabeled data into smaller high quality labeled datasets.\n", "\n", "*A hard sample is one that is difficult for your machine learning (ML) model to correctly predict the label of.* \n", "\n", "In an image classification dataset, a hard sample could be anything from a cat that looks like a dog to a blurry resolution image. If you expect your model to perform well on these hard samples, then you may need to \"mine\" more examples of these hard samples to add to your training dataset. Exposing your model to more hard samples during training will allow it to perform better on those types of samples later on.\n", "\n", "\n", "Hard samples are useful for more than just training data, they are also necessary to include in your test set. If your test data is composed primarily of easy samples, then your [performance will soon reach an upper bound causing progress to stagnate](https://www.sciencedirect.com/science/article/abs/pii/S0925231219316984). Adding hard samples to a test set will give you a better idea of how models perform in harder edge cases and can provide more insight into which models are more reliable.\n", "\n", "## Overview:\n", "\n", "In this walkthrough, we explore how [FiftyOne’s image hardness tool](https://voxel51.com/docs/fiftyone/user_guide/brain.html) can be used to analyze and improve datasets.\n", "\n", "We’ll cover the following concepts:\n", "\n", "* Loading a dataset from the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/index.html)\n", "\n", "* Applying [FiftyOne’s sample hardness algorithm](https://voxel51.com/docs/fiftyone/user_guide/brain.html) to your dataset\n", "\n", "* Launching the [FiftyOne App and visualizing/exploring your data](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n", "\n", "* Identifying the hardest samples in your dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install torch torchvision" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load your data\n", "\n", "For this example, we will be using the test split of the image classification dataset, [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html). This dataset contains 10,000 test images labeled across 10 different classes. This is one of the dozens of datasets in the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/index.html), so we can easily load it up." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import fiftyone.zoo as foz" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Split 'test' already downloaded\n", "Loading 'cifar10' split 'test'\n", " 100% |████████████████████████████████████████████████████████████| 10000/10000 [9.2s elapsed, 0s remaining, 1.1K samples/s] \n", "Dataset 'cifar10-test' created\n" ] } ], "source": [ "dataset = foz.load_zoo_dataset(\"cifar10\", split=\"test\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: You can also [load your own dataset into FiftyOne](https://voxel51.com/docs/fiftyone/user_guide/dataset_creation/index.html). It supports labels for many computer vision tasks including [classification, detection, segmentation, keypoints, and more](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#labels)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Add logits\n", "In order to calculate hardness on images in FiftyOne, you first need to use a model to compute logits for those images. You can use any model you want, but ideally, it would be one trained similar data and on the same task you will be using these new images for.\n", "\n", "In this example, we will be using code from the [PyTorch CIFAR-10 repository](https://github.com/huyvnphan/PyTorch_CIFAR10/tree/v2.1), namely the pretrained ResNet50 classifier.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Download the software\n", "!git clone --depth 1 --branch v2.1 https://github.com/huyvnphan/PyTorch_CIFAR10.git\n", "\n", "# Download the pretrained model (90MB)\n", "!eta gdrive download --public \\\n", " 1dGfpeFK_QG0kV-U6QDHMX2EOGXPqaNzu \\\n", " PyTorch_CIFAR10/cifar10_models/state_dicts/resnet50.pt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "You can easily add a classification field with [logits to your samples in a FiftyOne dataset.](https://voxel51.com/docs/fiftyone/recipes/model_inference.html?highlight=logits)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "import sys\n", "\n", "import numpy as np\n", "import torch\n", "import torchvision\n", "from torch.utils.data import DataLoader\n", "\n", "import fiftyone.utils.torch as fout\n", "\n", "sys.path.insert(1, \"PyTorch_CIFAR10\")\n", "from cifar10_models import resnet50\n", "\n", "\n", "# Set up a data loader in accordance to PyTorch CIFAR10\n", "def make_cifar10_data_loader(image_paths, sample_ids, batch_size):\n", " mean = [0.4914, 0.4822, 0.4465]\n", " std = [0.2023, 0.1994, 0.2010]\n", " transforms = torchvision.transforms.Compose(\n", " [\n", " torchvision.transforms.ToTensor(),\n", " torchvision.transforms.Normalize(mean, std),\n", " ]\n", " )\n", " dataset = fout.TorchImageDataset(\n", " image_paths, sample_ids=sample_ids, transform=transforms\n", " )\n", " return DataLoader(dataset, batch_size=batch_size, num_workers=4)\n", "\n", "\n", "# Run inference on the model to generate predictions and logits\n", "def predict(model, imgs):\n", " logits = model(imgs).detach().cpu().numpy()\n", " predictions = np.argmax(logits, axis=1)\n", " odds = np.exp(logits)\n", " confidences = np.max(odds, axis=1) / np.sum(odds, axis=1)\n", " return predictions, confidences, logits" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#\n", "# Load a pretrained model\n", "#\n", "# Model performance numbers are available at:\n", "# https://github.com/huyvnphan/PyTorch_CIFAR10\n", "#\n", "\n", "model = resnet50(pretrained=True)\n", "model_name = \"resnet50\"\n", "\n", "view = dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: If you want this notebook to run faster, select a subset of samples from the dataset" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "# Uncomment the lines below to select a random subset of 1000 samples\n", "\n", "# num_samples = 10000\n", "# view = dataset.take(num_samples, seed=51)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "batch_size = 20\n", "\n", "# Get the list of classes from the dataset information\n", "classes = dataset.info[\"classes\"]\n", "\n", "image_paths, sample_ids = zip(\n", " *[(s.filepath, s.id) for s in view.select_fields([\"filepath\", \"id\"])]\n", ")\n", "\n", "# Create a PyTorch data loader\n", "data_loader = make_cifar10_data_loader(image_paths, sample_ids, batch_size)\n", "\n", "#\n", "# Perform prediction and store results in dataset\n", "#\n", "\n", "for imgs, sample_ids in data_loader:\n", " predictions, confidences, logits_ = predict(model, imgs)\n", "\n", " # Add predictions to your FiftyOne dataset\n", " for sample_id, prediction, confidence, logits in zip(sample_ids, predictions, confidences, logits_):\n", " sample = dataset[sample_id]\n", " sample.tags.append(\"processed\")\n", " sample[model_name] = fo.Classification(\n", " label=classes[prediction], logits=logits, confidence=confidence\n", " )\n", " sample.save()" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "processed_view = dataset.match_tags([\"processed\"])" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset: cifar10-test\n", "Media type: image\n", "Num samples: 10000\n", "Tags: ['processed', 'test']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " resnet50: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "View stages:\n", " 1. MatchTags(tag=['processed'])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "processed_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html) to take a look at this dataset." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(view=processed_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compute hardness\n", "\n", "[The FiftyOne Brain](https://voxel51.com/docs/fiftyone/user_guide/brain.html) contains various useful methods that can provide insights into your data. You can compute the uniqueness of your data, the hardest samples, as well as annotation mistakes. These are all different ways to generate scalar metrics on your dataset that will let you better understand the quality of existing data as well as select help high-quality new samples of data.\n", "\n", "Once you have loaded your dataset and added logits to your samples, you calculate hardness in one line of code. The hardness algorithm is closed-source, but the basic idea is to leverage the relative uncertainty of the model's predictions to assign a scalar hardness value to each sample." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "import fiftyone.brain as fob" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing hardness...\n", " 100% |████████████████████████████████████████████████████████████| 10000/10000 [24.1s elapsed, 0s remaining, 409.4 samples/s] \n", "Hardness computation complete\n" ] } ], "source": [ "fob.compute_hardness(processed_view, label_field=model_name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore and identify the hardest samples\n", "\n", "You can visualize your dataset and explore the samples with the highest and lowest hardness scores with the FiftyOne App." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = processed_view.sort_by(\"hardness\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While this example is using small images from CIFAR-10, FiftyOne also works with high-resolution images and videos.\n", "\n", "We can write some queries to dig a bit deeper into these hardness calculations and how they relate to other aspects of the data. For example, we can see the distribution of hardness on correct and incorrect predictions of the model separately." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from fiftyone import ViewField as F" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: cifar10-test\n", "Media type: image\n", "Num samples: 8183\n", "Tags: ['processed', 'test']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " resnet50: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " hardness: fiftyone.core.fields.FloatField\n", "View stages:\n", " 1. MatchTags(tag=['processed'])\n", " 2. Match(filter={'$expr': {'$eq': [...]}})\n", "Dataset: cifar10-test\n", "Media type: image\n", "Num samples: 1817\n", "Tags: ['processed', 'test']\n", "Sample fields:\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " resnet50: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " hardness: fiftyone.core.fields.FloatField\n", "View stages:\n", " 1. MatchTags(tag=['processed'])\n", " 2. Match(filter={'$expr': {'$ne': [...]}})\n" ] } ], "source": [ "# Correct Preds\n", "correct_view = processed_view.match(F(\"ground_truth.label\") == F(model_name+\".label\"))\n", "\n", "# Incorrect Preds\n", "incorrect_view = processed_view.match(F(\"ground_truth.label\") != F(model_name+\".label\"))\n", "\n", "print(correct_view)\n", "print(incorrect_view)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Incorrect predictions avg hardness: 0.898047\n", "Correct predictions avg hardness: 0.518320\n", "Total avg hardness: 0.587316\n" ] } ], "source": [ "print(\"Incorrect predictions avg hardness: %f\" % (incorrect_view.sum(\"hardness\")/incorrect_view.count()))\n", "print(\"Correct predictions avg hardness: %f\" % (correct_view.sum(\"hardness\")/correct_view.count()))\n", "print(\"Total avg hardness: %f\" % (processed_view.sum(\"hardness\")/processed_view.count()))" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "incorrect_view.clone_sample_field(\"hardness\", \"hardness_incorrect\")\n", "correct_view.clone_sample_field(\"hardness\", \"hardness_correct\")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "for sample in processed_view:\n", " if sample.id in correct_view:\n", " sample[\"prediction\"] = fo.Classification(label=\"correct\")\n", " else:\n", " sample[\"prediction\"] = fo.Classification(label=\"incorrect\")\n", " sample.save()" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = processed_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you might expect, the figure above shows that the distribution of hardness for correct predictions skews towards lower hardness values while incorrect predictions are spread more evenly at high hardness values. This indicates that samples that the model predicts incorrectly tend to be harder samples. Thus, adding harder samples to the training set should improve model performance.\n", "\n", "We can also see how the hardness of samples is distributed across different classes." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Average classwise hardness\n", "\n", "cat: 0.694020\n", "dog: 0.624073\n", "deer: 0.591152\n", "bird: 0.584566\n", "truck: 0.581653\n", "frog: 0.581470\n", "airplane: 0.576446\n", "horse: 0.565170\n", "ship: 0.542408\n", "automobile: 0.532203\n" ] } ], "source": [ "cls_hardness = []\n", "\n", "for label in processed_view.distinct(\"ground_truth.label\"):\n", " label_view = processed_view.match(F(\"ground_truth.label\")==label)\n", " avg_hardness = label_view.sum(\"hardness\")/label_view.count()\n", " \n", " num_correct = correct_view.match(F(\"ground_truth.label\")==label).count()\n", " accuracy = num_correct/label_view.count()\n", " \n", " cls_hardness.append([avg_hardness, label, accuracy])\n", "\n", "print(\"Average classwise hardness\\n\")\n", "for avg_hardness, label, _ in sorted(cls_hardness, reverse=True):\n", " print(\"%s: %f\" % (label, avg_hardness))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems that cat and dog tend to be the hardest classes so it would be worthwhile adding more examples of these before other classes." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "\n", "avg_hardness = [i[0] for i in cls_hardness]\n", "labels = [i[1] for i in cls_hardness]\n", "acc = [i[2] for i in cls_hardness]\n", "\n", "plt.scatter(avg_hardness, acc)\n", "plt.xlabel(\"Hardness\")\n", "plt.ylabel(\"Accuracy\")\n", "for i, label in enumerate(labels):\n", " plt.annotate(label, (avg_hardness[i]+.002, acc[i]))\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that there is a fairly strong anti-correlation between the average hardness of the samples in a class and the accuracy of the model on that class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the incorrectly predicted samples of the hardest class, \"cat\"." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = incorrect_view.match(F(\"ground_truth.label\")==\"cat\").sort_by(\"hardness\", reverse=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's take a look at the correctly predicted images of cats with the lowest hardness." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = correct_view.match(F(\"ground_truth.label\")==\"cat\").sort_by(\"hardness\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the hardest incorrectly predicted cat images with the easiest correctly predicted cat images, we can see that the model has a much easier time classifying images of cats faces looking directly at the camera. The images of cats that the model struggles the most with are ones of cats in poor lighting, complex backgrounds, and poses where they are not sitting and facing the camera. Now we have an idea of the types of cat images to look for to add to this dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What's next?\n", "\n", "This example was done on a previously annotated set of data in order to show how hardness relates to other aspects of a dataset. In a real-world application, you would now apply this method to new unlabeled data.\n", "\n", "Once you've identified the hardest samples you have available, it's time to update your dataset. You can select the X samples with the highest hardness value to send off to get annotated and added to your train or test set. Alternatively, you could select samples proportionally to the per-class hardness calculated above.\n", "\n", "Retraining your model on this new data should now allow it to perform better on harder edge cases. Additionally, adding these samples to your test set will let you be more confident in the ability of your model to perform well on new unseen data if it performs well on your test set.\n", "\n", "Now it's time to keep improving your dataset by [fixing annotation mistakes](https://voxel51.com/docs/fiftyone/recipes/detection_mistakenness.html) and [adding more unqiue samples](https://voxel51.com/docs/fiftyone/tutorials/uniqueness.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 2 }