{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Basic Evaluation\n", "In our first step, we will be covering how we can perform basic evaluation. One of the great parts of FiftyOne is that once your data and model predictions are in FiftyOne, evaluation becomes easy, no matter if you are coming from different formats. Gone are the days of converting your YOLO styled predictions to COCO styled evaluation, FiftyOne handles all the conversions for you so you can focus on the task at hand.\n", "\n", "Let's take a look first at loading a common a dataset with predictions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Installation\n", "\n", "Here are some packages that are needed to help run some of our demo code:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone torch ultralytics pycocotools" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a Zoo Dataset for Evaluation\n", "We will be loading the [quickstart](https://docs.voxel51.com/api/fiftyone.utils.quickstart.html) dataset from the [Dataset Zoo](https://docs.voxel51.com/dataset_zoo/index.html). This dataset is a slice of MSCOCO and contains some preloaded predictions. If you are unsure how to load your own detection dataset, be sure to checkout our [Getting Started with Detections](<../object_detection/index.html>)\n", "\n", "Once our dataset is loaded, we can start getting ready for model eval!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset already downloaded\n", "Loading existing dataset 'quickstart'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n", "Name: quickstart\n", "Media type: image\n", "Num samples: 200\n", "Persistent: False\n", "Tags: []\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " created_at: fiftyone.core.fields.DateTimeField\n", " last_modified_at: fiftyone.core.fields.DateTimeField\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " uniqueness: fiftyone.core.fields.FloatField\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", " eval_high_conf_tp: fiftyone.core.fields.IntField\n", " eval_high_conf_fp: fiftyone.core.fields.IntField\n", " eval_high_conf_fn: fiftyone.core.fields.IntField\n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dataset = foz.load_zoo_dataset(\"quickstart\")\n", "\n", "# View summary info about the dataset\n", "print(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we go further, let’s launch the [FiftyOne App](https://docs.voxel51.com/user_guide/app.html) and use the GUI to explore the dataset visually:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Session launched. Run `session.show()` to open the App in a cell output.\n" ] } ], "source": [ "session = fo.launch_app(dataset, auto=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Detections\n", "\n", "Now that we have samples with ground truth and predicted objects, let’s use FiftyOne to evaluate the quality of the detections.\n", "\n", "FiftyOne provides a powerful [evaluation API](https://docs.voxel51.com/user_guide/evaluation.html) that contains a collection of methods for performing evaluation of model predictions. Since we’re working with object detections here, we’ll use [detection evaluation](https://docs.voxel51.com/api/fiftyone.core.collections.html#fiftyone.core.collections.SampleCollection.evaluate_detections)\n", "\n", "We can run evaluation on our samples via `evaluate_detections()`. Note that this method is available on both the `Dataset` and `DatasetView` classes, which means that we can run evaluation on subsets of our dataset as well.\n", "\n", "By default, this method will use the COCO evaluation protocol, plus some extra goodies that we will use later." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |█████████████████| 200/200 [7.5s elapsed, 0s remaining, 18.3 samples/s] \n", "Performing IoU sweep...\n", " 100% |█████████████████| 200/200 [2.4s elapsed, 0s remaining, 76.0 samples/s] \n" ] } ], "source": [ "results = dataset.evaluate_detections(\n", " \"predictions\",\n", " gt_field=\"ground_truth\",\n", " eval_key=\"eval\",\n", " compute_mAP=True,\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analyzing Results\n", "\n", "The `results` object returned by the evaluation routine provides a number of convenient methods for analyzing our predictions.\n", "\n", "For example, let’s print a classification report for the top-10 most common classes in the dataset:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " person 0.52 0.94 0.67 716\n", " kite 0.59 0.88 0.71 140\n", " car 0.18 0.80 0.29 61\n", " bird 0.65 0.78 0.71 110\n", " carrot 0.09 0.74 0.16 47\n", " boat 0.09 0.46 0.16 37\n", " surfboard 0.17 0.73 0.28 30\n", "traffic light 0.32 0.79 0.45 24\n", " airplane 0.36 0.83 0.50 24\n", " bench 0.17 0.52 0.26 23\n", "\n", " micro avg 0.38 0.87 0.53 1212\n", " macro avg 0.31 0.75 0.42 1212\n", " weighted avg 0.47 0.87 0.60 1212\n", "\n" ] } ], "source": [ "# Get the 10 most common classes in the dataset\n", "counts = dataset.count_values(\"ground_truth.detections.label\")\n", "classes_top10 = sorted(counts, key=counts.get, reverse=True)[:10]\n", "\n", "# Print a classification report for the top-10 classes\n", "results.print_report(classes=classes_top10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also grab the mean average-precision (mAP) of our model as well:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.3957238101325776\n" ] } ], "source": [ "print(results.mAP())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate Subsets\n", "\n", "As mentioned before, we can evaluate `DatasetViews` as well! Let's evaluate only where our model is highly confident. First we will create a high confidence view, then evaluate with `evaluate_detections()` again. See using [Dataset Views](https://docs.voxel51.com/user_guide/using_views.html) for full details on matching, filtering, or sorting detections. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: quickstart\n", "Media type: image\n", "Num samples: 200\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " created_at: fiftyone.core.fields.DateTimeField\n", " last_modified_at: fiftyone.core.fields.DateTimeField\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " uniqueness: fiftyone.core.fields.FloatField\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_high_conf_tp: fiftyone.core.fields.IntField\n", " eval_high_conf_fp: fiftyone.core.fields.IntField\n", " eval_high_conf_fn: fiftyone.core.fields.IntField\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", "View stages:\n", " 1. FilterLabels(field='predictions', filter={'$gt': ['$$this.confidence', 0.75]}, only_matches=False, trajectories=False)\n" ] } ], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Only contains detections with confidence >= 0.75\n", "high_conf_view = dataset.filter_labels(\"predictions\", F(\"confidence\") > 0.75, only_matches=False)\n", "\n", "# Print some information about the view\n", "print(high_conf_view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can check out our new view in the session before we run evaluation:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session.view = high_conf_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just like before, lets run evaluation. Be sure to change the eval_key to a new name this time!" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |█████████████████| 200/200 [1.4s elapsed, 0s remaining, 113.3 samples/s] \n", "Performing IoU sweep...\n", " 100% |█████████████████| 200/200 [920.9ms elapsed, 0s remaining, 217.2 samples/s] \n", " precision recall f1-score support\n", "\n", " person 0.85 0.72 0.78 412\n", " kite 0.84 0.68 0.75 91\n", " car 0.74 0.51 0.60 61\n", " bird 0.91 0.48 0.63 64\n", " carrot 0.58 0.40 0.47 47\n", " boat 0.62 0.35 0.45 37\n", " surfboard 0.63 0.40 0.49 30\n", "traffic light 0.88 0.62 0.73 24\n", " airplane 0.90 0.79 0.84 24\n", " bench 0.88 0.30 0.45 23\n", "\n", " micro avg 0.82 0.62 0.70 813\n", " macro avg 0.78 0.53 0.62 813\n", " weighted avg 0.81 0.62 0.70 813\n", "\n", "0.3395358471186352\n" ] } ], "source": [ "results = high_conf_view.evaluate_detections(\n", " \"predictions\",\n", " gt_field=\"ground_truth\",\n", " eval_key=\"eval_high_conf\",\n", " compute_mAP=True,\n", ")\n", "\n", "# Print the same report to see the difference\n", "results.print_report(classes=classes_top10)\n", "print(results.mAP())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate for Classification\n", "\n", "Evaluation is just as easy for classification tasks. Once you have loaded up your dataset and model predictions, you can start with `dataset.evaluate_classifications()`\n", "\n", "If you need a refresher on how to work with classification datasets, head over to Getting Started with Classifications!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Split 'test' already downloaded\n", "Loading existing dataset 'cifar10-test'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Welcome to\n", "\n", "███████╗██╗███████╗████████╗██╗ ██╗ ██████╗ ███╗ ██╗███████╗\n", "██╔════╝██║██╔════╝╚══██╔══╝╚██╗ ██╔╝██╔═══██╗████╗ ██║██╔════╝\n", "█████╗ ██║█████╗ ██║ ╚████╔╝ ██║ ██║██╔██╗ ██║█████╗\n", "██╔══╝ ██║██╔══╝ ██║ ╚██╔╝ ██║ ██║██║╚██╗██║██╔══╝\n", "██║ ██║██║ ██║ ██║ ╚██████╔╝██║ ╚████║███████╗\n", "╚═╝ ╚═╝╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═══╝╚══════╝ v1.4.0\n", "\n", "If you're finding FiftyOne helpful, here's how you can get involved:\n", "\n", "|\n", "| ⭐⭐⭐ Give the project a star on GitHub ⭐⭐⭐\n", "| https://github.com/voxel51/fiftyone\n", "|\n", "| 🚀🚀🚀 Join the FiftyOne Discord community 🚀🚀🚀\n", "| https://community.voxel51.com/\n", "|\n", "\n" ] } ], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dataset = foz.load_zoo_dataset(\"cifar10\", split=\"test\")\n", "\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |█████████████████████| 5/5 [106.4ms elapsed, 0s remaining, 47.0 samples/s] \n", " precision recall f1-score support\n", "\n", " airplane 0.00 0.00 0.00 1\n", " bird 0.00 0.00 0.00 0\n", " cat 1.00 1.00 1.00 1\n", " frog 1.00 1.00 1.00 1\n", " ship 1.00 1.00 1.00 2\n", "\n", " accuracy 0.80 5\n", " macro avg 0.60 0.60 0.60 5\n", "weighted avg 0.80 0.80 0.80 5\n", "\n" ] } ], "source": [ "import fiftyone.zoo as foz\n", "\n", "classes = [\"horse\", \"truck\", \"deer\", \"automobile\", \"bird\", \"ship\", \"cat\", \"dog\", \"frog\", \"airplane\"]\n", "\n", "clip = foz.load_zoo_model(\n", " \"clip-vit-base32-torch\",\n", " classes=classes,\n", ")\n", "\n", "first_5_samples = dataset.limit(5)\n", "\n", "first_5_samples.apply_model(clip, label_field=\"clip\")\n", "\n", "results = first_5_samples.evaluate_classifications(\"clip\")\n", "\n", "# Print the same report to see the difference\n", "results.print_report()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate for Segmentation\n", "\n", "The last basic form of evaluation we will cover is evaluating segmentations! \n", "\n", "There are two popular forms of segmentations, Instance Segmentation and Semantic Segmentation. In FiftyOne, instance segmentation is stored in `fo.Detections` and semantic segmentations are stored in `fo.Segmentations`. We will cover how to evaluate both.\n", "\n", "If you need a refresher on how to work with segmentation datasets, head over to Getting Started with Segmentations! \n", "\n", "Once your dataset is prepped and ready with ground_truth and predicted segmentations, you can start evaluation!\n", "\n", "### Instance Segmentation Example" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading split 'validation' to '/Users/dangural/fiftyone/coco-2017/validation' if necessary\n", "Found annotations at '/Users/dangural/fiftyone/coco-2017/raw/instances_val2017.json'\n", "Sufficient images already downloaded\n", "Existing download of split 'validation' is sufficient\n", "Loading existing dataset 'coco-2017-validation-25'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dataset = foz.load_zoo_dataset(\n", " \"coco-2017\",\n", " split=\"validation\",\n", " label_types=[\"segmentations\"],\n", " classes=[\"cat\", \"dog\"],\n", " max_samples=25,\n", ")\n", "\n", "session.dataset = dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading https://github.com/ultralytics/assets/releases/download/v8.3.0/yolo11s-seg.pt to 'yolo11s-seg.pt'...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 19.7M/19.7M [00:02<00:00, 7.54MB/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ " 100% |███████████████████| 25/25 [3.2s elapsed, 0s remaining, 8.2 samples/s] \n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from ultralytics import YOLO\n", "\n", "# Use the YOLO model to add some predictions to the dataset\n", "model = YOLO(\"yolo11s-seg.pt\")\n", "\n", "dataset.apply_model(model, label_field=\"instances\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our model and predictions loaded, lets run evaluation with [detection.evaluate_detection()](https://docs.voxel51.com/api/fiftyone.utils.eval.detection.html)." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |███████████████████| 25/25 [240.5ms elapsed, 0s remaining, 104.0 samples/s] \n", "Performing IoU sweep...\n", " 100% |███████████████████| 25/25 [154.1ms elapsed, 0s remaining, 162.3 samples/s] \n", " precision recall f1-score support\n", "\n", " bed 1.00 0.50 0.67 4\n", " bench 0.50 1.00 0.67 1\n", " bicycle 0.67 0.67 0.67 3\n", " bird 0.00 0.00 0.00 1\n", " book 0.67 1.00 0.80 2\n", " bottle 1.00 1.00 1.00 2\n", " bowl 1.00 1.00 1.00 1\n", " car 0.50 1.00 0.67 4\n", " cat 1.00 1.00 1.00 18\n", " cell phone 0.00 0.00 0.00 0\n", " chair 0.00 0.00 0.00 1\n", " couch 0.50 0.50 0.50 2\n", " cup 1.00 1.00 1.00 1\n", "dining table 0.00 0.00 0.00 1\n", " dog 0.92 0.79 0.85 14\n", " frisbee 1.00 1.00 1.00 3\n", " handbag 0.00 0.00 0.00 1\n", " horse 1.00 1.00 1.00 1\n", " keyboard 0.00 0.00 0.00 1\n", " kite 1.00 0.50 0.67 2\n", " laptop 0.50 1.00 0.67 2\n", " motorcycle 1.00 1.00 1.00 1\n", " mouse 1.00 1.00 1.00 1\n", " orange 0.00 0.00 0.00 3\n", " person 0.71 0.45 0.56 22\n", "potted plant 1.00 1.00 1.00 1\n", "refrigerator 1.00 1.00 1.00 1\n", " remote 1.00 1.00 1.00 2\n", " sink 1.00 1.00 1.00 1\n", " truck 0.00 0.00 0.00 2\n", " tv 1.00 0.50 0.67 2\n", "\n", " micro avg 0.80 0.69 0.74 101\n", " macro avg 0.64 0.64 0.62 101\n", "weighted avg 0.77 0.69 0.71 101\n", "\n", "0.4298524090504289\n" ] } ], "source": [ "results = dataset.evaluate_detections(\n", " \"instances\",\n", " gt_field=\"ground_truth\",\n", " eval_key=\"inst_eval\",\n", " use_masks=True,\n", " compute_mAP=True,\n", ")\n", "\n", "# Print the same report to see the difference\n", "results.print_report()\n", "print(results.mAP())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Semantic Segmentation Evaluation\n", "\n", "Now let's look at an example of semantic segmentation. We can easily convert our instance segmentation dataset to semantic masks using [to_segmentation()](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Detection.to_segmentation). After we convert our `ground_truth` and `instances` fields, we can evaluate our new masks! Let's convert now:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset.compute_metadata()\n", "for sample in dataset:\n", "\n", " detections = sample[\"ground_truth\"]\n", " segmentation = detections.to_segmentation(\n", " frame_size=(sample.metadata.width, sample.metadata.height),\n", " mask_targets={1: \"cat\", 2: \"dog\"},\n", " )\n", " sample[\"gt_semantic\"] = segmentation\n", "\n", " detections = sample[\"instances\"]\n", " segmentation = detections.to_segmentation(\n", " frame_size=(sample.metadata.width, sample.metadata.height),\n", " mask_targets={1: \"cat\", 2: \"dog\"},\n", " )\n", " sample[\"pred_semantic\"] = segmentation\n", " sample.save()\n", "\n", "session.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we can evaluate our semantic segmentations with `dataset.evaluate_segmentations()`:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Computing possible mask values...\n", " 100% |███████████████████| 25/25 [274.0ms elapsed, 0s remaining, 91.2 samples/s] \n", "Evaluating segmentations...\n", " 100% |███████████████████| 25/25 [207.0ms elapsed, 0s remaining, 120.7 samples/s] \n", " precision recall f1-score support\n", "\n", " 1 0.90 0.96 0.93 1088297.0\n", " 2 0.91 0.79 0.85 330667.0\n", "\n", " micro avg 0.90 0.92 0.91 1418964.0\n", " macro avg 0.91 0.88 0.89 1418964.0\n", "weighted avg 0.90 0.92 0.91 1418964.0\n", "\n" ] } ], "source": [ "results = dataset.evaluate_segmentations(\"gt_semantic\", \"pred_semantic\", eval_key=\"seg_eval\")\n", "results.print_report()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "This covers the basic of model eval in FiftyOne. In the next step, we will learn how to dive even deeper with the [Model Evaluation Panel](https://docs.voxel51.com/plugins/api/plugins.panels.model_evaluation.html), a interactive tool that allows you find exactly where your model is doing the best and the worst all in the FiftyOne app. Learn more by continuing!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.23" } }, "nbformat": 4, "nbformat_minor": 2 }