{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " Try in Google Colab\n", " \n", " \n", " \n", " \n", " Share via nbviewer\n", " \n", " \n", " \n", " \n", " View on GitHub\n", " \n", " \n", " \n", " \n", " Download notebook\n", " \n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Evaluating a Detection Model on the Open Images Dataset\n", "\n", "This tutorial demonstrates per-image evaluation of an object detection model on [the Open Images dataset](https://storage.googleapis.com/openimages/web/index.html)\n", "that generates:\n", "\n", "- true positives & false positives\n", "- per-class average precision (AP)\n", "- mean average precision (mAP)\n", "\n", "for each image and adds this information to each [Sample](https://voxel51.com/docs/fiftyone/api/fiftyone.core.sample.html#fiftyone.core.sample.Sample)\n", "in the [Dataset](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset).\n", "\n", "The steps are broken down as follows:\n", "\n", "1. [Requirements](#Requirements)\n", "2. [Download the test data and ground-truth labels](#Download-the-test-data-and-ground-truth-labels) (optional)\n", "3. [Generate predictions](#(optional)-Generate-predictions) (optional)\n", "4. [Load the data into FiftyOne](#Load-the-data-into-FiftyOne)\n", "5. [Prepare the ground-truth for evaluation](#Prepare-the-ground-truth-for-evaluation) (optional)\n", "6. [Evaluate on a per-image granularity](#Evaluate-on-a-per-image-granularity)\n", "7. [Explore](#Explore)\n", "\n", "Optional steps may not be necessary depending on if you have already downloaded the data or have your own model to evaluate.\n", "\n", "This tutorial evaluates a model on [Open Images V4](https://storage.googleapis.com/openimages/web/download_v4.html)\n", "however this code supports later versions of Open Images as well. If using a newer version just make sure to\n", "use the appropriate hierarchy file and class label map." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Quickstart: Interactive visualization in under 5 minutes\n", "\n", "The following steps demonstrate how to evaluate *your own model* on a per-image granularity using Tensorflow Object Detection API and then interactively visualize and explore true/false positive detections. If you would simply like to browse a subset of Open Images test set with evaluation on a pre-trained model, instead [download this dataset](https://voxel51.com/downloads/fiftyone/tutorials/open-images-v4-test-500.zip). You can get up and running with just 5 lines of code!\n", "\n", "Below is the Python code to load the dataset download and visualize it:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "from fiftyone import ViewField as F\n", "\n", "# Path to the unzipped dataset you downloaded\n", "DATASET_DIR = \"/path/to/open-images-v4-test-500\"\n", "\n", "# Load the dataset\n", "dataset = fo.Dataset.from_dir(DATASET_DIR, fo.types.FiftyOneDataset)\n", "\n", "# Open the dataset in the App\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections by confidence and filter the samples\n", "# to only those with at least one false positive\n", "high_conf_view = (\n", " dataset\n", " .filter_labels(\"true_positives\", F(\"confidence\") > 0.4)\n", " .filter_labels(\"false_positives\", F(\"confidence\") > 0.4)\n", " .match(F(\"false_positives.detections\").length() > 0)\n", " .sort_by(\"open_images_id\")\n", ")\n", "\n", "session.view = high_conf_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Requirements\n", "\n", "This workflow requires a few Python packages. First, if you haven't already, install FiftyOne:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then install the appropriate version of `tensorflow` depending on whether or not you have a GPU:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "!pip install tensorflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and install other requirements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install numpy pandas google-api-python-client" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download supporting scripts\n", "\n", "This notebook uses a collection of helper scripts and modules. If you downloaded this notebook from the [fiftyone-examples](https://github.com/voxel51/fiftyone-examples) repository, you will also need to download the rest of the `examples/open_images_evaluation/` sudirectory." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Download the test dataset\n", "\n", "All of the data (images, metadata and annotations) can be found on the\n", "[official Open Images website](https://storage.googleapis.com/openimages/web/download_v4.html).\n", "\n", "If you are using Open Images V4 you can use the following commands to download all the necessary files." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the data\n", "\n", "**WARNING** This is 36GB of data!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!aws s3 --no-sign-request sync s3://open-images-dataset/test open-images-dataset/test" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Download the labels and metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv\n", "wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-human-imagelabels-boxable.csv\n", "wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv\n", "wget https://storage.googleapis.com/openimages/2018_04/bbox_labels_600_hierarchy.json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## (optional) Generate predictions\n", "\n", "This section steps through generating predictions using a pre-trained model publically available\n", "on [Tensorflow Hub](https://www.tensorflow.org/hub).\n", "The exact model used can be modified simply by changing `MODEL_HANDLE` below.\n", "\n", "### Alternative 1: download pre-computed predictions\n", "\n", "If you would like to skip the step of generating predictions, simply download [this predictions file](https://voxel51.com/downloads/fiftyone/tutorials/google-faster_rcnn-openimages_v4-inception_resnet_v2_predictions_3081.csv).\n", "\n", "### Alternative 2: use your own model\n", "\n", "If you have your own model that you would like to evaluate, make sure the outputs are saved to `csv` in\n", "[Tensorflow Object Detection API](https://github.com/tensorflow/models/tree/master/research/object_detection) format.\n", "Output file structure must have a single header row followed by one row per detection as follows:\n", "\n", "```\n", "ImageID,LabelName,Score,XMin,XMax,YMin,YMax\n", "...,...,...,...,...,...,...\n", "...,...,...,...,...,...,...\n", "```\n", "\n", "Example output for two images with two detections each:\n", "\n", "```\n", "ImageID,LabelName,Score,XMin,XMax,YMin,YMax\n", "000026e7ee790996,/m/07j7r,0.1,0.071905,0.145346,0.206591,0.391306\n", "000026e7ee790996,/m/07j7r,0.2,0.439756,0.572466,0.264153,0.435122\n", "000062a39995e348,/m/015p6,0.4,0.205719,0.849912,0.154144,1.000000\n", "000062a39995e348,/m/05s2s,0.5,0.137133,0.377634,0.000000,0.884185\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generate predictions with a Tensorflow Hub pre-trained model\n", "\n", "To use a Tensorflow Hub model requires the following packages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install Pillow tensorflow-hub" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Populate the following environment variables and run the inference script.\n", "\n", "This script is resumable and saves after every 10 samples are processed by default. It does\n", "not process images in batches." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "IMAGES_DIR=/PATH/TO/IMAGES\n", "OUTPUT_DIR=/PATH/TO/PREDICTIONS\n", "\n", "MODEL_HANDLE=\"https://tfhub.dev/google/faster_rcnn/openimages_v4/inception_resnet_v2/1\"\n", "# MODEL_HANDLE=\"https://tfhub.dev/google/openimages_v4/ssd/mobilenet_v2/1\"\n", "\n", "python open_images_eval/scripts/inference.py \\\n", " --output_dir ${OUTPUT_DIR} \\\n", " --output_format tf_object_detection_api \\\n", " ${IMAGES_DIR} ${MODEL_HANDLE}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load the data into FiftyOne" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a persistent FiftyOne dataset\n", "\n", "The following script loads the data into a FiftyOne [Dataset](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset).\n", "This process copies all labels and metadata to a non-relational database for rapid access and powerful querying, but only paths to the images are stored\n", "in the database, not copies of the images themselves!\n", "\n", "The dataset is set to [persistent](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#dataset-persistence)\n", "so that it remains in the database and can be loaded in a new python process." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "DATASET_NAME=\"open-images-v4-test\"\n", "IMAGES_DIR=/PATH/TO/IMAGES\n", "BOUNDING_BOXES_EXPANDED=/PATH/TO/test-annotations-bbox_expanded.csv\n", "IMAGE_LABELS_EXPANDED=/PATH/TO/test-annotations-human-imagelabels-boxable_expanded.csv\n", "PREDICTIONS_PATH=/PATH/TO/PREDICTIONS.csv\n", "CLASS_DESCRIPTIONS=/PATH/TO/class-descriptions-boxable.csv\n", "\n", "python open_images_eval/scripts/load_data.py \\\n", " --bounding_boxes_path ${BOUNDING_BOXES_EXPANDED} \\\n", " --image_labels_path ${IMAGE_LABELS_EXPANDED} \\\n", " --predictions_path ${PREDICTIONS_PATH} \\\n", " --prediction_field_name \"faster_rcnn\" \\\n", " --class_descriptions_path ${CLASS_DESCRIPTIONS} \\\n", " --load_images_with_preds \\\n", " --max_num_images 1000 \\\n", " ${DATASET_NAME} ${IMAGES_DIR}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To skip uploading predictions use the following code block. You can always add\n", "predictions later using the function \n", "`open_images_eval.error_analysis.load_data.add_open_images_predictions()`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "DATASET_NAME=\"open-images-v4-test\"\n", "IMAGES_DIR=/PATH/TO/IMAGES\n", "BOUNDING_BOXES_EXPANDED=/PATH/TO/test-annotations-bbox_expanded.csv\n", "IMAGE_LABELS_EXPANDED=/PATH/TO/test-annotations-human-imagelabels-boxable_expanded.csv\n", "CLASS_DESCRIPTIONS=/PATH/TO/class-descriptions-boxable.csv\n", "\n", "python open_images_eval/scripts/load_data.py \\\n", " --bounding_boxes_path ${BOUNDING_BOXES_EXPANDED} \\\n", " --image_labels_path ${IMAGE_LABELS_EXPANDED} \\\n", " --class_descriptions_path ${CLASS_DESCRIPTIONS} \\\n", " --max_num_images 1000 \\\n", " ${DATASET_NAME} ${IMAGES_DIR}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualize the data\n", "\n", "Now that we have a Fiftyone `Dataset`, let's visualize the data before evaluating it:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "from fiftyone import ViewField as F\n", "\n", "dataset = fo.load_dataset(\"open-images-v4-test\")\n", "\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections by confidence\n", "session.view = dataset.filter_labels(\"faster_rcnn\", F(\"confidence\") > 0.4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Prepare the ground-truth for evaluation\n", "\n", "Open Images requires \"expanding the hierarchy\" of the ground-truth labels, for\n", "evaluation. The labels you downloaded only contain leaf node labels. So, for\n", "example, for a bounding box labeled `Jaguar`, the hierarchy expansion would add\n", "duplicate boxes with labels `Carnivore`, `Mammal` and `Animal`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Install TF Object Detection API\n", "\n", "The first step is to install the Tensorflow Object Detection API. Instructions\n", "on how to do so can be found\n", "[here](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2.md)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create expanded hierarchy ground-truth labels\n", "\n", "The following commands are essentially copied from [this tutorial](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/challenge_evaluation.md)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "# TODO: modify these\n", "export TF_MODELS_RESEARCH=PATH/TO/TENSORFLOW/models/research/object_detection\n", "LABELS_DIR=PATH/TO/LABELS\n", "\n", "HIERARCHY_FILE=${LABELS_DIR}/bbox_labels_600_hierarchy.json\n", "BOUNDING_BOXES=${LABELS_DIR}/test-annotations-bbox\n", "IMAGE_LABELS=${LABELS_DIR}/test-annotations-human-imagelabels-boxable\n", "\n", "python ${TF_MODELS_RESEARCH}/object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \\\n", " --json_hierarchy_file=${HIERARCHY_FILE} \\\n", " --input_annotations=${BOUNDING_BOXES}.csv \\\n", " --output_annotations=${BOUNDING_BOXES}_expanded.csv \\\n", " --annotation_type=1\n", "\n", "python ${TF_MODELS_RESEARCH}/object_detection/dataset_tools/oid_hierarchical_labels_expansion.py \\\n", " --json_hierarchy_file=${HIERARCHY_FILE} \\\n", " --input_annotations=${IMAGE_LABELS}.csv \\\n", " --output_annotations=${IMAGE_LABELS}_expanded.csv \\\n", " --annotation_type=2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should now have two new files in `LABELS_DIR`:\n", "\n", "```\n", "test-annotations-bbox_expanded.csv\n", "test-annotations-human-imagelabels-boxable_expanded.csv\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Evaluate on a per-image granularity\n", "\n", "This next script evaluates each image indivually using some wrapper code around the TF Object Detection API\n", "evaluation code.\n", "\n", "### Running evaluation\n", "\n", "If you skipped [\"Prepare the ground-truth for evaluation\"](#Prepare-the-ground-truth-for-evaluation) be sure to export the `TF_MODELS_RESEARCH` environment variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "\n", "CLASS_LABEL_MAP=${TF_MODELS_RESEARCH}/object_detection/data/oid_v4_label_map.pbtxt\n", "\n", "python open_images_eval/scripts/evaluate_model.py \\\n", " --prediction_field_name \"faster_rcnn\" \\\n", " --iou_threshold 0.5 \\\n", " ${DATASET_NAME} ${CLASS_LABEL_MAP}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explore\n", "\n", "At last! We can now visualize the data. Use this snippet to launch the GUI app and start browsing." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "from fiftyone import ViewField as F\n", "\n", "dataset = fo.load_dataset(\"open-images-v4-test\")\n", "\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There so many possibilities as far as how to slice and dice this data. Let's start with high confidence predictions (`detection.confidence > 0.4`):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections for high confidence,\n", "# then filter the samples to only those with at least one false positive\n", "high_confidence_view = (\n", " dataset\n", " .filter_labels(\"faster_rcnn_TP\", F(\"confidence\") > 0.4)\n", " .filter_labels(\"faster_rcnn_FP\", F(\"confidence\") > 0.4)\n", " .match(F(\"faster_rcnn_FP.detections\").length() > 0)\n", " .sort_by(\"open_images_id\")\n", ")\n", "\n", "session.view = high_confidence_view" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "On the *very first image* we see a prediction that correctly boxes a bird but is mistakenly marked as a false positive!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are a few more ideas for how to view the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections for medium confidence,\n", "# then filter the samples to only those with at least one false positive\n", "medium_confidence_view = (\n", " dataset\n", " .filter_labels(\"true_positives\", (F(\"confidence\") > 0.2) & (F(\"confidence\") < 0.4))\n", " .filter_labels(\"false_positives\", (F(\"confidence\") > 0.2) & (F(\"confidence\") < 0.4))\n", " .match(F(\"false_positives.detections\").length() > 0)\n", " .sort_by(\"open_images_id\")\n", ")\n", "\n", "session.view = medium_confidence_view" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections for ground truth with `Bird` or `Human eye`,\n", "# then filter the samples to only those with at least one ground truth box\n", "bird_eye_view = (\n", " dataset\n", " .filter_labels(\"groundtruth_detections\", F(\"label\").is_in([\"Bird\", \"Human eye\"]))\n", " .match(F(\"groundtruth_detections.detections\").length() > 0)\n", " .sort_by(\"open_images_id\")\n", ")\n", "\n", "session.view = bird_eye_view" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Filter the visible detections for small bounding box area,\n", "# then filter the samples to only those with at least one false positive\n", "small_boxes_view = (\n", " dataset\n", " .filter_labels(\"false_positives\", bbox_area < 0.01)\n", " .filter_labels(\"true_positives\", bbox_area < 0.01)\n", " .match(F(\"false_positives.detections\").length() > 0)\n", " .sort_by(\"open_images_id\")\n", ")\n", "\n", "session.view = small_boxes_view" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 4 }