{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "c2NwQQQ1aIJ5",
    "tags": []
   },
   "source": [
    "# Using Image Embeddings\n",
    "\n",
    "FiftyOne provides a powerful [embeddings visualization](https://voxel51.com/docs/fiftyone/user_guide/brain.html#visualizing-embeddings) capability that you can use to generate low-dimensional representations of the samples and objects in your datasets.\n",
    "\n",
    "This notebook highlights several applications of visualizing image embeddings, with the goal of motivating some of the many possible workflows that you can perform.\n",
    "\n",
    "Specifically, we'll cover the following concepts:\n",
    "\n",
    "- Loading datasets from the [FiftyOne Dataset Zoo](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo_datasets.html)\n",
    "- Using [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) to generate 2D representations of images\n",
    "- Providing custom embeddings to [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization)\n",
    "- Visualizing embeddings via [interactive plots](https://voxel51.com/docs/fiftyone/user_guide/plots.html) connected to the [FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n",
    "\n",
    "And we'll demonstrate how to use embeddings to:\n",
    "\n",
    "- Identify anomolous/incorrect image labels\n",
    "- Find examples of scenarios of interest\n",
    "- Pre-annotate unlabeled data for training\n",
    "\n",
    "**So, what's the takeaway?**\n",
    "\n",
    "Combing through individual images in a dataset and staring at aggregate performance metrics trying to figure out how to improve the performance of a model is an ineffective and time-consuming process. Visualizing your dataset in a low-dimensional embedding space is a powerful workflow that can reveal patterns and clusters in your data that can answer important questions about the critical failure modes of your model and how to augment your dataset to address these failures.\n",
    "\n",
    "Using the FiftyOne Brain's [embeddings visualization](https://voxel51.com/docs/fiftyone/user_guide/brain.html#visualizing-embeddings) capability on your ML projects can help you uncover hidden patterns in your data and take action to improve the quality of your datasets and models."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![embeddings-sizzle](./images/image_embeddings_zero_cluster.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "LJgEqAq_NrkI"
   },
   "source": [
    "## Setup\n",
    "\n",
    "If you haven’t already, install FiftyOne:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install fiftyone"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this tutorial, we'll use some PyTorch models to generate embeddings, and we'll use the (default) [UMAP method](https://github.com/lmcinnes/umap) to generate embeddings, so we'll need to install the corresponding packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "executionInfo": {
     "elapsed": 3244,
     "status": "ok",
     "timestamp": 1612369104036,
     "user": {
      "displayName": "Brian Moore",
      "photoUrl": "https://lh3.googleusercontent.com/a-/AOh14GjTqODKAkDmvfsMo0GDgqHOiMk5Dvk772GPdYuW=s64",
      "userId": "06264691545006004774"
     },
     "user_tz": 300
    },
    "id": "7kob347uK5ET",
    "outputId": "fba5f2e4-e46a-473b-efda-37d0c6d2e7f8"
   },
   "outputs": [],
   "source": [
    "!pip install torch torchvision umap-learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial will demonstrate the powerful [interactive plotting](https://voxel51.com/docs/fiftyone/user_guide/plots.html) capabilities of FiftyOne. In the FiftyOne App, you can bidirectionally interact with plots to identify interesting subsets of your data and take action on them!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "hzGW6D0-NwBH"
   },
   "source": [
    "## Part I: MNIST\n",
    "\n",
    "In this section, we'll be working with the [MNIST dataset](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html?highlight=mnist#dataset-zoo-mnist) from the FiftyOne Dataset Zoo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "id": "GnzA9md8NF59"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Split 'train' already downloaded\n",
      "Split 'test' already downloaded\n",
      "Loading existing dataset 'mnist'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "import fiftyone.zoo as foz\n",
    "\n",
    "dataset = foz.load_zoo_dataset(\"mnist\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To start, we'll just use the $10,000$ test images from the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_split = dataset.match_tags(\"test\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:     mnist\n",
      "Media type:  image\n",
      "Num samples: 10000\n",
      "Sample fields:\n",
      "    id:           fiftyone.core.fields.ObjectIdField\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "View stages:\n",
      "    1. MatchTags(tags=['test'], bool=True, all=False)\n"
     ]
    }
   ],
   "source": [
    "print(test_split)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing image embeddings"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Typically when generating low-dimensional representations of image datasets, one will use a deep model to generate embeddings for each image, say 1024 or 2028 dimensional, that are then passed to a dimensionality reduction method like [UMAP](https://github.com/lmcinnes/umap) or [t-SNE](https://lvdmaaten.github.io/tsne) to generate the 2D or 3D representation that is visualized.\n",
    "\n",
    "Such an intermediate embedding step is necessary when the images in the dataset have different sizes, or are too large for [UMAP](https://github.com/lmcinnes/umap) or [t-SNE](https://lvdmaaten.github.io/tsne) to directly process, or the dataset is too complex and a model trained to recognize the concepts of interest is required in order to produce interpretable embeddings.\n",
    "\n",
    "However, for a relatively small and fixed size dataset such as MNIST, we can pass the images themselves to the dimensionality reduction method.\n",
    "\n",
    "Let's use the [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) method to generate our first representation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating visualization...\n",
      "UMAP(random_state=51, verbose=True)\n",
      "Wed Mar 27 11:27:43 2024 Construct fuzzy simplicial set\n",
      "Wed Mar 27 11:27:43 2024 Finding Nearest Neighbors\n",
      "Wed Mar 27 11:27:43 2024 Building RP forest with 10 trees\n",
      "Wed Mar 27 11:27:43 2024 NN descent for 13 iterations\n",
      "\t 1  /  13\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/homebrew/Caskroom/miniforge/base/envs/fo-dev/lib/python3.9/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.\n",
      "  warn(f\"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.\")\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\t 2  /  13\n",
      "\t 3  /  13\n",
      "\t 4  /  13\n",
      "\tStopping threshold met -- exiting after 4 iterations\n",
      "Wed Mar 27 11:27:43 2024 Finished Nearest Neighbor Search\n",
      "Wed Mar 27 11:27:43 2024 Construct embedding\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "22824118c9514d918cb3811b311a7be1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Epochs completed:   0%|            0/500 [00:00]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\tcompleted  0  /  500 epochs\n",
      "\tcompleted  50  /  500 epochs\n",
      "\tcompleted  100  /  500 epochs\n",
      "\tcompleted  150  /  500 epochs\n",
      "\tcompleted  200  /  500 epochs\n",
      "\tcompleted  250  /  500 epochs\n",
      "\tcompleted  300  /  500 epochs\n",
      "\tcompleted  350  /  500 epochs\n",
      "\tcompleted  400  /  500 epochs\n",
      "\tcompleted  450  /  500 epochs\n",
      "Wed Mar 27 11:27:56 2024 Finished embedding\n"
     ]
    }
   ],
   "source": [
    "import cv2\n",
    "import numpy as np\n",
    "\n",
    "import fiftyone.brain as fob\n",
    "\n",
    "# Construct a ``num_samples x num_pixels`` array of images\n",
    "embeddings = np.array([\n",
    "    cv2.imread(f, cv2.IMREAD_UNCHANGED).ravel()\n",
    "    for f in test_split.values(\"filepath\")\n",
    "])\n",
    "\n",
    "# Compute 2D representation\n",
    "results = fob.compute_visualization(\n",
    "    test_split,\n",
    "    embeddings=embeddings,\n",
    "    num_dims=2,\n",
    "    method=\"umap\",\n",
    "    brain_key=\"mnist_test\",\n",
    "    verbose=True,\n",
    "    seed=51,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The method returned a `results` object with a `points` attribute that contains a `10000 x 2` array of 2D embeddings for our samples that we'll visualize next."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'fiftyone.brain.visualization.VisualizationResults'>\n",
      "(10000, 2)\n"
     ]
    }
   ],
   "source": [
    "print(type(results))\n",
    "print(results.points.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you ever want to access this results object again from the dataset, you can do so with the dataset's `load_brain_results()` method, passing the `brain_key` that you used to store the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<fiftyone.brain.visualization.VisualizationResults at 0x2c8918c40>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset.load_brain_results(\"mnist_test\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization parameters\n",
    "\n",
    "There are two primary components to an embedding visualization: the method used to generate the embeddings, and the dimensionality reduction method used to compute a low-dimensional representation of the embeddings.\n",
    "\n",
    "The ``embeddings`` and ``model`` parameters of\n",
    "[compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) support a variety of ways to generate embeddings for your data:\n",
    "\n",
    "- Provide nothing, in which case a default general-purpose model is used to embed your data\n",
    "- Provide a [Model instance](https://voxel51.com/docs/fiftyone/api/fiftyone.core.models.html#fiftyone.core.models.Model) or the name of any model from the [FiftyOne Model Zoo](https://voxel51.com/docs/fiftyone/user_guide/model_zoo/index.html) that supports embeddings\n",
    "- Compute your own embeddings and provide them as an array\n",
    "- Specify the name of a field of your dataset in which embeddings are stored\n",
    "\n",
    "The ``method`` parameter of [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) allows you to specify the dimensionality reduction method to use. The supported methods are:\n",
    "\n",
    "- `\"umap\"` **(default)**: Uniform Manifold Approximation and Projection ([UMAP](https://github.com/lmcinnes/umap))\n",
    "- `\"t-sne\"`: t-distributed Stochastic Neighbor Embedding ([t-SNE](https://lvdmaaten.github.io/tsne))\n",
    "- `\"pca\"`: Principal Component Analysis ([PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizing embeddings\n",
    "\n",
    "Now we're ready to use [results.visualize()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.visualization.html#fiftyone.brain.visualization.VisualizationResults.visualize) to visualize our representation.\n",
    "\n",
    "Although we could use this method in isolation, the real power of FiftyOne comes when you [attach plots to the FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/plots.html#attaching-plots), so that when points of interest are selected in the plot, the corresponding samples are automatically selected in the App, and vice versa.\n",
    "\n",
    "So, let's open the test split in the App:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Launch App instance\n",
    "session = fo.launch_app(test_split)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![MNIST test split](images/image_embeddings_test_split.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's visualize the results in the [Embeddings Panel](https://docs.voxel51.com/user_guide/app.html#embeddings-panel) in the FiftyOne App, and color by ground truth label:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![MNIST embeddings](images/image_embeddings_panel.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the scatterpoints are naturally clustered by color (ground truth label), which means that [UMAP](https://github.com/lmcinnes/umap) was able to capture the visual differences between the digits!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scatterplot controls\n",
    "\n",
    "Now it's time to explore the scatterplot interactively. In the top of the panel, you will see controls that allow you to do the following:\n",
    "- The `Zoom`, `Pan`, `Zoom In`, `Zoom Out`, `Autoscale`, and `Reset axes` modes can be used to manipulate your current view\n",
    "- The `Box Select` and `Lasso Select` modes enable you to select points in the plot\n",
    "\n",
    "In `Box Select` or `Lasso Select` mode, drawing on the plot will select the points within the region you define. In addition, note that you can:\n",
    "- Hold shift to add a region to your selection\n",
    "- Double-click anywhere on the plot to deselect all points\n",
    "\n",
    "Since we chose a categorical label for the points, each ground truth label is given its own trace in the legend on the righthand side. You can interact with this legend to show/hide data:\n",
    "- Single-click on a trace to show/hide all points in that trace\n",
    "- Double-click on a trace to show/hide all *other* traces"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exploring the embeddings\n",
    "\n",
    "Here's a couple suggestions for exploring the data:\n",
    "\n",
    "- Use `Lasso Select` to select the cluster of points corresponding to the `0` label. Note that the corresponding images loaded in the App are all images of zeros, as expected\n",
    "- Try selecting other clusters and viewing the images in the App\n",
    "- Click on the `0` label in the legend to hide the zeros. Now lasso the points from other traces that are in the `0` cluster. These are non-zero images in the dataset that are likley to be confused as zeros!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![likely zero confusions](images/image_embeddings_zero_cluster.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now try double-clicking on the `0` legend entry so that only the `0` points are visible in the plot. Then use the `Lasso Select` to select the `0` points that are far away from the main `0` cluster. These are zero images in the dataset that are likely to be confused as non-zeroes by a model!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![likely nonzero confusions](images/image_embeddings_nonzero_cluster.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Cool, right? It is clear that [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) has picked up on some interesting structure in the test split of the dataset."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pre-annotation of samples\n",
    "\n",
    "Now let's put the insights from the previous section to work.\n",
    "\n",
    "Imagine that you are [Yann LeCun](http://yann.lecun.com) or [Corinna Cortes](https://research.google/people/author121/), the creators of the MNIST dataset, and you have just invested the time to annotate the test split of the dataset by hand. Now, you have collected an additional 60K images of unlabeled digits that you want to use as a training split that you need to annotate.\n",
    "\n",
    "Let's see how [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) can be used to efficiently pre-annotate the train split with minimal effort.\n",
    "\n",
    "First, let's regenerate embeddings for all 70,000 images in the combined test and train splits:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Generating visualization...\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/homebrew/Caskroom/miniforge/base/envs/fo-dev/lib/python3.9/site-packages/numba/cpython/hashing.py:482: UserWarning: FNV hashing is not implemented in Numba. See PEP 456 https://www.python.org/dev/peps/pep-0456/ for rationale over not using FNV. Numba will continue to work, but hashes for built in types will be computed using siphash24. This will permit e.g. dictionaries to continue to behave as expected, however anything relying on the value of the hash opposed to hash as a derived property is likely to not work as expected.\n",
      "  warnings.warn(msg)\n",
      "/opt/homebrew/Caskroom/miniforge/base/envs/fo-dev/lib/python3.9/site-packages/umap/umap_.py:1943: UserWarning: n_jobs value -1 overridden to 1 by setting random_state. Use no seed for parallelism.\n",
      "  warn(f\"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.\")\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "UMAP(random_state=51, verbose=True)\n",
      "Wed Mar 27 11:37:27 2024 Construct fuzzy simplicial set\n",
      "Wed Mar 27 11:37:27 2024 Finding Nearest Neighbors\n",
      "Wed Mar 27 11:37:27 2024 Building RP forest with 18 trees\n",
      "Wed Mar 27 11:37:30 2024 NN descent for 16 iterations\n",
      "\t 1  /  16\n",
      "\t 2  /  16\n",
      "\t 3  /  16\n",
      "\t 4  /  16\n",
      "\tStopping threshold met -- exiting after 4 iterations\n",
      "Wed Mar 27 11:37:38 2024 Finished Nearest Neighbor Search\n",
      "Wed Mar 27 11:37:40 2024 Construct embedding\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "cef3d23476c84f6db42f2a572277b49c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Epochs completed:   0%|            0/200 [00:00]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\tcompleted  0  /  200 epochs\n",
      "\tcompleted  20  /  200 epochs\n",
      "\tcompleted  40  /  200 epochs\n",
      "\tcompleted  60  /  200 epochs\n",
      "\tcompleted  80  /  200 epochs\n",
      "\tcompleted  100  /  200 epochs\n",
      "\tcompleted  120  /  200 epochs\n",
      "\tcompleted  140  /  200 epochs\n",
      "\tcompleted  160  /  200 epochs\n",
      "\tcompleted  180  /  200 epochs\n",
      "Wed Mar 27 11:38:12 2024 Finished embedding\n"
     ]
    }
   ],
   "source": [
    "# Construct a ``num_samples x num_pixels`` array of images\n",
    "embeddings = np.array([\n",
    "    cv2.imread(f, cv2.IMREAD_UNCHANGED).ravel()\n",
    "    for f in dataset.values(\"filepath\")\n",
    "])\n",
    "\n",
    "# Compute 2D representation\n",
    "results = fob.compute_visualization(\n",
    "    dataset,\n",
    "    embeddings=embeddings,\n",
    "    num_dims=2,\n",
    "    method=\"umap\",\n",
    "    brain_key=\"mnist\",\n",
    "    verbose=True,\n",
    "    seed=51,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Of course, our dataset already has ground truth labels for the train split, but let's pretend that's not the case and generate an array of `labels` for our visualization that colors the test split by its ground truth labels and marks all images in the train split as `unlabeled`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "from fiftyone import ViewField as F\n",
    "\n",
    "# Label `test` split samples by their ground truth label\n",
    "# Mark all samples in `train` split as `unlabeled`\n",
    "expr = F(\"$tags\").contains(\"test\").if_else(F(\"label\"), \"unlabeled\")\n",
    "labels = dataset.values(\"ground_truth\", expr=expr)\n",
    "# Set the `ground_truth.label` field to the computed labels\n",
    "dataset.set_values(\"ground_truth.label\", labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Launch a new App instance\n",
    "session = fo.launch_app(dataset, auto=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Unlabeled Images](./images/image_embeddings_unlabeled.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What's so great about this plot? Notice that the unlabeled points in our (to-be-labeled) train split are heavily clustered into the same clusters as the labeled points from the test split!\n",
    "\n",
    "Follow the simple workflow below to use FiftyOne's [in-App tagging feature](https://voxel51.com/docs/fiftyone/user_guide/app.html#tags-and-tagging) to efficiently assign proposed labels for all samples in the `train` split:\n",
    "1. Click on the `unlabeled` trace to hide it, and then use the predominant color of the labeled points in the cluster below to determine the identity of the cluster\n",
    "2. Now double-click on the `unlabeled` trace so that *only* the unlabeled points are visible in the plot\n",
    "3. Use the `Lasso Select` tool to select the `unlabeled` points in the cluster whose identity you just deduced\n",
    "4. Over in the App, click on the `tag` icon and assign a tag to the samples \n",
    "5. Repeat steps 1-4 for the other clusters\n",
    "\n",
    "Congratulations, you just pre-annotated ~60,000 training images! "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![pre-annotation](images/image_embeddings_prelabel.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can easily print some statistics about the sample tags that you created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'8': 5577, '6': 6006, '3': 6104, '7': 6361, '9': 6036, '5': 5389, 'train': 60000, '0': 5981, '4': 5737, '2': 5811, '1': 6937}\n"
     ]
    }
   ],
   "source": [
    "# The train split that we pre-annotated\n",
    "train_split = dataset.match_tags(\"train\")\n",
    "\n",
    "# Print state about labels that were added\n",
    "print(train_split.count_sample_tags())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The snippet below converts the sample tags into [Classification labels](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html#classification) in a new ``hypothesis`` field of the dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " 100% |█████████████| 60000/60000 [2.6m elapsed, 0s remaining, 353.0 samples/s]      \n",
      "{'0': 5981, '9': 6036, '8': 5572, '7': 6361, '6': 6006, '5': 5388, None: 69, '3': 6104, '1': 6937, '4': 5735, '2': 5811}\n"
     ]
    }
   ],
   "source": [
    "# Add a new Classification field called `hypothesis` to store our guesses\n",
    "with fo.ProgressBar() as pb:\n",
    "    for sample in pb(train_split):\n",
    "        labels = [t for t in sample.tags if t != \"train\"]\n",
    "        if labels:\n",
    "            sample[\"hypothesis\"] = fo.Classification(label=labels[0])\n",
    "            sample.save()\n",
    "\n",
    "# Print stats about the labels we created\n",
    "print(train_split.count_values(\"hypothesis.label\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the example above, notice that 69 samples in the train split were not given a hypothesis (i.e., they were not included in the cluster-based tagging procedure). If you would like to retrieve them and manually assign tags, that is easy:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Dataset:        mnist\n",
      "Media type:     image\n",
      "Num samples:    69\n",
      "Tags:           ['train']\n",
      "Sample fields:\n",
      "    filepath:     fiftyone.core.fields.StringField\n",
      "    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n",
      "    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n",
      "    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "    hypothesis:   fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n",
      "View stages:\n",
      "    1. MatchTags(tags=['train'])\n",
      "    2. Exists(field='hypothesis.label', bool=False)\n"
     ]
    }
   ],
   "source": [
    "no_hypothesis = train_split.exists(\"hypothesis.label\", False)\n",
    "print(no_hypothesis)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Depending on your application, you may be able to start training using the `hypothesis` labels directly by [exporting the dataset](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html) as, say, a classification directory tree structure:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Export `hypothesis` labels as a classification directory tree format\n",
    "# `exists()` ensures that we only export samples with a hypothesis\n",
    "train_split.exists(\"hypothesis.label\").export(\n",
    "    export_dir=\"/path/for/dataset\",\n",
    "    dataset_type=fo.types.ImageClassificationDirectoryTree,\n",
    "    label_field=\"hypothesis\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or perhaps you'll want to export the hypothesized labels to your annotation provider to fine-tune the labels for training.\n",
    "\n",
    "FiftyOne provides integrations for importing/exporting data to [Scale](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.scale.html), [Labelbox](https://voxel51.com/docs/fiftyone/api/fiftyone.utils.labelbox.html), [Label Studio](https://docs.voxel51.com/integrations/labelstudio.html), [V7](https://docs.voxel51.com/integrations/v7.html), and [CVAT](https://voxel51.com/docs/fiftyone/user_guide/export_datasets.html#cvatimagedataset). Check out [this blog post](https://towardsdatascience.com/managing-annotation-mistakes-with-fiftyone-and-labelbox-fc6e87b51102) for more information about FiftyOne's Labelbox integration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading existing visualizations\n",
    "\n",
    "If you provide the `brain_key` argument to [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization), then the visualization results that you generate will be saved and you can recall them later.\n",
    "\n",
    "For example, we can recall the visualization results that we first computed on the test split:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['mnist', 'mnist_test']\n",
      "<class 'fiftyone.brain.visualization.VisualizationResults'>\n",
      "10000\n"
     ]
    }
   ],
   "source": [
    "# List brain runs saved on the dataset\n",
    "print(dataset.list_brain_runs())\n",
    "\n",
    "# Load the results for a brain run\n",
    "results = dataset.load_brain_results(\"mnist_test\")\n",
    "print(type(results))\n",
    "\n",
    "# Load the dataset view on which the results were computed\n",
    "results_view = dataset.load_brain_view(\"mnist_test\")\n",
    "print(len(results_view))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See [this docs page](https://voxel51.com/docs/fiftyone/user_guide/brain.html#managing-brain-runs) for more information about managing brain runs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Part II: BDD100K\n",
    "\n",
    "In this section, we'll visualize embeddings from a deep model on the [BDD100K dataset](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html?highlight=bdd100k#dataset-zoo-bdd100k) from the FiftyOne Dataset Zoo and demonstrate some of the insights that you can glean from them.\n",
    "\n",
    "If you want to follow along youself, you will need to register at https://bdd-data.berkeley.edu in order to get links to download the data. See the [zoo docs](https://voxel51.com/docs/fiftyone/user_guide/dataset_zoo/datasets.html?highlight=bdd100k#dataset-zoo-bdd100k) for details on loading the dataset.\n",
    "\n",
    "We'll be working with the validation split, which contains 10,000 labeled images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Split 'validation' already prepared\n",
      "Loading existing dataset 'bdd100k-validation'. To reload from disk, either delete the existing dataset or provide a custom `dataset_name` to use\n"
     ]
    }
   ],
   "source": [
    "import fiftyone as fo\n",
    "import fiftyone.zoo as foz\n",
    "\n",
    "# The path to the source files that you manually downloaded\n",
    "source_dir = \"/path/to/dir-with-bdd100k-files\"\n",
    "\n",
    "dataset = foz.load_zoo_dataset(\"bdd100k\", split=\"validation\", source_dir=source_dir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = fo.launch_app(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![BDD100K validation split](images/image_embeddings_bdd100k.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing image embeddings\n",
    "\n",
    "The BDD100K images are 1280 x 720 pixels, so we'll use a deep model to generate intermediate embeddings.\n",
    "\n",
    "Using a deep model not only enables us to visualize larger datasets or ones with different image sizes, it also enables us to key in on different features of the dataset, depending on the model that we use to generate embeddings. By default, [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization) will generate embeddings using a general-purpose model that works well for datasets of any kind.\n",
    "\n",
    "You can run the cell below to generate a 2D representation for the BDD100K validation split using FiftyOne's default model. You will likely want to run this on a machine with GPU, as this requires running inference on 10,000 images:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import fiftyone.brain as fob\n",
    "\n",
    "# Compute 2D representation\n",
    "# This will compute deep embeddings for all 10,000 images\n",
    "results = fob.compute_visualization(\n",
    "    dataset,\n",
    "    num_dims=2,\n",
    "    brain_key=\"img_viz\",\n",
    "    verbose=True,\n",
    "    seed=51,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you can provide your own custom embeddings via the `embeddings` parameter of [compute_visualization()](https://voxel51.com/docs/fiftyone/api/fiftyone.brain.html#fiftyone.brain.compute_visualization). The [FiftyOne Model Zoo](https://voxel51.com/docs/fiftyone/user_guide/model_zoo/index.html) provides dozens of models that expose embeddings that you can use for this purpose.\n",
    "\n",
    "For example, the cell below generates a visualization using pre-computed embeddings from a ResNet model from the zoo:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load a resnet from the model zoo\n",
    "model = foz.load_zoo_model(\"resnet50-imagenet-torch\")\n",
    "\n",
    "# Verify that the model exposes embeddings\n",
    "print(model.has_embeddings)\n",
    "# True\n",
    "\n",
    "# Compute embeddings for each image\n",
    "embeddings = dataset.compute_embeddings(model)\n",
    "print(embeddings.shape)\n",
    "# 10000 x 2048\n",
    "\n",
    "# Compute 2D representation using pre-computed embeddings\n",
    "results = fob.compute_visualization(\n",
    "    dataset,\n",
    "    embeddings=embeddings,\n",
    "    num_dims=2,\n",
    "    brain_key=\"image_embeddings\",\n",
    "    verbose=True,\n",
    "    seed=51,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Either way, we're ready to visualize our representation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualizing embeddings\n",
    "\n",
    "As usual, we'll start by launching an App instance and opening the embeddings panel to interactively explore the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "session = fo.launch_app(dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The BDD100K dataset contains a handful of image-level labels, including scene type, time of day, and weather.\n",
    "\n",
    "Let's use the time of day labels to color our scatterplot:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![BDD100K embeddings](images/image_embeddings_bdd100k_colorby.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the visualization has a dominant bimodal distribution corresponding to whether the time of day is `daytime` or `night`!\n",
    "\n",
    "You can investigate each cluster in more detail using the `Zoom` and `Pan` controls of the plot's modebar."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Investigating outliers\n",
    "\n",
    "Outlier clusters in these plots often correspond to images with distinctive properties. Let's `Lasso Select` some outlier regions to investigate:\n",
    "\n",
    "- The first cluster contains images where the dashboard of the vehicle heavily occludes the view\n",
    "- The second cluster contains images in rainy scenes where the windshield has water droplets on it\n",
    "- The third cluster contains images where a cell phone is in the field of view"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![bdd100k-outliers](images/image_embeddings_outliers.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tagging label mistakes\n",
    "\n",
    "We can also use the scatterplot to tag samples whose `timeofday` labels are incorrect by taking the following actions:\n",
    "\n",
    "- Deselect all labels except `night` in the legend\n",
    "- Lasso select the nightime points (yellow) within the daytime cluster (red)\n",
    "- Click the tag icon in the upper-left corner of the sample grid in the App\n",
    "- Add an appropriate tag to the samples in the current view to mark the mistakes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![bdd100k-mistakes](images/image_embeddings_tag_mistakes.gif)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "In this tutorial, we showed how to use FiftyOne to compute embeddings for image datasets and demonstrated a few of their many uses, including:\n",
    "\n",
    "- Visualizing the hidden structure of image datasets\n",
    "- Pre-annotating unlabeled data for training\n",
    "- Identifying anomolous/incorrect image labels\n",
    "- Finding examples of scenarios of interest\n",
    "\n",
    "**So, what's the takeaway?**\n",
    "\n",
    "Combing through individual images in a dataset and staring at aggregate performance metrics trying to figure out how to improve the performance of a model is an ineffective and time-consuming process.\n",
    "\n",
    "Visualizing your dataset in a low-dimensional embedding space is a powerful workflow that can reveal patterns and clusters in your data that can answer important questions about the critical failure modes of your model. It can also help you automate workflows like finding label mistakes and efficiently pre-annotating data.\n",
    "\n",
    "FiftyOne also supports visualizing embeddings for object detections. Stay tuned for an upcoming tutorial on the topic!"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "authorship_tag": "ABX9TyMAfZvkbsTal90uy2ThxFem",
   "collapsed_sections": [],
   "name": "malaria-cell-images-analysis.ipynb",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.0"
  },
  "widgets": {
   "application/vnd.jupyter.widget-state+json": {
    "307700e692704d128c9000235a6f17c5": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "100%",
      "description_tooltip": null,
      "layout": "IPY_MODEL_411eb725c43c4adf9340a0a27ef061da",
      "max": 256198016,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_9e103a1d535a40fd8f417c8689884b8b",
      "value": 256198016
     }
    },
    "411eb725c43c4adf9340a0a27ef061da": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "451754fff86644b5912b33c58ac5981a": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "49ec2b75718b473fa00541d7fb50259a": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "53e8b0f53dbe492592b21c6c288ca914": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_95c0356316274d999df93162d27da01d",
       "IPY_MODEL_e4499c59e6f8453699ba1921a82dc780"
      ],
      "layout": "IPY_MODEL_49ec2b75718b473fa00541d7fb50259a"
     }
    },
    "702a1184a1314468ba935e167a863cf8": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "7281666eb2864344bdf7d8bf897cadaa": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "8547c80a5f2645fea1d61c9a6d8cec16": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "931041a5074542d192a1533f944ee789": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "DescriptionStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "DescriptionStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "description_width": ""
     }
    },
    "95c0356316274d999df93162d27da01d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "FloatProgressModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "FloatProgressModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "ProgressView",
      "bar_style": "success",
      "description": "100%",
      "description_tooltip": null,
      "layout": "IPY_MODEL_7281666eb2864344bdf7d8bf897cadaa",
      "max": 256198016,
      "min": 0,
      "orientation": "horizontal",
      "style": "IPY_MODEL_ed07f0250541454cbbd39c83b01aae6d",
      "value": 256198016
     }
    },
    "9e103a1d535a40fd8f417c8689884b8b": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": "initial"
     }
    },
    "c0b89280a59d4f31aec03b9361fcffdd": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_702a1184a1314468ba935e167a863cf8",
      "placeholder": "​",
      "style": "IPY_MODEL_8547c80a5f2645fea1d61c9a6d8cec16",
      "value": " 244M/244M [00:06&lt;00:00, 40.7MB/s]"
     }
    },
    "c6c98e77502045719bc02b0a116e01a5": {
     "model_module": "@jupyter-widgets/base",
     "model_name": "LayoutModel",
     "state": {
      "_model_module": "@jupyter-widgets/base",
      "_model_module_version": "1.2.0",
      "_model_name": "LayoutModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "LayoutView",
      "align_content": null,
      "align_items": null,
      "align_self": null,
      "border": null,
      "bottom": null,
      "display": null,
      "flex": null,
      "flex_flow": null,
      "grid_area": null,
      "grid_auto_columns": null,
      "grid_auto_flow": null,
      "grid_auto_rows": null,
      "grid_column": null,
      "grid_gap": null,
      "grid_row": null,
      "grid_template_areas": null,
      "grid_template_columns": null,
      "grid_template_rows": null,
      "height": null,
      "justify_content": null,
      "justify_items": null,
      "left": null,
      "margin": null,
      "max_height": null,
      "max_width": null,
      "min_height": null,
      "min_width": null,
      "object_fit": null,
      "object_position": null,
      "order": null,
      "overflow": null,
      "overflow_x": null,
      "overflow_y": null,
      "padding": null,
      "right": null,
      "top": null,
      "visibility": null,
      "width": null
     }
    },
    "dfe11b89f27a43ef9db67ccb96d3bad2": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HBoxModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HBoxModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HBoxView",
      "box_style": "",
      "children": [
       "IPY_MODEL_307700e692704d128c9000235a6f17c5",
       "IPY_MODEL_c0b89280a59d4f31aec03b9361fcffdd"
      ],
      "layout": "IPY_MODEL_c6c98e77502045719bc02b0a116e01a5"
     }
    },
    "e4499c59e6f8453699ba1921a82dc780": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "HTMLModel",
     "state": {
      "_dom_classes": [],
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "HTMLModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/controls",
      "_view_module_version": "1.5.0",
      "_view_name": "HTMLView",
      "description": "",
      "description_tooltip": null,
      "layout": "IPY_MODEL_451754fff86644b5912b33c58ac5981a",
      "placeholder": "​",
      "style": "IPY_MODEL_931041a5074542d192a1533f944ee789",
      "value": " 244M/244M [00:06&lt;00:00, 42.2MB/s]"
     }
    },
    "ed07f0250541454cbbd39c83b01aae6d": {
     "model_module": "@jupyter-widgets/controls",
     "model_name": "ProgressStyleModel",
     "state": {
      "_model_module": "@jupyter-widgets/controls",
      "_model_module_version": "1.5.0",
      "_model_name": "ProgressStyleModel",
      "_view_count": null,
      "_view_module": "@jupyter-widgets/base",
      "_view_module_version": "1.2.0",
      "_view_name": "StyleView",
      "bar_color": null,
      "description_width": "initial"
     }
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}