<!-- Autogenerated by `scripts/make_examples.py` -->
<table align="left">
 <td>
 <a target="_blank" href="https://colab.research.google.com/github/voxel51/fiftyone-examples/blob/master/examples/zilliz_advent_of_code.ipynb">
 <img src="https://user-images.githubusercontent.com/25985824/104791629-6e618700-5769-11eb-857f-d176b37d2496.png" height="32" width="32">
 Try in Google Colab
 </a>
 </td>
 <td>
 <a target="_blank" href="https://nbviewer.jupyter.org/github/voxel51/fiftyone-examples/blob/master/examples/zilliz_advent_of_code.ipynb">
 <img src="https://user-images.githubusercontent.com/25985824/104791634-6efa1d80-5769-11eb-8a4c-71d6cb53ccf0.png" height="32" width="32">
 Share via nbviewer
 </a>
 </td>
 <td>
 <a target="_blank" href="https://github.com/voxel51/fiftyone-examples/blob/master/examples/zilliz_advent_of_code.ipynb">
 <img src="https://user-images.githubusercontent.com/25985824/104791633-6efa1d80-5769-11eb-8ee3-4b2123fe4b66.png" height="32" width="32">
 View on GitHub
 </a>
 </td>
 <td>
 <a href="https://github.com/voxel51/fiftyone-examples/raw/master/examples/zilliz_advent_of_code.ipynb" download>
 <img src="https://user-images.githubusercontent.com/25985824/104792428-60f9cc00-576c-11eb-95a4-5709d803023a.png" height="32" width="32">
 Download notebook
 </a>
 </td>
</table>


# FiftyOne <> Zilliz Advent of Open Source 2023!

Welcome to day 2 of Zilliz's [Advent of Code for Open Source](https://zilliz.com/blog/advent-of-code-for-open-source) 2023! Today we're going to be looking at the [FiftyOne](https://github.com/voxel51/fiftyone) library, which is a Python package for curation, visualization and analysis of machine learning datasets. It's a great tool for exploring datasets and debugging models, and it's also a great way to get started with machine learning if you're new to the field.

In this notebook, we'll show you how to load a dataset, visualize it, and augment the dataset with generative AI. Along the way, you'll get a whirlwind tour of some of FiftyOne's features, and you'll learn how to use it to explore your own datasets.

## Installation

For this walkthrough, we'll be using a few libraries in addition to FiftyOne:

- [torch](https://pytorch.org/) as our machine learning framework
- [diffusers](https://huggingface.co/docs/diffusers/index) from Hugging Face for generative AI
- [umap-learn](https://umap-learn.readthedocs.io/en/latest/) for dimensionality reduction

In [None]:
!pip install fiftyone torch diffusers==0.24.0 umap-learn

Now we're ready to get started!

## Loading a dataset

Let's import FiftyOne and some of its modules:

In [20]:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.brain as fob
from fiftyone import ViewField as F

The FiftyOne Zoo contains a collection of common [datasets](https://docs.voxel51.com/user_guide/dataset_zoo/index.html) (MNIST, CIFAR, COCO, ...) and [models](https://docs.voxel51.com/user_guide/model_zoo/index.html) (YOLO, CLIP, SAM, DINO, ...) that you can load with a single line of code.

💡 You can list all available datasets and models with `foz.list_zoo_datasets()` and `foz.list_zoo_models()`.

The [FiftyOne Brain](https://docs.voxel51.com/user_guide/brain.html) contains machine learning methods that you can apply to better understand your data. 

And the `ViewField` will make it easy for us to programmatically filter our dataset.

For this walkthrough, we'll be using a subset of the Caltech-101 dataset, which contains 101 categories of objects, with 40 to 800 images per category. First, we'll load the dataset:

In [None]:
caltech101 = foz.load_zoo_dataset("caltech101")

We can print out the dataset to see what it contains:

In [22]:
caltech101

Name: caltech101
Media type: image
Num samples: 9145
Persistent: False
Tags: []
Sample fields:
 id: fiftyone.core.fields.ObjectIdField
 filepath: fiftyone.core.fields.StringField
 tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
 metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
 ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)

We can launch the FiftyOne App to explore the dataset:

In [None]:
session = fo.launch_app(caltech101, auto=False) # Launches the App, which you can manually open in a browser tab at http://localhost:5151
## session = fo.launch_app(caltech101) # Launches the App in the cell output

## Filtering the dataset

While the App is running, you can click on any image to see its label and metadata. You can also filter the dataset by label, view images in a grid or list, and more. Here, we'll only be interested in a subset of the Caltech-101 dataset — namely the images falling into any of the following categories: `["ibis", "flamingo", "emu", "pigeon"]`. We can filter the dataset to only include these categories:

In [21]:
## create a `View` of the dataset
classes = ["ibis", "flamingo", "emu", "pigeon"]
bird_view = caltech101.match_labels(filter=F("label").is_in(classes))

We can print out this `DatasetView` to see that it contains only the images we're interested in, and we can set the view of our session to see the subset:

In [23]:
print(bird_view)
session.view = bird_view

Dataset: caltech101
Media type: image
Num samples: 245
Sample fields:
 id: fiftyone.core.fields.ObjectIdField
 filepath: fiftyone.core.fields.StringField
 tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
 metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
 ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)
View stages:
 1. MatchLabels(labels=None, ids=None, tags=None, filter={'$in': ['$$FIELD.label', [...]]}, fields=None, bool=True)


Here, we can see that this subset contains 245 images. We can also see that the filter we have applied is represented internally as a `ViewStage`. FiftyOne has a powerful query language that allows you to filter your dataset in many different ways. You can learn more about it in the [User Guide](https://docs.voxel51.com/user_guide/using_datasets.html#datasetviews), and in the [Views](https://docs.voxel51.com/cheat_sheets/views_cheat_sheet.html) and [Filtering](https://docs.voxel51.com/cheat_sheets/filtering_cheat_sheet.html) cheat sheets. You can also get a quick list of the view stage methods available to you by calling a `Dataset` or `DatasetView`'s `list_view_stages()` method.

Now that we've isolated the subset of interest, let's create a new dataset containing only the images in this subset using the `clone()` method:

In [29]:
dataset = bird_view.shuffle().clone(name="caltech-birds", persistent=True)

Here, we have cloned the view into a new dataset called `caltech-birds`, and we've made the dataset persistent so that changes we make from now on are persisted to disk. We also threw in a `shuffle()` call to shuffle the dataset, just for fun.

In [30]:
print(dataset)

Name: caltech-birds
Media type: image
Num samples: 245
Persistent: True
Tags: []
Sample fields:
 id: fiftyone.core.fields.ObjectIdField
 filepath: fiftyone.core.fields.StringField
 tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
 metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)
 ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)


## Multimodal Magic

### Semantic Search

Now that we have our dataset, we're ready to bring on the machine learning. Given that it is unlikely that a model has been trained on precisely the four classes we have selected, we'll use a multimodal foundation model, which will bring open-world knowledge to the table. In particular, we'll use the [CLIP](https://openai.com/blog/clip/) model from OpenAI. CLIP is integrated into the FiftyOne Model Zoo, so we can use it without installing anything else.

The first thing we'll do is create a multimodal similarity index on the dataset using the FiftyOne Brain's `compute_similarity()` method. We will specify the model by name. This will allow us to search by text or image!

💡 For datasets with a large number of samples, it is helpful to use a vector database. FiftyOne has native integrations with [Milvus](https://docs.voxel51.com/integrations/milvus.html), [Qdrant](https://docs.voxel51.com/integrations/qdrant.html), [Pinecone](https://docs.voxel51.com/integrations/pinecone.html), and [LanceDB](https://docs.voxel51.com/integrations/lancedb.html)!

In [31]:
fob.compute_similarity(dataset, brain_key="clip_sim", model="clip-vit-base32-torch")

Computing embeddings...
 100% |█████████████████| 245/245 [11.8s elapsed, 0s remaining, 21.6 samples/s] 


<fiftyone.brain.internal.core.sklearn.SklearnSimilarityIndex at 0x2a3181eb0>

💡 Once you have the FiftyOne Core Plugins installed, you can also compute similarity on the FiftyOne App!

Now you can search the dataset by images or text, either in the App or programmatically. Here's an example of searching by text in Python:

In [33]:
pink_flamingos = dataset.sort_by_similarity("pink flamingo", k = 25)
session.view = pink_flamingos

You can also achieve the same effect in the App (after refreshing) by clicking on the magnifying glass icon in the menu bar and typing in a search query!

To search by image, select an image in the App and press the image icon in the menu bar.

### Zero-Shot Classification

In a similar vein, we can run zero-shot classification on the dataset. For this, we'll use a [FiftyOne Plugin](https://voxel51.com/plugins/). Plugins are modular extensions to FiftyOne that provide additional functionality. There are a ton of already [existing plugins](https://github.com/voxel51/fiftyone-plugins) for a variety of use-cases and workflows, and you can also write your own!

We'll use the FiftyOne Command Line Interface syntax to install the [Zero Shot Prediction plugin](https://github.com/jacobmarks/zero-shot-prediction-plugin):

In [None]:
!fiftyone plugins download https://github.com/jacobmarks/zero-shot-prediction-plugin

While we're at it, we will install two more plugins that we'll use shortly — the [FiftyOne Core Plugins](https://github.com/voxel51/fiftyone-plugins#core-plugins) (which wrap SDK functionality in simple UIs), and a [Text-to-Image plugin](https://github.com/jacobmarks/text-to-image) (which we will use to add synthetic images to our dataset):

In [None]:
!fiftyone plugins download https://github.com/voxel51/fiftyone-plugins

In [None]:
!fiftyone plugins download https://github.com/jacobmarks/text-to-image

Refresh the app, and hit the "~" button on your keyboard, and you'll see a long list of "operators" appear. Select the `zero_shot_classify` option, and you'll see a dynamic form appear, giving you tons of options for configuring the zero-shot classification.

 Depending on what Python packages you have installed, you'll see different options in the dropdown for `Classification Model`. You can specify class names directly, or via a text file. In our case as there are only 4 classes, we'll paste them as a comma-separated list: `ibis, flamingo, emu, pigeon`.

There are only a few hundred images in the dataset, so we can run this zero-shot classification in seconds, but if you have a larger dataset, consider [delegating execution](https://docs.voxel51.com/plugins/using_plugins.html#delegated-operations).

## Evaluating a model

Now that we have ground truth labels and (zero-shot) predictions, we can evaluate the quality of our model predictions. FiftyOne provides an [Evaluation API](https://docs.voxel51.com/user_guide/evaluation.html) that encompasses classification, detection, and segmentation tasks. For classification tasks like ours, we would use the `evaluate_classifications()` method (read about it [here](https://docs.voxel51.com/user_guide/evaluation.html)). However, now that we have installed the FiftyOne Evaluation plugin, we can achieve this via the app!

Press the "~" button on your keyboard again, and select `evaluate_model` from the list of operators. Then fill out the dynamic form, and hit `Execute`!

Now in the sidebar on the left, you'll see a new field with the name that you assigned to the "Evaluation key" in the form. Expanding this, you'll see `True` and `False` value counts, and clicking on these will show you the images that were correctly and incorrectly classified, respectively.

The CLIP model did very well on the whole, but the majority of the errors were between the `ibis` and `flamingo` classes, which is understandable given the similarity between the two. If we wanted to improve the model, we could collect more data for these classes... or we could use generative AI to augment our dataset!

## Augmenting the dataset

Now that we know where the model is struggling, we can use generative AI to augment our dataset with synthetic images. If we wanted to generate very high quality images from text prompts, we could use a state-of-the-art model like Stable Diffusion XL. In the spirit of the Advent of Code, however, we'll use a much simpler model that will generate images in a matter of seconds: a Latent Consistency Model called [LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) from Hugging Face's [Diffusers](https://huggingface.co/docs/diffusers/index) library. This model is integrated into the FiftyOne Text-to-Image plugin that we installed earlier, so we can use it without installing anything else.

Here are a few prompts that you can try out:

- "Close-up of a standing flamingo in shallow water at sunset."
- "Group of flying ibises over a wetland against a blue sky."
- "Resting flamingo lying on the ground, head tucked in feathers."
- "Juvenile ibis on a branch, showing brown and white transitioning plumage."
- "Flamingo feeding in a lake, beak underwater with ripple effects."
- "Ibis foraging in an urban park with city buildings in the background."
- "Side view of flying flamingos at sunset, legs and necks extended."
- "Close-up of an ibis's head, focusing on eye and beak details."
- "Panoramic of flamingos in a coastal habitat during low tide."
- "Ibis in a rainy wetland, raindrops visible on feathers."

Go back to the App, and press the "~" button on your keyboard again. This time, select `txt2img` from the list of operators. Fill out the dynamic form, and hit `Execute`! The plugin will call the model, generate an image from the text prompt, and add it to the dataset. You can repeat this as many times as you like, with as many prompts as you like, and varied hyperparameters. If you have `replicate` or `openai` accounts and API keys, you can also use the models exposed by those APIs as well!

The images will be added directly to the dataset, at the bottom of the sample grid. You may also notice that there are some new fields in the sidebar: `prompt`, `model`, `date_created`, and model configuration fields. These are automatically added by the plugin, and you can use them to filter the dataset to only include synthetic images, or to only include images generated by a particular model.

💡 You can also call the `txt2img` operator [from Python](https://github.com/jacobmarks/text-to-image#python-sdk)!

## Comparing Real and Synthetic Images

The last stop on our whirlwind tour of FiftyOne's functionality takes us back to the FiftyOne Brain. We will use the `compute_visualization()` method to generate embeddings for our samples, use UMAP to reduce the dimensionality of the embeddings, and then visualize the embeddings in two dimensions. We can run this method from the App, or [programmatically](https://docs.voxel51.com/api/fiftyone.brain.html#fiftyone.brain.compute_visualization).

Once we've run the method, we can visualize the embeddings by clicking the "+" button in the menu bar and selecting `Embeddings`. We can than select the name of the brain run that we just created, and we'll see a 2D visualization of the embeddings, and we can color the points by any field in the dataset.

If we color by the `model`, we can see how the synthetic images compare to the real ones:

On the other hand, if we color by the ground truth `label`, we can see the differences in how the model sees the different classes:

## Conclusion

One takeaway from this exercise is that augmenting your dataset with synthetically generated images isn't always a quick fix. The new images may look realistic to the human eye, but there may be subtle differences that models can pick up on. In this case, we can see that the new images of ibises and flamingos are being clustered together, and are not part of the original clusters of ibises and flamingos, which is not what we want. We could try to fix this by generating more images, or by using a more powerful model, but we could also try to fix it by changing the prompts that we use to generate the images. For example, we could try to generate images of ibises and flamingos in different poses, or in different environments.

The meta-takeaway is that FiftyOne streamlines the process of exploring your data and debugging your models, and it enables you to iterate quickly on your ideas. The FiftyOne query language makes it easy to filter your dataset, the FiftyOne Brain allows you to apply machine learning methods to your data, the FiftyOne App makes it easy to visualize your data and share your findings with others, and the FiftyOne Plugins make it easy to extend FiftyOne's functionality to suit your needs. We hope you've enjoyed this walkthrough, and we hope you'll give FiftyOne a try on your own datasets!

## 🚀 Next Steps

If you'd like to learn more about FiftyOne, here are some resources to get you started:

- [FiftyOne Documentation](https://voxel51.com/docs/fiftyone/)
- [FiftyOne Community Slack](https://slack.voxel51.com/)

We also have a complete [Getting Started with FiftyOne Workshop](https://voxel51.com/computer-vision-events/fiftyone-workshop-dec-6/) taking place on December 6th at 8AM PT / 11AM ET / 5PM CET. We hope to see you there!

For all things Advent of Code, join the [Advent of Code Discord](https://discord.com/invite/7hwQAHgKMS)!

- On deck: [Day 3: quivr](https://github.com/StanGirard/quivr)
- In the hole: [Day 4: haystack](https://github.com/deepset-ai/haystack)

And don't forget to check out [Milvus](https://github.com/milvus-io/milvus), the open-source vector database from Zilliz!