{ "cells": [ { "cell_type": "markdown", "id": "201bf295-3c31-4348-9429-893dcab6be94", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", "
\n", " GitHub •\n", " Join Discord Community •\n", " Discussion Forum \n", "
\n", "\n", "
\n", " Blog •\n", " Documentation •\n", " About Us \n", "
\n", "\n", "\n", "
\n", "\n", "
\n", "
\n", " \n", " \"site\"\n", " \n", " \"blog\"\n", " \n", " \"github\"\n", " \n", " \"slack\"\n", " \n", " \"linkedin\"\n", " \n", " \"youtube\"\n", " \n", " \"twitter\"\n", "
\n", "
" ] }, { "cell_type": "markdown", "id": "pN6wiKBax7Pa", "metadata": { "id": "pN6wiKBax7Pa", "tags": [] }, "source": [ "# Quickstart - Analyze Dataset for Potential Issues\n", "\n", "[![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=google-colab&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/quickstart.ipynb)\n", "[![Open in Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=kaggle&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/quickstart.ipynb)\n", "[![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/quickstart)\n", "\n", "This notebook shows how to quickly analyze an image dataset for potential issues using [fastdup](https://github.com/visual-layer/fastdup). We'll take you on a high-level tour showcasing the core functions of fastdup in the shortest time.\n", "\n", "By the end of this notebook, you will learn how to find out if your dataset has issues such as:\n", "\n", "+ Broken images.\n", "+ Duplicates/near-duplicates.\n", "+ Outliers.\n", "+ Dark/bright/blurry images.\n", "\n", "We'll also visualize clusters of visually similar images to provide a bird's-eye view and help you understand the data's structure for further analysis." ] }, { "cell_type": "markdown", "id": "c0727302-dbe5-46b3-a5ff-b039811a7e7e", "metadata": { "tags": [] }, "source": [ "## Installation\n", "First, let's start with the installation:\n", "\n", "> ✅ **Tip** - If you're new to fastdup, we encourage you to run the notebook in [Google Colab](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb) or [Kaggle](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/quick-dataset-analysis.ipynb) for the best experience. If you'd like to just view and skim through the notebook, we recommend viewing using [nbviewer](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb). \n", "\n" ] }, { "cell_type": "code", "execution_count": null, "id": "8e6dd3e6-0f72-456b-9b16-2e53d5d5c099", "metadata": {}, "outputs": [], "source": [ "!pip install fastdup -Uq" ] }, { "cell_type": "markdown", "id": "488abfbf", "metadata": {}, "source": [ "Now, test the installation by printing out the version. If there's no error message, we are ready to go!" ] }, { "cell_type": "code", "execution_count": 1, "id": "e301485f", "metadata": { "id": "e301485f", "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'2.0.17'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "2d30a901-4ba8-48cf-9a2f-37e0f70fa1ae", "metadata": { "tags": [] }, "source": [ "## Download Dataset\n", "\n", "For demonstration, we will use a generally curated [Oxford IIIT Pet dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). Feel free to swap this dataset with your own.\n", "\n", "The dataset consists of images and annotations for 37 category pets with roughly 200 images for each class. \n", "\n", "> 🗒 **Note** - fastdup works on both unlabeled and labeled images. But for now, we are only interested in finding issues in the images and not the annotations. \n", "> If you're interested in finding annotation issues, head to:\n", "> + 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n", "> + 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb).\n", "\n", "\n", "Let's download only from the dataset and extract them into the local directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "d91abfc1", "metadata": {}, "outputs": [], "source": [ "!wget https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz -O images.tar.gz\n", "!tar xf images.tar.gz" ] }, { "cell_type": "markdown", "id": "8cd8a7da-2e05-4c38-aa37-33fd466a61e2", "metadata": { "tags": [] }, "source": [ "## Run fastdup\n", "\n", "Once the extraction completes, we can run fastdup on the images.\n", "\n", "For that let's initialize fastdup and specify the input directory which points to the folder of images." ] }, { "cell_type": "code", "execution_count": 2, "id": "fe4d8211-89b2-4a2f-91f4-8074d2314aef", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "fastdup By Visual Layer, Inc. 2024. All rights reserved.\n", "\n", "A fastdup dataset object was created!\n", "\n", "Input directory is set to \u001b[0;35m\"images\"\u001b[0m\n", "Work directory is set to \u001b[0;35m\"work_dir\"\u001b[0m\n", "\n", "The next steps are:\n", " 1. Analyze your dataset with the \u001b[0;35m.run()\u001b[0m function of the dataset object\n", " 2. Interactively explore your data on your local machine with the \u001b[0;35m.explore()\u001b[0m function of the dataset object\n", "\n", "For more information, use \u001b[0;35mhelp(fastdup)\u001b[0m or check our documentation [link].\n", "\n" ] } ], "source": [ "fd = fastdup.create(input_dir=\"images/\")" ] }, { "cell_type": "markdown", "id": "4acb64a1-ab06-4fa2-8111-65b5d4f2a335", "metadata": {}, "source": [ "> 🗒 **Note** - The `.create` method also has an optional `work_dir` parameter which specifies the directory to store artifacts from the run.\n", "\n", "In other words you can run `fastdup.create(input_dir=\"images/\", work_dir=\"my_work_dir/\")` if you'd like to store the artifacts in a `my_work_dir`.\n", "\n", "Now, let's run fastdup." ] }, { "cell_type": "code", "execution_count": 14, "id": "beac4c50-3084-47fe-9b22-b14c3d3cb139", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "fastdup By Visual Layer, Inc. 2024. All rights reserved.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Initializing data [■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100% Estimated: 0 Minutes\r" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Done: 100%|██████████████████████████████████████████████| 3/3 [01:20<00:00, 26.86s/it]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Analysis complete. Use the \u001b[0;35m.explore()\u001b[0m function to interactively explore your data on your local machine.\n", "\n", "Alternatively, you can generate HTML-based galleries.\n", "For more information, use \u001b[0;35mhelp(fastdup)\u001b[0m or check our documentation [link].\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.run()" ] }, { "cell_type": "markdown", "id": "24b9d94d-7458-42f0-bf77-1b33491279f2", "metadata": {}, "source": [ "## View Run Summary\n", "\n", "After the run is completed, you can optionally view the summary with:" ] }, { "cell_type": "code", "execution_count": 15, "id": "b546398f-e555-42b7-83ad-fd9ba9286d41", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 7390 images\n", " Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data\n", " For a detailed analysis, use `.invalid_instances()`.\n", "\n", " Components: failed to find images clustered into components, try to run with lower cc_threshold.\n", " Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n", "\n" ] }, { "data": { "text/plain": [ "['Dataset contains 7390 images',\n", " 'Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data',\n", " 'For a detailed analysis, use `.invalid_instances()`.\\n',\n", " 'Components: failed to find images clustered into components, try to run with lower cc_threshold.',\n", " 'Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.',\n", " 'For a detailed list of outliers, use `.outliers()`.\\n']" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.summary()" ] }, { "cell_type": "markdown", "id": "9cde5da4-960b-469e-bba2-32736c5131f8", "metadata": { "id": "67205fab", "tags": [] }, "source": [ "## Invalid Images\n", "From the summary above, we see there are a few invalid images. These are broken images that cannot be read.\n", "\n", "You can get a list of broken images with:" ] }, { "cell_type": "code", "execution_count": 5, "id": "883435db-3097-4449-ab1a-c522d48edbd9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenameindexerror_codeis_validfd_index
136images/Abyssinian_34.jpg136ERROR_CORRUPT_IMAGEFalse136
1042images/Egyptian_Mau_139.jpg1042ERROR_CORRUPT_IMAGEFalse1042
1049images/Egyptian_Mau_145.jpg1049ERROR_CORRUPT_IMAGEFalse1049
1070images/Egyptian_Mau_167.jpg1070ERROR_CORRUPT_IMAGEFalse1070
1079images/Egyptian_Mau_177.jpg1079ERROR_CORRUPT_IMAGEFalse1079
1095images/Egyptian_Mau_191.jpg1095ERROR_CORRUPT_IMAGEFalse1095
\n", "
" ], "text/plain": [ " filename index error_code is_valid fd_index\n", "136 images/Abyssinian_34.jpg 136 ERROR_CORRUPT_IMAGE False 136\n", "1042 images/Egyptian_Mau_139.jpg 1042 ERROR_CORRUPT_IMAGE False 1042\n", "1049 images/Egyptian_Mau_145.jpg 1049 ERROR_CORRUPT_IMAGE False 1049\n", "1070 images/Egyptian_Mau_167.jpg 1070 ERROR_CORRUPT_IMAGE False 1070\n", "1079 images/Egyptian_Mau_177.jpg 1079 ERROR_CORRUPT_IMAGE False 1079\n", "1095 images/Egyptian_Mau_191.jpg 1095 ERROR_CORRUPT_IMAGE False 1095" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.invalid_instances()" ] }, { "cell_type": "markdown", "id": "22e04b25-0fe7-409d-8bd9-3b92c2ec8c5b", "metadata": {}, "source": [ "## Duplicate/Near-duplicates\n", "\n", "One of the lowest hanging fruits in cleaning a dataset is finding and eliminating duplicates.\n", "\n", "fastdup provides a handy way of visualizing duplicates/near-duplicates using the `duplicates_gallery` method. The `Distance` value indicates how visually similar are the image pairs in the gallery. A `Distance` of `1.0` indicates an exact copy and vice-versa." ] }, { "cell_type": "code", "execution_count": 6, "id": "27b091e6-fffa-4701-8a9a-19b7b087314a", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f0b35ad31c454d779888a71a79cd0b4d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_11.jpg
To/Bombay_100.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_220.jpg
To/Bombay_126.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_189.jpg
To/Bombay_164.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_99.jpg
To/Bombay_202.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_185.jpg
To/Bombay_190.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/boxer_82.jpg
To/boxer_114.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_198.jpg
To/Bombay_69.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/newfoundland_137.jpg
To/newfoundland_153.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Egyptian_Mau_183.jpg
To/Egyptian_Mau_10.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/keeshond_59.jpg
To/keeshond_54.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_194.jpg
To/Bombay_32.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_193.jpg
To/Bombay_22.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_109.jpg
To/Bombay_206.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/newfoundland_152.jpg
To/newfoundland_147.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Egyptian_Mau_224.jpg
To/Egyptian_Mau_71.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "markdown", "id": "530988f2-a98e-4516-90e1-0d94bcac9951", "metadata": {}, "source": [ "## Outliers\n", "\n", "Similar to duplicate pairs, you can visualize potential outliers in your dataset with:" ] }, { "cell_type": "code", "execution_count": 7, "id": "7d83835b-0223-445f-9700-052fc4ca58a1", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "43b67a7169da4bb2bd7af9a32aef1cac", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.597075
Path/Bengal_105.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.616418
Path/Sphynx_128.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.624279
Path/beagle_142.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.629087
Path/staffordshire_bull_terrier_51.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.629917
Path/american_pit_bull_terrier_72.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.633318
Path/german_shorthaired_173.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.633533
Path/miniature_pinscher_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.634925
Path/Bengal_131.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.639585
Path/chihuahua_6.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.642
Path/basset_hound_197.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.643355
Path/boxer_149.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.643534
Path/beagle_147.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.645831
Path/Bombay_204.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.653168
Path/Bombay_36.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.6535
Path/Abyssinian_226.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.654307
Path/miniature_pinscher_191.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.655955
Path/staffordshire_bull_terrier_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.660908
Path/chihuahua_164.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.661223
Path/german_shorthaired_121.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.667204
Path/Bombay_188.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery() " ] }, { "cell_type": "markdown", "id": "789da241-e9cd-4568-9d19-aa5c80567415", "metadata": {}, "source": [ "## Dark, Bright and Blurry Images\n", "\n", "fastdup also lets you visualize images from your dataset using statistical metrics.\n", "\n", "For example, with `metric='dark'` we can visualize the darkest images from the dataset." ] }, { "cell_type": "code", "execution_count": 8, "id": "292bdd75-5df0-4617-bd1e-8bcdd147e215", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "013b326375884f1cadff1df4134f87ad", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Dark Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Dark Image Report

Showing example images, sort by ascending order

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean15.7118
filenameimages/Abyssinian_4.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean18.7883
filenameimages/Abyssinian_114.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean19.5741
filenameimages/Abyssinian_18.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean19.8396
filenameimages/Bombay_191.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean26.7209
filenameimages/Bombay_108.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean27.4072
filenameimages/Abyssinian_62.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean28.5051
filenameimages/scottish_terrier_171.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean29.4029
filenameimages/Sphynx_119.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean29.9286
filenameimages/Maine_Coon_134.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean31.4749
filenameimages/shiba_inu_137.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean31.599
filenameimages/chihuahua_78.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean32.7848
filenameimages/shiba_inu_27.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.2283
filenameimages/Egyptian_Mau_59.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.7525
filenameimages/japanese_chin_175.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.7692
filenameimages/beagle_180.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.9768
filenameimages/Abyssinian_30.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.0113
filenameimages/american_bulldog_150.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.3895
filenameimages/Abyssinian_46.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.8092
filenameimages/Sphynx_46.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean35.634
filenameimages/japanese_chin_40.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='dark')" ] }, { "cell_type": "code", "execution_count": 9, "id": "6e4cd628-ee7b-4eb9-b2d0-bde4f0beb22d", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9889a1605f854cfa81f5cc173daceed4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Bright Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Bright Image Report

Showing example images, sort by descending order

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean235.6992
filenameimages/saint_bernard_183.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean234.3785
filenameimages/saint_bernard_188.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean233.4722
filenameimages/Egyptian_Mau_99.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean232.2554
filenameimages/saint_bernard_186.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean230.1848
filenameimages/Abyssinian_127.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean226.9057
filenameimages/saint_bernard_187.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean226.3688
filenameimages/British_Shorthair_274.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean223.6878
filenameimages/Egyptian_Mau_1.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean223.2687
filenameimages/great_pyrenees_88.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean220.246
filenameimages/Bengal_20.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean218.5597
filenameimages/pug_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean217.9169
filenameimages/Egyptian_Mau_39.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean216.7688
filenameimages/Maine_Coon_267.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean214.4495
filenameimages/staffordshire_bull_terrier_25.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean213.1254
filenameimages/Birman_136.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean212.3259
filenameimages/basset_hound_24.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.3064
filenameimages/boxer_172.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.2815
filenameimages/saint_bernard_14.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.1101
filenameimages/pug_96.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean210.7337
filenameimages/Egyptian_Mau_45.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='bright')" ] }, { "cell_type": "code", "execution_count": 10, "id": "aeb3a18e-1c2e-4ce4-94c0-61cdddae0619", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4f14b06bbd194f29a73ff5231a596550", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Blurry Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Blurry Image Report

Showing example images, sort by ascending order

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur65.1586
filenameimages/Persian_228.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur68.6347
filenameimages/Ragdoll_254.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur71.8926
filenameimages/pomeranian_170.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur76.9661
filenameimages/pomeranian_183.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur77.3129
filenameimages/pug_166.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur77.8375
filenameimages/Ragdoll_255.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur79.21
filenameimages/yorkshire_terrier_123.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur83.2725
filenameimages/pomeranian_166.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur88.556
filenameimages/pomeranian_123.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur91.0464
filenameimages/chihuahua_124.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur93.68
filenameimages/chihuahua_161.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur96.0024
filenameimages/pomeranian_117.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur99.3509
filenameimages/pomeranian_176.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur104.3721
filenameimages/chihuahua_187.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur105.5227
filenameimages/Siamese_250.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur108.3876
filenameimages/Persian_260.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur111.6988
filenameimages/pomeranian_173.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur113.5611
filenameimages/pomeranian_172.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur115.5061
filenameimages/Bombay_85.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur115.5061
filenameimages/Bombay_200.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='blur')" ] }, { "cell_type": "markdown", "id": "a6808750-d5d7-44bc-a6b0-aa985255407b", "metadata": { "tags": [] }, "source": [ "## Visualize Image Clusters\n", "\n", "One of fastdup's coolest features is visualizing image clusters. In the previous section, we saw how to visualize similar image pairs. In this section, we group similar-looking images (or even duplicates) as a cluster and visualize them in the gallery.\n", "\n", "To do so, run:\n", "\n" ] }, { "cell_type": "markdown", "id": "2bcd09c1", "metadata": {}, "source": [ "> **Note**: fastdup uses default parameter values when creating image clusters. Depending on your data and use case, the best value may vary. Read more [here](https://visual-layer.readme.io/docs/dataset-cleanup) on how to change parameter values to cluster images." ] }, { "cell_type": "code", "execution_count": 11, "id": "2cc2b317-e92e-4e40-9655-0a6b7c569dfa", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cc097b9e33aa43febbe2b6950de2ce9e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Components Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "\n", "
\n", "
\n", "
\n", " For the new and interactive data exploration\n", " \n", " Read more \n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", " fastdup.explore()\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", "

Components Report

Showing groups of similar images

\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component599
num_images3
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component606
num_images3
mean_distance0.965826
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component12
num_images2
mean_distance0.968095
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4593
num_images2
mean_distance0.999985
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4656
num_images2
mean_distance0.999894
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4649
num_images2
mean_distance0.999973
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4648
num_images2
mean_distance0.999969
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4636
num_images2
mean_distance0.999947
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4616
num_images2
mean_distance0.999967
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4597
num_images2
mean_distance0.999982
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4158
num_images2
mean_distance0.96041
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4235
num_images2
mean_distance0.99994
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4749
num_images2
mean_distance0.999988
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4046
num_images2
mean_distance0.999951
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4044
num_images2
mean_distance0.999976
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4013
num_images2
mean_distance0.999985
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3642
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3621
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4747
num_images2
mean_distance0.999986
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4790
num_images2
mean_distance0.999705
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.component_gallery()" ] }, { "cell_type": "markdown", "id": "98a0333c", "metadata": {}, "source": [ "## Interactive Exploration\n", "In addition to the static visualizations presented above, fastdup also offers interactive exploration of the dataset.\n", "\n", "To explore the dataset and issues interactively in a browser, run:" ] }, { "cell_type": "code", "execution_count": null, "id": "1f1c8b89-cf96-4130-b09e-b257904445d1", "metadata": {}, "outputs": [], "source": [ "fd.explore()" ] }, { "cell_type": "markdown", "id": "609b7114-9bae-46f5-be4d-0b86c920770e", "metadata": {}, "source": [ "> 🗒 **Note** - This currently requires you to sign-up (for free) to view the interactive exploration. Alternatively, you can visualize fastdup in a non-interactive way using fastdup's built in galleries shown in the upcoming cells.\n", "\n", "You'll be presented with a web interface that lets you conveniently view, filter, and curate your dataset in a web interface.\n", "\n", "\n", "![image.png](https://vl-blog.s3.us-east-2.amazonaws.com/fastdup_assets/cloud_preview.gif)" ] }, { "cell_type": "markdown", "id": "6c3135e1", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "That's a wrap! In this notebook we showed how you can run fastdup on a dataset or any folder of images. \n", "\n", "We've seen how to use fastdup to find:\n", "\n", "+ Broken images.\n", "+ Duplicate/near-duplicates.\n", "+ Outliers.\n", "+ Dark, bright and blurry images.\n", "+ Image clusters.\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. \n", "\n", "As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues).\n" ] }, { "cell_type": "markdown", "id": "6034a6ad-2aa2-454e-ad2d-bd320e7fe6bb", "metadata": {}, "source": [ "
\n", "
\n", " \n", " \"site\"\n", " \n", " \"blog\"\n", " \n", " \"github\"\n", " \n", " \"slack\"\n", " \n", " \"linkedin\"\n", " \n", " \"youtube\"\n", " \n", " \"twitter\"\n", "
\n", "
\n", "
\n", " \"logo\"\n", "
Copyright © 2024 Visual Layer. All rights reserved.
\n", "
\n", "\n", "
" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.1.undefined" } }, "nbformat": 4, "nbformat_minor": 5 }