{ "cells": [ { "cell_type": "markdown", "id": "6769e2a1", "metadata": {}, "source": [ "[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)" ] }, { "cell_type": "markdown", "id": "pN6wiKBax7Pa", "metadata": { "id": "pN6wiKBax7Pa", "tags": [] }, "source": [ "# Quick Dataset Analysis\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb)\n", "[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb)\n", "\n", "This notebook shows how to quickly analyze an image dataset for potential issues using fastdup. We'll take you on a high level tour showcasing the core functions of fastdup in the shortest time.\n", "\n", "By the end of this notebook you will learn how to find out if your dataset has issues such as:\n", "\n", "+ Broken images.\n", "+ Duplicates/near-duplicates.\n", "+ Outliers.\n", "+ Dark/bright/blurry images.\n", "\n", "We'll also visualize clusters of visually similar looking images to let you have a birds eye view on your dataset." ] }, { "cell_type": "markdown", "id": "c0727302-dbe5-46b3-a5ff-b039811a7e7e", "metadata": { "tags": [] }, "source": [ "## Installation\n", "\n", "If you're new, we encourage you to run the notebook in [Google Colab](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb) or [Kaggle](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/quick-dataset-analysis.ipynb) for the best experience. If you'd like to just view and skim through the notebook, we recommend viewing using [nbviewer](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb). \n", "\n", "Let's start with the installation:" ] }, { "cell_type": "code", "execution_count": 1, "id": "8e6dd3e6-0f72-456b-9b16-2e53d5d5c099", "metadata": {}, "outputs": [], "source": [ "!pip install fastdup -Uq" ] }, { "cell_type": "markdown", "id": "488abfbf", "metadata": {}, "source": [ "Now, test the installation by printing out the version. If there's no error message, we are ready to go!" ] }, { "cell_type": "code", "execution_count": 2, "id": "e301485f", "metadata": { "id": "e301485f", "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/bin/dpkg\n" ] }, { "data": { "text/plain": [ "'1.23'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "2d30a901-4ba8-48cf-9a2f-37e0f70fa1ae", "metadata": { "tags": [] }, "source": [ "## Download Dataset\n", "\n", "For demonstration, we will use a widely available and well curated dataset [Oxford IIIT Pet dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/). For that reason we might not find a lot of issues here but feel free to swap this dataset with your own.\n", "\n", "The dataset consists of images and annotations for 37 category pet with roughly 200 images for each class. For now, we are only interested in finding issues in the images and not the annotations. More on annotations in the upcoming [notebook](./analyzing-image-classification-dataset.ipynb).\n", "\n", "Let's download only from the dataset and extract them into the local directory:" ] }, { "cell_type": "code", "execution_count": null, "id": "d91abfc1", "metadata": {}, "outputs": [], "source": [ "!wget https://thor.robots.ox.ac.uk/~vgg/data/pets/images.tar.gz -O images.tar.gz\n", "!tar xf images.tar.gz" ] }, { "cell_type": "markdown", "id": "8cd8a7da-2e05-4c38-aa37-33fd466a61e2", "metadata": { "tags": [] }, "source": [ "## Run fastdup\n", "\n", "Once the extraction completes, we can run fastdup on the images.\n", "\n", "For that let's create a `fastdup` object and specify the input directory which points to the folder of images." ] }, { "cell_type": "code", "execution_count": 3, "id": "fe4d8211-89b2-4a2f-91f4-8074d2314aef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n" ] } ], "source": [ "fd = fastdup.create(input_dir=\"images/\")" ] }, { "cell_type": "markdown", "id": "4acb64a1-ab06-4fa2-8111-65b5d4f2a335", "metadata": {}, "source": [ "The `.create` method also has an optional `work_dir` parameter which specifies the directory to store artifacts from the run.\n", "\n", "You can optionally run `fastdup.create(work_dir=\"my_work_dir/\", input_dir=\"images/\")` if you'd like to store the artifacts in a specific working directory." ] }, { "cell_type": "code", "execution_count": 4, "id": "beac4c50-3084-47fe-9b22-b14c3d3cb139", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n", "2023-07-11 13:16:29 [INFO] Going to loop over dir images\n", "2023-07-11 13:16:29 [INFO] Found total 7390 images to run on, 7390 train, 0 test, name list 7390, counter 7390 \n", "2023-07-11 13:16:29 [ERROR] Failed to read image images/Abyssinian_34.jpgtes\n", "2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_139.jpgs\n", "2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_145.jpg\n", "2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_167.jpg\n", "2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_177.jpgs\n", "2023-07-11 13:16:34 [ERROR] Failed to read image images/Egyptian_Mau_191.jpg\n", "2023-07-11 13:16:45 [INFO] Found total 7390 images to run ontimated: 0 Minutes\n", "Finished histogram 1.707\n", "Finished bucket sort 1.726\n", "2023-07-11 13:16:45 [INFO] 138) Finished write_index() NN model\n", "2023-07-11 13:16:45 [INFO] Stored nn model index file work_dir/nnf.index\n", "2023-07-11 13:16:45 [INFO] Total time took 16247 ms\n", "2023-07-11 13:16:45 [INFO] Found a total of 120 fully identical images (d>0.990), which are 0.81 %\n", "2023-07-11 13:16:45 [INFO] Found a total of 8 nearly identical images(d>0.980), which are 0.05 %\n", "2023-07-11 13:16:45 [INFO] Found a total of 1006 above threshold images (d>0.900), which are 6.81 %\n", "2023-07-11 13:16:45 [INFO] Found a total of 739 outlier images (d<0.050), which are 5.00 %\n", "2023-07-11 13:16:45 [INFO] Min distance found 0.597 max distance 1.000\n", "2023-07-11 13:16:45 [INFO] Running connected components for ccthreshold 0.960000 \n", ".0\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 7390 images\n", " Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data\n", " For a detailed analysis, use `.invalid_instances()`.\n", "\n", " Similarity: 1.00% (74) belong to 3 similarity clusters (components).\n", " 99.00% (7,316) images do not belong to any similarity cluster.\n", " Largest cluster has 12 (0.16%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.9, connected component threshold used is 0.96).\n", "\n", " Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.run()" ] }, { "cell_type": "markdown", "id": "24b9d94d-7458-42f0-bf77-1b33491279f2", "metadata": {}, "source": [ "## View Run Summary" ] }, { "cell_type": "code", "execution_count": 5, "id": "b546398f-e555-42b7-83ad-fd9ba9286d41", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 7390 images\n", " Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data\n", " For a detailed analysis, use `.invalid_instances()`.\n", "\n", " Similarity: 1.00% (74) belong to 3 similarity clusters (components).\n", " 99.00% (7,316) images do not belong to any similarity cluster.\n", " Largest cluster has 12 (0.16%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.9, connected component threshold used is 0.96).\n", "\n", " Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n" ] }, { "data": { "text/plain": [ "['Dataset contains 7390 images',\n", " 'Valid images are 99.92% (7,384) of the data, invalid are 0.08% (6) of the data',\n", " 'For a detailed analysis, use `.invalid_instances()`.\\n',\n", " 'Similarity: 1.00% (74) belong to 3 similarity clusters (components).',\n", " '99.00% (7,316) images do not belong to any similarity cluster.',\n", " 'Largest cluster has 12 (0.16%) images.',\n", " 'For a detailed analysis, use `.connected_components()`\\n(similarity threshold used is 0.9, connected component threshold used is 0.96).\\n',\n", " 'Outliers: 6.14% (454) of images are possible outliers, and fall in the bottom 5.00% of similarity values.',\n", " 'For a detailed list of outliers, use `.outliers()`.']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.summary()" ] }, { "cell_type": "markdown", "id": "9cde5da4-960b-469e-bba2-32736c5131f8", "metadata": { "id": "67205fab", "tags": [] }, "source": [ "## Invalid Images\n", "From the logs printed above, we see there are a few invalid images. These are broken images that cannot be read.\n", "\n", "You can get a list of broken images with:" ] }, { "cell_type": "code", "execution_count": 6, "id": "883435db-3097-4449-ab1a-c522d48edbd9", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenameindexerror_codeis_validfd_index
0images/Abyssinian_34.jpg135ERROR_CORRUPT_IMAGEFalse135
1images/Egyptian_Mau_139.jpg2240ERROR_CORRUPT_IMAGEFalse2240
2images/Egyptian_Mau_145.jpg2247ERROR_CORRUPT_IMAGEFalse2247
3images/Egyptian_Mau_167.jpg2268ERROR_CORRUPT_IMAGEFalse2268
4images/Egyptian_Mau_177.jpg2278ERROR_CORRUPT_IMAGEFalse2278
5images/Egyptian_Mau_191.jpg2293ERROR_CORRUPT_IMAGEFalse2293
\n", "
" ], "text/plain": [ " filename index error_code is_valid fd_index\n", "0 images/Abyssinian_34.jpg 135 ERROR_CORRUPT_IMAGE False 135\n", "1 images/Egyptian_Mau_139.jpg 2240 ERROR_CORRUPT_IMAGE False 2240\n", "2 images/Egyptian_Mau_145.jpg 2247 ERROR_CORRUPT_IMAGE False 2247\n", "3 images/Egyptian_Mau_167.jpg 2268 ERROR_CORRUPT_IMAGE False 2268\n", "4 images/Egyptian_Mau_177.jpg 2278 ERROR_CORRUPT_IMAGE False 2278\n", "5 images/Egyptian_Mau_191.jpg 2293 ERROR_CORRUPT_IMAGE False 2293" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.invalid_instances()" ] }, { "cell_type": "markdown", "id": "22e04b25-0fe7-409d-8bd9-3b92c2ec8c5b", "metadata": {}, "source": [ "## Duplicate/Near-duplicates\n", "\n", "One of the lowest hanging fruits in cleaning a dataset is finding and eliminating duplicates.\n", "\n", "fastdup provides a handy way of visualizing duplicates/near-duplicates using the `duplicates_gallery` method. The `Distance` value indicates how visually similar are the image pairs in the gallery. A `Distance` of `1.0` indicates an exact copy and vice-versa." ] }, { "cell_type": "code", "execution_count": 7, "id": "27b091e6-fffa-4701-8a9a-19b7b087314a", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 197.91it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored similarity visual view in work_dir/galleries/duplicates.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_190.jpg
To/Bombay_185.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/newfoundland_154.jpg
To/newfoundland_138.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_201.jpg
To/Bombay_92.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_32.jpg
To/Bombay_194.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_202.jpg
To/Bombay_99.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Egyptian_Mau_210.jpg
To/Egyptian_Mau_41.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_189.jpg
To/Bombay_164.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/english_cocker_spaniel_176.jpg
To/english_cocker_spaniel_179.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_198.jpg
To/Bombay_69.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/english_cocker_spaniel_163.jpg
To/english_cocker_spaniel_152.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/english_cocker_spaniel_151.jpg
To/english_cocker_spaniel_162.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/english_cocker_spaniel_164.jpg
To/english_cocker_spaniel_154.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_109.jpg
To/Bombay_206.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/Bombay_126.jpg
To/Bombay_220.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "markdown", "id": "530988f2-a98e-4516-90e1-0d94bcac9951", "metadata": {}, "source": [ "## Outliers\n", "\n", "Similar to duplicate pairs, you can visualize potential outliers in your dataset with:" ] }, { "cell_type": "code", "execution_count": 8, "id": "7d83835b-0223-445f-9700-052fc4ca58a1", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 24556.81it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored outliers visual view in work_dir/galleries/outliers.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.597075
Path/Bengal_105.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.624279
Path/beagle_142.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.629087
Path/staffordshire_bull_terrier_51.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.629917
Path/american_pit_bull_terrier_72.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.633318
Path/german_shorthaired_173.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.633533
Path/miniature_pinscher_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.634925
Path/Bengal_131.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.636669
Path/staffordshire_bull_terrier_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.639585
Path/chihuahua_6.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.642
Path/basset_hound_197.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.643355
Path/boxer_149.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.643534
Path/beagle_147.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.64548
Path/Bombay_188.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.645831
Path/Bombay_204.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.653168
Path/Bombay_36.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.6535
Path/Abyssinian_226.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.654307
Path/miniature_pinscher_191.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.660908
Path/chihuahua_164.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.661223
Path/german_shorthaired_121.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668792
Path/Maine_Coon_194.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery() " ] }, { "cell_type": "markdown", "id": "789da241-e9cd-4568-9d19-aa5c80567415", "metadata": {}, "source": [ "## Dark, Bright and Blurry Images\n", "\n", "fastdup also lets you visualize images from your dataset using statistical metrics.\n", "\n", "For example, with `metric='dark'` we can visualize the darkest images from the dataset." ] }, { "cell_type": "code", "execution_count": 9, "id": "292bdd75-5df0-4617-bd1e-8bcdd147e215", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 288.94it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored mean visual view in work_dir/galleries/mean.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Dark Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Dark Image Report

Showing example images, sort by ascending order

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean15.7118
filenameimages/Abyssinian_4.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean18.7883
filenameimages/Abyssinian_114.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean19.5741
filenameimages/Abyssinian_18.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean19.8396
filenameimages/Bombay_191.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean26.7209
filenameimages/Bombay_108.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean27.4072
filenameimages/Abyssinian_62.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean28.5051
filenameimages/scottish_terrier_171.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean29.4029
filenameimages/Sphynx_119.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean29.9286
filenameimages/Maine_Coon_134.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean31.4749
filenameimages/shiba_inu_137.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean31.599
filenameimages/chihuahua_78.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean32.7848
filenameimages/shiba_inu_27.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.2283
filenameimages/Egyptian_Mau_59.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.7525
filenameimages/japanese_chin_175.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.7692
filenameimages/beagle_180.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean33.9768
filenameimages/Abyssinian_30.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.0113
filenameimages/american_bulldog_150.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.3895
filenameimages/Abyssinian_46.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean34.8092
filenameimages/Sphynx_46.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean35.634
filenameimages/japanese_chin_40.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='dark')" ] }, { "cell_type": "code", "execution_count": 10, "id": "6e4cd628-ee7b-4eb9-b2d0-bde4f0beb22d", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 333.70it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored mean visual view in work_dir/galleries/mean.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Bright Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Bright Image Report

Showing example images, sort by descending order

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean235.6992
filenameimages/saint_bernard_183.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean234.3785
filenameimages/saint_bernard_188.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean233.4722
filenameimages/Egyptian_Mau_99.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean232.2554
filenameimages/saint_bernard_186.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean230.1848
filenameimages/Abyssinian_127.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean226.9057
filenameimages/saint_bernard_187.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean226.3688
filenameimages/British_Shorthair_274.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean223.6878
filenameimages/Egyptian_Mau_1.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean223.2687
filenameimages/great_pyrenees_88.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean220.246
filenameimages/Bengal_20.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean218.5597
filenameimages/pug_76.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean217.9169
filenameimages/Egyptian_Mau_39.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean216.7688
filenameimages/Maine_Coon_267.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean214.4495
filenameimages/staffordshire_bull_terrier_25.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean213.1254
filenameimages/Birman_136.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean212.3259
filenameimages/basset_hound_24.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.3064
filenameimages/boxer_172.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.2815
filenameimages/saint_bernard_14.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean211.1101
filenameimages/pug_96.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
mean210.7337
filenameimages/Egyptian_Mau_45.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='bright')" ] }, { "cell_type": "code", "execution_count": 11, "id": "aeb3a18e-1c2e-4ce4-94c0-61cdddae0619", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 805.61it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored blur visual view in work_dir/galleries/blur.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Blurry Image Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Blurry Image Report

Showing example images, sort by ascending order

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur65.1586
filenameimages/Persian_228.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur68.6347
filenameimages/Ragdoll_254.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur71.8926
filenameimages/pomeranian_170.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur76.9661
filenameimages/pomeranian_183.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur77.3129
filenameimages/pug_166.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur77.8375
filenameimages/Ragdoll_255.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur79.21
filenameimages/yorkshire_terrier_123.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur83.2725
filenameimages/pomeranian_166.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur88.556
filenameimages/pomeranian_123.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur91.0464
filenameimages/chihuahua_124.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur93.68
filenameimages/chihuahua_161.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur96.0024
filenameimages/pomeranian_117.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur99.3509
filenameimages/pomeranian_176.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur104.3721
filenameimages/chihuahua_187.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur105.5227
filenameimages/Siamese_250.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur108.3876
filenameimages/Persian_260.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur111.6988
filenameimages/pomeranian_173.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur115.5061
filenameimages/Bombay_85.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur115.5061
filenameimages/Bombay_200.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
blur115.8939
filenameimages/pomeranian_172.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.stats_gallery(metric='blur')" ] }, { "cell_type": "markdown", "id": "a6808750-d5d7-44bc-a6b0-aa985255407b", "metadata": { "tags": [] }, "source": [ "## Image Clusters\n", "\n", "One of fastdup's coolest feature is visualizing image clusters. In the previous section we saw how to visualize similar image pairs. In this section we group similar looking image (or even duplicates) as a cluster and visualize them in gallery.\n", "\n", "To do so, simply run:\n", "\n" ] }, { "cell_type": "markdown", "id": "2bcd09c1", "metadata": {}, "source": [ "> **Note**: fastdup uses default parameter values when creating image clusters. Depending on your data and use case, the best value may vary. Read more [here](https://visual-layer.readme.io/docs/dataset-cleanup) on how to change parameter values to cluster images." ] }, { "cell_type": "code", "execution_count": 12, "id": "2cc2b317-e92e-4e40-9655-0a6b7c569dfa", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 113.13it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg\n", "Stored components visual view in work_dir/galleries/components.html\n", "Execution time in seconds 0.7\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Components Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Components Report

Showing groups of similar images

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1397
num_images3
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1404
num_images3
mean_distance0.9658
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component21
num_images2
mean_distance0.9681
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3437
num_images2
mean_distance0.9999
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3551
num_images2
mean_distance0.9998
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3550
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3548
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3457
num_images2
mean_distance0.9999
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3450
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3449
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3398
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3417
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3592
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3394
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3036
num_images2
mean_distance0.9999
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2959
num_images2
mean_distance0.9604
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2847
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2845
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3591
num_images2
mean_distance0.9997
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3626
num_images2
mean_distance1.0
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.component_gallery()" ] }, { "cell_type": "markdown", "id": "6c3135e1", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "That's a wrap! In this notebook we showed how you can run fastdup on a dataset or any folder of images. \n", "\n", "We've seen how to use fastdup to find:\n", "\n", "+ Broken images.\n", "+ Duplicate/near-duplicates.\n", "+ Outliers.\n", "+ Dark, bright and blurry images.\n", "+ Image clusters.\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. \n" ] }, { "cell_type": "markdown", "id": "e24081cb", "metadata": {}, "source": [ "\n", "## VL Profiler\n", "If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n", "\n", "[Sign up](https://app.visual-layer.com) now, it's free.\n", "\n", "[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)\n", "\n", "As usual, feedback is welcome! \n", "\n", "Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)." ] }, { "cell_type": "markdown", "id": "47c317f9", "metadata": {}, "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }