{ "cells": [ { "cell_type": "markdown", "id": "6fc7a410", "metadata": {}, "source": [ "[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)" ] }, { "cell_type": "markdown", "id": "SwSYWR4vzk_e", "metadata": { "id": "SwSYWR4vzk_e", "tags": [] }, "source": [ "# Analyzing Image Classification Dataset\n", "\n", "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n", "[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n", "\n", "This notebook shows how you can use [fastdup](https://github.com/visual-layer/fastdup) to analyze an image classification dataset for:\n", "\n", "+ Duplicates\n", "+ Outliers\n", "+ Wrong labels\n", "+ Image clusters\n", "\n", "\n", "> **Note** - No GPU needed! You can run this notebook on a CPU-only instance.\n", "\n" ] }, { "cell_type": "markdown", "id": "bbed0117-e8d1-4df6-b8b7-7bcce10b8655", "metadata": { "tags": [] }, "source": [ "## Installation\n", "\n", "First let's install [fastdup](https://github.com/visual-layer/fastdup) from PyPI with:" ] }, { "cell_type": "code", "execution_count": 1, "id": "506e82b4-a1c2-4262-a326-d0924bb018b6", "metadata": { "id": "506e82b4-a1c2-4262-a326-d0924bb018b6" }, "outputs": [], "source": [ "!pip install -Uq fastdup" ] }, { "cell_type": "markdown", "id": "a5c3a1ab", "metadata": {}, "source": [ "Now, test the installation. If there's no error message, we are ready to go." ] }, { "cell_type": "code", "execution_count": 2, "id": "7f69d8b2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/bin/dpkg\n" ] }, { "data": { "text/plain": [ "'1.26'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886", "metadata": { "id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886", "tags": [] }, "source": [ "## Download Dataset\n", "\n", "We will analyze the [Imagenette](https://github.com/fastai/imagenette) dataset - a subset of 10 easily classified classes from Imagenet (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute)." ] }, { "cell_type": "code", "execution_count": null, "id": "be5b7ca5-34f5-4a0f-b081-2e78be6a425a", "metadata": {}, "outputs": [], "source": [ "!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz\n", "!tar -xf imagenette2-160.tgz" ] }, { "cell_type": "markdown", "id": "f01586fe-db75-4154-aa15-9ea2709c9461", "metadata": { "id": "f01586fe-db75-4154-aa15-9ea2709c9461" }, "source": [ "## Load and Format Annotations" ] }, { "cell_type": "code", "execution_count": 3, "id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91", "metadata": { "executionInfo": { "elapsed": 949, "status": "ok", "timestamp": 1677666765166, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91", "tags": [] }, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 4, "id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85", "metadata": { "executionInfo": { "elapsed": 2, "status": "ok", "timestamp": 1677666768281, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85", "tags": [] }, "outputs": [], "source": [ "data_dir = 'imagenette2-160/'\n", "csv_path = 'imagenette2-160/noisy_imagenette.csv'" ] }, { "cell_type": "code", "execution_count": 5, "id": "2cb91ccb-9cb6-42ba-9489-96182eccc583", "metadata": { "executionInfo": { "elapsed": 2, "status": "ok", "timestamp": 1677666769859, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "2cb91ccb-9cb6-42ba-9489-96182eccc583", "tags": [] }, "outputs": [], "source": [ "label_map = {\n", " 'n02979186': 'cassette_player', \n", " 'n03417042': 'garbage_truck', \n", " 'n01440764': 'tench', \n", " 'n02102040': 'English_springer', \n", " 'n03028079': 'church',\n", " 'n03888257': 'parachute', \n", " 'n03394916': 'French_horn', \n", " 'n03000684': 'chain_saw', \n", " 'n03445777': 'golf_ball', \n", " 'n03425413': 'gas_pump'\n", "}" ] }, { "cell_type": "markdown", "id": "8aba34e1", "metadata": {}, "source": [ "Load the annotations provided with the dataset." ] }, { "cell_type": "code", "execution_count": 6, "id": "e2e90600-b02d-4a2a-a348-7b67157f9129", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "executionInfo": { "elapsed": 2, "status": "ok", "timestamp": 1677666769859, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "e2e90600-b02d-4a2a-a348-7b67157f9129", "outputId": "f9f72c0d-f613-4aac-d29c-3646b2301dcb", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pathnoisy_labels_0noisy_labels_1noisy_labels_5noisy_labels_25noisy_labels_50is_valid
0train/n02979186/n02979186_9036.JPEGn02979186n02979186n02979186n02979186n02979186False
1train/n02979186/n02979186_11957.JPEGn02979186n02979186n02979186n02979186n03000684False
2train/n02979186/n02979186_9715.JPEGn02979186n02979186n02979186n03417042n03000684False
\n", "
" ], "text/plain": [ " path noisy_labels_0 noisy_labels_1 noisy_labels_5 noisy_labels_25 noisy_labels_50 is_valid\n", "0 train/n02979186/n02979186_9036.JPEG n02979186 n02979186 n02979186 n02979186 n02979186 False\n", "1 train/n02979186/n02979186_11957.JPEG n02979186 n02979186 n02979186 n02979186 n03000684 False\n", "2 train/n02979186/n02979186_9715.JPEG n02979186 n02979186 n02979186 n03417042 n03000684 False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_annot = pd.read_csv(csv_path)\n", "df_annot.head(3)" ] }, { "cell_type": "markdown", "id": "dfc957bf", "metadata": {}, "source": [ "Transform the annotations to fastdup supported format.\n", "\n", "fastdup expects an annotation `DataFrame` that contains the following column:\n", "\n", "+ filename - contains the path to the image file\n", "+ label - contains a label of the image\n", "+ split - whether the image is subset of the training, validation or test dataset" ] }, { "cell_type": "code", "execution_count": 7, "id": "473185d1-89f5-4746-b87b-f2b3ef7c445b", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 424 }, "executionInfo": { "elapsed": 1012, "status": "ok", "timestamp": 1677666771201, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "473185d1-89f5-4746-b87b-f2b3ef7c445b", "outputId": "c09c986d-bcef-4545-8ceb-ee5196b40ee6", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamelabelsplit
0imagenette2-160/train/n02979186/n02979186_9036.JPEGcassette_playertrain
1imagenette2-160/train/n02979186/n02979186_11957.JPEGcassette_playertrain
2imagenette2-160/train/n02979186/n02979186_9715.JPEGcassette_playertrain
3imagenette2-160/train/n02979186/n02979186_21736.JPEGcassette_playertrain
4imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEGcassette_playertrain
............
13389imagenette2-160/val/n03425413/n03425413_17521.JPEGgas_pumpval
13390imagenette2-160/val/n03425413/n03425413_20711.JPEGgas_pumpval
13391imagenette2-160/val/n03425413/n03425413_19050.JPEGgas_pumpval
13392imagenette2-160/val/n03425413/n03425413_13831.JPEGgas_pumpval
13393imagenette2-160/val/n03425413/n03425413_1242.JPEGgas_pumpval
\n", "

13394 rows × 3 columns

\n", "
" ], "text/plain": [ " filename label split\n", "0 imagenette2-160/train/n02979186/n02979186_9036.JPEG cassette_player train\n", "1 imagenette2-160/train/n02979186/n02979186_11957.JPEG cassette_player train\n", "2 imagenette2-160/train/n02979186/n02979186_9715.JPEG cassette_player train\n", "3 imagenette2-160/train/n02979186/n02979186_21736.JPEG cassette_player train\n", "4 imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEG cassette_player train\n", "... ... ... ...\n", "13389 imagenette2-160/val/n03425413/n03425413_17521.JPEG gas_pump val\n", "13390 imagenette2-160/val/n03425413/n03425413_20711.JPEG gas_pump val\n", "13391 imagenette2-160/val/n03425413/n03425413_19050.JPEG gas_pump val\n", "13392 imagenette2-160/val/n03425413/n03425413_13831.JPEG gas_pump val\n", "13393 imagenette2-160/val/n03425413/n03425413_1242.JPEG gas_pump val\n", "\n", "[13394 rows x 3 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# take relevant columns\n", "df_annot = df_annot[['path', 'noisy_labels_0']]\n", "\n", "# rename columns to fastdup's column names\n", "df_annot = df_annot.rename({'noisy_labels_0': 'label', 'path': 'filename'}, axis='columns')\n", "\n", "# append datadir\n", "df_annot['filename'] = df_annot['filename'].apply(lambda x: data_dir + x)\n", "\n", "# create split column\n", "df_annot['split'] = df_annot['filename'].apply(lambda x: x.split(\"/\")[1])\n", "\n", "# map label ids to regular labels\n", "df_annot['label'] = df_annot['label'].map(label_map)\n", "\n", "# show formated annotations\n", "df_annot" ] }, { "cell_type": "markdown", "id": "0c648ed1-5016-4230-9873-546eb510b764", "metadata": { "id": "0c648ed1-5016-4230-9873-546eb510b764" }, "source": [ "## Run fastdup\n", "\n", "With the images and annotations ready, we can proceed with running an analysis on the data." ] }, { "cell_type": "markdown", "id": "0a39243e", "metadata": {}, "source": [ "+ `input_dir` is the path to the downloaded images\n", "+ `work_dir` is the path to store the artifacts from the analysis (optional)" ] }, { "cell_type": "code", "execution_count": 8, "id": "92a6e2f9-e60c-44c0-b48a-f7413f7594ae", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n", "FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n", "2023-07-13 19:22:31 [INFO] Going to loop over dir /tmp/tmpqm6imqyr.csv\n", "2023-07-13 19:22:31 [INFO] Found total 13394 images to run on, 13394 train, 0 test, name list 13394, counter 13394 \n", "2023-07-13 19:23:04 [INFO] Found total 13394 images to run onimated: 0 Minutes\n", "Finished histogram 3.121\n", "Finished bucket sort 3.151\n", "2023-07-13 19:23:04 [INFO] 544) Finished write_index() NN model\n", "2023-07-13 19:23:04 [INFO] Stored nn model index file work_dir/nnf.index\n", "2023-07-13 19:23:05 [INFO] Total time took 34024 ms\n", "2023-07-13 19:23:05 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %\n", "2023-07-13 19:23:05 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %\n", "2023-07-13 19:23:05 [INFO] Found a total of 16764 above threshold images (d>0.800), which are 62.58 %\n", "2023-07-13 19:23:05 [INFO] Found a total of 1339 outlier images (d<0.050), which are 5.00 %\n", "2023-07-13 19:23:05 [INFO] Min distance found 0.519 max distance 0.969\n", "2023-07-13 19:23:05 [INFO] Running connected components for ccthreshold 0.900000 \n", ".0\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 13394 images\n", " Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data\n", " Similarity: 3.11% (416) belong to 18 similarity clusters (components).\n", " 96.89% (12,978) images do not belong to any similarity cluster.\n", " Largest cluster has 562 (4.20%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.8, connected component threshold used is 0.9).\n", "\n", " Outliers: 6.24% (836) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = fastdup.create(input_dir=data_dir) \n", "fd.run(annotations=df_annot, ccthreshold=0.9, threshold=0.8)" ] }, { "cell_type": "markdown", "id": "62e35a12-fadd-4b3f-bcab-69e6e67862a4", "metadata": {}, "source": [ "## Outliers\n", "\n", "Visualize outliers from the dataset." ] }, { "cell_type": "code", "execution_count": 9, "id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "executionInfo": { "elapsed": 2658, "status": "ok", "timestamp": 1677667336302, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47", "outputId": "caa992d2-5267-408c-b44a-3a4a66e1ab5f", "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 26723.82it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored outliers visual view in work_dir/galleries/outliers.html\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.523752
Path/train/n03445777/n03445777_5218.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.57066
Path/train/n03888257/n03888257_34639.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.578252
Path/train/n03445777/n03445777_3254.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.58389
Path/val/n03445777/n03445777_5932.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.599957
Path/train/n03888257/n03888257_79145.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.605961
Path/train/n01440764/n01440764_5638.JPEG
labeltench
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.608525
Path/train/n03394916/n03394916_33663.JPEG
labelFrench_horn
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.609527
Path/train/n03888257/n03888257_7793.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.611143
Path/val/n01440764/n01440764_4962.JPEG
labeltench
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.61373
Path/train/n03445777/n03445777_6033.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.61618
Path/train/n03394916/n03394916_37544.JPEG
labelFrench_horn
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.616704
Path/val/n03888257/n03888257_11450.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.616785
Path/val/n03445777/n03445777_9292.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.617952
Path/train/n03888257/n03888257_16223.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.619739
Path/train/n03028079/n03028079_24708.JPEG
labelchurch
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.619787
Path/train/n01440764/ILSVRC2012_val_00037834.JPEG
labeltench
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.620815
Path/train/n03888257/n03888257_5703.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.626412
Path/train/n03445777/n03445777_9199.JPEG
labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.628011
Path/train/n03888257/n03888257_32518.JPEG
labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.630812
Path/train/n02979186/n02979186_10289.JPEG
labelcassette_player
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery()" ] }, { "cell_type": "markdown", "id": "67378b58", "metadata": {}, "source": [ "Show outliers image data." ] }, { "cell_type": "code", "execution_count": 10, "id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 270 }, "executionInfo": { "elapsed": 429, "status": "ok", "timestamp": 1677667331251, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4", "outputId": "b38332f8-7e4e-45de-f7d3-828a52757ec2", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
outliernearestdistancefilename_outlierlabel_outliersplit_outlierindex_xerror_code_outlieris_valid_outlierfd_index_outlierfilename_nearestlabel_nearestsplit_nearestindex_yerror_code_nearestis_valid_nearestfd_index_nearest
08293132170.519030imagenette2-160/train/n03445777/n03445777_5218.JPEGgolf_balltrain8293VALIDTrue8293imagenette2-160/val/n03425413/n03425413_11460.JPEGgas_pumpval13217VALIDTrue13217
1545755000.544795imagenette2-160/train/n03888257/n03888257_34639.JPEGparachutetrain5457VALIDTrue5457imagenette2-160/train/n03888257/n03888257_12053.JPEGparachutetrain5500VALIDTrue5500
2807630160.555266imagenette2-160/train/n03445777/n03445777_3254.JPEGgolf_balltrain8076VALIDTrue8076imagenette2-160/train/n02102040/n02102040_585.JPEGEnglish_springertrain3016VALIDTrue3016
3279045100.568702imagenette2-160/train/n01440764/n01440764_5638.JPEGtenchtrain2790VALIDTrue2790imagenette2-160/train/n03028079/n03028079_6607.JPEGchurchtrain4510VALIDTrue4510
45478117750.582118imagenette2-160/train/n03888257/n03888257_79145.JPEGparachutetrain5478VALIDTrue5478imagenette2-160/val/n03888257/n03888257_8080.JPEGparachuteval11775VALIDTrue11775
\n", "
" ], "text/plain": [ " outlier nearest distance filename_outlier label_outlier split_outlier index_x error_code_outlier is_valid_outlier fd_index_outlier filename_nearest label_nearest split_nearest index_y error_code_nearest is_valid_nearest fd_index_nearest\n", "0 8293 13217 0.519030 imagenette2-160/train/n03445777/n03445777_5218.JPEG golf_ball train 8293 VALID True 8293 imagenette2-160/val/n03425413/n03425413_11460.JPEG gas_pump val 13217 VALID True 13217\n", "1 5457 5500 0.544795 imagenette2-160/train/n03888257/n03888257_34639.JPEG parachute train 5457 VALID True 5457 imagenette2-160/train/n03888257/n03888257_12053.JPEG parachute train 5500 VALID True 5500\n", "2 8076 3016 0.555266 imagenette2-160/train/n03445777/n03445777_3254.JPEG golf_ball train 8076 VALID True 8076 imagenette2-160/train/n02102040/n02102040_585.JPEG English_springer train 3016 VALID True 3016\n", "3 2790 4510 0.568702 imagenette2-160/train/n01440764/n01440764_5638.JPEG tench train 2790 VALID True 2790 imagenette2-160/train/n03028079/n03028079_6607.JPEG church train 4510 VALID True 4510\n", "4 5478 11775 0.582118 imagenette2-160/train/n03888257/n03888257_79145.JPEG parachute train 5478 VALID True 5478 imagenette2-160/val/n03888257/n03888257_8080.JPEG parachute val 11775 VALID True 11775" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.outliers().head(5)" ] }, { "cell_type": "markdown", "id": "bc16596d-899a-45eb-87ca-1d2b96a6ad96", "metadata": {}, "source": [ "## Comparing Labels of Similar Images\n", "Find possible mislabels by comparing a query image to other images in the dataset." ] }, { "cell_type": "code", "execution_count": 11, "id": "4d7cf1b9-c6c0-4b90-b7bb-59ca7bdbdcd7", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 237.91it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored similar images visual view in work_dir/galleries/similarity.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Similarity Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Similarity Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelchurch
from/val/n03028079/n03028079_13002.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800002/train/n03028079/n03028079_3839.JPEGchurch
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelFrench_horn
from/train/n03394916/n03394916_32478.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800012/train/n03394916/n03394916_35573.JPEGFrench_horn
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelcassette_player
from/train/n02979186/n02979186_14524.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.806502/train/n02979186/n02979186_213.JPEGcassette_player
0.800015/val/n02979186/n02979186_11000.JPEGcassette_player
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelcassette_player
from/val/n02979186/n02979186_11000.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.820827/train/n02979186/n02979186_10095.JPEGcassette_player
0.800015/train/n02979186/n02979186_14524.JPEGcassette_player
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labeltench
from/train/n01440764/n01440764_44.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.803563/train/n01440764/n01440764_14249.JPEGtench
0.800023/val/n01440764/n01440764_5490.JPEGtench
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelgarbage_truck
from/train/n03417042/n03417042_3236.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800025/train/n03417042/n03417042_12297.JPEGgarbage_truck
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/train/n03888257/n03888257_20704.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.804987/train/n03888257/n03888257_20473.JPEGparachute
0.800034/train/n03888257/n03888257_8614.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelgas_pump
from/train/n03425413/n03425413_14249.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.810811/val/n03425413/n03425413_20360.JPEGgas_pump
0.800035/train/n03425413/n03425413_719.JPEGgas_pump
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/val/n03888257/n03888257_31790.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.810816/train/n03888257/n03888257_17326.JPEGparachute
0.800036/train/n03888257/n03888257_8199.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/train/n03888257/n03888257_8199.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.834109/train/n03888257/n03888257_17326.JPEGparachute
0.800036/val/n03888257/n03888257_31790.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelchain_saw
from/val/n03000684/n03000684_24542.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.803641/val/n03000684/n03000684_2610.JPEGchain_saw
0.80004/train/n03000684/n03000684_26357.JPEGchain_saw
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelchain_saw
from/val/n03000684/n03000684_17431.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.807598/train/n03000684/n03000684_1034.JPEGchain_saw
0.800068/train/n03000684/n03000684_807.JPEGchain_saw
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelchain_saw
from/train/n03000684/n03000684_807.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.811944/val/n03000684/n03000684_18140.JPEGchain_saw
0.800068/val/n03000684/n03000684_17431.JPEGchain_saw
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelEnglish_springer
from/train/n02102040/n02102040_139.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.841122/train/n02102040/n02102040_2528.JPEGEnglish_springer
0.800071/val/n02102040/n02102040_1121.JPEGEnglish_springer
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/train/n03888257/n03888257_38633.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800073/train/n03888257/n03888257_12816.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/train/n03888257/n03888257_12816.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800073/train/n03888257/n03888257_38633.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelparachute
from/val/n03888257/n03888257_66961.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.805559/val/n03888257/n03888257_13410.JPEGparachute
0.800073/val/n03888257/n03888257_3142.JPEGparachute
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelchurch
from/train/n03028079/n03028079_17175.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.806021/train/n03028079/n03028079_12685.JPEGchurch
0.800076/train/n03028079/n03028079_23514.JPEGchurch
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelgolf_ball
from/val/n03445777/n03445777_6350.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.806152/train/n03445777/n03445777_2468.JPEGgolf_ball
0.800086/val/n03445777/n03445777_7480.JPEGgolf_ball
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
labelcassette_player
from/train/n02979186/n02979186_10666.JPEG
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.800088/train/n02979186/n02979186_2383.JPEGcassette_player
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtolabellabel2distance
7505imagenette2-160/val/n03028079/n03028079_13002.JPEG[imagenette2-160/train/n03028079/n03028079_3839.JPEG][church][church][0.800002]
3429imagenette2-160/train/n03394916/n03394916_32478.JPEG[imagenette2-160/train/n03394916/n03394916_35573.JPEG][French_horn][French_horn][0.800012]
1700imagenette2-160/train/n02979186/n02979186_14524.JPEG[imagenette2-160/val/n02979186/n02979186_11000.JPEG, imagenette2-160/train/n02979186/n02979186_213.JPEG][cassette_player, cassette_player][cassette_player, cassette_player][0.800015, 0.806502]
7055imagenette2-160/val/n02979186/n02979186_11000.JPEG[imagenette2-160/train/n02979186/n02979186_14524.JPEG, imagenette2-160/train/n02979186/n02979186_10095.JPEG][cassette_player, cassette_player][cassette_player, cassette_player][0.800015, 0.820827]
471imagenette2-160/train/n01440764/n01440764_44.JPEG[imagenette2-160/val/n01440764/n01440764_5490.JPEG, imagenette2-160/train/n01440764/n01440764_14249.JPEG][tench, tench][tench, tench][0.800023, 0.803563]
..................
870imagenette2-160/train/n02102040/n02102040_1306.JPEG[imagenette2-160/train/n02102040/n02102040_876.JPEG, imagenette2-160/train/n02102040/n02102040_3114.JPEG][English_springer, English_springer][English_springer, English_springer][0.936799, 0.949252]
1050imagenette2-160/train/n02102040/n02102040_3114.JPEG[imagenette2-160/train/n02102040/n02102040_1055.JPEG, imagenette2-160/train/n02102040/n02102040_1306.JPEG][English_springer, English_springer][English_springer, English_springer][0.941953, 0.949252]
231imagenette2-160/train/n01440764/n01440764_13978.JPEG[imagenette2-160/val/n01440764/n01440764_6341.JPEG, imagenette2-160/val/n01440764/n01440764_8210.JPEG][tench, tench][tench, tench][0.943767, 0.945909]
6846imagenette2-160/val/n02102040/n02102040_350.JPEG[imagenette2-160/val/n02102040/n02102040_312.JPEG, imagenette2-160/train/n02102040/n02102040_6313.JPEG][English_springer, English_springer][English_springer, English_springer][0.945413, 0.947323]
1339imagenette2-160/train/n02102040/n02102040_6313.JPEG[imagenette2-160/val/n02102040/n02102040_350.JPEG, imagenette2-160/train/n02102040/n02102040_3767.JPEG][English_springer, English_springer][English_springer, English_springer][0.947323, 0.950174]
\n", "

9069 rows × 5 columns

\n", "
" ], "text/plain": [ " from to label label2 distance\n", "7505 imagenette2-160/val/n03028079/n03028079_13002.JPEG [imagenette2-160/train/n03028079/n03028079_3839.JPEG] [church] [church] [0.800002]\n", "3429 imagenette2-160/train/n03394916/n03394916_32478.JPEG [imagenette2-160/train/n03394916/n03394916_35573.JPEG] [French_horn] [French_horn] [0.800012]\n", "1700 imagenette2-160/train/n02979186/n02979186_14524.JPEG [imagenette2-160/val/n02979186/n02979186_11000.JPEG, imagenette2-160/train/n02979186/n02979186_213.JPEG] [cassette_player, cassette_player] [cassette_player, cassette_player] [0.800015, 0.806502]\n", "7055 imagenette2-160/val/n02979186/n02979186_11000.JPEG [imagenette2-160/train/n02979186/n02979186_14524.JPEG, imagenette2-160/train/n02979186/n02979186_10095.JPEG] [cassette_player, cassette_player] [cassette_player, cassette_player] [0.800015, 0.820827]\n", "471 imagenette2-160/train/n01440764/n01440764_44.JPEG [imagenette2-160/val/n01440764/n01440764_5490.JPEG, imagenette2-160/train/n01440764/n01440764_14249.JPEG] [tench, tench] [tench, tench] [0.800023, 0.803563]\n", "... ... ... ... ... ...\n", "870 imagenette2-160/train/n02102040/n02102040_1306.JPEG [imagenette2-160/train/n02102040/n02102040_876.JPEG, imagenette2-160/train/n02102040/n02102040_3114.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.936799, 0.949252]\n", "1050 imagenette2-160/train/n02102040/n02102040_3114.JPEG [imagenette2-160/train/n02102040/n02102040_1055.JPEG, imagenette2-160/train/n02102040/n02102040_1306.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.941953, 0.949252]\n", "231 imagenette2-160/train/n01440764/n01440764_13978.JPEG [imagenette2-160/val/n01440764/n01440764_6341.JPEG, imagenette2-160/val/n01440764/n01440764_8210.JPEG] [tench, tench] [tench, tench] [0.943767, 0.945909]\n", "6846 imagenette2-160/val/n02102040/n02102040_350.JPEG [imagenette2-160/val/n02102040/n02102040_312.JPEG, imagenette2-160/train/n02102040/n02102040_6313.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.945413, 0.947323]\n", "1339 imagenette2-160/train/n02102040/n02102040_6313.JPEG [imagenette2-160/val/n02102040/n02102040_350.JPEG, imagenette2-160/train/n02102040/n02102040_3767.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.947323, 0.950174]\n", "\n", "[9069 rows x 5 columns]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.similarity_gallery() " ] }, { "cell_type": "markdown", "id": "c2c393be-2b42-4814-8688-03d2be9e8998", "metadata": {}, "source": [ "## Similar Image Pairs\n", "\n", "Find similar image pairs within and across the train and validation subfolders. Pairs may include train-train, train-val, val-train, and val-val." ] }, { "cell_type": "code", "execution_count": 12, "id": "9e065403-582b-4f94-855b-33fd8f4826a1", "metadata": { "scrolled": false, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n", "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n", "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 437.97it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Stored similarity visual view in work_dir/galleries/duplicates.html\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.968786
From/val/n03394916/n03394916_30631.JPEG
To/train/n03394916/n03394916_44127.JPEG
From_LabelFrench_horn
To_LabelFrench_horn
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.962458
From/train/n03445777/n03445777_13918.JPEG
To/val/n03445777/n03445777_6882.JPEG
From_Labelgolf_ball
To_Labelgolf_ball
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.953837
From/train/n02102040/n02102040_1564.JPEG
To/train/n02102040/n02102040_3837.JPEG
From_LabelEnglish_springer
To_LabelEnglish_springer
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.953413
From/train/n01440764/n01440764_7457.JPEG
To/train/n01440764/n01440764_11339.JPEG
From_Labeltench
To_Labeltench
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.952239
From/train/n03417042/n03417042_12906.JPEG
To/train/n03417042/n03417042_1578.JPEG
From_Labelgarbage_truck
To_Labelgarbage_truck
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.951679
From/val/n03394916/n03394916_6830.JPEG
To/val/n03394916/n03394916_21092.JPEG
From_LabelFrench_horn
To_LabelFrench_horn
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.950477
From/train/n03888257/n03888257_21027.JPEG
To/val/n03888257/n03888257_11210.JPEG
From_Labelparachute
To_Labelparachute
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.950174
From/train/n02102040/n02102040_3767.JPEG
To/train/n02102040/n02102040_6313.JPEG
From_LabelEnglish_springer
To_LabelEnglish_springer
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.949877
From/train/n02102040/ILSVRC2012_val_00032959.JPEG
To/val/n02102040/n02102040_662.JPEG
From_LabelEnglish_springer
To_LabelEnglish_springer
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.949252
From/train/n02102040/n02102040_3114.JPEG
To/train/n02102040/n02102040_1306.JPEG
From_LabelEnglish_springer
To_LabelEnglish_springer
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "markdown", "id": "e10989e1", "metadata": {}, "source": [ "Show similar image pairs." ] }, { "cell_type": "code", "execution_count": 13, "id": "3ea590e9-d221-4202-b03b-e5fef4487c89", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 270 }, "executionInfo": { "elapsed": 499, "status": "ok", "timestamp": 1677667342908, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "3ea590e9-d221-4202-b03b-e5fef4487c89", "outputId": "3c5f4cc0-0ba5-42a0-e01b-f165e9cf655c", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtodistancefilename_fromlabel_fromsplit_fromindex_xerror_code_fromis_valid_fromfd_index_fromfilename_tolabel_tosplit_toindex_yerror_code_tois_valid_tofd_index_to
01196059250.968786imagenette2-160/val/n03394916/n03394916_30631.JPEGFrench_hornval11960VALIDTrue11960imagenette2-160/train/n03394916/n03394916_44127.JPEGFrench_horntrain5925VALIDTrue5925
15925119600.968786imagenette2-160/train/n03394916/n03394916_44127.JPEGFrench_horntrain5925VALIDTrue5925imagenette2-160/val/n03394916/n03394916_30631.JPEGFrench_hornval11960VALIDTrue11960
21261379160.962458imagenette2-160/val/n03445777/n03445777_6882.JPEGgolf_ballval12613VALIDTrue12613imagenette2-160/train/n03445777/n03445777_13918.JPEGgolf_balltrain7916VALIDTrue7916
37916126130.962458imagenette2-160/train/n03445777/n03445777_13918.JPEGgolf_balltrain7916VALIDTrue7916imagenette2-160/val/n03445777/n03445777_6882.JPEGgolf_ballval12613VALIDTrue12613
4346434860.953837imagenette2-160/train/n02102040/n02102040_3837.JPEGEnglish_springertrain3464VALIDTrue3464imagenette2-160/train/n02102040/n02102040_1564.JPEGEnglish_springertrain3486VALIDTrue3486
\n", "
" ], "text/plain": [ " from to distance filename_from label_from split_from index_x error_code_from is_valid_from fd_index_from filename_to label_to split_to index_y error_code_to is_valid_to fd_index_to\n", "0 11960 5925 0.968786 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11960 VALID True 11960 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5925 VALID True 5925\n", "1 5925 11960 0.968786 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5925 VALID True 5925 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11960 VALID True 11960\n", "2 12613 7916 0.962458 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12613 VALID True 12613 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7916 VALID True 7916\n", "3 7916 12613 0.962458 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7916 VALID True 7916 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12613 VALID True 12613\n", "4 3464 3486 0.953837 imagenette2-160/train/n02102040/n02102040_3837.JPEG English_springer train 3464 VALID True 3464 imagenette2-160/train/n02102040/n02102040_1564.JPEG English_springer train 3486 VALID True 3486" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.similarity().head(5)" ] }, { "cell_type": "markdown", "id": "95d21e6d-a951-48dd-8c4c-894c8ba556fd", "metadata": {}, "source": [ "## Image Clusters" ] }, { "cell_type": "code", "execution_count": 14, "id": "4a6db529-cb1e-4655-af50-d97f3e131319", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000, "output_embedded_package_id": "1Wh1vmG-F-RG0ZYZP1oRgiyqHAtnfsuEk" }, "executionInfo": { "elapsed": 6376, "status": "ok", "timestamp": 1677667352994, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "4a6db529-cb1e-4655-af50-d97f3e131319", "outputId": "adfc3ee1-84c9-4aa6-a0db-09a6a800b566", "scrolled": false, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "cassette_player\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 68.44it/s]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg\n", "Stored components visual view in work_dir/galleries/components.html\n", "Execution time in seconds 1.5\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Components Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Components Report

Showing groups of similar images

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1894
num_images161
mean_distance0.9001
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
tench54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2812
num_images70
mean_distance0.9004
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
English_springer54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component7313
num_images69
mean_distance0.9001
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
golf_ball54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1072
num_images21
mean_distance0.9001
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
garbage_truck21
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component5498
num_images13
mean_distance0.9004
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
French_horn13
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component994
num_images12
mean_distance0.9025
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
garbage_truck12
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1391
num_images10
mean_distance0.9
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
garbage_truck10
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component5644
num_images8
mean_distance0.902
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
French_horn8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1315
num_images8
mean_distance0.9041
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
garbage_truck8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2781
num_images8
mean_distance0.9062
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
English_springer8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component984
num_images7
mean_distance0.9064
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
garbage_truck7
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3034
num_images6
mean_distance0.903
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
English_springer6
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component5639
num_images6
mean_distance0.902
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
French_horn6
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1951
num_images5
mean_distance0.9
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
tench5
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component7294
num_images5
mean_distance0.9019
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
golf_ball5
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4921
num_images5
mean_distance0.9004
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
parachute5
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component5548
num_images5
mean_distance0.9043
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
French_horn5
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component100
num_images5
mean_distance0.9011
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
cassette_player5
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component7292
num_images4
mean_distance0.9021
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
golf_ball4
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2143
num_images4
mean_distance0.9001
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
tench4
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.component_gallery()" ] }, { "cell_type": "markdown", "id": "ca5d4b6e-7ff6-49b8-b487-6ba1573ab104", "metadata": {}, "source": [ "You can also visualize clusters with specific labels using the `slice` parameter. For example let's visualize clusters with the `chain_saw` label" ] }, { "cell_type": "code", "execution_count": 15, "id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000, "output_embedded_package_id": "1xYIrPsODG8kAMaZOpGeKNRoa4-HjPC-w" }, "executionInfo": { "elapsed": 5130, "status": "ok", "timestamp": 1677667368207, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1", "outputId": "131d0f11-5627-4beb-b58c-3801e09a3b42", "scrolled": false, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "chain_saw\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 449.14it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg\n", "Stored components visual view in work_dir/galleries/components.html\n", "Execution time in seconds 0.2\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Components Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Components Report

Showing groups of similar images, for label: chain_saw

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6981
num_images3
mean_distance0.9064
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw3
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6421
num_images2
mean_distance0.9222
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6478
num_images2
mean_distance0.9355
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6621
num_images2
mean_distance0.9029
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6766
num_images2
mean_distance0.9208
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6831
num_images2
mean_distance0.9198
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6862
num_images2
mean_distance0.9139
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6901
num_images2
mean_distance0.9073
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component7033
num_images2
mean_distance0.9345
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component7067
num_images2
mean_distance0.9192
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component11637
num_images2
mean_distance0.9039
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", "
Label
chain_saw2
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.component_gallery(slice='chain_saw')" ] }, { "cell_type": "markdown", "id": "28498d81-d073-4f3d-baa4-732e1df93a34", "metadata": {}, "source": [ "## Connected Components" ] }, { "cell_type": "code", "execution_count": 16, "id": "0346be91-5380-48b9-a8df-074c342efcd3", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "executionInfo": { "elapsed": 1036, "status": "ok", "timestamp": 1677667380699, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "0346be91-5380-48b9-a8df-074c342efcd3", "outputId": "ffa6bd9d-b5b3-4ed5-86e1-c47ca9658667", "tags": [] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
indexcomponent_idsumcountmean_distancemin_distancemax_distancefilenamelabelspliterror_codeis_validfd_index
17923551894513.6729562.00.9140.90010.9534imagenette2-160/train/n01440764/n01440764_8673.JPEGtenchtrainVALIDTrue2355
14321471894513.6729562.00.9140.90010.9534imagenette2-160/train/n01440764/n01440764_5658.JPEGtenchtrainVALIDTrue2147
14521501894513.6729562.00.9140.90010.9534imagenette2-160/train/n01440764/n01440764_10726.JPEGtenchtrainVALIDTrue2150
14621741894513.6729562.00.9140.90010.9534imagenette2-160/train/n01440764/n01440764_6974.JPEGtenchtrainVALIDTrue2174
14721771894513.6729562.00.9140.90010.9534imagenette2-160/train/n01440764/n01440764_14294.JPEGtenchtrainVALIDTrue2177
\n", "
" ], "text/plain": [ " index component_id sum count mean_distance min_distance max_distance filename label split error_code is_valid fd_index\n", "179 2355 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_8673.JPEG tench train VALID True 2355\n", "143 2147 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_5658.JPEG tench train VALID True 2147\n", "145 2150 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_10726.JPEG tench train VALID True 2150\n", "146 2174 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_6974.JPEG tench train VALID True 2174\n", "147 2177 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_14294.JPEG tench train VALID True 2177" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cc_df, _ = fd.connected_components()\n", "cc_df.sort_values('count', ascending=False).head(5)" ] }, { "cell_type": "markdown", "id": "569cb878", "metadata": {}, "source": [ "We can also get metadata for individual images using their `fastdup_id` available in `fd.annotations()`" ] }, { "cell_type": "code", "execution_count": 17, "id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "executionInfo": { "elapsed": 990, "status": "ok", "timestamp": 1677667384644, "user": { "displayName": "Tom Shani", "userId": "00667426488827942961" }, "user_tz": -120 }, "id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8", "outputId": "4f973aba-572d-4e50-d22d-c5bfc8cf3d2d", "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'filename': 'imagenette2-160/train/n02979186/n02979186_2819.JPEG',\n", " 'label': 'cassette_player',\n", " 'split': 'train',\n", " 'index': 349,\n", " 'error_code': 'VALID',\n", " 'is_valid': True,\n", " 'fd_index': 349}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd[349]" ] }, { "cell_type": "markdown", "id": "b059951d", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. " ] }, { "cell_type": "markdown", "id": "47cf9410", "metadata": {}, "source": [ "\n", "## VL Profiler\n", "If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n", "\n", "[Sign up](https://app.visual-layer.com) now, it's free.\n", "\n", "[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)\n", "\n", "As usual, feedback is welcome! \n", "\n", "Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)." ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }