{
"cells": [
{
"cell_type": "markdown",
"id": "SwSYWR4vzk_e",
"metadata": {
"id": "SwSYWR4vzk_e",
"tags": []
},
"source": [
"# Analyzing Image Classification Dataset\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n",
"[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n",
"\n",
"This notebook shows how you can use [fastdup](https://github.com/visual-layer/fastdup) to analyze an image classification dataset for:\n",
"\n",
"+ Duplicates.\n",
"+ Outliers.\n",
"+ Wrong labels.\n",
"+ Image clusters.\n",
"\n",
"If you're new, run the notebook in Google Colab or Kaggle for free.\n",
"\n",
"> **Note** - No GPU needed! You can run on an instance with only CPU.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "bbed0117-e8d1-4df6-b8b7-7bcce10b8655",
"metadata": {
"tags": []
},
"source": [
"## Installation\n",
"\n",
"First let's install [fastdup](https://github.com/visual-layer/fastdup) from PyPI with:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "506e82b4-a1c2-4262-a326-d0924bb018b6",
"metadata": {
"id": "506e82b4-a1c2-4262-a326-d0924bb018b6"
},
"outputs": [],
"source": [
"!pip install -Uqq fastdup"
]
},
{
"cell_type": "markdown",
"id": "a5c3a1ab",
"metadata": {},
"source": [
"Now, test the installation. If there's no error message, we are ready to go."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7f69d8b2",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'0.930'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import fastdup\n",
"fastdup.__version__"
]
},
{
"cell_type": "markdown",
"id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886",
"metadata": {
"id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886",
"tags": []
},
"source": [
"## Download Dataset\n",
"\n",
"We will analyze the [Imagenette](https://github.com/fastai/imagenette) dataset - a subset of 10 easily classified classes from Imagenet (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute)."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "be5b7ca5-34f5-4a0f-b081-2e78be6a425a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--2023-05-16 08:53:02-- https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz\n",
"Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.216.251.158, 52.217.83.238, 52.217.96.150, ...\n",
"Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.216.251.158|:443... connected.\n",
"HTTP request sent, awaiting response... 200 OK\n",
"Length: 99003388 (94M) [application/x-tar]\n",
"Saving to: ‘imagenette2-160.tgz.1’\n",
"\n",
"imagenette2-160.tgz 100%[===================>] 94.42M 46.0MB/s in 2.1s \n",
"\n",
"2023-05-16 08:53:04 (46.0 MB/s) - ‘imagenette2-160.tgz.1’ saved [99003388/99003388]\n",
"\n"
]
}
],
"source": [
"!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz\n",
"!tar -xf imagenette2-160.tgz"
]
},
{
"cell_type": "markdown",
"id": "f01586fe-db75-4154-aa15-9ea2709c9461",
"metadata": {
"id": "f01586fe-db75-4154-aa15-9ea2709c9461"
},
"source": [
"## Load and Format Annotations"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91",
"metadata": {
"executionInfo": {
"elapsed": 949,
"status": "ok",
"timestamp": 1677666765166,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91",
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85",
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666768281,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85",
"tags": []
},
"outputs": [],
"source": [
"data_dir = 'imagenette2-160/'\n",
"csv_path = 'imagenette2-160/noisy_imagenette.csv'"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "2cb91ccb-9cb6-42ba-9489-96182eccc583",
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666769859,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "2cb91ccb-9cb6-42ba-9489-96182eccc583",
"tags": []
},
"outputs": [],
"source": [
"label_map = {\n",
" 'n02979186': 'cassette_player', \n",
" 'n03417042': 'garbage_truck', \n",
" 'n01440764': 'tench', \n",
" 'n02102040': 'English_springer', \n",
" 'n03028079': 'church',\n",
" 'n03888257': 'parachute', \n",
" 'n03394916': 'French_horn', \n",
" 'n03000684': 'chain_saw', \n",
" 'n03445777': 'golf_ball', \n",
" 'n03425413': 'gas_pump'\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "8aba34e1",
"metadata": {},
"source": [
"Load the annotation provided with the dataset."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e2e90600-b02d-4a2a-a348-7b67157f9129",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
},
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666769859,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "e2e90600-b02d-4a2a-a348-7b67157f9129",
"outputId": "f9f72c0d-f613-4aac-d29c-3646b2301dcb",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" path | \n",
" noisy_labels_0 | \n",
" noisy_labels_1 | \n",
" noisy_labels_5 | \n",
" noisy_labels_25 | \n",
" noisy_labels_50 | \n",
" is_valid | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" train/n02979186/n02979186_9036.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" train/n02979186/n02979186_11957.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n03000684 | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" train/n02979186/n02979186_9715.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n03417042 | \n",
" n03000684 | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" path noisy_labels_0 noisy_labels_1 noisy_labels_5 noisy_labels_25 noisy_labels_50 is_valid\n",
"0 train/n02979186/n02979186_9036.JPEG n02979186 n02979186 n02979186 n02979186 n02979186 False\n",
"1 train/n02979186/n02979186_11957.JPEG n02979186 n02979186 n02979186 n02979186 n03000684 False\n",
"2 train/n02979186/n02979186_9715.JPEG n02979186 n02979186 n02979186 n03417042 n03000684 False"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_annot = pd.read_csv(csv_path)\n",
"df_annot.head(3)"
]
},
{
"cell_type": "markdown",
"id": "dfc957bf",
"metadata": {},
"source": [
"Transform the annotation to fastdup supported format.\n",
"\n",
"fastdup expects an annotation `DataFrame` that contains the following column:\n",
"\n",
"+ filename - contains the path to the image file.\n",
"+ label - contains a label of the image.\n",
"+ split - whether the image is subset of the training, validation or test dataset."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "473185d1-89f5-4746-b87b-f2b3ef7c445b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"executionInfo": {
"elapsed": 1012,
"status": "ok",
"timestamp": 1677666771201,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "473185d1-89f5-4746-b87b-f2b3ef7c445b",
"outputId": "c09c986d-bcef-4545-8ceb-ee5196b40ee6",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" label | \n",
" split | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" imagenette2-160/train/n02979186/n02979186_9036.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 1 | \n",
" imagenette2-160/train/n02979186/n02979186_11957.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 2 | \n",
" imagenette2-160/train/n02979186/n02979186_9715.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 3 | \n",
" imagenette2-160/train/n02979186/n02979186_21736.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 4 | \n",
" imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 13389 | \n",
" imagenette2-160/val/n03425413/n03425413_17521.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13390 | \n",
" imagenette2-160/val/n03425413/n03425413_20711.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13391 | \n",
" imagenette2-160/val/n03425413/n03425413_19050.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13392 | \n",
" imagenette2-160/val/n03425413/n03425413_13831.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13393 | \n",
" imagenette2-160/val/n03425413/n03425413_1242.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
"
\n",
"
13394 rows × 3 columns
\n",
"
"
],
"text/plain": [
" filename label split\n",
"0 imagenette2-160/train/n02979186/n02979186_9036.JPEG cassette_player train\n",
"1 imagenette2-160/train/n02979186/n02979186_11957.JPEG cassette_player train\n",
"2 imagenette2-160/train/n02979186/n02979186_9715.JPEG cassette_player train\n",
"3 imagenette2-160/train/n02979186/n02979186_21736.JPEG cassette_player train\n",
"4 imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEG cassette_player train\n",
"... ... ... ...\n",
"13389 imagenette2-160/val/n03425413/n03425413_17521.JPEG gas_pump val\n",
"13390 imagenette2-160/val/n03425413/n03425413_20711.JPEG gas_pump val\n",
"13391 imagenette2-160/val/n03425413/n03425413_19050.JPEG gas_pump val\n",
"13392 imagenette2-160/val/n03425413/n03425413_13831.JPEG gas_pump val\n",
"13393 imagenette2-160/val/n03425413/n03425413_1242.JPEG gas_pump val\n",
"\n",
"[13394 rows x 3 columns]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# take relevant columns\n",
"df_annot = df_annot[['path', 'noisy_labels_0']]\n",
"\n",
"# rename columns to fastdup's column names\n",
"df_annot = df_annot.rename({'noisy_labels_0': 'label', 'path': 'filename'}, axis='columns')\n",
"\n",
"# append datadir\n",
"df_annot['filename'] = df_annot['filename'].apply(lambda x: data_dir + x)\n",
"\n",
"# create split column\n",
"df_annot['split'] = df_annot['filename'].apply(lambda x: x.split(\"/\")[1])\n",
"\n",
"# map label ids to regular labels\n",
"df_annot['label'] = df_annot['label'].map(label_map)\n",
"\n",
"# show formated annotations\n",
"df_annot"
]
},
{
"cell_type": "markdown",
"id": "0c648ed1-5016-4230-9873-546eb510b764",
"metadata": {
"id": "0c648ed1-5016-4230-9873-546eb510b764"
},
"source": [
"## Run fastdup\n",
"\n",
"With the images and annotations, we are now ready to run an analysis."
]
},
{
"cell_type": "markdown",
"id": "0a39243e",
"metadata": {},
"source": [
"+ `work_dir` is the path to store the artifacts from the analysis.\n",
"\n",
"+ `input_dir` is the path to the downloaded images."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "92a6e2f9-e60c-44c0-b48a-f7413f7594ae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n",
"2023-05-16 08:53:06 [INFO] Going to loop over dir imagenette2-160\n",
"2023-05-16 08:53:06 [INFO] Found total 13394 images to run on, 13394 train, 0 test, name list 13394, counter 13394 \n",
"2023-05-16 08:53:20 [INFO] Found total 13394 images to run onimated: 0 Minutes\n",
"Finished histogram 7.122\n",
"Finished bucket sort 7.177\n",
"2023-05-16 08:53:20 [INFO] 309) Finished write_index() NN model\n",
"2023-05-16 08:53:20 [INFO] Stored nn model index file fastdup_imagenette/nnf.index\n",
"2023-05-16 08:53:21 [INFO] Total time took 14601 ms\n",
"2023-05-16 08:53:21 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %\n",
"2023-05-16 08:53:21 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %\n",
"2023-05-16 08:53:21 [INFO] Found a total of 16757 above threshold images (d>0.800), which are 62.55 %\n",
"2023-05-16 08:53:21 [INFO] Found a total of 1339 outlier images (d<0.050), which are 5.00 %\n",
"2023-05-16 08:53:21 [INFO] Min distance found 0.476 max distance 0.969\n",
"2023-05-16 08:53:21 [INFO] Running connected components for ccthreshold 0.900000 \n",
".0\n",
" ########################################################################################\n",
"\n",
"Dataset Analysis Summary: \n",
"\n",
" Dataset contains 13394 images\n",
" Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data\n",
" Similarity: 3.11% (416) belong to 19 similarity clusters (components).\n",
" 96.89% (12,978) images do not belong to any similarity cluster.\n",
" Largest cluster has 566 (4.23%) images.\n",
" For a detailed analysis, use `.connected_components()`\n",
"(similarity threshold used is 0.8, connected component threshold used is 0.9).\n",
"\n",
" Outliers: 6.23% (835) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n",
" For a detailed list of outliers, use `.outliers()`.\n"
]
}
],
"source": [
"work_dir = 'fastdup_imagenette'\n",
"\n",
"fd = fastdup.create(work_dir=work_dir, input_dir=data_dir) \n",
"fd.run(annotations=df_annot, ccthreshold=0.9, threshold=0.8)"
]
},
{
"cell_type": "markdown",
"id": "62e35a12-fadd-4b3f-bcab-69e6e67862a4",
"metadata": {},
"source": [
"## Outliers\n",
"\n",
"Visualize outliers from the dataset."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"executionInfo": {
"elapsed": 2658,
"status": "ok",
"timestamp": 1677667336302,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47",
"outputId": "caa992d2-5267-408c-b44a-3a4a66e1ab5f",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 9642.08it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored outliers visual view in fastdup_imagenette/galleries/outliers.html\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Outliers Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Outliers Report
Showing image outliers, one per row
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.489022 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n02979186/n02979186_3967.JPEG | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.51468 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_5218.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.541967 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03417042/n03417042_5301.JPEG | \n",
"
\n",
"\n",
" label | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.57066 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_34639.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.578252 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_3254.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.58389 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03445777/n03445777_5932.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.590838 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n02102040/n02102040_7670.JPEG | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.609527 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_7793.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.611143 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n01440764/n01440764_4962.JPEG | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.61373 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_6033.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.61618 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03394916/n03394916_37544.JPEG | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.616785 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03445777/n03445777_9292.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.617952 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_16223.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.619739 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03028079/n03028079_24708.JPEG | \n",
"
\n",
"\n",
" label | \n",
" church | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.619768 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_79145.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.620815 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_5703.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.625504 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03394916/n03394916_33663.JPEG | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.626412 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_9199.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.630812 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n02979186/n02979186_10289.JPEG | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.631131 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_75495.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fd.vis.outliers_gallery()"
]
},
{
"cell_type": "markdown",
"id": "67378b58",
"metadata": {},
"source": [
"Show outliers image data."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 270
},
"executionInfo": {
"elapsed": 429,
"status": "ok",
"timestamp": 1677667331251,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4",
"outputId": "b38332f8-7e4e-45de-f7d3-828a52757ec2",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" outlier | \n",
" nearest | \n",
" distance | \n",
" filename_outlier | \n",
" label_outlier | \n",
" split_outlier | \n",
" index_x | \n",
" error_code_outlier | \n",
" is_valid_outlier | \n",
" fd_index_outlier | \n",
" filename_nearest | \n",
" label_nearest | \n",
" split_nearest | \n",
" index_y | \n",
" error_code_nearest | \n",
" is_valid_nearest | \n",
" fd_index_nearest | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2664 | \n",
" 9763 | \n",
" 0.476124 | \n",
" imagenette2-160/train/n02979186/n02979186_3967.JPEG | \n",
" cassette_player | \n",
" train | \n",
" 2664 | \n",
" VALID | \n",
" True | \n",
" 2664 | \n",
" imagenette2-160/val/n01440764/n01440764_710.JPEG | \n",
" tench | \n",
" val | \n",
" 9763 | \n",
" VALID | \n",
" True | \n",
" 9763 | \n",
"
\n",
" \n",
" 1 | \n",
" 8150 | \n",
" 7831 | \n",
" 0.514680 | \n",
" imagenette2-160/train/n03445777/n03445777_5218.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 8150 | \n",
" VALID | \n",
" True | \n",
" 8150 | \n",
" imagenette2-160/train/n03445777/n03445777_18756.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7831 | \n",
" VALID | \n",
" True | \n",
" 7831 | \n",
"
\n",
" \n",
" 2 | \n",
" 12076 | \n",
" 956 | \n",
" 0.539276 | \n",
" imagenette2-160/val/n03417042/n03417042_5301.JPEG | \n",
" garbage_truck | \n",
" val | \n",
" 12076 | \n",
" VALID | \n",
" True | \n",
" 12076 | \n",
" imagenette2-160/train/n01440764/n01440764_9898.JPEG | \n",
" tench | \n",
" train | \n",
" 956 | \n",
" VALID | \n",
" True | \n",
" 956 | \n",
"
\n",
" \n",
" 3 | \n",
" 9087 | \n",
" 8628 | \n",
" 0.544795 | \n",
" imagenette2-160/train/n03888257/n03888257_34639.JPEG | \n",
" parachute | \n",
" train | \n",
" 9087 | \n",
" VALID | \n",
" True | \n",
" 9087 | \n",
" imagenette2-160/train/n03888257/n03888257_12053.JPEG | \n",
" parachute | \n",
" train | \n",
" 8628 | \n",
" VALID | \n",
" True | \n",
" 8628 | \n",
"
\n",
" \n",
" 4 | \n",
" 7966 | \n",
" 1630 | \n",
" 0.555266 | \n",
" imagenette2-160/train/n03445777/n03445777_3254.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7966 | \n",
" VALID | \n",
" True | \n",
" 7966 | \n",
" imagenette2-160/train/n02102040/n02102040_585.JPEG | \n",
" English_springer | \n",
" train | \n",
" 1630 | \n",
" VALID | \n",
" True | \n",
" 1630 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" outlier nearest distance filename_outlier label_outlier split_outlier index_x error_code_outlier is_valid_outlier fd_index_outlier filename_nearest label_nearest split_nearest index_y error_code_nearest is_valid_nearest fd_index_nearest\n",
"0 2664 9763 0.476124 imagenette2-160/train/n02979186/n02979186_3967.JPEG cassette_player train 2664 VALID True 2664 imagenette2-160/val/n01440764/n01440764_710.JPEG tench val 9763 VALID True 9763\n",
"1 8150 7831 0.514680 imagenette2-160/train/n03445777/n03445777_5218.JPEG golf_ball train 8150 VALID True 8150 imagenette2-160/train/n03445777/n03445777_18756.JPEG golf_ball train 7831 VALID True 7831\n",
"2 12076 956 0.539276 imagenette2-160/val/n03417042/n03417042_5301.JPEG garbage_truck val 12076 VALID True 12076 imagenette2-160/train/n01440764/n01440764_9898.JPEG tench train 956 VALID True 956\n",
"3 9087 8628 0.544795 imagenette2-160/train/n03888257/n03888257_34639.JPEG parachute train 9087 VALID True 9087 imagenette2-160/train/n03888257/n03888257_12053.JPEG parachute train 8628 VALID True 8628\n",
"4 7966 1630 0.555266 imagenette2-160/train/n03445777/n03445777_3254.JPEG golf_ball train 7966 VALID True 7966 imagenette2-160/train/n02102040/n02102040_585.JPEG English_springer train 1630 VALID True 1630"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.outliers().head(5)"
]
},
{
"cell_type": "markdown",
"id": "bc16596d-899a-45eb-87ca-1d2b96a6ad96",
"metadata": {},
"source": [
"## Comparing Labels of Similar Images\n",
"Find possible mislabels by comparing a query image to other images in the dataset."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "4d7cf1b9-c6c0-4b90-b7bb-59ca7bdbdcd7",
"metadata": {
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 106.60it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored similar images visual view in fastdup_imagenette/galleries/similarity.html\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Similarity Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Similarity Report
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
"\n",
" from | \n",
" /train/n03394916/n03394916_44127.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.968786 | \n",
" /val/n03394916/n03394916_30631.JPEG | \n",
" French_horn | \n",
"
\n",
"\n",
" 0.918324 | \n",
" /train/n03394916/n03394916_36016.JPEG | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
"\n",
" from | \n",
" /val/n03394916/n03394916_30631.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.968786 | \n",
" /train/n03394916/n03394916_44127.JPEG | \n",
" French_horn | \n",
"
\n",
"\n",
" 0.903753 | \n",
" /train/n03394916/n03394916_29969.JPEG | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
"\n",
" from | \n",
" /val/n03445777/n03445777_6882.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.962458 | \n",
" /train/n03445777/n03445777_13918.JPEG | \n",
" golf_ball | \n",
"
\n",
"\n",
" 0.918005 | \n",
" /val/n03445777/n03445777_5912.JPEG | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
"\n",
" from | \n",
" /train/n03445777/n03445777_13918.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.962458 | \n",
" /val/n03445777/n03445777_6882.JPEG | \n",
" golf_ball | \n",
"
\n",
"\n",
" 0.917039 | \n",
" /val/n03445777/n03445777_8820.JPEG | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_1564.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.953837 | \n",
" /train/n02102040/n02102040_3837.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.908732 | \n",
" /train/n02102040/n02102040_3586.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_3837.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.953837 | \n",
" /train/n02102040/n02102040_1564.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.893944 | \n",
" /train/n02102040/n02102040_3027.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
"\n",
" from | \n",
" /train/n01440764/n01440764_7457.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.953413 | \n",
" /train/n01440764/n01440764_11339.JPEG | \n",
" tench | \n",
"
\n",
"\n",
" 0.918778 | \n",
" /train/n01440764/n01440764_9315.JPEG | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
"\n",
" from | \n",
" /train/n01440764/n01440764_11339.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.953413 | \n",
" /train/n01440764/n01440764_7457.JPEG | \n",
" tench | \n",
"
\n",
"\n",
" 0.889166 | \n",
" /train/n01440764/n01440764_12279.JPEG | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" garbage_truck | \n",
"
\n",
"\n",
" from | \n",
" /train/n03417042/n03417042_1578.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.952239 | \n",
" /train/n03417042/n03417042_12906.JPEG | \n",
" garbage_truck | \n",
"
\n",
"\n",
" 0.837864 | \n",
" /val/n03417042/n03417042_9610.JPEG | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" garbage_truck | \n",
"
\n",
"\n",
" from | \n",
" /train/n03417042/n03417042_12906.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.952239 | \n",
" /train/n03417042/n03417042_1578.JPEG | \n",
" garbage_truck | \n",
"
\n",
"\n",
" 0.828749 | \n",
" /train/n03417042/n03417042_27686.JPEG | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
"\n",
" from | \n",
" /val/n03394916/n03394916_6830.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.951679 | \n",
" /val/n03394916/n03394916_21092.JPEG | \n",
" French_horn | \n",
"
\n",
"\n",
" 0.89308 | \n",
" /train/n03394916/n03394916_35469.JPEG | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
"\n",
" from | \n",
" /val/n03394916/n03394916_21092.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.951679 | \n",
" /val/n03394916/n03394916_6830.JPEG | \n",
" French_horn | \n",
"
\n",
"\n",
" 0.865771 | \n",
" /train/n03394916/n03394916_35469.JPEG | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /train/n03888257/n03888257_21027.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.950477 | \n",
" /val/n03888257/n03888257_11210.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.92043 | \n",
" /val/n03888257/n03888257_12491.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /val/n03888257/n03888257_11210.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.950477 | \n",
" /train/n03888257/n03888257_21027.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.865155 | \n",
" /val/n03888257/n03888257_12491.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_6313.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.950174 | \n",
" /train/n02102040/n02102040_3767.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.947323 | \n",
" /val/n02102040/n02102040_350.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_3767.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.950174 | \n",
" /train/n02102040/n02102040_6313.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.914057 | \n",
" /val/n02102040/n02102040_350.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/ILSVRC2012_val_00032959.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.949877 | \n",
" /val/n02102040/n02102040_662.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.933114 | \n",
" /train/n02102040/n02102040_3114.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /val/n02102040/n02102040_662.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.949877 | \n",
" /train/n02102040/ILSVRC2012_val_00032959.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.927345 | \n",
" /val/n02102040/n02102040_3502.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_3114.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.949252 | \n",
" /train/n02102040/n02102040_1306.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.941953 | \n",
" /train/n02102040/n02102040_1055.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_1306.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.949252 | \n",
" /train/n02102040/n02102040_3114.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.936799 | \n",
" /train/n02102040/n02102040_876.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" from | \n",
" to | \n",
" label | \n",
" label2 | \n",
" distance | \n",
"
\n",
" \n",
" \n",
" \n",
" 3630 | \n",
" imagenette2-160/train/n03394916/n03394916_44127.JPEG | \n",
" [imagenette2-160/val/n03394916/n03394916_30631.JPEG, imagenette2-160/train/n03394916/n03394916_36016.JPEG] | \n",
" [French_horn, French_horn] | \n",
" [French_horn, French_horn] | \n",
" [0.968786, 0.918324] | \n",
"
\n",
" \n",
" 7823 | \n",
" imagenette2-160/val/n03394916/n03394916_30631.JPEG | \n",
" [imagenette2-160/train/n03394916/n03394916_44127.JPEG, imagenette2-160/train/n03394916/n03394916_29969.JPEG] | \n",
" [French_horn, French_horn] | \n",
" [French_horn, French_horn] | \n",
" [0.968786, 0.903753] | \n",
"
\n",
" \n",
" 8758 | \n",
" imagenette2-160/val/n03445777/n03445777_6882.JPEG | \n",
" [imagenette2-160/train/n03445777/n03445777_13918.JPEG, imagenette2-160/val/n03445777/n03445777_5912.JPEG] | \n",
" [golf_ball, golf_ball] | \n",
" [golf_ball, golf_ball] | \n",
" [0.962458, 0.918005] | \n",
"
\n",
" \n",
" 5363 | \n",
" imagenette2-160/train/n03445777/n03445777_13918.JPEG | \n",
" [imagenette2-160/val/n03445777/n03445777_6882.JPEG, imagenette2-160/val/n03445777/n03445777_8820.JPEG] | \n",
" [golf_ball, golf_ball] | \n",
" [golf_ball, golf_ball] | \n",
" [0.962458, 0.917039] | \n",
"
\n",
" \n",
" 896 | \n",
" imagenette2-160/train/n02102040/n02102040_1564.JPEG | \n",
" [imagenette2-160/train/n02102040/n02102040_3837.JPEG, imagenette2-160/train/n02102040/n02102040_3586.JPEG] | \n",
" [English_springer, English_springer] | \n",
" [English_springer, English_springer] | \n",
" [0.953837, 0.908732] | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 6224 | \n",
" imagenette2-160/train/n03888257/n03888257_38633.JPEG | \n",
" [imagenette2-160/train/n03888257/n03888257_12816.JPEG] | \n",
" [parachute] | \n",
" [parachute] | \n",
" [0.800073] | \n",
"
\n",
" \n",
" 5917 | \n",
" imagenette2-160/train/n03888257/n03888257_12816.JPEG | \n",
" [imagenette2-160/train/n03888257/n03888257_38633.JPEG] | \n",
" [parachute] | \n",
" [parachute] | \n",
" [0.800073] | \n",
"
\n",
" \n",
" 4324 | \n",
" imagenette2-160/train/n03417042/n03417042_3236.JPEG | \n",
" [imagenette2-160/train/n03417042/n03417042_12297.JPEG] | \n",
" [garbage_truck] | \n",
" [garbage_truck] | \n",
" [0.800025] | \n",
"
\n",
" \n",
" 3429 | \n",
" imagenette2-160/train/n03394916/n03394916_32478.JPEG | \n",
" [imagenette2-160/train/n03394916/n03394916_35573.JPEG] | \n",
" [French_horn] | \n",
" [French_horn] | \n",
" [0.800012] | \n",
"
\n",
" \n",
" 7503 | \n",
" imagenette2-160/val/n03028079/n03028079_13002.JPEG | \n",
" [imagenette2-160/train/n03028079/n03028079_3839.JPEG] | \n",
" [church] | \n",
" [church] | \n",
" [0.800002] | \n",
"
\n",
" \n",
"
\n",
"
9064 rows × 5 columns
\n",
"
"
],
"text/plain": [
" from to label label2 distance\n",
"3630 imagenette2-160/train/n03394916/n03394916_44127.JPEG [imagenette2-160/val/n03394916/n03394916_30631.JPEG, imagenette2-160/train/n03394916/n03394916_36016.JPEG] [French_horn, French_horn] [French_horn, French_horn] [0.968786, 0.918324]\n",
"7823 imagenette2-160/val/n03394916/n03394916_30631.JPEG [imagenette2-160/train/n03394916/n03394916_44127.JPEG, imagenette2-160/train/n03394916/n03394916_29969.JPEG] [French_horn, French_horn] [French_horn, French_horn] [0.968786, 0.903753]\n",
"8758 imagenette2-160/val/n03445777/n03445777_6882.JPEG [imagenette2-160/train/n03445777/n03445777_13918.JPEG, imagenette2-160/val/n03445777/n03445777_5912.JPEG] [golf_ball, golf_ball] [golf_ball, golf_ball] [0.962458, 0.918005]\n",
"5363 imagenette2-160/train/n03445777/n03445777_13918.JPEG [imagenette2-160/val/n03445777/n03445777_6882.JPEG, imagenette2-160/val/n03445777/n03445777_8820.JPEG] [golf_ball, golf_ball] [golf_ball, golf_ball] [0.962458, 0.917039]\n",
"896 imagenette2-160/train/n02102040/n02102040_1564.JPEG [imagenette2-160/train/n02102040/n02102040_3837.JPEG, imagenette2-160/train/n02102040/n02102040_3586.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.953837, 0.908732]\n",
"... ... ... ... ... ...\n",
"6224 imagenette2-160/train/n03888257/n03888257_38633.JPEG [imagenette2-160/train/n03888257/n03888257_12816.JPEG] [parachute] [parachute] [0.800073]\n",
"5917 imagenette2-160/train/n03888257/n03888257_12816.JPEG [imagenette2-160/train/n03888257/n03888257_38633.JPEG] [parachute] [parachute] [0.800073]\n",
"4324 imagenette2-160/train/n03417042/n03417042_3236.JPEG [imagenette2-160/train/n03417042/n03417042_12297.JPEG] [garbage_truck] [garbage_truck] [0.800025]\n",
"3429 imagenette2-160/train/n03394916/n03394916_32478.JPEG [imagenette2-160/train/n03394916/n03394916_35573.JPEG] [French_horn] [French_horn] [0.800012]\n",
"7503 imagenette2-160/val/n03028079/n03028079_13002.JPEG [imagenette2-160/train/n03028079/n03028079_3839.JPEG] [church] [church] [0.800002]\n",
"\n",
"[9064 rows x 5 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.similarity_gallery() "
]
},
{
"cell_type": "markdown",
"id": "c2c393be-2b42-4814-8688-03d2be9e8998",
"metadata": {},
"source": [
"## Similar Image Pairs\n",
"\n",
"Find similar image pairs within and across the train and validation subfolders. Pairs may include train-train, train-val, val-train, and val-val."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "9e065403-582b-4f94-855b-33fd8f4826a1",
"metadata": {
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/apps/volume/dataset-volume/mambaforge/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n",
"/apps/volume/dataset-volume/mambaforge/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n",
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 188.62it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored similarity visual view in fastdup_imagenette/galleries/duplicates.html\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Duplicates Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Duplicates Report
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.968786 | \n",
"
\n",
"\n",
" From | \n",
" /val/n03394916/n03394916_30631.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n03394916/n03394916_44127.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" French_horn | \n",
"
\n",
"\n",
" To_Label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.962458 | \n",
"
\n",
"\n",
" From | \n",
" /train/n03445777/n03445777_13918.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n03445777/n03445777_6882.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" golf_ball | \n",
"
\n",
"\n",
" To_Label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.953837 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_3837.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_1564.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.953413 | \n",
"
\n",
"\n",
" From | \n",
" /train/n01440764/n01440764_7457.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n01440764/n01440764_11339.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" tench | \n",
"
\n",
"\n",
" To_Label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.952239 | \n",
"
\n",
"\n",
" From | \n",
" /train/n03417042/n03417042_1578.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n03417042/n03417042_12906.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" garbage_truck | \n",
"
\n",
"\n",
" To_Label | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.951679 | \n",
"
\n",
"\n",
" From | \n",
" /val/n03394916/n03394916_6830.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n03394916/n03394916_21092.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" French_horn | \n",
"
\n",
"\n",
" To_Label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.950477 | \n",
"
\n",
"\n",
" From | \n",
" /val/n03888257/n03888257_11210.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n03888257/n03888257_21027.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" parachute | \n",
"
\n",
"\n",
" To_Label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.950174 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_6313.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_3767.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.949877 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/ILSVRC2012_val_00032959.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n02102040/n02102040_662.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.949252 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_1306.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_3114.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fd.vis.duplicates_gallery()"
]
},
{
"cell_type": "markdown",
"id": "e10989e1",
"metadata": {},
"source": [
"Show similar image pairs."
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "3ea590e9-d221-4202-b03b-e5fef4487c89",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 270
},
"executionInfo": {
"elapsed": 499,
"status": "ok",
"timestamp": 1677667342908,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "3ea590e9-d221-4202-b03b-e5fef4487c89",
"outputId": "3c5f4cc0-0ba5-42a0-e01b-f165e9cf655c",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" from | \n",
" to | \n",
" distance | \n",
" filename_from | \n",
" label_from | \n",
" split_from | \n",
" index_x | \n",
" error_code_from | \n",
" is_valid_from | \n",
" fd_index_from | \n",
" filename_to | \n",
" label_to | \n",
" split_to | \n",
" index_y | \n",
" error_code_to | \n",
" is_valid_to | \n",
" fd_index_to | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 11521 | \n",
" 5390 | \n",
" 0.968786 | \n",
" imagenette2-160/val/n03394916/n03394916_30631.JPEG | \n",
" French_horn | \n",
" val | \n",
" 11521 | \n",
" VALID | \n",
" True | \n",
" 11521 | \n",
" imagenette2-160/train/n03394916/n03394916_44127.JPEG | \n",
" French_horn | \n",
" train | \n",
" 5390 | \n",
" VALID | \n",
" True | \n",
" 5390 | \n",
"
\n",
" \n",
" 1 | \n",
" 5390 | \n",
" 11521 | \n",
" 0.968786 | \n",
" imagenette2-160/train/n03394916/n03394916_44127.JPEG | \n",
" French_horn | \n",
" train | \n",
" 5390 | \n",
" VALID | \n",
" True | \n",
" 5390 | \n",
" imagenette2-160/val/n03394916/n03394916_30631.JPEG | \n",
" French_horn | \n",
" val | \n",
" 11521 | \n",
" VALID | \n",
" True | \n",
" 11521 | \n",
"
\n",
" \n",
" 2 | \n",
" 12914 | \n",
" 7715 | \n",
" 0.962458 | \n",
" imagenette2-160/val/n03445777/n03445777_6882.JPEG | \n",
" golf_ball | \n",
" val | \n",
" 12914 | \n",
" VALID | \n",
" True | \n",
" 12914 | \n",
" imagenette2-160/train/n03445777/n03445777_13918.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7715 | \n",
" VALID | \n",
" True | \n",
" 7715 | \n",
"
\n",
" \n",
" 3 | \n",
" 7715 | \n",
" 12914 | \n",
" 0.962458 | \n",
" imagenette2-160/train/n03445777/n03445777_13918.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7715 | \n",
" VALID | \n",
" True | \n",
" 7715 | \n",
" imagenette2-160/val/n03445777/n03445777_6882.JPEG | \n",
" golf_ball | \n",
" val | \n",
" 12914 | \n",
" VALID | \n",
" True | \n",
" 12914 | \n",
"
\n",
" \n",
" 4 | \n",
" 1404 | \n",
" 1117 | \n",
" 0.953837 | \n",
" imagenette2-160/train/n02102040/n02102040_3837.JPEG | \n",
" English_springer | \n",
" train | \n",
" 1404 | \n",
" VALID | \n",
" True | \n",
" 1404 | \n",
" imagenette2-160/train/n02102040/n02102040_1564.JPEG | \n",
" English_springer | \n",
" train | \n",
" 1117 | \n",
" VALID | \n",
" True | \n",
" 1117 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" from to distance filename_from label_from split_from index_x error_code_from is_valid_from fd_index_from filename_to label_to split_to index_y error_code_to is_valid_to fd_index_to\n",
"0 11521 5390 0.968786 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11521 VALID True 11521 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5390 VALID True 5390\n",
"1 5390 11521 0.968786 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5390 VALID True 5390 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11521 VALID True 11521\n",
"2 12914 7715 0.962458 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12914 VALID True 12914 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7715 VALID True 7715\n",
"3 7715 12914 0.962458 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7715 VALID True 7715 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12914 VALID True 12914\n",
"4 1404 1117 0.953837 imagenette2-160/train/n02102040/n02102040_3837.JPEG English_springer train 1404 VALID True 1404 imagenette2-160/train/n02102040/n02102040_1564.JPEG English_springer train 1117 VALID True 1117"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.similarity().head(5)"
]
},
{
"cell_type": "markdown",
"id": "95d21e6d-a951-48dd-8c4c-894c8ba556fd",
"metadata": {},
"source": [
"## Image Clusters"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4a6db529-cb1e-4655-af50-d97f3e131319",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"output_embedded_package_id": "1Wh1vmG-F-RG0ZYZP1oRgiyqHAtnfsuEk"
},
"executionInfo": {
"elapsed": 6376,
"status": "ok",
"timestamp": 1677667352994,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "4a6db529-cb1e-4655-af50-d97f3e131319",
"outputId": "adfc3ee1-84c9-4aa6-a0db-09a6a800b566",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tench\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 36.72it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished OK. Components are stored as image files fastdup_imagenette/galleries/components_[index].jpg\n",
"Stored components visual view in fastdup_imagenette/galleries/components.html\n",
"Execution time in seconds 3.0\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Components Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Components Report
Showing groups of similar images
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6 | \n",
"
\n",
"\n",
" num_images | \n",
" 162 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" tench | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 850 | \n",
"
\n",
"\n",
" num_images | \n",
" 70 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7240 | \n",
"
\n",
"\n",
" num_images | \n",
" 69 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" golf_ball | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5410 | \n",
"
\n",
"\n",
" num_images | \n",
" 21 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 21 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4512 | \n",
"
\n",
"\n",
" num_images | \n",
" 13 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 13 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5397 | \n",
"
\n",
"\n",
" num_images | \n",
" 12 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9025 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5539 | \n",
"
\n",
"\n",
" num_images | \n",
" 10 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1139 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9062 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5632 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9041 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4494 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.902 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1239 | \n",
"
\n",
"\n",
" num_images | \n",
" 6 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.903 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4531 | \n",
"
\n",
"\n",
" num_images | \n",
" 6 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.902 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5678 | \n",
"
\n",
"\n",
" num_images | \n",
" 6 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9064 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 8335 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" parachute | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 199 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" tench | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2174 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9011 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" cassette_player | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7386 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9019 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" golf_ball | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4616 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9043 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 8979 | \n",
"
\n",
"\n",
" num_images | \n",
" 4 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9013 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" parachute | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4764 | \n",
"
\n",
"\n",
" num_images | \n",
" 4 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9032 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fd.vis.component_gallery()"
]
},
{
"cell_type": "markdown",
"id": "ca5d4b6e-7ff6-49b8-b487-6ba1573ab104",
"metadata": {},
"source": [
"You can also visualize clusters with specific labels using the `slice` parameter. For example let's visualize clusters with the `chain_saw` label"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"output_embedded_package_id": "1xYIrPsODG8kAMaZOpGeKNRoa4-HjPC-w"
},
"executionInfo": {
"elapsed": 5130,
"status": "ok",
"timestamp": 1677667368207,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1",
"outputId": "131d0f11-5627-4beb-b58c-3801e09a3b42",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chain_saw\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 250.94it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished OK. Components are stored as image files fastdup_imagenette/galleries/components_[index].jpg\n",
"Stored components visual view in fastdup_imagenette/galleries/components.html\n",
"Execution time in seconds 0.3\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Components Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Components Report
Showing groups of similar images, for label: chain_saw
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2876 | \n",
"
\n",
"\n",
" num_images | \n",
" 3 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9064 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2798 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9029 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2815 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9208 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2862 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9222 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2989 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9139 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2992 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9198 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 3001 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9073 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 3002 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9192 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 3077 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9355 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 3305 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9345 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 10204 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9039 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"fd.vis.component_gallery(slice='chain_saw')"
]
},
{
"cell_type": "markdown",
"id": "28498d81-d073-4f3d-baa4-732e1df93a34",
"metadata": {},
"source": [
"## Connected Components"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "0346be91-5380-48b9-a8df-074c342efcd3",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"executionInfo": {
"elapsed": 1036,
"status": "ok",
"timestamp": 1677667380699,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "0346be91-5380-48b9-a8df-074c342efcd3",
"outputId": "ffa6bd9d-b5b3-4ed5-86e1-c47ca9658667",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" component_id | \n",
" sum | \n",
" count | \n",
" mean_distance | \n",
" min_distance | \n",
" max_distance | \n",
" filename | \n",
" label | \n",
" split | \n",
" error_code | \n",
" is_valid | \n",
" fd_index | \n",
"
\n",
" \n",
" \n",
" \n",
" 235 | \n",
" 235 | \n",
" 6 | \n",
" 517.2897 | \n",
" 566.0 | \n",
" 0.9139 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_13304.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 235 | \n",
"
\n",
" \n",
" 121 | \n",
" 121 | \n",
" 6 | \n",
" 517.2897 | \n",
" 566.0 | \n",
" 0.9139 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_11486.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 121 | \n",
"
\n",
" \n",
" 685 | \n",
" 685 | \n",
" 6 | \n",
" 517.2897 | \n",
" 566.0 | \n",
" 0.9139 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_6174.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 685 | \n",
"
\n",
" \n",
" 689 | \n",
" 689 | \n",
" 6 | \n",
" 517.2897 | \n",
" 566.0 | \n",
" 0.9139 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_6249.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 689 | \n",
"
\n",
" \n",
" 706 | \n",
" 706 | \n",
" 6 | \n",
" 517.2897 | \n",
" 566.0 | \n",
" 0.9139 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_6494.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 706 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index component_id sum count mean_distance min_distance max_distance filename label split error_code is_valid fd_index\n",
"235 235 6 517.2897 566.0 0.9139 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_13304.JPEG tench train VALID True 235\n",
"121 121 6 517.2897 566.0 0.9139 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_11486.JPEG tench train VALID True 121\n",
"685 685 6 517.2897 566.0 0.9139 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_6174.JPEG tench train VALID True 685\n",
"689 689 6 517.2897 566.0 0.9139 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_6249.JPEG tench train VALID True 689\n",
"706 706 6 517.2897 566.0 0.9139 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_6494.JPEG tench train VALID True 706"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc_df, _ = fd.connected_components()\n",
"cc_df.sort_values('count', ascending=False).head(5)"
]
},
{
"cell_type": "markdown",
"id": "569cb878",
"metadata": {},
"source": [
"We can also get metadata for individual images using their `fastdup_id` available in `fd.annotations()`"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 990,
"status": "ok",
"timestamp": 1677667384644,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8",
"outputId": "4f973aba-572d-4e50-d22d-c5bfc8cf3d2d",
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'filename': 'imagenette2-160/train/n01440764/n01440764_1778.JPEG',\n",
" 'label': 'tench',\n",
" 'split': 'train',\n",
" 'index': 349,\n",
" 'error_code': 'VALID',\n",
" 'is_valid': True,\n",
" 'fd_index': 349}"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd[349]"
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}