{
"cells": [
{
"cell_type": "markdown",
"id": "6fc7a410",
"metadata": {},
"source": [
"[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)"
]
},
{
"cell_type": "markdown",
"id": "SwSYWR4vzk_e",
"metadata": {
"id": "SwSYWR4vzk_e",
"tags": []
},
"source": [
"# Analyzing Image Classification Dataset\n",
"\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n",
"[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb)\n",
"\n",
"This notebook shows how you can use [fastdup](https://github.com/visual-layer/fastdup) to analyze an image classification dataset for:\n",
"\n",
"+ Duplicates\n",
"+ Outliers\n",
"+ Wrong labels\n",
"+ Image clusters\n",
"\n",
"\n",
"> **Note** - No GPU needed! You can run this notebook on a CPU-only instance.\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "bbed0117-e8d1-4df6-b8b7-7bcce10b8655",
"metadata": {
"tags": []
},
"source": [
"## Installation\n",
"\n",
"First let's install [fastdup](https://github.com/visual-layer/fastdup) from PyPI with:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "506e82b4-a1c2-4262-a326-d0924bb018b6",
"metadata": {
"id": "506e82b4-a1c2-4262-a326-d0924bb018b6"
},
"outputs": [],
"source": [
"!pip install -Uq fastdup"
]
},
{
"cell_type": "markdown",
"id": "a5c3a1ab",
"metadata": {},
"source": [
"Now, test the installation. If there's no error message, we are ready to go."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "7f69d8b2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/usr/bin/dpkg\n"
]
},
{
"data": {
"text/plain": [
"'1.26'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import fastdup\n",
"fastdup.__version__"
]
},
{
"cell_type": "markdown",
"id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886",
"metadata": {
"id": "8a79fb1b-b089-4d4d-8fa8-3e2b2ef7f886",
"tags": []
},
"source": [
"## Download Dataset\n",
"\n",
"We will analyze the [Imagenette](https://github.com/fastai/imagenette) dataset - a subset of 10 easily classified classes from Imagenet (tench, English springer, cassette player, chain saw, church, French horn, garbage truck, gas pump, golf ball, parachute)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "be5b7ca5-34f5-4a0f-b081-2e78be6a425a",
"metadata": {},
"outputs": [],
"source": [
"!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz\n",
"!tar -xf imagenette2-160.tgz"
]
},
{
"cell_type": "markdown",
"id": "f01586fe-db75-4154-aa15-9ea2709c9461",
"metadata": {
"id": "f01586fe-db75-4154-aa15-9ea2709c9461"
},
"source": [
"## Load and Format Annotations"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91",
"metadata": {
"executionInfo": {
"elapsed": 949,
"status": "ok",
"timestamp": 1677666765166,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "ff90fe31-7c39-46c5-8c58-3ae349fbcc91",
"tags": []
},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85",
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666768281,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "21d2474d-3fa5-4148-a0f1-ea8d55d63b85",
"tags": []
},
"outputs": [],
"source": [
"data_dir = 'imagenette2-160/'\n",
"csv_path = 'imagenette2-160/noisy_imagenette.csv'"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "2cb91ccb-9cb6-42ba-9489-96182eccc583",
"metadata": {
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666769859,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "2cb91ccb-9cb6-42ba-9489-96182eccc583",
"tags": []
},
"outputs": [],
"source": [
"label_map = {\n",
" 'n02979186': 'cassette_player', \n",
" 'n03417042': 'garbage_truck', \n",
" 'n01440764': 'tench', \n",
" 'n02102040': 'English_springer', \n",
" 'n03028079': 'church',\n",
" 'n03888257': 'parachute', \n",
" 'n03394916': 'French_horn', \n",
" 'n03000684': 'chain_saw', \n",
" 'n03445777': 'golf_ball', \n",
" 'n03425413': 'gas_pump'\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "8aba34e1",
"metadata": {},
"source": [
"Load the annotations provided with the dataset."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "e2e90600-b02d-4a2a-a348-7b67157f9129",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 143
},
"executionInfo": {
"elapsed": 2,
"status": "ok",
"timestamp": 1677666769859,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "e2e90600-b02d-4a2a-a348-7b67157f9129",
"outputId": "f9f72c0d-f613-4aac-d29c-3646b2301dcb",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" path | \n",
" noisy_labels_0 | \n",
" noisy_labels_1 | \n",
" noisy_labels_5 | \n",
" noisy_labels_25 | \n",
" noisy_labels_50 | \n",
" is_valid | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" train/n02979186/n02979186_9036.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" False | \n",
"
\n",
" \n",
" 1 | \n",
" train/n02979186/n02979186_11957.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n03000684 | \n",
" False | \n",
"
\n",
" \n",
" 2 | \n",
" train/n02979186/n02979186_9715.JPEG | \n",
" n02979186 | \n",
" n02979186 | \n",
" n02979186 | \n",
" n03417042 | \n",
" n03000684 | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" path noisy_labels_0 noisy_labels_1 noisy_labels_5 noisy_labels_25 noisy_labels_50 is_valid\n",
"0 train/n02979186/n02979186_9036.JPEG n02979186 n02979186 n02979186 n02979186 n02979186 False\n",
"1 train/n02979186/n02979186_11957.JPEG n02979186 n02979186 n02979186 n02979186 n03000684 False\n",
"2 train/n02979186/n02979186_9715.JPEG n02979186 n02979186 n02979186 n03417042 n03000684 False"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_annot = pd.read_csv(csv_path)\n",
"df_annot.head(3)"
]
},
{
"cell_type": "markdown",
"id": "dfc957bf",
"metadata": {},
"source": [
"Transform the annotations to fastdup supported format.\n",
"\n",
"fastdup expects an annotation `DataFrame` that contains the following column:\n",
"\n",
"+ filename - contains the path to the image file\n",
"+ label - contains a label of the image\n",
"+ split - whether the image is subset of the training, validation or test dataset"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "473185d1-89f5-4746-b87b-f2b3ef7c445b",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 424
},
"executionInfo": {
"elapsed": 1012,
"status": "ok",
"timestamp": 1677666771201,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "473185d1-89f5-4746-b87b-f2b3ef7c445b",
"outputId": "c09c986d-bcef-4545-8ceb-ee5196b40ee6",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" filename | \n",
" label | \n",
" split | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" imagenette2-160/train/n02979186/n02979186_9036.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 1 | \n",
" imagenette2-160/train/n02979186/n02979186_11957.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 2 | \n",
" imagenette2-160/train/n02979186/n02979186_9715.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 3 | \n",
" imagenette2-160/train/n02979186/n02979186_21736.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" 4 | \n",
" imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEG | \n",
" cassette_player | \n",
" train | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 13389 | \n",
" imagenette2-160/val/n03425413/n03425413_17521.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13390 | \n",
" imagenette2-160/val/n03425413/n03425413_20711.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13391 | \n",
" imagenette2-160/val/n03425413/n03425413_19050.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13392 | \n",
" imagenette2-160/val/n03425413/n03425413_13831.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
" 13393 | \n",
" imagenette2-160/val/n03425413/n03425413_1242.JPEG | \n",
" gas_pump | \n",
" val | \n",
"
\n",
" \n",
"
\n",
"
13394 rows × 3 columns
\n",
"
"
],
"text/plain": [
" filename label split\n",
"0 imagenette2-160/train/n02979186/n02979186_9036.JPEG cassette_player train\n",
"1 imagenette2-160/train/n02979186/n02979186_11957.JPEG cassette_player train\n",
"2 imagenette2-160/train/n02979186/n02979186_9715.JPEG cassette_player train\n",
"3 imagenette2-160/train/n02979186/n02979186_21736.JPEG cassette_player train\n",
"4 imagenette2-160/train/n02979186/ILSVRC2012_val_00046953.JPEG cassette_player train\n",
"... ... ... ...\n",
"13389 imagenette2-160/val/n03425413/n03425413_17521.JPEG gas_pump val\n",
"13390 imagenette2-160/val/n03425413/n03425413_20711.JPEG gas_pump val\n",
"13391 imagenette2-160/val/n03425413/n03425413_19050.JPEG gas_pump val\n",
"13392 imagenette2-160/val/n03425413/n03425413_13831.JPEG gas_pump val\n",
"13393 imagenette2-160/val/n03425413/n03425413_1242.JPEG gas_pump val\n",
"\n",
"[13394 rows x 3 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# take relevant columns\n",
"df_annot = df_annot[['path', 'noisy_labels_0']]\n",
"\n",
"# rename columns to fastdup's column names\n",
"df_annot = df_annot.rename({'noisy_labels_0': 'label', 'path': 'filename'}, axis='columns')\n",
"\n",
"# append datadir\n",
"df_annot['filename'] = df_annot['filename'].apply(lambda x: data_dir + x)\n",
"\n",
"# create split column\n",
"df_annot['split'] = df_annot['filename'].apply(lambda x: x.split(\"/\")[1])\n",
"\n",
"# map label ids to regular labels\n",
"df_annot['label'] = df_annot['label'].map(label_map)\n",
"\n",
"# show formated annotations\n",
"df_annot"
]
},
{
"cell_type": "markdown",
"id": "0c648ed1-5016-4230-9873-546eb510b764",
"metadata": {
"id": "0c648ed1-5016-4230-9873-546eb510b764"
},
"source": [
"## Run fastdup\n",
"\n",
"With the images and annotations ready, we can proceed with running an analysis on the data."
]
},
{
"cell_type": "markdown",
"id": "0a39243e",
"metadata": {},
"source": [
"+ `input_dir` is the path to the downloaded images\n",
"+ `work_dir` is the path to store the artifacts from the analysis (optional)"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "92a6e2f9-e60c-44c0-b48a-f7413f7594ae",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n",
"FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n",
"2023-07-13 19:22:31 [INFO] Going to loop over dir /tmp/tmpqm6imqyr.csv\n",
"2023-07-13 19:22:31 [INFO] Found total 13394 images to run on, 13394 train, 0 test, name list 13394, counter 13394 \n",
"2023-07-13 19:23:04 [INFO] Found total 13394 images to run onimated: 0 Minutes\n",
"Finished histogram 3.121\n",
"Finished bucket sort 3.151\n",
"2023-07-13 19:23:04 [INFO] 544) Finished write_index() NN model\n",
"2023-07-13 19:23:04 [INFO] Stored nn model index file work_dir/nnf.index\n",
"2023-07-13 19:23:05 [INFO] Total time took 34024 ms\n",
"2023-07-13 19:23:05 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %\n",
"2023-07-13 19:23:05 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %\n",
"2023-07-13 19:23:05 [INFO] Found a total of 16764 above threshold images (d>0.800), which are 62.58 %\n",
"2023-07-13 19:23:05 [INFO] Found a total of 1339 outlier images (d<0.050), which are 5.00 %\n",
"2023-07-13 19:23:05 [INFO] Min distance found 0.519 max distance 0.969\n",
"2023-07-13 19:23:05 [INFO] Running connected components for ccthreshold 0.900000 \n",
".0\n",
" ########################################################################################\n",
"\n",
"Dataset Analysis Summary: \n",
"\n",
" Dataset contains 13394 images\n",
" Valid images are 100.00% (13,394) of the data, invalid are 0.00% (0) of the data\n",
" Similarity: 3.11% (416) belong to 18 similarity clusters (components).\n",
" 96.89% (12,978) images do not belong to any similarity cluster.\n",
" Largest cluster has 562 (4.20%) images.\n",
" For a detailed analysis, use `.connected_components()`\n",
"(similarity threshold used is 0.8, connected component threshold used is 0.9).\n",
"\n",
" Outliers: 6.24% (836) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n",
" For a detailed list of outliers, use `.outliers()`.\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd = fastdup.create(input_dir=data_dir) \n",
"fd.run(annotations=df_annot, ccthreshold=0.9, threshold=0.8)"
]
},
{
"cell_type": "markdown",
"id": "62e35a12-fadd-4b3f-bcab-69e6e67862a4",
"metadata": {},
"source": [
"## Outliers\n",
"\n",
"Visualize outliers from the dataset."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"executionInfo": {
"elapsed": 2658,
"status": "ok",
"timestamp": 1677667336302,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "b39ec702-3ea1-4afe-a948-f026ba8fcb47",
"outputId": "caa992d2-5267-408c-b44a-3a4a66e1ab5f",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 26723.82it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored outliers visual view in work_dir/galleries/outliers.html\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Outliers Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Outliers Report
Showing image outliers, one per row
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.523752 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_5218.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.57066 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_34639.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.578252 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_3254.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.58389 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03445777/n03445777_5932.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.599957 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_79145.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.605961 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n01440764/n01440764_5638.JPEG | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.608525 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03394916/n03394916_33663.JPEG | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.609527 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_7793.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.611143 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n01440764/n01440764_4962.JPEG | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.61373 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_6033.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.61618 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03394916/n03394916_37544.JPEG | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.616704 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03888257/n03888257_11450.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.616785 | \n",
"
\n",
"\n",
" Path | \n",
" /val/n03445777/n03445777_9292.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.617952 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_16223.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.619739 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03028079/n03028079_24708.JPEG | \n",
"
\n",
"\n",
" label | \n",
" church | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.619787 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n01440764/ILSVRC2012_val_00037834.JPEG | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.620815 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_5703.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.626412 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03445777/n03445777_9199.JPEG | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.628011 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n03888257/n03888257_32518.JPEG | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.630812 | \n",
"
\n",
"\n",
" Path | \n",
" /train/n02979186/n02979186_10289.JPEG | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.outliers_gallery()"
]
},
{
"cell_type": "markdown",
"id": "67378b58",
"metadata": {},
"source": [
"Show outliers image data."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 270
},
"executionInfo": {
"elapsed": 429,
"status": "ok",
"timestamp": 1677667331251,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "aa1c0e5d-6038-491b-8a91-1d76a87590d4",
"outputId": "b38332f8-7e4e-45de-f7d3-828a52757ec2",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" outlier | \n",
" nearest | \n",
" distance | \n",
" filename_outlier | \n",
" label_outlier | \n",
" split_outlier | \n",
" index_x | \n",
" error_code_outlier | \n",
" is_valid_outlier | \n",
" fd_index_outlier | \n",
" filename_nearest | \n",
" label_nearest | \n",
" split_nearest | \n",
" index_y | \n",
" error_code_nearest | \n",
" is_valid_nearest | \n",
" fd_index_nearest | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 8293 | \n",
" 13217 | \n",
" 0.519030 | \n",
" imagenette2-160/train/n03445777/n03445777_5218.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 8293 | \n",
" VALID | \n",
" True | \n",
" 8293 | \n",
" imagenette2-160/val/n03425413/n03425413_11460.JPEG | \n",
" gas_pump | \n",
" val | \n",
" 13217 | \n",
" VALID | \n",
" True | \n",
" 13217 | \n",
"
\n",
" \n",
" 1 | \n",
" 5457 | \n",
" 5500 | \n",
" 0.544795 | \n",
" imagenette2-160/train/n03888257/n03888257_34639.JPEG | \n",
" parachute | \n",
" train | \n",
" 5457 | \n",
" VALID | \n",
" True | \n",
" 5457 | \n",
" imagenette2-160/train/n03888257/n03888257_12053.JPEG | \n",
" parachute | \n",
" train | \n",
" 5500 | \n",
" VALID | \n",
" True | \n",
" 5500 | \n",
"
\n",
" \n",
" 2 | \n",
" 8076 | \n",
" 3016 | \n",
" 0.555266 | \n",
" imagenette2-160/train/n03445777/n03445777_3254.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 8076 | \n",
" VALID | \n",
" True | \n",
" 8076 | \n",
" imagenette2-160/train/n02102040/n02102040_585.JPEG | \n",
" English_springer | \n",
" train | \n",
" 3016 | \n",
" VALID | \n",
" True | \n",
" 3016 | \n",
"
\n",
" \n",
" 3 | \n",
" 2790 | \n",
" 4510 | \n",
" 0.568702 | \n",
" imagenette2-160/train/n01440764/n01440764_5638.JPEG | \n",
" tench | \n",
" train | \n",
" 2790 | \n",
" VALID | \n",
" True | \n",
" 2790 | \n",
" imagenette2-160/train/n03028079/n03028079_6607.JPEG | \n",
" church | \n",
" train | \n",
" 4510 | \n",
" VALID | \n",
" True | \n",
" 4510 | \n",
"
\n",
" \n",
" 4 | \n",
" 5478 | \n",
" 11775 | \n",
" 0.582118 | \n",
" imagenette2-160/train/n03888257/n03888257_79145.JPEG | \n",
" parachute | \n",
" train | \n",
" 5478 | \n",
" VALID | \n",
" True | \n",
" 5478 | \n",
" imagenette2-160/val/n03888257/n03888257_8080.JPEG | \n",
" parachute | \n",
" val | \n",
" 11775 | \n",
" VALID | \n",
" True | \n",
" 11775 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" outlier nearest distance filename_outlier label_outlier split_outlier index_x error_code_outlier is_valid_outlier fd_index_outlier filename_nearest label_nearest split_nearest index_y error_code_nearest is_valid_nearest fd_index_nearest\n",
"0 8293 13217 0.519030 imagenette2-160/train/n03445777/n03445777_5218.JPEG golf_ball train 8293 VALID True 8293 imagenette2-160/val/n03425413/n03425413_11460.JPEG gas_pump val 13217 VALID True 13217\n",
"1 5457 5500 0.544795 imagenette2-160/train/n03888257/n03888257_34639.JPEG parachute train 5457 VALID True 5457 imagenette2-160/train/n03888257/n03888257_12053.JPEG parachute train 5500 VALID True 5500\n",
"2 8076 3016 0.555266 imagenette2-160/train/n03445777/n03445777_3254.JPEG golf_ball train 8076 VALID True 8076 imagenette2-160/train/n02102040/n02102040_585.JPEG English_springer train 3016 VALID True 3016\n",
"3 2790 4510 0.568702 imagenette2-160/train/n01440764/n01440764_5638.JPEG tench train 2790 VALID True 2790 imagenette2-160/train/n03028079/n03028079_6607.JPEG church train 4510 VALID True 4510\n",
"4 5478 11775 0.582118 imagenette2-160/train/n03888257/n03888257_79145.JPEG parachute train 5478 VALID True 5478 imagenette2-160/val/n03888257/n03888257_8080.JPEG parachute val 11775 VALID True 11775"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.outliers().head(5)"
]
},
{
"cell_type": "markdown",
"id": "bc16596d-899a-45eb-87ca-1d2b96a6ad96",
"metadata": {},
"source": [
"## Comparing Labels of Similar Images\n",
"Find possible mislabels by comparing a query image to other images in the dataset."
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "4d7cf1b9-c6c0-4b90-b7bb-59ca7bdbdcd7",
"metadata": {
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 237.91it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored similar images visual view in work_dir/galleries/similarity.html\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Similarity Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Similarity Report
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" church | \n",
"
\n",
"\n",
" from | \n",
" /val/n03028079/n03028079_13002.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800002 | \n",
" /train/n03028079/n03028079_3839.JPEG | \n",
" church | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" French_horn | \n",
"
\n",
"\n",
" from | \n",
" /train/n03394916/n03394916_32478.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800012 | \n",
" /train/n03394916/n03394916_35573.JPEG | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
"\n",
" from | \n",
" /train/n02979186/n02979186_14524.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.806502 | \n",
" /train/n02979186/n02979186_213.JPEG | \n",
" cassette_player | \n",
"
\n",
"\n",
" 0.800015 | \n",
" /val/n02979186/n02979186_11000.JPEG | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
"\n",
" from | \n",
" /val/n02979186/n02979186_11000.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.820827 | \n",
" /train/n02979186/n02979186_10095.JPEG | \n",
" cassette_player | \n",
"
\n",
"\n",
" 0.800015 | \n",
" /train/n02979186/n02979186_14524.JPEG | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" tench | \n",
"
\n",
"\n",
" from | \n",
" /train/n01440764/n01440764_44.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.803563 | \n",
" /train/n01440764/n01440764_14249.JPEG | \n",
" tench | \n",
"
\n",
"\n",
" 0.800023 | \n",
" /val/n01440764/n01440764_5490.JPEG | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" garbage_truck | \n",
"
\n",
"\n",
" from | \n",
" /train/n03417042/n03417042_3236.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800025 | \n",
" /train/n03417042/n03417042_12297.JPEG | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /train/n03888257/n03888257_20704.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.804987 | \n",
" /train/n03888257/n03888257_20473.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.800034 | \n",
" /train/n03888257/n03888257_8614.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" gas_pump | \n",
"
\n",
"\n",
" from | \n",
" /train/n03425413/n03425413_14249.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.810811 | \n",
" /val/n03425413/n03425413_20360.JPEG | \n",
" gas_pump | \n",
"
\n",
"\n",
" 0.800035 | \n",
" /train/n03425413/n03425413_719.JPEG | \n",
" gas_pump | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /val/n03888257/n03888257_31790.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.810816 | \n",
" /train/n03888257/n03888257_17326.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.800036 | \n",
" /train/n03888257/n03888257_8199.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /train/n03888257/n03888257_8199.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.834109 | \n",
" /train/n03888257/n03888257_17326.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.800036 | \n",
" /val/n03888257/n03888257_31790.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" chain_saw | \n",
"
\n",
"\n",
" from | \n",
" /val/n03000684/n03000684_24542.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.803641 | \n",
" /val/n03000684/n03000684_2610.JPEG | \n",
" chain_saw | \n",
"
\n",
"\n",
" 0.80004 | \n",
" /train/n03000684/n03000684_26357.JPEG | \n",
" chain_saw | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" chain_saw | \n",
"
\n",
"\n",
" from | \n",
" /val/n03000684/n03000684_17431.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.807598 | \n",
" /train/n03000684/n03000684_1034.JPEG | \n",
" chain_saw | \n",
"
\n",
"\n",
" 0.800068 | \n",
" /train/n03000684/n03000684_807.JPEG | \n",
" chain_saw | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" chain_saw | \n",
"
\n",
"\n",
" from | \n",
" /train/n03000684/n03000684_807.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.811944 | \n",
" /val/n03000684/n03000684_18140.JPEG | \n",
" chain_saw | \n",
"
\n",
"\n",
" 0.800068 | \n",
" /val/n03000684/n03000684_17431.JPEG | \n",
" chain_saw | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" English_springer | \n",
"
\n",
"\n",
" from | \n",
" /train/n02102040/n02102040_139.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.841122 | \n",
" /train/n02102040/n02102040_2528.JPEG | \n",
" English_springer | \n",
"
\n",
"\n",
" 0.800071 | \n",
" /val/n02102040/n02102040_1121.JPEG | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /train/n03888257/n03888257_38633.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800073 | \n",
" /train/n03888257/n03888257_12816.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /train/n03888257/n03888257_12816.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800073 | \n",
" /train/n03888257/n03888257_38633.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" parachute | \n",
"
\n",
"\n",
" from | \n",
" /val/n03888257/n03888257_66961.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.805559 | \n",
" /val/n03888257/n03888257_13410.JPEG | \n",
" parachute | \n",
"
\n",
"\n",
" 0.800073 | \n",
" /val/n03888257/n03888257_3142.JPEG | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" church | \n",
"
\n",
"\n",
" from | \n",
" /train/n03028079/n03028079_17175.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.806021 | \n",
" /train/n03028079/n03028079_12685.JPEG | \n",
" church | \n",
"
\n",
"\n",
" 0.800076 | \n",
" /train/n03028079/n03028079_23514.JPEG | \n",
" church | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" golf_ball | \n",
"
\n",
"\n",
" from | \n",
" /val/n03445777/n03445777_6350.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.806152 | \n",
" /train/n03445777/n03445777_2468.JPEG | \n",
" golf_ball | \n",
"
\n",
"\n",
" 0.800086 | \n",
" /val/n03445777/n03445777_7480.JPEG | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info From | \n",
"
\n",
"\n",
" label | \n",
" cassette_player | \n",
"
\n",
"\n",
" from | \n",
" /train/n02979186/n02979186_10666.JPEG | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info To | \n",
"
\n",
"\n",
" 0.800088 | \n",
" /train/n02979186/n02979186_2383.JPEG | \n",
" cassette_player | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tQuery Image | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\tSimilar | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t\t\t\t | \n",
"\t\t\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t\t\t\n",
"\t\t\t\t\t\t\t
\n",
"\t\t\t\t\t\t
\n",
"\t\t\t\t\t
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" from | \n",
" to | \n",
" label | \n",
" label2 | \n",
" distance | \n",
"
\n",
" \n",
" \n",
" \n",
" 7505 | \n",
" imagenette2-160/val/n03028079/n03028079_13002.JPEG | \n",
" [imagenette2-160/train/n03028079/n03028079_3839.JPEG] | \n",
" [church] | \n",
" [church] | \n",
" [0.800002] | \n",
"
\n",
" \n",
" 3429 | \n",
" imagenette2-160/train/n03394916/n03394916_32478.JPEG | \n",
" [imagenette2-160/train/n03394916/n03394916_35573.JPEG] | \n",
" [French_horn] | \n",
" [French_horn] | \n",
" [0.800012] | \n",
"
\n",
" \n",
" 1700 | \n",
" imagenette2-160/train/n02979186/n02979186_14524.JPEG | \n",
" [imagenette2-160/val/n02979186/n02979186_11000.JPEG, imagenette2-160/train/n02979186/n02979186_213.JPEG] | \n",
" [cassette_player, cassette_player] | \n",
" [cassette_player, cassette_player] | \n",
" [0.800015, 0.806502] | \n",
"
\n",
" \n",
" 7055 | \n",
" imagenette2-160/val/n02979186/n02979186_11000.JPEG | \n",
" [imagenette2-160/train/n02979186/n02979186_14524.JPEG, imagenette2-160/train/n02979186/n02979186_10095.JPEG] | \n",
" [cassette_player, cassette_player] | \n",
" [cassette_player, cassette_player] | \n",
" [0.800015, 0.820827] | \n",
"
\n",
" \n",
" 471 | \n",
" imagenette2-160/train/n01440764/n01440764_44.JPEG | \n",
" [imagenette2-160/val/n01440764/n01440764_5490.JPEG, imagenette2-160/train/n01440764/n01440764_14249.JPEG] | \n",
" [tench, tench] | \n",
" [tench, tench] | \n",
" [0.800023, 0.803563] | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 870 | \n",
" imagenette2-160/train/n02102040/n02102040_1306.JPEG | \n",
" [imagenette2-160/train/n02102040/n02102040_876.JPEG, imagenette2-160/train/n02102040/n02102040_3114.JPEG] | \n",
" [English_springer, English_springer] | \n",
" [English_springer, English_springer] | \n",
" [0.936799, 0.949252] | \n",
"
\n",
" \n",
" 1050 | \n",
" imagenette2-160/train/n02102040/n02102040_3114.JPEG | \n",
" [imagenette2-160/train/n02102040/n02102040_1055.JPEG, imagenette2-160/train/n02102040/n02102040_1306.JPEG] | \n",
" [English_springer, English_springer] | \n",
" [English_springer, English_springer] | \n",
" [0.941953, 0.949252] | \n",
"
\n",
" \n",
" 231 | \n",
" imagenette2-160/train/n01440764/n01440764_13978.JPEG | \n",
" [imagenette2-160/val/n01440764/n01440764_6341.JPEG, imagenette2-160/val/n01440764/n01440764_8210.JPEG] | \n",
" [tench, tench] | \n",
" [tench, tench] | \n",
" [0.943767, 0.945909] | \n",
"
\n",
" \n",
" 6846 | \n",
" imagenette2-160/val/n02102040/n02102040_350.JPEG | \n",
" [imagenette2-160/val/n02102040/n02102040_312.JPEG, imagenette2-160/train/n02102040/n02102040_6313.JPEG] | \n",
" [English_springer, English_springer] | \n",
" [English_springer, English_springer] | \n",
" [0.945413, 0.947323] | \n",
"
\n",
" \n",
" 1339 | \n",
" imagenette2-160/train/n02102040/n02102040_6313.JPEG | \n",
" [imagenette2-160/val/n02102040/n02102040_350.JPEG, imagenette2-160/train/n02102040/n02102040_3767.JPEG] | \n",
" [English_springer, English_springer] | \n",
" [English_springer, English_springer] | \n",
" [0.947323, 0.950174] | \n",
"
\n",
" \n",
"
\n",
"
9069 rows × 5 columns
\n",
"
"
],
"text/plain": [
" from to label label2 distance\n",
"7505 imagenette2-160/val/n03028079/n03028079_13002.JPEG [imagenette2-160/train/n03028079/n03028079_3839.JPEG] [church] [church] [0.800002]\n",
"3429 imagenette2-160/train/n03394916/n03394916_32478.JPEG [imagenette2-160/train/n03394916/n03394916_35573.JPEG] [French_horn] [French_horn] [0.800012]\n",
"1700 imagenette2-160/train/n02979186/n02979186_14524.JPEG [imagenette2-160/val/n02979186/n02979186_11000.JPEG, imagenette2-160/train/n02979186/n02979186_213.JPEG] [cassette_player, cassette_player] [cassette_player, cassette_player] [0.800015, 0.806502]\n",
"7055 imagenette2-160/val/n02979186/n02979186_11000.JPEG [imagenette2-160/train/n02979186/n02979186_14524.JPEG, imagenette2-160/train/n02979186/n02979186_10095.JPEG] [cassette_player, cassette_player] [cassette_player, cassette_player] [0.800015, 0.820827]\n",
"471 imagenette2-160/train/n01440764/n01440764_44.JPEG [imagenette2-160/val/n01440764/n01440764_5490.JPEG, imagenette2-160/train/n01440764/n01440764_14249.JPEG] [tench, tench] [tench, tench] [0.800023, 0.803563]\n",
"... ... ... ... ... ...\n",
"870 imagenette2-160/train/n02102040/n02102040_1306.JPEG [imagenette2-160/train/n02102040/n02102040_876.JPEG, imagenette2-160/train/n02102040/n02102040_3114.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.936799, 0.949252]\n",
"1050 imagenette2-160/train/n02102040/n02102040_3114.JPEG [imagenette2-160/train/n02102040/n02102040_1055.JPEG, imagenette2-160/train/n02102040/n02102040_1306.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.941953, 0.949252]\n",
"231 imagenette2-160/train/n01440764/n01440764_13978.JPEG [imagenette2-160/val/n01440764/n01440764_6341.JPEG, imagenette2-160/val/n01440764/n01440764_8210.JPEG] [tench, tench] [tench, tench] [0.943767, 0.945909]\n",
"6846 imagenette2-160/val/n02102040/n02102040_350.JPEG [imagenette2-160/val/n02102040/n02102040_312.JPEG, imagenette2-160/train/n02102040/n02102040_6313.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.945413, 0.947323]\n",
"1339 imagenette2-160/train/n02102040/n02102040_6313.JPEG [imagenette2-160/val/n02102040/n02102040_350.JPEG, imagenette2-160/train/n02102040/n02102040_3767.JPEG] [English_springer, English_springer] [English_springer, English_springer] [0.947323, 0.950174]\n",
"\n",
"[9069 rows x 5 columns]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.similarity_gallery() "
]
},
{
"cell_type": "markdown",
"id": "c2c393be-2b42-4814-8688-03d2be9e8998",
"metadata": {},
"source": [
"## Similar Image Pairs\n",
"\n",
"Find similar image pairs within and across the train and validation subfolders. Pairs may include train-train, train-val, val-train, and val-val."
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "9e065403-582b-4f94-855b-33fd8f4826a1",
"metadata": {
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n",
"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:106: SettingWithCopyWarning: \n",
"A value is trying to be set on a copy of a slice from a DataFrame.\n",
"Try using .loc[row_indexer,col_indexer] = value instead\n",
"\n",
"See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
" df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n",
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 437.97it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored similarity visual view in work_dir/galleries/duplicates.html\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Duplicates Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Duplicates Report
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.968786 | \n",
"
\n",
"\n",
" From | \n",
" /val/n03394916/n03394916_30631.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n03394916/n03394916_44127.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" French_horn | \n",
"
\n",
"\n",
" To_Label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.962458 | \n",
"
\n",
"\n",
" From | \n",
" /train/n03445777/n03445777_13918.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n03445777/n03445777_6882.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" golf_ball | \n",
"
\n",
"\n",
" To_Label | \n",
" golf_ball | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.953837 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_1564.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_3837.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.953413 | \n",
"
\n",
"\n",
" From | \n",
" /train/n01440764/n01440764_7457.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n01440764/n01440764_11339.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" tench | \n",
"
\n",
"\n",
" To_Label | \n",
" tench | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.952239 | \n",
"
\n",
"\n",
" From | \n",
" /train/n03417042/n03417042_12906.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n03417042/n03417042_1578.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" garbage_truck | \n",
"
\n",
"\n",
" To_Label | \n",
" garbage_truck | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.951679 | \n",
"
\n",
"\n",
" From | \n",
" /val/n03394916/n03394916_6830.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n03394916/n03394916_21092.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" French_horn | \n",
"
\n",
"\n",
" To_Label | \n",
" French_horn | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.950477 | \n",
"
\n",
"\n",
" From | \n",
" /train/n03888257/n03888257_21027.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n03888257/n03888257_11210.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" parachute | \n",
"
\n",
"\n",
" To_Label | \n",
" parachute | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.950174 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_3767.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_6313.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.949877 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/ILSVRC2012_val_00032959.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /val/n02102040/n02102040_662.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.949252 | \n",
"
\n",
"\n",
" From | \n",
" /train/n02102040/n02102040_3114.JPEG | \n",
"
\n",
"\n",
" To | \n",
" /train/n02102040/n02102040_1306.JPEG | \n",
"
\n",
"\n",
" From_Label | \n",
" English_springer | \n",
"
\n",
"\n",
" To_Label | \n",
" English_springer | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.duplicates_gallery()"
]
},
{
"cell_type": "markdown",
"id": "e10989e1",
"metadata": {},
"source": [
"Show similar image pairs."
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3ea590e9-d221-4202-b03b-e5fef4487c89",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 270
},
"executionInfo": {
"elapsed": 499,
"status": "ok",
"timestamp": 1677667342908,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "3ea590e9-d221-4202-b03b-e5fef4487c89",
"outputId": "3c5f4cc0-0ba5-42a0-e01b-f165e9cf655c",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" from | \n",
" to | \n",
" distance | \n",
" filename_from | \n",
" label_from | \n",
" split_from | \n",
" index_x | \n",
" error_code_from | \n",
" is_valid_from | \n",
" fd_index_from | \n",
" filename_to | \n",
" label_to | \n",
" split_to | \n",
" index_y | \n",
" error_code_to | \n",
" is_valid_to | \n",
" fd_index_to | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 11960 | \n",
" 5925 | \n",
" 0.968786 | \n",
" imagenette2-160/val/n03394916/n03394916_30631.JPEG | \n",
" French_horn | \n",
" val | \n",
" 11960 | \n",
" VALID | \n",
" True | \n",
" 11960 | \n",
" imagenette2-160/train/n03394916/n03394916_44127.JPEG | \n",
" French_horn | \n",
" train | \n",
" 5925 | \n",
" VALID | \n",
" True | \n",
" 5925 | \n",
"
\n",
" \n",
" 1 | \n",
" 5925 | \n",
" 11960 | \n",
" 0.968786 | \n",
" imagenette2-160/train/n03394916/n03394916_44127.JPEG | \n",
" French_horn | \n",
" train | \n",
" 5925 | \n",
" VALID | \n",
" True | \n",
" 5925 | \n",
" imagenette2-160/val/n03394916/n03394916_30631.JPEG | \n",
" French_horn | \n",
" val | \n",
" 11960 | \n",
" VALID | \n",
" True | \n",
" 11960 | \n",
"
\n",
" \n",
" 2 | \n",
" 12613 | \n",
" 7916 | \n",
" 0.962458 | \n",
" imagenette2-160/val/n03445777/n03445777_6882.JPEG | \n",
" golf_ball | \n",
" val | \n",
" 12613 | \n",
" VALID | \n",
" True | \n",
" 12613 | \n",
" imagenette2-160/train/n03445777/n03445777_13918.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7916 | \n",
" VALID | \n",
" True | \n",
" 7916 | \n",
"
\n",
" \n",
" 3 | \n",
" 7916 | \n",
" 12613 | \n",
" 0.962458 | \n",
" imagenette2-160/train/n03445777/n03445777_13918.JPEG | \n",
" golf_ball | \n",
" train | \n",
" 7916 | \n",
" VALID | \n",
" True | \n",
" 7916 | \n",
" imagenette2-160/val/n03445777/n03445777_6882.JPEG | \n",
" golf_ball | \n",
" val | \n",
" 12613 | \n",
" VALID | \n",
" True | \n",
" 12613 | \n",
"
\n",
" \n",
" 4 | \n",
" 3464 | \n",
" 3486 | \n",
" 0.953837 | \n",
" imagenette2-160/train/n02102040/n02102040_3837.JPEG | \n",
" English_springer | \n",
" train | \n",
" 3464 | \n",
" VALID | \n",
" True | \n",
" 3464 | \n",
" imagenette2-160/train/n02102040/n02102040_1564.JPEG | \n",
" English_springer | \n",
" train | \n",
" 3486 | \n",
" VALID | \n",
" True | \n",
" 3486 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" from to distance filename_from label_from split_from index_x error_code_from is_valid_from fd_index_from filename_to label_to split_to index_y error_code_to is_valid_to fd_index_to\n",
"0 11960 5925 0.968786 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11960 VALID True 11960 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5925 VALID True 5925\n",
"1 5925 11960 0.968786 imagenette2-160/train/n03394916/n03394916_44127.JPEG French_horn train 5925 VALID True 5925 imagenette2-160/val/n03394916/n03394916_30631.JPEG French_horn val 11960 VALID True 11960\n",
"2 12613 7916 0.962458 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12613 VALID True 12613 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7916 VALID True 7916\n",
"3 7916 12613 0.962458 imagenette2-160/train/n03445777/n03445777_13918.JPEG golf_ball train 7916 VALID True 7916 imagenette2-160/val/n03445777/n03445777_6882.JPEG golf_ball val 12613 VALID True 12613\n",
"4 3464 3486 0.953837 imagenette2-160/train/n02102040/n02102040_3837.JPEG English_springer train 3464 VALID True 3464 imagenette2-160/train/n02102040/n02102040_1564.JPEG English_springer train 3486 VALID True 3486"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.similarity().head(5)"
]
},
{
"cell_type": "markdown",
"id": "95d21e6d-a951-48dd-8c4c-894c8ba556fd",
"metadata": {},
"source": [
"## Image Clusters"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "4a6db529-cb1e-4655-af50-d97f3e131319",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"output_embedded_package_id": "1Wh1vmG-F-RG0ZYZP1oRgiyqHAtnfsuEk"
},
"executionInfo": {
"elapsed": 6376,
"status": "ok",
"timestamp": 1677667352994,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "4a6db529-cb1e-4655-af50-d97f3e131319",
"outputId": "adfc3ee1-84c9-4aa6-a0db-09a6a800b566",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"cassette_player\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 68.44it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg\n",
"Stored components visual view in work_dir/galleries/components.html\n",
"Execution time in seconds 1.5\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Components Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Components Report
Showing groups of similar images
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1894 | \n",
"
\n",
"\n",
" num_images | \n",
" 161 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" tench | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2812 | \n",
"
\n",
"\n",
" num_images | \n",
" 70 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7313 | \n",
"
\n",
"\n",
" num_images | \n",
" 69 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" golf_ball | \n",
" 54 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1072 | \n",
"
\n",
"\n",
" num_images | \n",
" 21 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 21 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5498 | \n",
"
\n",
"\n",
" num_images | \n",
" 13 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 13 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 994 | \n",
"
\n",
"\n",
" num_images | \n",
" 12 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9025 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 12 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1391 | \n",
"
\n",
"\n",
" num_images | \n",
" 10 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5644 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.902 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1315 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9041 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2781 | \n",
"
\n",
"\n",
" num_images | \n",
" 8 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9062 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 8 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 984 | \n",
"
\n",
"\n",
" num_images | \n",
" 7 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9064 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" garbage_truck | \n",
" 7 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 3034 | \n",
"
\n",
"\n",
" num_images | \n",
" 6 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.903 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" English_springer | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5639 | \n",
"
\n",
"\n",
" num_images | \n",
" 6 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.902 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 6 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 1951 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" tench | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7294 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9019 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" golf_ball | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 4921 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9004 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" parachute | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 5548 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9043 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" French_horn | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 100 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9011 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" cassette_player | \n",
" 5 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7292 | \n",
"
\n",
"\n",
" num_images | \n",
" 4 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9021 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" golf_ball | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 2143 | \n",
"
\n",
"\n",
" num_images | \n",
" 4 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9001 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" tench | \n",
" 4 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.component_gallery()"
]
},
{
"cell_type": "markdown",
"id": "ca5d4b6e-7ff6-49b8-b487-6ba1573ab104",
"metadata": {},
"source": [
"You can also visualize clusters with specific labels using the `slice` parameter. For example let's visualize clusters with the `chain_saw` label"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"output_embedded_package_id": "1xYIrPsODG8kAMaZOpGeKNRoa4-HjPC-w"
},
"executionInfo": {
"elapsed": 5130,
"status": "ok",
"timestamp": 1677667368207,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "4b38dacf-becc-4631-9aeb-6fe9bd235aa1",
"outputId": "131d0f11-5627-4beb-b58c-3801e09a3b42",
"scrolled": false,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"chain_saw\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 449.14it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg\n",
"Stored components visual view in work_dir/galleries/components.html\n",
"Execution time in seconds 0.2\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Components Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Components Report
Showing groups of similar images, for label: chain_saw
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6981 | \n",
"
\n",
"\n",
" num_images | \n",
" 3 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9064 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6421 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9222 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6478 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9355 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6621 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9029 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6766 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9208 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6831 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9198 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6862 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9139 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 6901 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9073 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7033 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9345 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 7067 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9192 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 11637 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9039 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" chain_saw | \n",
" 2 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.component_gallery(slice='chain_saw')"
]
},
{
"cell_type": "markdown",
"id": "28498d81-d073-4f3d-baa4-732e1df93a34",
"metadata": {},
"source": [
"## Connected Components"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0346be91-5380-48b9-a8df-074c342efcd3",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"executionInfo": {
"elapsed": 1036,
"status": "ok",
"timestamp": 1677667380699,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "0346be91-5380-48b9-a8df-074c342efcd3",
"outputId": "ffa6bd9d-b5b3-4ed5-86e1-c47ca9658667",
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" index | \n",
" component_id | \n",
" sum | \n",
" count | \n",
" mean_distance | \n",
" min_distance | \n",
" max_distance | \n",
" filename | \n",
" label | \n",
" split | \n",
" error_code | \n",
" is_valid | \n",
" fd_index | \n",
"
\n",
" \n",
" \n",
" \n",
" 179 | \n",
" 2355 | \n",
" 1894 | \n",
" 513.6729 | \n",
" 562.0 | \n",
" 0.914 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_8673.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 2355 | \n",
"
\n",
" \n",
" 143 | \n",
" 2147 | \n",
" 1894 | \n",
" 513.6729 | \n",
" 562.0 | \n",
" 0.914 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_5658.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 2147 | \n",
"
\n",
" \n",
" 145 | \n",
" 2150 | \n",
" 1894 | \n",
" 513.6729 | \n",
" 562.0 | \n",
" 0.914 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_10726.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 2150 | \n",
"
\n",
" \n",
" 146 | \n",
" 2174 | \n",
" 1894 | \n",
" 513.6729 | \n",
" 562.0 | \n",
" 0.914 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_6974.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 2174 | \n",
"
\n",
" \n",
" 147 | \n",
" 2177 | \n",
" 1894 | \n",
" 513.6729 | \n",
" 562.0 | \n",
" 0.914 | \n",
" 0.9001 | \n",
" 0.9534 | \n",
" imagenette2-160/train/n01440764/n01440764_14294.JPEG | \n",
" tench | \n",
" train | \n",
" VALID | \n",
" True | \n",
" 2177 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" index component_id sum count mean_distance min_distance max_distance filename label split error_code is_valid fd_index\n",
"179 2355 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_8673.JPEG tench train VALID True 2355\n",
"143 2147 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_5658.JPEG tench train VALID True 2147\n",
"145 2150 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_10726.JPEG tench train VALID True 2150\n",
"146 2174 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_6974.JPEG tench train VALID True 2174\n",
"147 2177 1894 513.6729 562.0 0.914 0.9001 0.9534 imagenette2-160/train/n01440764/n01440764_14294.JPEG tench train VALID True 2177"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"cc_df, _ = fd.connected_components()\n",
"cc_df.sort_values('count', ascending=False).head(5)"
]
},
{
"cell_type": "markdown",
"id": "569cb878",
"metadata": {},
"source": [
"We can also get metadata for individual images using their `fastdup_id` available in `fd.annotations()`"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"executionInfo": {
"elapsed": 990,
"status": "ok",
"timestamp": 1677667384644,
"user": {
"displayName": "Tom Shani",
"userId": "00667426488827942961"
},
"user_tz": -120
},
"id": "e80d6817-fed6-4fa4-8714-b01214e0d3f8",
"outputId": "4f973aba-572d-4e50-d22d-c5bfc8cf3d2d",
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'filename': 'imagenette2-160/train/n02979186/n02979186_2819.JPEG',\n",
" 'label': 'cassette_player',\n",
" 'split': 'train',\n",
" 'index': 349,\n",
" 'error_code': 'VALID',\n",
" 'is_valid': True,\n",
" 'fd_index': 349}"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd[349]"
]
},
{
"cell_type": "markdown",
"id": "b059951d",
"metadata": {},
"source": [
"## Wrap Up\n",
"\n",
"Next, feel free to check out other tutorials -\n",
"\n",
"+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n",
"+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n",
"+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n",
"+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. "
]
},
{
"cell_type": "markdown",
"id": "47cf9410",
"metadata": {},
"source": [
"\n",
"## VL Profiler\n",
"If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n",
"\n",
"[Sign up](https://app.visual-layer.com) now, it's free.\n",
"\n",
"[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)\n",
"\n",
"As usual, feedback is welcome! \n",
"\n",
"Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)."
]
}
],
"metadata": {
"colab": {
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}