# Analysing Image Classification Dataset

## Installation & Setting Up

In [None]:
!pip install pip -U
!pip install fastdup

## Download Imagenette Dataset

In [None]:
!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz
!tar -xf imagenette2-160.tgz

## Load and Format Annotations

In [1]:
import pandas as pd

In [2]:
data_dir = 'imagenette2-160/'
csv_path = 'imagenette2-160/noisy_imagenette.csv'

In [3]:
label_map = {
    'n02979186': 'cassette_player', 
    'n03417042': 'garbage_truck', 
    'n01440764': 'tench', 
    'n02102040': 'English_springer', 
    'n03028079': 'church',
    'n03888257': 'parachute', 
    'n03394916': 'French_horn', 
    'n03000684': 'chain_saw', 
    'n03445777': 'golf_ball', 
    'n03425413': 'gas_pump'
}

Load the annotation provided with the dataset.

In [4]:
df_annot = pd.read_csv(csv_path)
df_annot.head(3)

Unnamed: 0,path,noisy_labels_0,noisy_labels_1,noisy_labels_5,noisy_labels_25,noisy_labels_50,is_valid
0,train/n02979186/n02979186_9036.JPEG,n02979186,n02979186,n02979186,n02979186,n02979186,False
1,train/n02979186/n02979186_11957.JPEG,n02979186,n02979186,n02979186,n02979186,n03000684,False
2,train/n02979186/n02979186_9715.JPEG,n02979186,n02979186,n02979186,n03417042,n03000684,False


Transform the annotation to fastdup supported format.

In [5]:
# take relevant columns
df_annot = df_annot[['path', 'noisy_labels_0']]

# rename columns to fastdup's column names
df_annot = df_annot.rename({'noisy_labels_0': 'label', 'path': 'filename'}, axis='columns')

# append datadir
df_annot['filename'] = df_annot['filename'].apply(lambda x: data_dir + x)

# create split column
df_annot['split'] = df_annot['filename'].apply(lambda x: x.split("/")[0])

# map label ids to regular labels
df_annot['label'] = df_annot['label'].map(label_map)

# show formated annotations
df_annot

Unnamed: 0,filename,label,split
0,imagenette2-160/train/n02979186/n02979186_9036...,cassette_player,imagenette2-160
1,imagenette2-160/train/n02979186/n02979186_1195...,cassette_player,imagenette2-160
2,imagenette2-160/train/n02979186/n02979186_9715...,cassette_player,imagenette2-160
3,imagenette2-160/train/n02979186/n02979186_2173...,cassette_player,imagenette2-160
4,imagenette2-160/train/n02979186/ILSVRC2012_val...,cassette_player,imagenette2-160
...,...,...,...
13389,imagenette2-160/val/n03425413/n03425413_17521....,gas_pump,imagenette2-160
13390,imagenette2-160/val/n03425413/n03425413_20711....,gas_pump,imagenette2-160
13391,imagenette2-160/val/n03425413/n03425413_19050....,gas_pump,imagenette2-160
13392,imagenette2-160/val/n03425413/n03425413_13831....,gas_pump,imagenette2-160


## Import & Run fastdup

In this example we run fastdup by providing the annotations.

In [6]:
import fastdup
fastdup.__version__

'0.922'

In [6]:
work_dir = 'fastdup_imagenette'

fd = fastdup.create(work_dir=work_dir, input_dir=data_dir) 
fd.run(annotations=df_annot, ccthreshold=0.9, threshold=0.8)

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-20 17:57:26 [INFO] Going to loop over dir imagenette2-160
2023-03-20 17:57:26 [INFO] Found total 13394 images to run on
2023-03-20 17:57:54 [INFO] Found total 13394 images to run onimated: 0 Minutes 0 Features
2023-03-20 17:57:55 [INFO] 1657) Finished write_index() NN model
2023-03-20 17:57:55 [INFO] Stored nn model index file fastdup_imagenette/nnf.index
2023-03-20 17:57:56 [INFO] Total time took 30624 ms
2023-03-20 17:57:56 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-03-20 17:57:56 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-03-20 17:57:56 [INFO] Found a total of 16741 above threshold images (d>0.800), which are 41.66 %
2023-03-20 17:57:56 [INFO] Found a total of 1339 outlier images         (d<0.050), which are 3.33 %
2023-03-20 17:57:56 [INFO] Min distance found 0.470 max distance 0.969
2023-03-20 17:57:56 [INFO] Running conne

## Outliers

Visualize outliers from the dataset.

In [7]:
fd.vis.outliers_gallery()

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 29767.95it/s]

Stored outliers visual view in  fastdup_imagenette/galleries/outliers.html





Info,Unnamed: 1
Distance,0.469904
Path,val/n03417042/n03417042_29412.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.476124
Path,train/n02979186/n02979186_3967.JPEG
label,cassette_player

Info,Unnamed: 1
Distance,0.47929
Path,val/n03417042/n03417042_91.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.48977
Path,val/n03417042/n03417042_7422.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.505358
Path,train/n03417042/n03417042_15485.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.510293
Path,train/n03417042/n03417042_19447.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.514679
Path,train/n03445777/n03445777_5218.JPEG
label,golf_ball

Info,Unnamed: 1
Distance,0.515321
Path,val/n03417042/n03417042_27581.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.536679
Path,train/n03417042/n03417042_24856.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.541046
Path,train/n03417042/n03417042_15198.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.544796
Path,train/n03888257/n03888257_34639.JPEG
label,parachute

Info,Unnamed: 1
Distance,0.548765
Path,val/n03417042/n03417042_6081.JPEG
label,garbage_truck

Info,Unnamed: 1
Distance,0.555266
Path,train/n03445777/n03445777_3254.JPEG
label,golf_ball

Info,Unnamed: 1
Distance,0.569853
Path,train/n03445777/n03445777_13576.JPEG
label,golf_ball

Info,Unnamed: 1
Distance,0.579928
Path,val/n02102040/n02102040_7670.JPEG
label,English_springer

Info,Unnamed: 1
Distance,0.583889
Path,val/n03445777/n03445777_5932.JPEG
label,golf_ball

Info,Unnamed: 1
Distance,0.590159
Path,train/n03888257/n03888257_79145.JPEG
label,parachute

Info,Unnamed: 1
Distance,0.607759
Path,train/n03394916/n03394916_37544.JPEG
label,French_horn

Info,Unnamed: 1
Distance,0.608525
Path,train/n03394916/n03394916_33663.JPEG
label,French_horn

Info,Unnamed: 1
Distance,0.609526
Path,train/n03888257/n03888257_7793.JPEG
label,parachute


Show outliers image data.

In [8]:
fd.outliers().head(5)

Unnamed: 0,index,outlier,nearest,distance,img_filename_outlier,label_outlier,split_outlier,error_code_outlier,is_valid_outlier,img_filename_nearest,label_nearest,split_nearest,error_code_nearest,is_valid_nearest
0,1338,12009,1757,0.469904,val/n03417042/n03417042_29412.JPEG,garbage_truck,val,VALID,True,train/n02102040/n02102040_7256.JPEG,English_springer,train,VALID,True
1,1336,2664,9763,0.476124,train/n02979186/n02979186_3967.JPEG,cassette_player,train,VALID,True,val/n01440764/n01440764_710.JPEG,tench,val,VALID,True
2,1335,12172,1817,0.47929,val/n03417042/n03417042_91.JPEG,garbage_truck,val,VALID,True,train/n02102040/n02102040_7868.JPEG,English_springer,train,VALID,True
3,1332,12131,1522,0.48977,val/n03417042/n03417042_7422.JPEG,garbage_truck,val,VALID,True,train/n02102040/n02102040_4884.JPEG,English_springer,train,VALID,True
4,1330,5898,1392,0.505358,train/n03417042/n03417042_15485.JPEG,garbage_truck,train,VALID,True,train/n02102040/n02102040_3719.JPEG,English_springer,train,VALID,True


## Comparing Labels of Similar Images
Find possible mislabels by comparing a query image to other images in the dataset.

In [9]:
fd.vis.similarity_gallery() 

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 180.17it/s]


Stored similar images visual view in  fastdup_imagenette/galleries/similarity.html


Info From,Unnamed: 1
label,French_horn
from,/train/n03394916/n03394916_44127.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.968786,/val/n03394916/n03394916_30631.JPEG,French_horn
0.918324,/train/n03394916/n03394916_36016.JPEG,French_horn

0
Query Image

0
Similar

Info From,Unnamed: 1
label,French_horn
from,/val/n03394916/n03394916_30631.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.968786,/train/n03394916/n03394916_44127.JPEG,French_horn
0.903754,/train/n03394916/n03394916_29969.JPEG,French_horn

0
Query Image

0
Similar

Info From,Unnamed: 1
label,golf_ball
from,/val/n03445777/n03445777_6882.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.962459,/train/n03445777/n03445777_13918.JPEG,golf_ball
0.918005,/val/n03445777/n03445777_5912.JPEG,golf_ball

0
Query Image

0
Similar

Info From,Unnamed: 1
label,golf_ball
from,/train/n03445777/n03445777_13918.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.962459,/val/n03445777/n03445777_6882.JPEG,golf_ball
0.91704,/val/n03445777/n03445777_8820.JPEG,golf_ball

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_1564.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.953837,/train/n02102040/n02102040_3837.JPEG,English_springer
0.908732,/train/n02102040/n02102040_3586.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_3837.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.953837,/train/n02102040/n02102040_1564.JPEG,English_springer
0.893944,/train/n02102040/n02102040_3027.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,tench
from,/train/n01440764/n01440764_7457.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.953413,/train/n01440764/n01440764_11339.JPEG,tench
0.918778,/train/n01440764/n01440764_9315.JPEG,tench

0
Query Image

0
Similar

Info From,Unnamed: 1
label,tench
from,/train/n01440764/n01440764_11339.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.953413,/train/n01440764/n01440764_7457.JPEG,tench
0.889166,/train/n01440764/n01440764_12279.JPEG,tench

0
Query Image

0
Similar

Info From,Unnamed: 1
label,garbage_truck
from,/train/n03417042/n03417042_1578.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.952239,/train/n03417042/n03417042_12906.JPEG,garbage_truck
0.837864,/val/n03417042/n03417042_9610.JPEG,garbage_truck

0
Query Image

0
Similar

Info From,Unnamed: 1
label,garbage_truck
from,/train/n03417042/n03417042_12906.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.952239,/train/n03417042/n03417042_1578.JPEG,garbage_truck
0.828749,/train/n03417042/n03417042_27686.JPEG,garbage_truck

0
Query Image

0
Similar

Info From,Unnamed: 1
label,French_horn
from,/val/n03394916/n03394916_6830.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.951679,/val/n03394916/n03394916_21092.JPEG,French_horn
0.893079,/train/n03394916/n03394916_35469.JPEG,French_horn

0
Query Image

0
Similar

Info From,Unnamed: 1
label,French_horn
from,/val/n03394916/n03394916_21092.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.951679,/val/n03394916/n03394916_6830.JPEG,French_horn
0.865771,/train/n03394916/n03394916_35469.JPEG,French_horn

0
Query Image

0
Similar

Info From,Unnamed: 1
label,parachute
from,/train/n03888257/n03888257_21027.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.950477,/val/n03888257/n03888257_11210.JPEG,parachute
0.92043,/val/n03888257/n03888257_12491.JPEG,parachute

0
Query Image

0
Similar

Info From,Unnamed: 1
label,parachute
from,/val/n03888257/n03888257_11210.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.950477,/train/n03888257/n03888257_21027.JPEG,parachute
0.865155,/val/n03888257/n03888257_12491.JPEG,parachute

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_6313.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.950173,/train/n02102040/n02102040_3767.JPEG,English_springer
0.947323,/val/n02102040/n02102040_350.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_3767.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.950173,/train/n02102040/n02102040_6313.JPEG,English_springer
0.914056,/val/n02102040/n02102040_350.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/ILSVRC2012_val_00032959.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.949877,/val/n02102040/n02102040_662.JPEG,English_springer
0.933115,/train/n02102040/n02102040_3114.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/val/n02102040/n02102040_662.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.949877,/train/n02102040/ILSVRC2012_val_00032959.JPEG,English_springer
0.927345,/val/n02102040/n02102040_3502.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_3114.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.949252,/train/n02102040/n02102040_1306.JPEG,English_springer
0.941953,/train/n02102040/n02102040_1055.JPEG,English_springer

0
Query Image

0
Similar

Info From,Unnamed: 1
label,English_springer
from,/train/n02102040/n02102040_1306.JPEG

Info To,Unnamed: 1,Unnamed: 2
0.949252,/train/n02102040/n02102040_3114.JPEG,English_springer
0.936799,/train/n02102040/n02102040_876.JPEG,English_springer

0
Query Image

0
Similar


Unnamed: 0,from,to,label,label2,distance
3630,imagenette2-160/train/n03394916/n03394916_44127.JPEG,"[imagenette2-160/val/n03394916/n03394916_30631.JPEG, imagenette2-160/train/n03394916/n03394916_36016.JPEG]","[French_horn, French_horn]","[French_horn, French_horn]","[0.968786, 0.918324]"
7819,imagenette2-160/val/n03394916/n03394916_30631.JPEG,"[imagenette2-160/train/n03394916/n03394916_44127.JPEG, imagenette2-160/train/n03394916/n03394916_29969.JPEG]","[French_horn, French_horn]","[French_horn, French_horn]","[0.968786, 0.903754]"
8751,imagenette2-160/val/n03445777/n03445777_6882.JPEG,"[imagenette2-160/train/n03445777/n03445777_13918.JPEG, imagenette2-160/val/n03445777/n03445777_5912.JPEG]","[golf_ball, golf_ball]","[golf_ball, golf_ball]","[0.962459, 0.918005]"
5358,imagenette2-160/train/n03445777/n03445777_13918.JPEG,"[imagenette2-160/val/n03445777/n03445777_6882.JPEG, imagenette2-160/val/n03445777/n03445777_8820.JPEG]","[golf_ball, golf_ball]","[golf_ball, golf_ball]","[0.962459, 0.91704]"
896,imagenette2-160/train/n02102040/n02102040_1564.JPEG,"[imagenette2-160/train/n02102040/n02102040_3837.JPEG, imagenette2-160/train/n02102040/n02102040_3586.JPEG]","[English_springer, English_springer]","[English_springer, English_springer]","[0.953837, 0.908732]"
...,...,...,...,...,...
5911,imagenette2-160/train/n03888257/n03888257_12816.JPEG,[imagenette2-160/train/n03888257/n03888257_38633.JPEG],[parachute],[parachute],[0.800073]
6219,imagenette2-160/train/n03888257/n03888257_38633.JPEG,[imagenette2-160/train/n03888257/n03888257_12816.JPEG],[parachute],[parachute],[0.800073]
4320,imagenette2-160/train/n03417042/n03417042_3236.JPEG,[imagenette2-160/train/n03417042/n03417042_12297.JPEG],[garbage_truck],[garbage_truck],[0.800024]
3429,imagenette2-160/train/n03394916/n03394916_32478.JPEG,[imagenette2-160/train/n03394916/n03394916_35573.JPEG],[French_horn],[French_horn],[0.800012]


## Similar Image Pairs

Find similar image pairs within and across the train and validation subfolders. Pairs may include train-train, train-val, val-train, and val-val.

In [16]:
fd.vis.duplicates_gallery()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 343.69it/s]


Stored similarity visual view in  fastdup_imagenette/galleries/duplicates.html


Info,Unnamed: 1
Distance,0.968786
From,French_horn
To,French_horn

Info,Unnamed: 1
Distance,0.962459
From,golf_ball
To,golf_ball

Info,Unnamed: 1
Distance,0.953837
From,English_springer
To,English_springer

Info,Unnamed: 1
Distance,0.953413
From,tench
To,tench

Info,Unnamed: 1
Distance,0.952239
From,garbage_truck
To,garbage_truck

Info,Unnamed: 1
Distance,0.951679
From,French_horn
To,French_horn

Info,Unnamed: 1
Distance,0.950477
From,parachute
To,parachute

Info,Unnamed: 1
Distance,0.950173
From,English_springer
To,English_springer

Info,Unnamed: 1
Distance,0.949877
From,English_springer
To,English_springer

Info,Unnamed: 1
Distance,0.949252
From,English_springer
To,English_springer


Show similar image pairs.

In [10]:
fd.similarity().head(5)

Unnamed: 0,from,to,distance,img_filename_from,label_from,split_from,error_code_from,is_valid_from,img_filename_to,label_to,split_to,error_code_to,is_valid_to
0,11521,5390,0.968786,val/n03394916/n03394916_30631.JPEG,French_horn,val,VALID,True,train/n03394916/n03394916_44127.JPEG,French_horn,train,VALID,True
1,5390,11521,0.968786,train/n03394916/n03394916_44127.JPEG,French_horn,train,VALID,True,val/n03394916/n03394916_30631.JPEG,French_horn,val,VALID,True
2,12914,7715,0.962459,val/n03445777/n03445777_6882.JPEG,golf_ball,val,VALID,True,train/n03445777/n03445777_13918.JPEG,golf_ball,train,VALID,True
3,7715,12914,0.962459,train/n03445777/n03445777_13918.JPEG,golf_ball,train,VALID,True,val/n03445777/n03445777_6882.JPEG,golf_ball,val,VALID,True
4,1117,1404,0.953837,train/n02102040/n02102040_1564.JPEG,English_springer,train,VALID,True,train/n02102040/n02102040_3837.JPEG,English_springer,train,VALID,True


## Image Clusters

In [11]:
fd.vis.component_gallery()

tench


100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 97.47it/s]


Finished OK. Components are stored as image files fastdup_imagenette/galleries/components_[index].jpg
Stored components visual view in  fastdup_imagenette/galleries/components.html
Execution time in seconds 2.3


Info,Unnamed: 1
component,36.0
num_images,24.0
mean_distance,0.9003

Label,Unnamed: 1
tench,24

Info,Unnamed: 1
component,7332.0
num_images,22.0
mean_distance,0.9011

Label,Unnamed: 1
golf_ball,22

Info,Unnamed: 1
component,143.0
num_images,16.0
mean_distance,0.9003

Label,Unnamed: 1
tench,16

Info,Unnamed: 1
component,6.0
num_images,13.0
mean_distance,0.9023

Label,Unnamed: 1
tench,13

Info,Unnamed: 1
component,10.0
num_images,11.0
mean_distance,0.9065

Label,Unnamed: 1
tench,11

Info,Unnamed: 1
component,4589.0
num_images,11.0
mean_distance,0.9005

Label,Unnamed: 1
French_horn,11

Info,Unnamed: 1
component,900.0
num_images,10.0
mean_distance,0.9018

Label,Unnamed: 1
English_springer,10

Info,Unnamed: 1
component,5491.0
num_images,10.0
mean_distance,0.9001

Label,Unnamed: 1
garbage_truck,10

Info,Unnamed: 1
component,150.0
num_images,10.0
mean_distance,0.9032

Label,Unnamed: 1
tench,10

Info,Unnamed: 1
component,7341.0
num_images,9.0
mean_distance,0.9112

Label,Unnamed: 1
golf_ball,9

Info,Unnamed: 1
component,7355.0
num_images,8.0
mean_distance,0.9057

Label,Unnamed: 1
golf_ball,8

Info,Unnamed: 1
component,5478.0
num_images,8.0
mean_distance,0.9025

Label,Unnamed: 1
garbage_truck,8

Info,Unnamed: 1
component,151.0
num_images,7.0
mean_distance,0.9006

Label,Unnamed: 1
tench,7

Info,Unnamed: 1
component,902.0
num_images,7.0
mean_distance,0.9044

Label,Unnamed: 1
English_springer,7

Info,Unnamed: 1
component,4571.0
num_images,6.0
mean_distance,0.9038

Label,Unnamed: 1
French_horn,6

Info,Unnamed: 1
component,41.0
num_images,6.0
mean_distance,0.9007

Label,Unnamed: 1
tench,6

Info,Unnamed: 1
component,5718.0
num_images,6.0
mean_distance,0.9043

Label,Unnamed: 1
garbage_truck,6

Info,Unnamed: 1
component,917.0
num_images,5.0
mean_distance,0.9037

Label,Unnamed: 1
English_springer,5

Info,Unnamed: 1
component,8448.0
num_images,5.0
mean_distance,0.9004

Label,Unnamed: 1
parachute,5

Info,Unnamed: 1
component,218.0
num_images,5.0
mean_distance,0.9

Label,Unnamed: 1
tench,5


You can also visualize clusters with specific labels using the `slice` parameter. For example let's visualize clusters with the `chain_saw` label

In [12]:
fd.vis.component_gallery(slice='chain_saw')

chain_saw


100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 313.24it/s]

Finished OK. Components are stored as image files fastdup_imagenette/galleries/components_[index].jpg
Stored components visual view in  fastdup_imagenette/galleries/components.html
Execution time in seconds 1.4





Info,Unnamed: 1
component,2953.0
num_images,3.0
mean_distance,0.9064

Label,Unnamed: 1
chain_saw,3

Info,Unnamed: 1
component,2875.0
num_images,2.0
mean_distance,0.9029

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,2891.0
num_images,2.0
mean_distance,0.9208

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,2939.0
num_images,2.0
mean_distance,0.9222

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3065.0
num_images,2.0
mean_distance,0.9139

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3068.0
num_images,2.0
mean_distance,0.9198

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3077.0
num_images,2.0
mean_distance,0.9073

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3078.0
num_images,2.0
mean_distance,0.9192

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3153.0
num_images,2.0
mean_distance,0.9355

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,3381.0
num_images,2.0
mean_distance,0.9345

Label,Unnamed: 1
chain_saw,2

Info,Unnamed: 1
component,10340.0
num_images,2.0
mean_distance,0.9039

Label,Unnamed: 1
chain_saw,2


## Connected Components

In [13]:
cc_df, _ = fd.connected_components()
cc_df.sort_values('count', ascending=False).head(5)

Unnamed: 0,fastdup_id,component_id,sum,count,mean_distance,min_distance,max_distance,img_filename,label,split,error_code,is_valid
7778,7778,7332,36.6734,40.0,0.9168,0.9011,0.9328,train/n03445777/n03445777_16186.JPEG,golf_ball,train,VALID,True
7990,7990,7332,36.6734,40.0,0.9168,0.9011,0.9328,train/n03445777/n03445777_3503.JPEG,golf_ball,train,VALID,True
682,682,36,36.5815,40.0,0.9145,0.9003,0.9339,train/n01440764/n01440764_6159.JPEG,tench,train,VALID,True
9545,9545,36,36.5815,40.0,0.9145,0.9003,0.9339,val/n01440764/n01440764_12250.JPEG,tench,val,VALID,True
7651,7651,7332,36.6734,40.0,0.9168,0.9011,0.9328,train/n03445777/n03445777_11389.JPEG,golf_ball,train,VALID,True


We can also get metadata for individual images using their `fastdup_id` available in `fd.annotations()`

In [14]:
fd[349]

{'img_filename': 'train/n01440764/n01440764_17789.JPEG',
 'label': 'tench',
 'split': 'train',
 'fastdup_id': 349,
 'error_code': 'VALID',
 'is_valid': True}