{
"cells": [
{
"cell_type": "markdown",
"id": "ee446f4a",
"metadata": {},
"source": [
"[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)"
]
},
{
"cell_type": "markdown",
"id": "2d3a2ba6-3ba0-4770-b025-c88adf5b292e",
"metadata": {},
"source": [
"# fastdup for Satellite Imagery\n",
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/satellite-image-analysis.ipynb)\n",
"[![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/satellite-image-analysis.ipynb)\n",
"\n",
"In this notebook we load satellite data from Mafat Competition https://mafatchallenge.mod.gov.il/, which consists of 16 bit grayscale images with rotated bounding boxes.\n",
"\n",
"The dataset is also available on Kaggle [here](https://www.kaggle.com/datasets/dragonzhang/mafat-train-dataset).\n",
"\n",
"We show how to work with this dataset using fastdup. It takes 140 seconds to process 18,000 bounding boxes and find all similarities.\n",
"\n",
"We use components gallery to highly suspected wrong bounding boxes as well as correct bounding boxes.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "b2cc8c20-4069-4183-a247-0dc28788b158",
"metadata": {},
"outputs": [],
"source": [
"!pip install fastdup -Uq"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "51b8ea18",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"/usr/bin/dpkg\n"
]
},
{
"data": {
"text/plain": [
"'1.26'"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import fastdup\n",
"fastdup.__version__"
]
},
{
"cell_type": "markdown",
"id": "eb290525",
"metadata": {},
"source": [
"Download mafat traing data, extract the zip file and put the notebook one level below images/ folder"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73ec897b",
"metadata": {},
"outputs": [],
"source": [
"!kaggle datasets download -d dragonzhang/mafat-train-dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15dc9cf9",
"metadata": {},
"outputs": [],
"source": [
"!unzip mafat-train-dataset.zip"
]
},
{
"cell_type": "markdown",
"id": "538d2699-4678-4f0b-a570-412d4a97c7ae",
"metadata": {},
"source": [
"## Prepare annotation for fastdup format\n",
"\n",
"\n",
"Here we read the data as given in the competition, one annotation file per each image. We combine all files into a single flat table"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "8e6087e1-9a59-4958-9110-a199c35c10f6",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"files=!ls labelTxt\n",
"files = [os.path.join('labelTxt', f) for f in files]"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d64f0fa9-2ae4-4636-8866-a5303a490669",
"metadata": {},
"outputs": [],
"source": [
"def read_annotations(f):\n",
" with open(f, 'r') as fd:\n",
" lines = fd.readlines()\n",
"\n",
" bounding_boxes = []\n",
"\n",
" for line in lines:\n",
" tokens = line.split()\n",
" x1, y1, x2, y2, x3, y3, x4, y4 = map(float, tokens[:8])\n",
" label = tokens[8]\n",
" bounding_box = {'annot':f , 'x1': x1, 'y1': y1, 'x2': x2, 'y2': y2, 'x3': x3, 'y3': y3, 'x4': x4, 'y4': y4, 'label': label}\n",
" bounding_boxes.append(bounding_box)\n",
" return bounding_boxes"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "696a9865-8a7d-45e4-9f8b-eea4b424c91f",
"metadata": {},
"outputs": [],
"source": [
"annot = []\n",
"for f in files:\n",
" annot.extend(read_annotations(f))"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "d6d95cd5-990c-4ce0-9a0d-c127a8a456b6",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" annot | \n",
" x1 | \n",
" y1 | \n",
" x2 | \n",
" y2 | \n",
" x3 | \n",
" y3 | \n",
" x4 | \n",
" y4 | \n",
" label | \n",
" filename | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" labelTxt/126_0_0.txt | \n",
" 1221.94 | \n",
" 423.54 | \n",
" 1229.28 | \n",
" 404.73 | \n",
" 1236.34 | \n",
" 407.49 | \n",
" 1229.00 | \n",
" 426.30 | \n",
" large_vehicle | \n",
" images/126_0_0.tiff | \n",
"
\n",
" \n",
" 1 | \n",
" labelTxt/126_0_0.txt | \n",
" 445.80 | \n",
" 729.00 | \n",
" 457.34 | \n",
" 729.60 | \n",
" 457.01 | \n",
" 735.82 | \n",
" 445.47 | \n",
" 735.22 | \n",
" medium_vehicle | \n",
" images/126_0_0.tiff | \n",
"
\n",
" \n",
" 2 | \n",
" labelTxt/126_0_0.txt | \n",
" 1059.83 | \n",
" 237.72 | \n",
" 1079.99 | \n",
" 225.27 | \n",
" 1084.31 | \n",
" 232.27 | \n",
" 1064.15 | \n",
" 244.72 | \n",
" heavy_equipment | \n",
" images/126_0_0.tiff | \n",
"
\n",
" \n",
" 3 | \n",
" labelTxt/126_0_0.txt | \n",
" 964.83 | \n",
" 831.37 | \n",
" 981.88 | \n",
" 832.92 | \n",
" 981.26 | \n",
" 839.71 | \n",
" 964.21 | \n",
" 838.16 | \n",
" medium_vehicle | \n",
" images/126_0_0.tiff | \n",
"
\n",
" \n",
" 4 | \n",
" labelTxt/126_0_0.txt | \n",
" 985.48 | \n",
" 867.08 | \n",
" 1001.37 | \n",
" 868.52 | \n",
" 1000.75 | \n",
" 875.29 | \n",
" 984.86 | \n",
" 873.85 | \n",
" medium_vehicle | \n",
" images/126_0_0.tiff | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" annot x1 y1 x2 y2 x3 y3 x4 y4 label filename\n",
"0 labelTxt/126_0_0.txt 1221.94 423.54 1229.28 404.73 1236.34 407.49 1229.00 426.30 large_vehicle images/126_0_0.tiff\n",
"1 labelTxt/126_0_0.txt 445.80 729.00 457.34 729.60 457.01 735.82 445.47 735.22 medium_vehicle images/126_0_0.tiff\n",
"2 labelTxt/126_0_0.txt 1059.83 237.72 1079.99 225.27 1084.31 232.27 1064.15 244.72 heavy_equipment images/126_0_0.tiff\n",
"3 labelTxt/126_0_0.txt 964.83 831.37 981.88 832.92 981.26 839.71 964.21 838.16 medium_vehicle images/126_0_0.tiff\n",
"4 labelTxt/126_0_0.txt 985.48 867.08 1001.37 868.52 1000.75 875.29 984.86 873.85 medium_vehicle images/126_0_0.tiff"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.DataFrame(annot)\n",
"df['filename'] = df['annot'].apply(lambda x: x.replace('labelTxt', 'images').replace('.txt', '.tiff'))\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "1b4ccdaa-6162-4684-9808-303966e080bd",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total annotations 117\n"
]
}
],
"source": [
"print('total annotations', len(df))"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "c46545d0-3e52-4257-91cf-68e1a2b8d10c",
"metadata": {},
"outputs": [],
"source": [
"df.index.name = 'index'\n",
"df[['filename', 'x1', 'y1', 'x2', 'y2', 'x3', 'y3', 'x4', 'y4', 'label']].to_csv('mafat.csv',index_label='index')"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "92a5df03-d456-40a2-a01f-42a47f6835b5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"index,filename,x1,y1,x2,y2,x3,y3,x4,y4,label\r\n",
"0,images/126_0_0.tiff,1221.94,423.54,1229.28,404.73,1236.34,407.49,1229.0,426.3,large_vehicle\r\n",
"1,images/126_0_0.tiff,445.8,729.0,457.34,729.6,457.01,735.82,445.47,735.22,medium_vehicle\r\n",
"2,images/126_0_0.tiff,1059.83,237.72,1079.99,225.27,1084.31,232.27,1064.15,244.72,heavy_equipment\r\n",
"3,images/126_0_0.tiff,964.83,831.37,981.88,832.92,981.26,839.71,964.21,838.16,medium_vehicle\r\n",
"4,images/126_0_0.tiff,985.48,867.08,1001.37,868.52,1000.75,875.29,984.86,873.85,medium_vehicle\r\n",
"5,images/126_0_0.tiff,1012.44,839.59,1031.34,841.31,1030.73,848.08,1011.83,846.36,large_vehicle\r\n",
"6,images/126_0_0.tiff,7.4,262.78,25.79,261.82,26.21,269.89,7.82,270.85,large_vehicle\r\n",
"7,images/126_0_0.tiff,1121.18,877.51,1137.87,879.03,1137.25,885.8,1120.56,884.28,medium_vehicle\r\n",
"8,images/126_0_0.tiff,571.05,753.26,585.66,754.02,585.31,760.57,570.7,759.81,medium_vehicle\r\n"
]
}
],
"source": [
"# This is the required input by fastdup\n",
"!head mafat.csv"
]
},
{
"cell_type": "markdown",
"id": "620799ea-3318-4a74-8dd0-d74ec3f42849",
"metadata": {},
"source": [
"## Run fastdup to crop and build a model for the crops"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "d8dcc080-7ef8-4789-8e14-7b56794c4d22",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import cv2\n",
"\n",
"!rm -fr output"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "d5abac7d-3b78-4090-9c6a-50abea31b0db",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import fastdup\n",
"df = pd.read_csv('mafat.csv')\n",
"fd = fastdup.create(input_dir='.', work_dir='output')\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "94156d52-1c7d-400f-a0c2-63df5648a0e9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n",
"2023-07-13 18:58:04 [INFO] Going to loop over dir /tmp/tmplebc1a_5.csv\n",
"2023-07-13 18:58:04 [INFO] Found total 117 images to run on, 117 train, 0 test, name list 117, counter 117 \n",
"FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.utes\n",
"2023-07-13 18:58:05 [INFO] Going to loop over dir /tmp/crops_input.csv\n",
"2023-07-13 18:58:05 [INFO] Found total 117 images to run on, 117 train, 0 test, name list 117, counter 117 \n",
"2023-07-13 18:58:06 [INFO] Found total 117 images to run onstimated: 0 Minutes\n",
"Finished histogram 0.048\n",
"Finished bucket sort 0.056\n",
"2023-07-13 18:58:06 [INFO] 10) Finished write_index() NN model\n",
"2023-07-13 18:58:06 [INFO] Stored nn model index file output/nnf.index\n",
"2023-07-13 18:58:06 [INFO] Total time took 1021 ms\n",
"2023-07-13 18:58:06 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %\n",
"2023-07-13 18:58:06 [INFO] Found a total of 2 nearly identical images(d>0.980), which are 0.85 %\n",
"2023-07-13 18:58:06 [INFO] Found a total of 193 above threshold images (d>0.900), which are 82.48 %\n",
"2023-07-13 18:58:06 [INFO] Found a total of 11 outlier images (d<0.050), which are 4.70 %\n",
"2023-07-13 18:58:06 [INFO] Min distance found 0.455 max distance 0.982\n",
"2023-07-13 18:58:06 [INFO] Running connected components for ccthreshold 0.950000 \n",
".0\n",
" ########################################################################################\n",
"\n",
"Dataset Analysis Summary: \n",
"\n",
" Dataset contains 117 images\n",
" Valid images are 100.00% (117) of the data, invalid are 0.00% (0) of the data\n",
" Similarity: 18.80% (22) belong to 4 similarity clusters (components).\n",
" 81.20% (95) images do not belong to any similarity cluster.\n",
" Largest cluster has 82 (70.09%) images.\n",
" For a detailed analysis, use `.connected_components()`\n",
"(similarity threshold used is 0.9, connected component threshold used is 0.95).\n",
"\n",
" Outliers: 5.98% (7) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n",
" For a detailed list of outliers, use `.outliers()`.\n"
]
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.run(annotations=df, overwrite=True, bounding_box='rotated', augmentation_additive_margin=15,\n",
" verbose=False, ccthreshold=0.95)"
]
},
{
"cell_type": "markdown",
"id": "a834aaaa-a76c-49bc-b293-c3c3e114d7aa",
"metadata": {},
"source": [
"## Find suspected wrong bounding boxes\n",
"\n",
"From - crop image name\n",
"To - similar images\n",
"where the labels are not matching"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "4e445a56-ffa9-448d-9e74-715413fc4f3c",
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"medium_vehicle\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 357.88it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Finished OK. Components are stored as image files output/galleries/components_[index].jpg\n",
"Stored components visual view in output/galleries/components.html\n",
"Execution time in seconds 0.1\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Components Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Components Report
Showing groups of similar images, from different classes
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 45 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9688 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" medium_vehicle | \n",
" 1 | \n",
"
\n",
"\n",
" small_vessel | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 59 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9576 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" medium_vehicle | \n",
" 1 | \n",
"
\n",
"\n",
" medium_vessel | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 63 | \n",
"
\n",
"\n",
" num_images | \n",
" 5 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9573 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" heavy_equipment | \n",
" 2 | \n",
"
\n",
"\n",
" medium_vehicle | \n",
" 2 | \n",
"
\n",
"\n",
" small_aircraft | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 64 | \n",
"
\n",
"\n",
" num_images | \n",
" 2 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.9554 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" heavy_equipment | \n",
" 1 | \n",
"
\n",
"\n",
" small_vessel | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 15 | \n",
"
\n",
"\n",
" num_images | \n",
" 10 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.955 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" small_vessel | \n",
" 4 | \n",
"
\n",
"\n",
" medium_vehicle | \n",
" 3 | \n",
"
\n",
"\n",
" medium_vessel | \n",
" 2 | \n",
"
\n",
"\n",
" heavy_equipment | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" component | \n",
" 13 | \n",
"
\n",
"\n",
" num_images | \n",
" 23 | \n",
"
\n",
"\n",
" mean_distance | \n",
" 0.95 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Label | \n",
"
\n",
"\n",
" small_vessel | \n",
" 10 | \n",
"
\n",
"\n",
" medium_vehicle | \n",
" 7 | \n",
"
\n",
"\n",
" medium_vessel | \n",
" 4 | \n",
"
\n",
"\n",
" heavy_equipment | \n",
" 1 | \n",
"
\n",
"\n",
" large_aircraft | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.component_gallery(load_crops=True, enhance_image=True, keep_aspect_ratio=True, \n",
" slice='diff', num_images=20, save_artifacts=True)"
]
},
{
"cell_type": "markdown",
"id": "c0b9d32a",
"metadata": {},
"source": [
"Looking at the raw cluster to link back cluster name to to file"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "d1129fcd-ab0b-4ef7-93a0-30fea445be2f",
"metadata": {},
"outputs": [],
"source": [
"df = pd.read_csv('output/galleries/components.csv')"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "1422b9cd-34cf-496f-be2a-48ca5f358193",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Unnamed: 0 | \n",
" component_id | \n",
" files | \n",
" label | \n",
" files_ids | \n",
" distance | \n",
" len | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 45 | \n",
" 45 | \n",
" ['output/crops/images126_0_5120.tiff_704_1078_710_1079_709_1091_703_1091.jpg', 'output/crops/images126_0_5120.tiff_991_1081_1004_1081_1004_1086_991_1086.jpg'] | \n",
" ['medium_vehicle', 'small_vessel'] | \n",
" [50, 72] | \n",
" 0.9688 | \n",
" 2 | \n",
"
\n",
" \n",
" 1 | \n",
" 59 | \n",
" 59 | \n",
" ['output/crops/images126_0_5120.tiff_241_1265_259_1265_259_1273_241_1273.jpg', 'output/crops/images126_0_5120.tiff_1166_1005_1181_1005_1181_1009_1166_1010.jpg'] | \n",
" ['medium_vehicle', 'medium_vessel'] | \n",
" [88, 90] | \n",
" 0.9576 | \n",
" 2 | \n",
"
\n",
" \n",
" 2 | \n",
" 63 | \n",
" 63 | \n",
" ['output/crops/images126_1280_5120.tiff_996_134_1012_134_1012_141_996_141.jpg', 'output/crops/images126_1280_5120.tiff_192_81_197_80_197_91_193_91.jpg', 'output/crops/images126_1280_5120.tiff_191_101_196_101_196_111_191_111.jpg', 'output/crops/images126_1280_5120.tiff_1012_148_1030_161_1024_170_1006_156.jpg', 'output/crops/images126_1280_5120.tiff_909_1133_909_1107_939_1107_939_1132.jpg'] | \n",
" ['heavy_equipment', 'medium_vehicle', 'medium_vehicle', 'heavy_equipment', 'small_aircraft'] | \n",
" [93, 99, 103, 104, 114] | \n",
" 0.9573 | \n",
" 5 | \n",
"
\n",
" \n",
" 3 | \n",
" 64 | \n",
" 64 | \n",
" ['output/crops/images126_0_5120.tiff_1134_1049_1134_1061_1129_1061_1129_1050.jpg', 'output/crops/images126_1280_5120.tiff_267_1221_253_1206_259_1201_273_1215.jpg'] | \n",
" ['small_vessel', 'heavy_equipment'] | \n",
" [94, 115] | \n",
" 0.9554 | \n",
" 2 | \n",
"
\n",
" \n",
" 4 | \n",
" 15 | \n",
" 15 | \n",
" ['output/crops/images126_0_0.tiff_964_831_981_832_981_839_964_838.jpg', 'output/crops/images126_0_5120.tiff_987_1097_997_1097_997_1101_986_1101.jpg', 'output/crops/images126_0_5120.tiff_1149_1050_1149_1065_1143_1065_1143_1051.jpg', 'output/crops/images126_0_5120.tiff_1163_998_1174_998_1174_1003_1163_1003.jpg', 'output/crops/images126_0_5120.tiff_1063_1171_1075_1171_1075_1176_1063_1177.jpg', 'output/crops/images126_0_5120.tiff_1124_1050_1125_1064_1120_1064_1119_1051.jpg', 'output/crops/images126_1280_5120.tiff_1049_127_1064_127_1064_134_1049_134.jpg', 'output/crops/images126_0_5120.tiff_1228_1005_1243_1005_1243_1011_1228_1011.jpg', 'output/crops/images126_1280_5120.tiff_931_143_937_143_937_161_931_161.jpg', 'output/crops/images126_1280_5120.tiff_300_170_315_170_315_177_300_177.jpg'] | \n",
" ['medium_vehicle', 'small_vessel', 'medium_vessel', 'small_vessel', 'small_vessel', 'small_vessel', 'heavy_equipment', 'medium_vessel', 'medium_vehicle', 'medium_vehicle'] | \n",
" [15, 48, 55, 67, 75, 80, 87, 97, 106, 107] | \n",
" 0.9550 | \n",
" 10 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Unnamed: 0 component_id files label \\\n",
"0 45 45 ['output/crops/images126_0_5120.tiff_704_1078_710_1079_709_1091_703_1091.jpg', 'output/crops/images126_0_5120.tiff_991_1081_1004_1081_1004_1086_991_1086.jpg'] ['medium_vehicle', 'small_vessel'] \n",
"1 59 59 ['output/crops/images126_0_5120.tiff_241_1265_259_1265_259_1273_241_1273.jpg', 'output/crops/images126_0_5120.tiff_1166_1005_1181_1005_1181_1009_1166_1010.jpg'] ['medium_vehicle', 'medium_vessel'] \n",
"2 63 63 ['output/crops/images126_1280_5120.tiff_996_134_1012_134_1012_141_996_141.jpg', 'output/crops/images126_1280_5120.tiff_192_81_197_80_197_91_193_91.jpg', 'output/crops/images126_1280_5120.tiff_191_101_196_101_196_111_191_111.jpg', 'output/crops/images126_1280_5120.tiff_1012_148_1030_161_1024_170_1006_156.jpg', 'output/crops/images126_1280_5120.tiff_909_1133_909_1107_939_1107_939_1132.jpg'] ['heavy_equipment', 'medium_vehicle', 'medium_vehicle', 'heavy_equipment', 'small_aircraft'] \n",
"3 64 64 ['output/crops/images126_0_5120.tiff_1134_1049_1134_1061_1129_1061_1129_1050.jpg', 'output/crops/images126_1280_5120.tiff_267_1221_253_1206_259_1201_273_1215.jpg'] ['small_vessel', 'heavy_equipment'] \n",
"4 15 15 ['output/crops/images126_0_0.tiff_964_831_981_832_981_839_964_838.jpg', 'output/crops/images126_0_5120.tiff_987_1097_997_1097_997_1101_986_1101.jpg', 'output/crops/images126_0_5120.tiff_1149_1050_1149_1065_1143_1065_1143_1051.jpg', 'output/crops/images126_0_5120.tiff_1163_998_1174_998_1174_1003_1163_1003.jpg', 'output/crops/images126_0_5120.tiff_1063_1171_1075_1171_1075_1176_1063_1177.jpg', 'output/crops/images126_0_5120.tiff_1124_1050_1125_1064_1120_1064_1119_1051.jpg', 'output/crops/images126_1280_5120.tiff_1049_127_1064_127_1064_134_1049_134.jpg', 'output/crops/images126_0_5120.tiff_1228_1005_1243_1005_1243_1011_1228_1011.jpg', 'output/crops/images126_1280_5120.tiff_931_143_937_143_937_161_931_161.jpg', 'output/crops/images126_1280_5120.tiff_300_170_315_170_315_177_300_177.jpg'] ['medium_vehicle', 'small_vessel', 'medium_vessel', 'small_vessel', 'small_vessel', 'small_vessel', 'heavy_equipment', 'medium_vessel', 'medium_vehicle', 'medium_vehicle'] \n",
"\n",
" files_ids distance len \n",
"0 [50, 72] 0.9688 2 \n",
"1 [88, 90] 0.9576 2 \n",
"2 [93, 99, 103, 104, 114] 0.9573 5 \n",
"3 [94, 115] 0.9554 2 \n",
"4 [15, 48, 55, 67, 75, 80, 87, 97, 106, 107] 0.9550 10 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.head()"
]
},
{
"cell_type": "markdown",
"id": "937b6733",
"metadata": {},
"source": [
"Looking at good labels"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "5225bde9-baea-4a45-92fd-baab7d6d4553",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Traceback (most recent call last):\n",
" File \"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/__init__.py\", line 1376, in create_components_gallery\n",
" ret = do_create_components_gallery(work_dir, save_path, num_images, lazy_load, get_label_func, group_by, slice,\n",
" File \"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py\", line 1399, in do_create_components_gallery\n",
" ret = visualize_top_components(work_dir, save_dir, num_images,\n",
" File \"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py\", line 795, in visualize_top_components\n",
" top_components = do_find_top_components(work_dir=work_dir, get_label_func=get_label_func, group_by=group_by,\n",
" File \"/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py\", line 1236, in do_find_top_components\n",
" assert len(comps), \"No components found with more than one image/video\"\n",
"AssertionError: No components found with more than one image/video\n"
]
}
],
"source": [
"fd.vis.component_gallery(load_crops=True, enhance_image=True, keep_aspect_ratio=True,\n",
" slice='same', num_images=20, save_artifacts=True)"
]
},
{
"cell_type": "markdown",
"id": "8bf752ae",
"metadata": {},
"source": [
"## Outliers\n",
"\n",
"Let's look on outliers on the satellite image level"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "4082bd38-22ab-445b-a9a2-a72856352870",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 26144.37it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored outliers visual view in output/galleries/outliers.html\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Outliers Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Outliers Report
Showing image outliers, one per row
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.46795 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_333_977_331_879_448_877_449_975jpg | \n",
"
\n",
"\n",
" label | \n",
" large_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.848818 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_2560tiff_1221_1277_1244_1273_1245_1280_1222_1283jpg | \n",
"
\n",
"\n",
" label | \n",
" bus | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.855832 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_0tiff_7_262_25_261_26_269_7_270jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.858068 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_-2_933_47_930_52_991_1_994jpg | \n",
"
\n",
"\n",
" label | \n",
" large_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.859666 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_267_1221_253_1206_259_1201_273_1215jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.863308 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_0tiff_1059_237_1079_225_1084_232_1064_244jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.867095 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_601_1050_600_1015_642_1015_643_1049jpg | \n",
"
\n",
"\n",
" label | \n",
" small_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.outliers_gallery()"
]
},
{
"cell_type": "markdown",
"id": "b6a420e5",
"metadata": {},
"source": [
"Now we look at outliers at the crop level"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "925c986e-18d9-4a6f-adc5-2cd7949f8424",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 17445.11it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored outliers visual view in output/galleries/outliers.html\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Outliers Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Outliers Report
Showing image outliers, one per row
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.46795 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_333_977_331_879_448_877_449_975jpg | \n",
"
\n",
"\n",
" label | \n",
" large_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.848818 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_2560tiff_1221_1277_1244_1273_1245_1280_1222_1283jpg | \n",
"
\n",
"\n",
" label | \n",
" bus | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.855832 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_0tiff_7_262_25_261_26_269_7_270jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.858068 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_-2_933_47_930_52_991_1_994jpg | \n",
"
\n",
"\n",
" label | \n",
" large_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.859666 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_267_1221_253_1206_259_1201_273_1215jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.863308 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_0_0tiff_1059_237_1079_225_1084_232_1064_244jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" Distance | \n",
" 0.867095 | \n",
"
\n",
"\n",
" Path | \n",
" /crops/images126_1280_5120tiff_601_1050_600_1015_642_1015_643_1049jpg | \n",
"
\n",
"\n",
" label | \n",
" small_aircraft | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.outliers_gallery(load_crops=True)"
]
},
{
"cell_type": "markdown",
"id": "ad571f11",
"metadata": {},
"source": [
"## Brightest Image\n",
"\n",
"We look for the brightest satellite images"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "4a861aab-50a2-4f39-944e-f139fe60327a",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 6562.32it/s]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored mean visual view in output/galleries/mean.html\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Bright Image Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Bright Image Report
Showing example images, sort by descending order
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 115.5904 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_949_234_950_219_956_220_955_234.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 92.5701 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1030_250_1036_246_1041_256_1034_259.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 91.7934 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_601_1050_600_1015_642_1015_643_1049.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 90.4244 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_794_1192_794_1157_833_1156_834_1192.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 90.1924 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1059_237_1079_225_1084_232_1064_244.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 89.8846 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1221_423_1229_404_1236_407_1229_426.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 88.5196 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_1012_148_1030_161_1024_170_1006_156.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 88.3636 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_996_134_1012_134_1012_141_996_141.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 86.5223 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_592_900_592_889_611_888_611_900.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 85.2022 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_5120.tiff_20_1049_28_1057_24_1060_16_1052.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 84.5831 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_889_1005_888_982_917_981_917_1005.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 82.7721 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1028_192_1033_185_1039_190_1033_197.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 82.3681 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_1280.tiff_330_829_348_832_347_841_329_838.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 79.9242 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_909_1133_909_1107_939_1107_939_1132.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 78.291 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_907_1078_906_1055_924_1054_925_1078.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 78.1487 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_906_1106_906_1080_935_1080_936_1106.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 77.239 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_5120.tiff_1059_1066_1059_1079_1055_1079_1055_1066.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 77.0893 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_1062_124_1080_125_1079_133_1061_132.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 76.4959 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_595_253_615_258_613_266_594_261.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" mean | \n",
" 76.3903 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_1280.tiff_150_799_170_798_171_806_150_807.jpg | \n",
"
\n",
"\n",
" label | \n",
" N/A | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.stats_gallery(metric='mean')"
]
},
{
"cell_type": "markdown",
"id": "00277963",
"metadata": {},
"source": [
"## Blurry Images \n",
"Now we look for the most blurry images"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "c0a2d9d9-5180-4ebe-b073-f7feef1e4c6d",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 6341.55it/s]\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Stored blur visual view in output/galleries/blur.html\n"
]
},
{
"data": {
"text/html": [
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" \n",
" Blurry Image Report\n",
" \n",
" \n",
"\n",
"\n",
"\n",
" \n",
" \n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
Blurry Image Report
Showing example images, sort by ascending order
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 5.0175 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_1280_5120.tiff_267_1221_253_1206_259_1201_273_1215.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 5.6971 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_3840.tiff_631_554_638_551_647_568_641_572.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 5.7064 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_964_831_981_832_981_839_964_838.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 7.9295 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1121_877_1137_879_1137_885_1120_884.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 8.1925 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_482_267_493_277_488_282_477_271.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 8.464 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_621_487_624_477_629_478_626_489.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 9.0193 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_3840.tiff_864_534_871_538_863_554_856_551.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 10.5485 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_556_455_570_454_570_462_557_463.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 11.0871 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_965_859_985_859_985_866_965_865.jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 11.5538 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_985_867_1001_868_1000_875_984_873.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 11.8231 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_1280.tiff_527_824_557_825_557_836_526_835.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 12.157 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_5120.tiff_704_1078_710_1079_709_1091_703_1091.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 12.5293 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_5120.tiff_688_1078_694_1078_694_1092_688_1092.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 12.76 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1012_839_1031_841_1030_848_1011_846.jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 13.6241 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_307_267_315_267_314_283_306_282.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 14.0418 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_1280.tiff_584_919_592_918_593_935_584_935.jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 14.3275 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_3840.tiff_226_1209_245_1205_247_1213_228_1217.jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 14.3412 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_0.tiff_1044_803_1057_803_1056_810_1044_809.jpg | \n",
"
\n",
"\n",
" label | \n",
" medium_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 14.3537 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_574_412_576_394_582_395_580_413.jpg | \n",
"
\n",
"\n",
" label | \n",
" large_vehicle | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
" \n",
" Info | \n",
"
\n",
"\n",
" blur | \n",
" 14.4656 | \n",
"
\n",
"\n",
" filename | \n",
" output/crops/images126_0_2560.tiff_528_419_546_427_540_439_523_431.jpg | \n",
"
\n",
"\n",
" label | \n",
" heavy_equipment | \n",
"
\n",
" \n",
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
\n",
" \n",
" \n",
" \n",
" \n",
" "
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/plain": [
"0"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"fd.vis.stats_gallery(metric='blur',load_crops=True)"
]
},
{
"cell_type": "markdown",
"id": "a8fe9bbf-6be1-4907-b555-53605befbf6d",
"metadata": {},
"source": [
"## Wrap Up\n",
"\n",
"Next, feel free to check out other tutorials -\n",
"\n",
"+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n",
"+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n",
"+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n",
"+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try."
]
},
{
"cell_type": "markdown",
"id": "12d9492f",
"metadata": {},
"source": [
"\n",
"## VL Profiler\n",
"If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n",
"\n",
"[Sign up](https://app.visual-layer.com) now, it's free.\n",
"\n",
"[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)\n",
"\n",
"As usual, feedback is welcome! \n",
"\n",
"Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}