[![image](https://raw.githubusercontent.com/visual-layer/visuallayer/main/imgs/vl_horizontal_logo.png)](https://www.visual-layer.com)

# Optical Character Recognition

This notebook shows how you can perform OCR with fastdup.

In [None]:
!pip install -Uq fastdup paddleocr paddlepaddle kaggle

In [2]:
import fastdup
fastdup.__version__

/usr/bin/dpkg


'1.26'

## Download Video Dataset

Let's download a Tiktok [trending video dataset](https://www.kaggle.com/datasets/erikvdven/tiktok-trending-december-2020) from Kaggle. The dataset consists of the first 1000 trending videos scraped from TikTok on December 2020.

You can download the dataset by manually by heading to the dataset [homepage](https://www.kaggle.com/datasets/erikvdven/tiktok-trending-december-2020) or using the [Kaggle API](https://github.com/Kaggle/kaggle-api). 

Let's use the Kaggle API to download the dataset:

In [None]:
!kaggle datasets download -d erikvdven/tiktok-trending-december-2020

Now, unzip the dataset into our local directory. You'll find a folder name `videos` that has all the trending clips.

In [None]:
!unzip tiktok-trending-december-2020.zip

## Extract frames
To run fastdup, we will need to extract the clips in the `videos` folder into frames and store them in another folder, let's name the folder `frames`. fastdup provides a convenience function for that.

In [3]:
fastdup.extract_video_frames('videos', 'frames')

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-13 17:48:56 [INFO] Going to loop over dir videos
2023-07-13 17:48:56 [INFO] Found total 1000 videos to run on, 1000 train, 0 test, name list 1000, counter 1000 


0

## Run fastdup
With the extracted frames, we can run fastdup to analyze them. 

To use the optical character recognition feature, specify `bounding_box='ocr'` in the `run` method.

For demonstration, we'll specify `num_images=1000` in the `run` method which limits the run to only 1000 images. Feel free to specify a different value or omitting this parameter altogether to run on the entire dataset.

In [16]:
fd = fastdup.create(input_dir='./frames', work_dir='work_dir')
fd.run(bounding_box='ocr', num_images=1000)

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-13 18:01:48 [INFO] Going to loop over dir frames
2023-07-13 18:01:48 [INFO] Found total 1000 images to run on, 1000 train, 0 test, name list 1000, counter 1000 


FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-07-13 18:15:46 [INFO] Going to loop over dir /tmp/crops_input.csv
2023-07-13 18:15:46 [INFO] Found total 1000 images to run on, 1000 train, 0 test, name list 1000, counter 1000 
2023-07-13 18:15:49 [INFO] Found total 1000 images to run ontimated: 0 Minutes975030598917.mp4output_000005.jpg_166_533_195_533_195_541_166_541.jpg - file does not exist[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■        ] 83% Estimated: 0 Minutes
Finished histogram 0.211
Finished bucket sort 0.219
2023-07-13 18:15:49 [INFO] 70) Finished write_index() NN model
2023-07-13 18:15:49 [INFO] Stored nn model index file work_dir/nnf.index
2023-07-13 18:15:49 [INFO] Total time took 3118 ms
2023-07-13 18:15:49 [INFO] Found a total of 219 fully identical images (d>0.990), which are 10.95 %
2023-07-13 18:15:49 [INFO] Found a total of 148 nearly identical images(d>0.980), which are 7.40 %
2023-07-13 18:15:49 [INFO] Found a total of 1569 above th

0

## Duplicate/Near-duplicate Detections

In [24]:
fd.vis.duplicates_gallery()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 459.87it/s]


Stored similarity visual view in  work_dir/galleries/duplicates.html


Info,Unnamed: 1
Distance,0.999333
From,/crops/tmpvideos6875872124968439046.mp4output_000004.jpg_78_604_308_605_308_623_78_622.jpg
To,/crops/tmpvideos6875872124968439046.mp4output_000003.jpg_78_604_308_605_308_623_78_622.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.999164
From,/crops/tmpvideos6875872124968439046.mp4output_000003.jpg_74_554_333_555_332_576_74_575.jpg
To,/crops/tmpvideos6875872124968439046.mp4output_000004.jpg_74_554_333_555_332_576_74_575.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.99886
From,/crops/tmpvideos6877747486451043590.mp4output_000002.jpg_72_122_320_122_320_135_72_135.jpg
To,/crops/tmpvideos6877747486451043590.mp4output_000005.jpg_72_122_320_122_320_135_72_135.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.998814
From,/crops/tmpvideos6875405441472498949.mp4output_000002.jpg_76_161_320_161_320_179_76_179.jpg
To,/crops/tmpvideos6875405441472498949.mp4output_000005.jpg_76_161_320_161_320_179_76_179.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.998212
From,/crops/tmpvideos6875436892226178305.mp4output_000002.jpg_108_286_468_286_468_311_108_311.jpg
To,/crops/tmpvideos6875436892226178305.mp4output_000001.jpg_108_286_468_286_468_311_108_311.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.998153
From,/crops/tmpvideos6878165800902085890.mp4output_000001.jpg_125_581_381_473_400_517_144_625.jpg
To,/crops/tmpvideos6878165800902085890.mp4output_000002.jpg_125_581_381_473_400_517_144_625.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.998042
From,/crops/tmpvideos6876156631651077381.mp4output_000002.jpg_78_206_311_206_311_221_78_221.jpg
To,/crops/tmpvideos6876156631651077381.mp4output_000003.jpg_78_206_311_206_311_221_78_221.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.997734
From,/crops/tmpvideos6875405441472498949.mp4output_000004.jpg_76_161_320_161_320_179_76_179.jpg
To,/crops/tmpvideos6875405441472498949.mp4output_000002.jpg_76_161_320_161_320_179_76_179.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.997723
From,/crops/tmpvideos6877796349459336450.mp4output_000004.jpg_435_988_562_988_562_1006_435_1006.jpg
To,/crops/tmpvideos6877796349459336450.mp4output_000002.jpg_435_988_562_988_562_1006_435_1006.jpg
From_Label,
To_Label,

Info,Unnamed: 1
Distance,0.997721
From,/crops/tmpvideos6875872124968439046.mp4output_000003.jpg_78_604_308_605_308_623_78_622.jpg
To,/crops/tmpvideos6875872124968439046.mp4output_000005.jpg_78_604_308_605_308_623_78_622.jpg
From_Label,
To_Label,


0

## Outliers

Let's visualize the outliers in the OCR detections.

In [25]:
fd.vis.outliers_gallery(load_crops=True)

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 22976.19it/s]


Stored outliers visual view in  work_dir/galleries/outliers.html


Info,Unnamed: 1
Distance,0.644383
Path,/crops/tmpvideos6875749962681044230.mp4output_000001.jpg_329_267_431_253_437_299_335_313.jpg
label,0.745696

Info,Unnamed: 1
Distance,0.654496
Path,/crops/tmpvideos6876369534018669825.mp4output_000005.jpg_220_419_445_470_424_563_199_511.jpg
label,0.672772

Info,Unnamed: 1
Distance,0.675129
Path,/crops/tmpvideos6877789463221947650.mp4output_000001.jpg_45_72_503_79_499_321_41_314.jpg
label,0.947268

Info,Unnamed: 1
Distance,0.695931
Path,/crops/tmpvideos6876857501397159170.mp4output_000001.jpg_218_537_326_543_324_588_216_582.jpg
label,0.732296

Info,Unnamed: 1
Distance,0.697093
Path,/crops/tmpvideos6876369534018669825.mp4output_000003.jpg_128_630_279_582_309_680_158_728.jpg
label,0.992506

Info,Unnamed: 1
Distance,0.700451
Path,/crops/tmpvideos6875317312082201857.mp4output_000001.jpg_307_399_352_399_352_446_307_446.jpg
label,0.536331

Info,Unnamed: 1
Distance,0.70144
Path,/crops/tmpvideos6875872124968439046.mp4output_000006.jpg_38_914_91_958_72_981_19_938.jpg
label,0.58447

Info,Unnamed: 1
Distance,0.711805
Path,/crops/tmpvideos6877179236386376961.mp4output_000002.jpg_10_139_86_83_120_129_43_185.jpg
label,0.992461

Info,Unnamed: 1
Distance,0.714143
Path,/crops/tmpvideos6876787355181665537.mp4output_000004.jpg_289_391_476_391_476_508_289_508.jpg
label,0.651683

Info,Unnamed: 1
Distance,0.720347
Path,/crops/tmpvideos6876245387871587590.mp4output_000001.jpg_510_252_565_252_565_298_510_298.jpg
label,0.556968

Info,Unnamed: 1
Distance,0.722568
Path,/crops/tmpvideos6875323773755657474.mp4output_000002.jpg_410_216_485_209_489_249_414_256.jpg
label,0.665932

Info,Unnamed: 1
Distance,0.722777
Path,/crops/tmpvideos6875323773755657474.mp4output_000003.jpg_176_383_419_390_417_438_174_431.jpg
label,0.802938

Info,Unnamed: 1
Distance,0.726905
Path,/crops/tmpvideos6877191692341054721.mp4output_000002.jpg_206_521_239_535_210_602_177_587.jpg
label,0.676398

Info,Unnamed: 1
Distance,0.728571
Path,/crops/tmpvideos6875739742340762885.mp4output_000002.jpg_206_580_278_586_276_613_204_606.jpg
label,0.62836

Info,Unnamed: 1
Distance,0.732963
Path,/crops/tmpvideos6875323773755657474.mp4output_000002.jpg_161_217_238_217_238_261_161_261.jpg
label,0.977016

Info,Unnamed: 1
Distance,0.733121
Path,/crops/tmpvideos6876603307708665093.mp4output_000007.jpg_301_476_439_434_453_482_315_523.jpg
label,0.517169

Info,Unnamed: 1
Distance,0.734482
Path,/crops/tmpvideos6876369534018669825.mp4output_000002.jpg_144_448_173_444_177_474_148_478.jpg
label,0.554273

Info,Unnamed: 1
Distance,0.735593
Path,/crops/tmpvideos6875872124968439046.mp4output_000005.jpg_36_917_107_959_91_986_20_945.jpg
label,0.596334

Info,Unnamed: 1
Distance,0.736088
Path,/crops/tmpvideos6877178763474423041.mp4output_000002.jpg_177_808_543_829_533_1000_168_979.jpg
label,0.522156

Info,Unnamed: 1
Distance,0.740881
Path,/crops/tmpvideos6877301356101782785.mp4output_000003.jpg_308_509_335_509_335_559_308_559.jpg
label,0.568036


0

## Dark Detections

In [26]:
fd.vis.stats_gallery(load_crops=True)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 1555.75it/s]


Stored mean visual view in  work_dir/galleries/mean.html


Info,Unnamed: 1
mean,25.3559
filename,work_dir/crops/framestmpvideos6875739742340762885.mp4output_000002.jpg_206_580_278_586_276_613_204_606.jpg
label,0.62836

Info,Unnamed: 1
mean,39.847
filename,work_dir/crops/framestmpvideos6878259877353966850.mp4output_000003.jpg_347_115_425_123_422_152_344_144.jpg
label,0.898937

Info,Unnamed: 1
mean,41.6365
filename,work_dir/crops/framestmpvideos6878259877353966850.mp4output_000002.jpg_348_115_425_123_422_152_345_144.jpg
label,0.873811

Info,Unnamed: 1
mean,41.8634
filename,work_dir/crops/framestmpvideos6878259877353966850.mp4output_000001.jpg_348_115_425_123_422_152_345_144.jpg
label,0.84548

Info,Unnamed: 1
mean,43.4009
filename,work_dir/crops/framestmpvideos6876130849683770625.mp4output_000004.jpg_548_329_632_329_632_343_548_343.jpg
label,0.984834

Info,Unnamed: 1
mean,45.9549
filename,work_dir/crops/framestmpvideos6877816601308073218.mp4output_000001.jpg_8_45_81_45_81_59_8_59.jpg
label,0.987654

Info,Unnamed: 1
mean,46.0729
filename,work_dir/crops/framestmpvideos6876760427229957377.mp4output_000001.jpg_14_66_156_68_155_91_14_88.jpg
label,0.992584

Info,Unnamed: 1
mean,46.3902
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000001.jpg_13_63_149_64_149_82_13_81.jpg
label,0.95529

Info,Unnamed: 1
mean,46.9471
filename,work_dir/crops/framestmpvideos6875436892226178305.mp4output_000002.jpg_463_985_562_988_562_1007_463_1005.jpg
label,0.987316

Info,Unnamed: 1
mean,47.8048
filename,work_dir/crops/framestmpvideos6876860979787959554.mp4output_000001.jpg_109_140_465_142_465_178_109_176.jpg
label,0.932088

Info,Unnamed: 1
mean,47.9955
filename,work_dir/crops/framestmpvideos6875436892226178305.mp4output_000001.jpg_14_67_111_69_111_89_14_86.jpg
label,0.951615

Info,Unnamed: 1
mean,48.379
filename,work_dir/crops/framestmpvideos6877473712652799234.mp4output_000001.jpg_8_46_123_47_123_61_8_60.jpg
label,0.912621

Info,Unnamed: 1
mean,48.5566
filename,work_dir/crops/framestmpvideos6875436892226178305.mp4output_000003.jpg_463_987_562_989_562_1008_463_1006.jpg
label,0.990549

Info,Unnamed: 1
mean,49.5633
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000005.jpg_387_926_521_927_521_944_387_943.jpg
label,0.98103

Info,Unnamed: 1
mean,49.7569
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000002.jpg_387_926_521_927_521_944_387_943.jpg
label,0.982998

Info,Unnamed: 1
mean,49.8006
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000006.jpg_387_926_521_927_521_944_387_943.jpg
label,0.965701

Info,Unnamed: 1
mean,49.9399
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000003.jpg_387_926_520_927_520_944_387_943.jpg
label,0.984911

Info,Unnamed: 1
mean,49.9419
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000004.jpg_387_926_520_927_520_944_387_943.jpg
label,0.983885

Info,Unnamed: 1
mean,50.1834
filename,work_dir/crops/framestmpvideos6877301356101782785.mp4output_000007.jpg_387_926_521_927_521_944_387_943.jpg
label,0.937984

Info,Unnamed: 1
mean,50.8746
filename,work_dir/crops/framestmpvideos6875528457388903681.mp4output_000005.jpg_894_539_1010_537_1010_558_894_560.jpg
label,0.947736


0

## Blurry Detections

In [27]:
fd.vis.stats_gallery(metric='blur', load_crops=True)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 4670.46it/s]


Stored blur visual view in  work_dir/galleries/blur.html


Info,Unnamed: 1
blur,11.368
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_197_611_218_608_220_620_199_623.jpg
label,0.763627

Info,Unnamed: 1
blur,12.6647
filename,work_dir/crops/framestmpvideos6877179236386376961.mp4output_000002.jpg_103_849_341_829_348_914_110_934.jpg
label,0.958016

Info,Unnamed: 1
blur,14.282
filename,work_dir/crops/framestmpvideos6877179236386376961.mp4output_000003.jpg_48_470_108_470_108_493_48_493.jpg
label,0.932451

Info,Unnamed: 1
blur,17.6397
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_160_596_210_583_213_597_163_610.jpg
label,0.709151

Info,Unnamed: 1
blur,20.3299
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_166_644_220_629_225_645_171_660.jpg
label,0.887496

Info,Unnamed: 1
blur,21.5456
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_171_135_206_139_204_159_169_154.jpg
label,0.717846

Info,Unnamed: 1
blur,23.9951
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_217_656_235_656_235_669_217_669.jpg
label,0.677347

Info,Unnamed: 1
blur,27.2666
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000002.jpg_184_91_221_94_219_115_183_112.jpg
label,0.790454

Info,Unnamed: 1
blur,33.5997
filename,work_dir/crops/framestmpvideos6877606750900538626.mp4output_000004.jpg_2_990_53_984_54_996_3_1002.jpg
label,0.51632

Info,Unnamed: 1
blur,38.4142
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_311_573_343_573_343_594_311_594.jpg
label,0.99319

Info,Unnamed: 1
blur,39.8873
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_311_517_342_517_342_533_311_533.jpg
label,0.506171

Info,Unnamed: 1
blur,41.6052
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_271_609_337_601_339_620_274_628.jpg
label,0.844861

Info,Unnamed: 1
blur,43.3049
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000002.jpg_330_503_363_503_363_521_330_521.jpg
label,0.660055

Info,Unnamed: 1
blur,45.5358
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_196_301_240_301_240_311_196_311.jpg
label,0.771191

Info,Unnamed: 1
blur,46.0039
filename,work_dir/crops/framestmpvideos6876369534018669825.mp4output_000003.jpg_128_630_279_582_309_680_158_728.jpg
label,0.992506

Info,Unnamed: 1
blur,48.2854
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_312_635_341_635_341_654_312_654.jpg
label,0.972614

Info,Unnamed: 1
blur,49.4907
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_377_287_438_285_439_300_377_302.jpg
label,0.845535

Info,Unnamed: 1
blur,52.3385
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_200_315_256_315_256_325_200_325.jpg
label,0.530487

Info,Unnamed: 1
blur,56.4384
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000003.jpg_227_272_300_277_299_292_226_287.jpg
label,0.697303

Info,Unnamed: 1
blur,57.5365
filename,work_dir/crops/framestmpvideos6876369534018669825.mp4output_000005.jpg_89_932_107_932_107_953_89_953.jpg
label,0.960756


0

## Bright Detections

In [28]:
fd.vis.stats_gallery(metric='bright', load_crops=True)

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 3067.02it/s]


Stored mean visual view in  work_dir/galleries/mean.html


Info,Unnamed: 1
mean,241.9651
filename,work_dir/crops/framestmpvideos6876805952117558529.mp4output_000003.jpg_182_641_350_641_350_702_182_702.jpg
label,0.885485

Info,Unnamed: 1
mean,239.1625
filename,work_dir/crops/framestmpvideos6876805952117558529.mp4output_000004.jpg_147_654_356_656_355_721_146_719.jpg
label,0.735413

Info,Unnamed: 1
mean,238.5339
filename,work_dir/crops/framestmpvideos6876805952117558529.mp4output_000002.jpg_150_635_381_640_379_708_149_702.jpg
label,0.841424

Info,Unnamed: 1
mean,238.1793
filename,work_dir/crops/framestmpvideos6875739742340762885.mp4output_000001.jpg_60_271_111_271_111_282_60_282.jpg
label,0.866655

Info,Unnamed: 1
mean,237.8891
filename,work_dir/crops/framestmpvideos6876805952117558529.mp4output_000006.jpg_229_637_396_650_391_712_224_698.jpg
label,0.804841

Info,Unnamed: 1
mean,234.394
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_166_644_220_629_225_645_171_660.jpg
label,0.887496

Info,Unnamed: 1
mean,233.5309
filename,work_dir/crops/framestmpvideos6876156631651077381.mp4output_000004.jpg_78_206_312_206_312_221_78_221.jpg
label,0.882049

Info,Unnamed: 1
mean,233.4604
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000002.jpg_330_503_363_503_363_521_330_521.jpg
label,0.660055

Info,Unnamed: 1
mean,233.2266
filename,work_dir/crops/framestmpvideos6876156631651077381.mp4output_000003.jpg_78_206_311_206_311_221_78_221.jpg
label,0.888761

Info,Unnamed: 1
mean,233.2228
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_160_596_210_583_213_597_163_610.jpg
label,0.709151

Info,Unnamed: 1
mean,233.1428
filename,work_dir/crops/framestmpvideos6876156631651077381.mp4output_000002.jpg_78_206_311_206_311_221_78_221.jpg
label,0.914209

Info,Unnamed: 1
mean,233.1201
filename,work_dir/crops/framestmpvideos6876156631651077381.mp4output_000005.jpg_78_206_311_206_311_221_78_221.jpg
label,0.911282

Info,Unnamed: 1
mean,233.0334
filename,work_dir/crops/framestmpvideos6876156631651077381.mp4output_000001.jpg_78_206_311_206_311_221_78_221.jpg
label,0.937742

Info,Unnamed: 1
mean,232.7812
filename,work_dir/crops/framestmpvideos6876643374548438274.mp4output_000002.jpg_15_64_174_65_174_79_15_78.jpg
label,0.829832

Info,Unnamed: 1
mean,232.7233
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000002.jpg_281_537_359_532_360_550_282_555.jpg
label,0.873083

Info,Unnamed: 1
mean,232.0589
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_217_656_235_656_235_669_217_669.jpg
label,0.677347

Info,Unnamed: 1
mean,231.8961
filename,work_dir/crops/framestmpvideos6876787355181665537.mp4output_000005.jpg_197_611_218_608_220_620_199_623.jpg
label,0.763627

Info,Unnamed: 1
mean,230.7012
filename,work_dir/crops/framestmpvideos6876857501397159170.mp4output_000001.jpg_67_175_238_175_238_188_67_188.jpg
label,0.90828

Info,Unnamed: 1
mean,229.6535
filename,work_dir/crops/framestmpvideos6877774951361875201.mp4output_000001.jpg_61_25_148_25_148_54_61_54.jpg
label,0.89313

Info,Unnamed: 1
mean,229.6462
filename,work_dir/crops/framestmpvideos6876369534018669825.mp4output_000001.jpg_14_67_146_71_146_89_14_85.jpg
label,0.804711


0

## Detection Clusters

In [29]:
fd.vis.component_gallery()

0.912512


  0%|                                                                                                                                                                                                                                                                                                                                | 0/20 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py", line 699, in load_one_image
    assert img is not None, f"Failed to read image {f} {input_dir} {kwargs}"
AssertionError: Failed to read image work_dir/crops/framestmpvideos6877337975030598917.mp4output_000005.jpg_166_533_195_533_195_541_166_541.jpg work_dir {'save_artifacts': False, 'id_to_filename_func': <function FastdupVisualizer._get_filneme_func.<locals>.get_filename_func at 0x7fe316dd1b40>, 'jupyter_html': True, 'load_crops': True, 'draw_bbox': False, 'sort_by': 'comp_size', 'lazy_load': False, 'run_hierarchical

Failed to read image from img_path work_dir/crops/framestmpvideos6877337975030598917.mp4output_000005.jpg_166_533_195_533_195_541_166_541.jpg
Failed to read image from img_path work_dir/crops/framestmpvideos6877036154982616321.mp4output_000002.jpg_197_420_231_420_231_427_197_427.jpg
Finished OK. Components are stored as image files work_dir/galleries/components_[index].jpg
Stored components visual view in  work_dir/galleries/components.html
Execution time in seconds 0.2





Info,Unnamed: 1
component,16.0
num_images,36.0
mean_distance,0.9601

Label,Unnamed: 1
0.566012,1
0.716122,1
0.962239,1
0.964565,1
0.964638,1
0.965408,1
0.967614,1
0.970122,1
0.972217,1
0.97659,1

Info,Unnamed: 1
component,3.0
num_images,18.0
mean_distance,0.9601

Label,Unnamed: 1
0.582775,1
0.776851,1
0.968139,1
0.967538,1
0.9673,1
0.963748,1
0.955813,1
0.952471,1
0.943044,1
0.940816,1

Info,Unnamed: 1
component,557.0
num_images,12.0
mean_distance,0.9615

Label,Unnamed: 1
0.51632,1
0.568036,1
0.576053,1
0.658896,1
0.872467,1
0.905367,1
0.910863,1
0.932909,1
0.934804,1
0.938696,1

Info,Unnamed: 1
component,491.0
num_images,10.0
mean_distance,0.9865

Label,Unnamed: 1
0.694068,1
0.90516,1
0.921454,1
0.943429,1
0.944783,1
0.944897,1
0.958016,1
0.969519,1
0.974236,1
0.976503,1

Info,Unnamed: 1
component,185.0
num_images,10.0
mean_distance,0.9604

Label,Unnamed: 1
0.898507,1
0.899538,1
0.905345,1
0.905881,1
0.917017,1
0.921527,1
0.940975,1
0.984398,1
0.986932,1
0.987669,1

Info,Unnamed: 1
component,29.0
num_images,9.0
mean_distance,0.9614

Label,Unnamed: 1
0.897021,1
0.915021,1
0.917719,1
0.918148,1
0.935715,1
0.940089,1
0.951066,1
0.980969,1
0.982242,1

Info,Unnamed: 1
component,106.0
num_images,8.0
mean_distance,0.9633

Label,Unnamed: 1
0.89313,1
0.922129,1
0.930321,1
0.951415,1
0.953237,1
0.957237,1
0.993251,1
0.996365,1

Info,Unnamed: 1
component,62.0
num_images,8.0
mean_distance,0.9635

Label,Unnamed: 1
0.639288,1
0.910821,1
0.915858,1
0.918426,1
0.93316,1
0.934579,1
0.949467,1
0.968144,1

Info,Unnamed: 1
component,490.0
num_images,8.0
mean_distance,0.9808

Label,Unnamed: 1
0.872586,1
0.905988,1
0.929356,1
0.930516,1
0.946273,1
0.950314,1
0.950973,1
0.957764,1

Info,Unnamed: 1
component,455.0
num_images,8.0
mean_distance,0.9603

Label,Unnamed: 1
0.7064,1
0.90828,1
0.927472,1
0.931385,1
0.937934,1
0.953506,1
0.982314,1
0.993107,1

Info,Unnamed: 1
component,27.0
num_images,7.0
mean_distance,0.9668

Label,Unnamed: 1
0.903862,1
0.910805,1
0.912512,1
0.912723,1
0.916848,1
0.930749,1
0.960439,1

Info,Unnamed: 1
component,620.0
num_images,7.0
mean_distance,0.9652

Label,Unnamed: 1
0.89443,1
0.911825,1
0.913297,1
0.917963,1
0.955546,1
0.960902,1
0.97509,1

Info,Unnamed: 1
component,119.0
num_images,7.0
mean_distance,0.9606

Label,Unnamed: 1
0.882049,1
0.888761,1
0.911282,1
0.914209,1
0.937742,1
0.950713,1
0.965892,1

Info,Unnamed: 1
component,26.0
num_images,7.0
mean_distance,0.9764

Label,Unnamed: 1
0.897349,1
0.901716,1
0.930532,1
0.93272,1
0.932828,1
0.941728,1
0.94206,1

Info,Unnamed: 1
component,492.0
num_images,6.0
mean_distance,0.9629

Label,Unnamed: 1
0.876738,1
0.906598,1
0.936391,1
0.949162,1
0.951972,1
0.959641,1

Info,Unnamed: 1
component,619.0
num_images,6.0
mean_distance,0.9776

Label,Unnamed: 1
0.889022,1
0.893933,1
0.90888,1
0.928806,1
0.937852,1
0.950581,1

Info,Unnamed: 1
component,148.0
num_images,6.0
mean_distance,0.9956

Label,Unnamed: 1
0.940358,1
0.942999,1
0.943252,1
0.945848,1
0.945968,1
0.967571,1

Info,Unnamed: 1
component,621.0
num_images,6.0
mean_distance,0.9836

Label,Unnamed: 1
0.921039,1
0.95587,1
0.966351,1
0.97009,1
0.970162,1
0.97053,1

Info,Unnamed: 1
component,623.0
num_images,6.0
mean_distance,0.9788

Label,Unnamed: 1
0.987933,1
0.988892,1
0.992727,1
0.993201,1
0.995423,1
0.997629,1

Info,Unnamed: 1
component,150.0
num_images,6.0
mean_distance,0.9906

Label,Unnamed: 1
0.958481,1
0.967046,1
0.970228,1
0.97752,1
0.979932,1
0.987422,1


0

## Similar Detections

In [30]:
fd.vis.similarity_gallery()

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 371.96it/s]


Stored similar images visual view in  work_dir/galleries/similarity.html


Info From,Unnamed: 1
label,0.732384
from,/crops/tmpvideos6876674985746795778.mp4output_000004.jpg_18_24_154_21_155_55_18_58.jpg

Info To,Unnamed: 1,Unnamed: 2
0.925805,/crops/tmpvideos6876760427229957377.mp4output_000002.jpg_427_944_561_942_561_976_427_978.jpg,0.901466
0.900407,/crops/tmpvideos6875872124968439046.mp4output_000003.jpg_470_946_560_948_560_975_469_973.jpg,0.937073

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.757851
from,/crops/tmpvideos6875872124968439046.mp4output_000006.jpg_428_946_560_944_560_974_428_976.jpg

Info To,Unnamed: 1,Unnamed: 2
0.903237,/crops/tmpvideos6875872124968439046.mp4output_000003.jpg_470_946_560_948_560_975_469_973.jpg,0.937073
0.900435,/crops/tmpvideos6875872124968439046.mp4output_000004.jpg_470_948_560_948_560_975_470_975.jpg,0.957974

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.925193
from,/crops/tmpvideos6875378565614013697.mp4output_000001.jpg_14_65_139_67_138_89_14_86.jpg

Info To,Unnamed: 1,Unnamed: 2
0.901463,/crops/tmpvideos6876827705950752005.mp4output_000001.jpg_14_65_156_65_156_86_14_86.jpg,0.982604
0.900878,/crops/tmpvideos6876674985746795778.mp4output_000007.jpg_456_987_564_986_564_1005_456_1006.jpg,0.981889

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.909594
from,/crops/tmpvideos6875453919879908614.mp4output_000002.jpg_162_662_381_662_381_719_162_719.jpg

Info To,Unnamed: 1,Unnamed: 2
0.963362,/crops/tmpvideos6875453919879908614.mp4output_000001.jpg_162_665_381_665_381_721_162_721.jpg,0.973569
0.900944,/crops/tmpvideos6875453919879908614.mp4output_000002.jpg_161_854_380_854_380_911_161_911.jpg,0.986378

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.929418
from,/crops/tmpvideos6876716527186382081.mp4output_000002.jpg_456_987_563_987_563_1008_456_1008.jpg

Info To,Unnamed: 1,Unnamed: 2
0.906595,/crops/tmpvideos6877747486451043590.mp4output_000001.jpg_11_67_120_69_119_89_10_86.jpg,0.911825
0.901163,/crops/tmpvideos6875639469563759873.mp4output_000002.jpg_454_987_562_987_562_1008_454_1008.jpg,0.878065

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.802544
from,/crops/tmpvideos6877411003160694017.mp4output_000002.jpg_429_944_561_942_561_975_429_977.jpg

Info To,Unnamed: 1,Unnamed: 2
0.901313,/crops/tmpvideos6876760427229957377.mp4output_000005.jpg_428_940_562_940_562_977_428_977.jpg,0.883697

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.883697
from,/crops/tmpvideos6876760427229957377.mp4output_000005.jpg_428_940_562_940_562_977_428_977.jpg

Info To,Unnamed: 1,Unnamed: 2
0.917096,/crops/tmpvideos6877178763474423041.mp4output_000002.jpg_467_945_560_949_559_975_466_971.jpg,0.946273
0.901313,/crops/tmpvideos6877411003160694017.mp4output_000002.jpg_429_944_561_942_561_975_429_977.jpg,0.802544

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.964565
from,/crops/tmpvideos6876603307708665093.mp4output_000005.jpg_125_746_450_746_450_767_125_767.jpg

Info To,Unnamed: 1,Unnamed: 2
0.903434,/crops/tmpvideos6877179236386376961.mp4output_000010.jpg_388_985_562_989_562_1010_387_1007.jpg,0.975002
0.901328,/crops/tmpvideos6875317312082201857.mp4output_000001.jpg_13_67_202_69_202_88_13_85.jpg,0.918337

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.959249
from,/crops/tmpvideos6878259877353966850.mp4output_000001.jpg_12_62_100_66_99_83_12_79.jpg

Info To,Unnamed: 1,Unnamed: 2
0.902454,/crops/tmpvideos6876262384093236485.mp4output_000001.jpg_11_62_133_65_133_83_11_80.jpg,0.951905
0.901708,/crops/tmpvideos6876674985746795778.mp4output_000007.jpg_456_987_564_986_564_1005_456_1006.jpg,0.981889

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.839425
from,/crops/tmpvideos6875405441472498949.mp4output_000006.jpg_428_944_561_942_561_975_428_977.jpg

Info To,Unnamed: 1,Unnamed: 2
0.916229,/crops/tmpvideos6875405441472498949.mp4output_000007.jpg_458_944_560_946_560_976_457_974.jpg,0.824197
0.902018,/crops/tmpvideos6876369534018669825.mp4output_000006.jpg_434_943_561_943_561_976_434_976.jpg,0.804691

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.647072
from,/crops/tmpvideos6877251331145370881.mp4output_000002.jpg_38_16_60_38_34_65_12_43.jpg

Info To,Unnamed: 1,Unnamed: 2
0.902097,/crops/tmpvideos6876145412105899265.mp4output_000001.jpg_31_16_57_29_41_62_15_48.jpg,0.536829

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.536829
from,/crops/tmpvideos6876145412105899265.mp4output_000001.jpg_31_16_57_29_41_62_15_48.jpg

Info To,Unnamed: 1,Unnamed: 2
0.90959,/crops/tmpvideos6876369534018669825.mp4output_000005.jpg_438_941_461_949_453_977_429_969.jpg,0.830409
0.902097,/crops/tmpvideos6877251331145370881.mp4output_000002.jpg_38_16_60_38_34_65_12_43.jpg,0.647072

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.972871
from,/crops/tmpvideos6876369534018669825.mp4output_000002.jpg_342_752_442_719_447_732_346_766.jpg

Info To,Unnamed: 1,Unnamed: 2
0.90236,/crops/tmpvideos6876369534018669825.mp4output_000002.jpg_130_703_319_656_322_670_133_717.jpg,0.931903

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.931903
from,/crops/tmpvideos6876369534018669825.mp4output_000002.jpg_130_703_319_656_322_670_133_717.jpg

Info To,Unnamed: 1,Unnamed: 2
0.90236,/crops/tmpvideos6876369534018669825.mp4output_000002.jpg_342_752_442_719_447_732_346_766.jpg,0.972871

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.908666
from,/crops/tmpvideos6875621663564680450.mp4output_000003.jpg_242_200_315_204_313_234_240_230.jpg

Info To,Unnamed: 1,Unnamed: 2
0.902605,/crops/tmpvideos6877869447999245569.mp4output_000001.jpg_216_171_324_171_324_195_216_195.jpg,0.830085

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.513795
from,/crops/tmpvideos6876835225989664002.mp4output_000002.jpg_402_903_418_887_430_899_414_915.jpg

Info To,Unnamed: 1,Unnamed: 2
0.912662,/crops/tmpvideos6877179236386376961.mp4output_000005.jpg_445_946_460_958_443_978_429_966.jpg,0.624278
0.90278,/crops/tmpvideos6877167604100910337.mp4output_000002.jpg_38_26_52_39_34_59_20_45.jpg,0.618125

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.618125
from,/crops/tmpvideos6877167604100910337.mp4output_000002.jpg_38_26_52_39_34_59_20_45.jpg

Info To,Unnamed: 1,Unnamed: 2
0.955131,/crops/tmpvideos6877179236386376961.mp4output_000005.jpg_445_946_460_958_443_978_429_966.jpg,0.624278
0.90278,/crops/tmpvideos6876835225989664002.mp4output_000002.jpg_402_903_418_887_430_899_414_915.jpg,0.513795

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.962116
from,/crops/tmpvideos6876603307708665093.mp4output_000001.jpg_10_66_116_64_117_85_11_88.jpg

Info To,Unnamed: 1,Unnamed: 2
0.902827,/crops/tmpvideos6877747486451043590.mp4output_000007.jpg_456_987_563_989_563_1008_455_1006.jpg,0.943226

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.974686
from,/crops/tmpvideos6876760427229957377.mp4output_000006.jpg_470_946_560_948_560_976_469_974.jpg

Info To,Unnamed: 1,Unnamed: 2
0.916783,/crops/tmpvideos6876512902929599745.mp4output_000001.jpg_57_25_142_25_142_50_57_50.jpg,0.966201
0.902836,/crops/tmpvideos6875691073499352321.mp4output_000002.jpg_470_946_559_948_559_975_469_973.jpg,0.912156

0
Query Image

0
Similar

Info From,Unnamed: 1
label,0.940584
from,/crops/tmpvideos6875528457388903681.mp4output_000004.jpg_915_496_1006_499_1005_529_914_526.jpg

Info To,Unnamed: 1,Unnamed: 2
0.905662,/crops/tmpvideos6876374255974501633.mp4output_000001.jpg_50_22_127_22_127_46_50_46.jpg,0.942806
0.902972,/crops/tmpvideos6875651291343883522.mp4output_000003.jpg_465_945_560_945_560_976_465_976.jpg,0.963449

0
Query Image

0
Similar


Unnamed: 0,from,to,label,label2,distance
378,work_dir/crops/framestmpvideos6876674985746795778.mp4output_000004.jpg_18_24_154_21_155_55_18_58.jpg,"[work_dir/crops/framestmpvideos6875872124968439046.mp4output_000003.jpg_470_946_560_948_560_975_469_973.jpg, work_dir/crops/framestmpvideos6876760427229957377.mp4output_000002.jpg_427_944_561_942_561_976_427_978.jpg]","[0.732384, 0.732384]","[0.937073, 0.901466]","[0.900407, 0.925805]"
182,work_dir/crops/framestmpvideos6875872124968439046.mp4output_000006.jpg_428_946_560_944_560_974_428_976.jpg,"[work_dir/crops/framestmpvideos6875872124968439046.mp4output_000004.jpg_470_948_560_948_560_975_470_975.jpg, work_dir/crops/framestmpvideos6875872124968439046.mp4output_000003.jpg_470_946_560_948_560_975_469_973.jpg]","[0.757851, 0.757851]","[0.957974, 0.937073]","[0.900435, 0.903237]"
22,work_dir/crops/framestmpvideos6875378565614013697.mp4output_000001.jpg_14_65_139_67_138_89_14_86.jpg,"[work_dir/crops/framestmpvideos6876674985746795778.mp4output_000007.jpg_456_987_564_986_564_1005_456_1006.jpg, work_dir/crops/framestmpvideos6876827705950752005.mp4output_000001.jpg_14_65_156_65_156_86_14_86.jpg]","[0.925193, 0.925193]","[0.981889, 0.982604]","[0.900878, 0.901463]"
91,work_dir/crops/framestmpvideos6875453919879908614.mp4output_000002.jpg_162_662_381_662_381_719_162_719.jpg,"[work_dir/crops/framestmpvideos6875453919879908614.mp4output_000002.jpg_161_854_380_854_380_911_161_911.jpg, work_dir/crops/framestmpvideos6875453919879908614.mp4output_000001.jpg_162_665_381_665_381_721_162_721.jpg]","[0.909594, 0.909594]","[0.986378, 0.973569]","[0.900944, 0.963362]"
402,work_dir/crops/framestmpvideos6876716527186382081.mp4output_000002.jpg_456_987_563_987_563_1008_456_1008.jpg,"[work_dir/crops/framestmpvideos6875639469563759873.mp4output_000002.jpg_454_987_562_987_562_1008_454_1008.jpg, work_dir/crops/framestmpvideos6877747486451043590.mp4output_000001.jpg_11_67_120_69_119_89_10_86.jpg]","[0.929418, 0.929418]","[0.878065, 0.911825]","[0.901163, 0.906595]"
...,...,...,...,...,...
707,work_dir/crops/framestmpvideos6877747486451043590.mp4output_000002.jpg_72_122_320_122_320_135_72_135.jpg,"[work_dir/crops/framestmpvideos6877747486451043590.mp4output_000003.jpg_72_122_320_122_320_135_72_135.jpg, work_dir/crops/framestmpvideos6877747486451043590.mp4output_000005.jpg_72_122_320_122_320_135_72_135.jpg]","[0.970586, 0.970586]","[0.970162, 0.966351]","[0.997698, 0.99886]"
162,work_dir/crops/framestmpvideos6875872124968439046.mp4output_000002.jpg_78_604_308_605_308_623_78_622.jpg,"[work_dir/crops/framestmpvideos6875872124968439046.mp4output_000005.jpg_78_604_308_605_308_623_78_622.jpg, work_dir/crops/framestmpvideos6875872124968439046.mp4output_000001.jpg_78_604_308_605_308_623_78_622.jpg]","[0.945968, 0.945968]","[0.943252, 0.940358]","[0.997702, 0.997718]"
179,work_dir/crops/framestmpvideos6875872124968439046.mp4output_000005.jpg_78_604_308_605_308_623_78_622.jpg,"[work_dir/crops/framestmpvideos6875872124968439046.mp4output_000002.jpg_78_604_308_605_308_623_78_622.jpg, work_dir/crops/framestmpvideos6875872124968439046.mp4output_000003.jpg_78_604_308_605_308_623_78_622.jpg]","[0.943252, 0.943252]","[0.945968, 0.942999]","[0.997702, 0.997721]"
167,work_dir/crops/framestmpvideos6875872124968439046.mp4output_000003.jpg_78_604_308_605_308_623_78_622.jpg,"[work_dir/crops/framestmpvideos6875872124968439046.mp4output_000005.jpg_78_604_308_605_308_623_78_622.jpg, work_dir/crops/framestmpvideos6875872124968439046.mp4output_000004.jpg_78_604_308_605_308_623_78_622.jpg]","[0.942999, 0.942999]","[0.943252, 0.967571]","[0.997721, 0.999333]"


## Wrap Up

In this notebook we show how you can run OCR models with fastdup and analyze the bounding boxes for issues.

Next, feel free to check out other tutorials -

+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!
+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.
+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!
+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. 


# VL Profiler
If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. 

[Sign up](https://app.visual-layer.com) now, it's free.

[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/vl_profiler_promo.svg)](https://app.visual-layer.com)

As usual, feedback is welcome! 

Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues).