{ "cells": [ { "cell_type": "markdown", "id": "221f5c62-51ac-447e-91b0-c42c7b602af9", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", " \n", "
\n", "
\n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "7aad46c3-7e0a-463f-9064-0b5751501039", "metadata": {}, "source": [ "# Run fastdup with TIMM Embeddings\n", "\n", "[![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/embeddings-timm.ipynb)\n", "[![Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/embeddings-timm.ipynb)\n", "[![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/embeddings-timm)" ] }, { "cell_type": "markdown", "id": "bae6d61b-3beb-46ad-b53a-895e78d3cf5f", "metadata": {}, "source": [ "In this notebook we show an end-to-end example on how you can pre-compute embeddings using any models from TIMM run fastdup on top of the embeddings to surface dataset issues." ] }, { "cell_type": "markdown", "id": "55b99f27-269c-49d6-8f51-b2af6d2019bb", "metadata": {}, "source": [ "## Installation\n", "\n", "First, let's install the neccessary packages:\n", "\n", "- [fastdup](https://github.com/visual-layer/fastdup) - To analyze issues in the dataset.\n", "- [TIMM (PyTorch Image Models)](https://github.com/huggingface/pytorch-image-models) - To acquire pre-trained models." ] }, { "cell_type": "code", "execution_count": null, "id": "fc42cae3-4659-4060-b781-48e2983411fd", "metadata": {}, "outputs": [], "source": [ "!pip install -Uq fastdup timm" ] }, { "cell_type": "markdown", "id": "e6722adf-0f74-4aae-8e67-76107456a91b", "metadata": {}, "source": [ "Now, test the installation. If there's no error message, we are ready to go." ] }, { "cell_type": "code", "execution_count": 2, "id": "efc6af00-4688-454d-b84b-05e15c95fb86", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "'1.46'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "c5ba0892-e860-4f3c-84a9-8c0fc189b77d", "metadata": {}, "source": [ "## Download Dataset\n", "\n", "In this notebook, we will the [Price Match Guarantee Dataset](https://www.kaggle.com/competitions/shopee-product-matching/) from Shopee from Kaggle. \n", "The dataset consists of images from users who sell products on the Shopee online platform.\n", "\n", "Download the dataset [here](https://www.kaggle.com/competitions/shopee-product-matching/data), unzip, and place it in the current directory.\n", "\n", "Here's a snapshot showing some of the images from the dataset.\n", "![img](https://files.readme.io/09f6849-download.png)" ] }, { "cell_type": "markdown", "id": "91910747-6be2-4283-959e-c931e45f1f2c", "metadata": {}, "source": [ "## List TIMM Models\n", "There are currently 1212 computer vision models on TIMM. Pick a model of your choice to compute the embedding with.\n", "\n", "Now, pick a model of your choice. For demonstration, we will go with a relatively new model `vit_small_patch14_dinov2.lvd142m` from MetaAI. \n", "\n", "Let's list down models that match the keyword `dino`." ] }, { "cell_type": "code", "execution_count": 3, "id": "dd166b91-38a1-4dfe-9e9e-590e3f550242", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "['resmlp_12_224.fb_dino',\n", " 'resmlp_24_224.fb_dino',\n", " 'vit_base_patch8_224.dino',\n", " 'vit_base_patch14_dinov2.lvd142m',\n", " 'vit_base_patch16_224.dino',\n", " 'vit_giant_patch14_dinov2.lvd142m',\n", " 'vit_large_patch14_dinov2.lvd142m',\n", " 'vit_small_patch8_224.dino',\n", " 'vit_small_patch14_dinov2.lvd142m',\n", " 'vit_small_patch16_224.dino']" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import timm\n", "timm.list_models(\"*dino*\", pretrained=True)" ] }, { "cell_type": "markdown", "id": "633dce0c-47eb-4039-8cd4-a36874c49b8a", "metadata": {}, "source": [ "DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. Read more about DINOv2 [here](https://github.com/facebookresearch/dinov2).\n", "\n", "It makes sense for us to use DINOv2 as a model to create an embedding of the dataset." ] }, { "cell_type": "markdown", "id": "9a1e56b1-d8cd-4457-9b2b-83c1aa3ccaaf", "metadata": {}, "source": [ "## Compute Embeddings using TIMM\n", "\n", "Loading TIMM models in fastdup is seamless with the `TimmEncoder` wrapper class. This ensures all TIMM models can be used in fastdup to compute the embeddings of your dataset. \n", "Under the hood, the wrapper class loads the model from TIMM excluding the final classification layer.\n", "\n", "Next, let's load the DINOv2 model using the `TimmEncoder` wrapper." ] }, { "cell_type": "code", "execution_count": 5, "id": "e3a9e5f2-a92e-4536-be12-19ccc47a7ca4", "metadata": { "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:fastdup.embeddings.timm:Initializing model - vit_small_patch14_dinov2.lvd142m.\n", "INFO:timm.models._builder:Loading pretrained weights from Hugging Face hub (timm/vit_small_patch14_dinov2.lvd142m)\n", "INFO:timm.models._hub:[timm/vit_small_patch14_dinov2.lvd142m] Safe alternative available for 'pytorch_model.bin' (as 'model.safetensors'). Loading weights using safetensors.\n", "INFO:fastdup.embeddings.timm:Model loaded on device - cuda\n" ] } ], "source": [ "from fastdup.embeddings_timm import TimmEncoder\n", "timm_model = TimmEncoder('vit_small_patch14_dinov2.lvd142m') " ] }, { "cell_type": "markdown", "id": "54dc55ca-25bb-4400-adf7-01666f5569bf", "metadata": {}, "source": [ "Here are other the parameters for `TimmEncoder`\n", "\n", "+ `model_name` (str): The name of the model architecture to use.\n", "+ `num_classes` (int): The number of classes for the model. Use num_features=0 to exclude the last layer. Default: `0`.\n", "+ `pretrained` (bool): Whether to load pretrained weights. Default: `True`.\n", "+ `device` (str): Which device to load the model on. Choices: \"cuda\" or \"cpu\". Default: `None`.\n", "+ `torch_compile` (bool): Whether to use `torch.compile` to optimize model. Default `False`." ] }, { "cell_type": "markdown", "id": "8a86dbb0-7387-4e2f-86d3-5b7d6c41815d", "metadata": {}, "source": [ "To start computing embeddings, specify the directory where the images are stored." ] }, { "cell_type": "code", "execution_count": 6, "id": "7477d7e6-71a6-42a0-9a0d-5806072141ed", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e21d1b0c58c3469d93b2e8f9d884cff3", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Computing embeddings: 0%| | 0/32412 [00:000.990), which are 14.70 % of total graph edges\n", "2023-10-16 11:43:52 [INFO] Found a total of 5040 nearly identical images(d>0.980), which are 7.77 % of total graph edges\n", "2023-10-16 11:43:52 [INFO] Found a total of 27522 above threshold images (d>0.900), which are 42.46 % of total graph edges\n", "2023-10-16 11:43:52 [INFO] Found a total of 3241 outlier images (d<0.050), which are 5.00 % of total graph edges\n", "2023-10-16 11:43:52 [INFO] Min distance found 0.105 max distance 1.000\n", "2023-10-16 11:43:52 [INFO] Running connected components for ccthreshold 0.960000 \n", ".0\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 32412 images\n", " Valid images are 100.00% (32,412) of the data, invalid are 0.00% (0) of the data\n", " Similarity: 41.05% (13,304) belong to 29 similarity clusters (components).\n", " 58.95% (19,108) images do not belong to any similarity cluster.\n", " Largest cluster has 86 (0.27%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.9, connected component threshold used is 0.96).\n", "\n", " Outliers: 7.43% (2,409) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n", "\n", "########################################################################################\n", "Would you like to see awesome visualizations for some of the most popular academic datasets?\n", "Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup\n", "########################################################################################\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"/home/dnth/anaconda3/envs/fastdup-comb/lib/python3.10/site-packages/fastdup/fastdup_controller.py\", line 669, in summary\n", " stats_df = self.img_stats()\n", " File \"/home/dnth/anaconda3/envs/fastdup-comb/lib/python3.10/site-packages/fastdup/fastdup_controller.py\", line 390, in img_stats\n", " assert df is not None, f'No stats file found in {self._work_dir}'\n", "AssertionError: No stats file found in work_dir\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = fastdup.create(input_dir=timm_model.img_folder)\n", "fd.run(annotations=timm_model.file_paths, embeddings=timm_model.embeddings)" ] }, { "cell_type": "markdown", "id": "3caa9e1a-5cb5-47d3-baa0-948d879b78b3", "metadata": {}, "source": [ "## Visualize\n", "\n", "You can use all of fastdup gallery methods to view duplicates, clusters, etc.\n", "\n", "Let's view the image clusters." ] }, { "cell_type": "code", "execution_count": 8, "id": "0dbea899-8560-4e0d-9c25-7ba882dd06e0", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "753106823d2d45e5a8e8a9b6e50695d4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Components Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Components Report

Showing groups of similar images

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component146
num_images22
mean_distance0.9625
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component495
num_images20
mean_distance0.9626
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1349
num_images20
mean_distance0.96
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component258
num_images18
mean_distance0.9626
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component486
num_images18
mean_distance0.9675
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2125
num_images16
mean_distance0.9712
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2534
num_images16
mean_distance0.96
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2753
num_images16
mean_distance0.9802
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component626
num_images15
mean_distance0.9737
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1724
num_images15
mean_distance0.9607
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3364
num_images14
mean_distance0.9687
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component645
num_images13
mean_distance0.9701
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3866
num_images13
mean_distance0.988
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component1750
num_images13
mean_distance0.9831
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2704
num_images13
mean_distance0.9756
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component4928
num_images12
mean_distance0.9867
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component2235
num_images12
mean_distance0.9943
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component713
num_images12
mean_distance0.9608
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component3840
num_images12
mean_distance0.9751
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
component6185
num_images11
mean_distance0.9836
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.component_gallery()" ] }, { "cell_type": "markdown", "id": "c9c58233-e12a-4608-bcac-b311a98eedd4", "metadata": {}, "source": [ "And duplicates gallery." ] }, { "cell_type": "code", "execution_count": 9, "id": "06f822db-959a-4d37-8fdc-508233179ddf", "metadata": { "tags": [] }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "981a4e66248c4b788b46759dec72f852", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//e76626814516c662e78fcf71858379f3.jpg
To/media/dnth/Active-Projects/fastdup/examples//f4da907524bf9cb6b4f2588ba2134af3.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//b78206278f2ca82501b83ced2c77c844.jpg
To/media/dnth/Active-Projects/fastdup/examples//eb66d39daaf7ef264d427e1d0e670eff.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//8f8f3bb971e994cd5cd525b3a2571081.jpg
To/media/dnth/Active-Projects/fastdup/examples//6a44bfb5f0d61dd6ac57b495c54f0dc3.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//d95f57c80178d86166a25c7b8679f9ba.jpg
To/media/dnth/Active-Projects/fastdup/examples//4812e9df89c8ac537a50b818a7236033.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//664d02d07a08338ad0d3eb07456d13d9.jpg
To/media/dnth/Active-Projects/fastdup/examples//bd59cad80ec6c0339c822fbb45b6a1d1.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/media/dnth/Active-Projects/fastdup/examples//84e8eedb8f7a783bbf78d547bea1fcdb.jpg
To/media/dnth/Active-Projects/fastdup/examples//14d84ed1a6c78da17f560e16073cff24.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//28cd408859e1a811c6cae6fb56fcc434.jpg
To/media/dnth/Active-Projects/fastdup/examples//7d109a5c2d9caa09e6bc71c35f9ac830.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//9cd6ae6159fe2e0d5160bc06bab05653.jpg
To/media/dnth/Active-Projects/fastdup/examples//fa54691b63af4d44432f99987730e2ea.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//eed4675990746a72c574da3393dfec45.jpg
To/media/dnth/Active-Projects/fastdup/examples//733dee44474dfc4aa8304d58312e7893.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//ff13d1b54edbe8cc25ae51f4d4ecd936.jpg
To/media/dnth/Active-Projects/fastdup/examples//11244cae08d34a160374ed7c3dc19b37.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//e19791e61df217a4828fb6e69f3056d2.jpg
To/media/dnth/Active-Projects/fastdup/examples//479c8de633316df70873141b44dce545.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//65af7ae970f86e283d8b8e45337d5f17.jpg
To/media/dnth/Active-Projects/fastdup/examples//38c75058bcbe6777bbf4b0e057f36954.jpg
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.999999
From/media/dnth/Active-Projects/fastdup/examples//ccdf0cfc8001617a3b36a1184e0dd5b1.jpg
To/media/dnth/Active-Projects/fastdup/examples//2eeaeb80df88558ef8b7ab30f164933a.jpg
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "markdown", "id": "bc8a3ce2", "metadata": {}, "source": [ "## Wrap Up\n", "In this tutorial, we showed how you can compute embeddings on your dataset using TIMM and run fastdup on top of it to surface dataset issues.\n", "\n", "Questions about this tutorial? Reach out to us on our [Slack channel](https://visuallayer.slack.com/)!\n", "\n", "\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try." ] }, { "cell_type": "markdown", "id": "44acb813-730b-4513-9266-17e0348f8584", "metadata": {}, "source": [ "\n", "## VL Profiler - A faster and easier way to diagnose and visualize dataset issues\n", "\n", "If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n", "\n", "VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost!\n", "\n", "[Sign up](https://app.visual-layer.com) now.\n", "\n", "[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/github_banner_profiler.gif)](https://app.visual-layer.com)\n", "\n", "As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)." ] }, { "cell_type": "markdown", "id": "75e95b2c-5354-46b6-8f5a-23d3c20e1864", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", "
\n", " GitHub •\n", " Join Slack Community •\n", " Discussion Forum \n", "
\n", "\n", "
\n", " Blog •\n", " Documentation •\n", " About Us \n", "
\n", "\n", "
\n", " LinkedIn •\n", " Twitter \n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.13" } }, "nbformat": 4, "nbformat_minor": 5 }