{ "cells": [ { "cell_type": "markdown", "id": "09937637-0401-42a3-a54e-bf20a3256464", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", " \n", "
\n", "
\n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "fbf7bc42-ba7d-498f-9b82-09584215a5db", "metadata": {}, "source": [ "# Analyzing Hugging Face Datasets\n", "\n", "[![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/hugging-face-datasets)\n", "\n", "This notebook shows how you can use fastdup to analyze any dataset from [Hugging Face Datasets](https://huggingface.co/docs/datasets/index).\n", "\n", "We will analyze an image classification dataset for:\n", "\n", "+ Duplicates / near-duplicates\n", "+ Outliers\n", "+ Wrong labels" ] }, { "cell_type": "markdown", "id": "34d4d2db", "metadata": {}, "source": [ "## Installation" ] }, { "cell_type": "code", "execution_count": 1, "id": "7176a4bc", "metadata": {}, "outputs": [], "source": [ "!pip install -Uq fastdup datasets" ] }, { "cell_type": "markdown", "id": "4dea523f", "metadata": {}, "source": [ "Now, test the installation. If there's no error message, we are ready to go." ] }, { "cell_type": "code", "execution_count": 2, "id": "655330c1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/bin/dpkg\n" ] }, { "data": { "text/plain": [ "'1.41'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "40145087", "metadata": {}, "source": [ "## Load Dataset\n", "\n", "In this example we load the Tiny ImageNet dataset from [Hugging Face Datasets](https://huggingface.co/datasets)..\n", "\n", "Tiny ImageNet contains 100,000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.\n", "\n", "Let's load the dataset into our local directory." ] }, { "cell_type": "code", "execution_count": 3, "id": "9fb0fffc-ba54-4b77-beff-e068ae2f7753", "metadata": {}, "outputs": [], "source": [ "from fastdup.datasets import FastdupHFDataset" ] }, { "cell_type": "code", "execution_count": 4, "id": "ae56315c-07d6-4559-9a4d-fdf1772918e7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:No changes in dataset folder: /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images. Skipping image conversion.\n" ] } ], "source": [ "dataset = FastdupHFDataset(\"zh-plus/tiny-imagenet\", split=\"train\")" ] }, { "cell_type": "markdown", "id": "be18cac4", "metadata": {}, "source": [ "We can inspect the `dataset` object." ] }, { "cell_type": "code", "execution_count": 5, "id": "85ea7e08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset({\n", " features: ['image', 'label'],\n", " num_rows: 100000\n", "})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "3e05ba85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'image': ,\n", " 'label': 0}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]" ] }, { "cell_type": "code", "execution_count": 7, "id": "e1078a54", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAIAAAAlC+aJAAAgG0lEQVR4nD26ya6uWZIlZM1uvuZvTnMbd7/eRJPhEZFkFdVQpRwUYsKAB2CA6i0YId6AR6ga1gSpJkgUSCkVMKCTACVUZoYSD88M765fd7/3nubvvmZ3ZsbgpJD2ZO+p7WW21rKF9l/9z0BWuliHqNGDuVAxFoIWILcz6PuN//Gj4d2H43dXdO9BuQNDD+pRowGaoqiZjWPPZGwNZcJ2oXZxljtQLvZse/Pq9qUDXE9nh7QZRgV5eHgY9z0z17TmeV6nORDe3t4Ou201aWBCgIhqhkaMpFlrzveP59p0//yZeo/O/zf/5r91sDQjEKuABEaIQI1MFJcMxOhZIae1Xo5LYqdD5zedIjgjAkMEMjAAABARMEBQVEVVEBOTZuLJe3ZEhAYA0Fqbpqm0Goc+p5rL2RMOw8iAeVpOpxMFL2hKiI6BkRAB0AyIaLfbMfk5F+/DVIpIa604gA0i9EpWqJkpqEmrVViNup4HolBLW9ZLg67r4o21jVFHZB6QwZgQyIOaCYCpkREyoSMiVDJVQvSBmQlFQwgtl5zLmteN35HzveMu+g5Ra6W+32w2WtUYzcAQCJGIwMCakBEaOGbHTIREIKDSigNzgASGKMqI5ExAC5awDW2QMkDmVO1sVANW78epLuSIkZnAAzA5IkSEWo2ImMlBcxSIOmzqEEDAOYeIiNh1nbJbbRXT0/FMjtihiaLjUqo1UVX2zhDIQEQMkcghmAAE5wGglCIlx3HoopOac1lc08rmBUAMgIwCCeKqloc2ufVE7d6XmZbqC7rqcOtsCzZ4wwDkwBiMAIkAGZEAEQkckSMiQAQAM4uOGU2aMDsGql47dmEYp+VyvlzMbP/s2e2zOD0e07yOuy0iioigITtmz4xstM7LOI5DF8mh95xrVWtSsrvcBheckZWWhRt6WiAd2tTIHmQ+QFo9XFxbWQCr1pN3Nwg1IHhQMiADImID8mAGoKCm0po0gZxbSZ7YMQNok+KQCNDMnHPd0A/jGGM0EUSOIdJOpdSSkzlqpgoGPjIgATbVL7/88ueffrbf74PzYFXqalrYmfv2avHeo2nOa7OmDiZb7/Q8aznYMvsGIWjkxs2BRplNLl4iQ8feOyQidA7YgfPQGpRKrai1JjlbSpaWwIymjARqDukJ8ezC4XjuNz0wXY5Hzelmt+2InKNpKmTOkBRBm7RSW9P1Mi2X6XQ6iVZj6q82iGDWalnd/5O+7qBzYE0yO2LnVyr3PD3acuYkkWLPoSMzdGQMLS9HDo5x9NB7Hz274MB1wB5yBZMnNDVrteYCORUmA2EwInDOmQF5F0IIrYrI+Xw+Xc6bF7dXVzunupzPBoqI+NTdVHPOecnHw+Gzzz6LIUzTVEHCGFwAncvj4737Qr95Ob642mw9utB3/bZf03y+P9gwmNE5T6EtL+Oz3gVu4puBZJcv/eB2Xa+aUYDElxkxgBFED844Z1ylQSsoTQQYIKekTcys1ua9byqh787T6dnz5yXNIbhlmXpHiHZ1tfOxfzifclpijE//7dNPP2UFMIlLWFsGACJ6fHwcN727+s2NH6Kg5JJWW3TBu+n003zPfptRDZWRWAFzs1SNlpvx+nR8PKYzpenq+vl2uwWPl6S1oCCCtlZWbRm0EgqRBQ4OEMkckWkzM0QioiLVOYcELrrcslooRRD0dJr2N9R1cS15Wqeh37gQRCS4YEYx9BSd60KGupaViNz783eH2QViAvZIajilJJqWQ0IfGNmDpylbBVuLekjp3fRwBKPOyjaQDQFBoUmMY1OrWqFlrQkkszUG671DAwIERjNTVXBsZDln9mjY+t6n9WIQc117F0pdm+bNbt+gHc6zIfR9X1NWJDRxzoUY1eO8rufzGRDdu8fXDmnoxu2wFd/V1pJUJGgpBRUCD4YCARVYMHjwmvbBhr6/3QYnaT28xz6pejOraqWkki6tzKjFMQRizyStBHTeewRUFTRqojmv/RjNig+0XhJgEU3kne851dyD9Jv+tKaqVaw9jXBVVdWefZW25vp4OBGRe/XyBSJ2cejHoSnk0yWtqRSJ7BwotGKm4iKxc7ELo+ulsW+9a06XPOd8Pqo/VYh+c10NqiSpC8kSQbpAIzlumFJic2MfANDMVCS3UlrhBohNIBtkpMZO2Mlm190fT3a0cX/tAy/zgoqOfMdRVUWbbz63IiKXy4V8cJ6YvHO9h+BEVANbJFSBpiJVchNmF9iHLrvWg4ui8+Eu0QGtjdtnLoTGYKKdAzBTVbUGUtUyQBNyPfm8rCTUhR4ARQQQc05mUmpiqiktziNgc94M2na3ezidT9OJ+t4Flkmnddlv9obAzCLSVKs0Yr+kjIju/vEYukit2roKghhAHxCpXBYyVBRjy76uZGJJkwwWtMw+wFXvb55duX6fcZgqJENVk6bVqtQVdCmtIvLVfp9SkmpdHJijiKCjpuIDl7pi0GWZrgZn1rwnkezJuiEu5+lyOcVuy0ytKhAJmHMOGoqBNGPHKaXUqvM+kgvSbMlTNUXvCEC1UmRnaI6cD2GIDbBIqSUdzpMta9+PHJ1ozstplWkWWhtmkZxTyTPWFaGYU2KKzFZLqa2VFZyaqgMirYFdy5WNWlrdbq9gyFRqobRGZKv1fDjurryqtmYiT+yXDbmpiqkRppTWltz19XU/DuBwWuYpz0Wl1lpbInBGjj2ptHSeduPu+fhsY+EK7f1jaeZe//BWfngnzp1T3t48z6WN4wZFMWevxmAlF7NM12U/7FThuz988ctf/rr3EbVsCIYQonbTcr7d7kpKD5b1anO7v+aEPvIZ/cvPfr5UcyzFYz9sTaBUNYRlTXHTf/PmG++5GblaK6fsAhOgMyylSsnQmpCF6D1wVbVUKqzObcbOo1VAOl3Op7dvp5K6zVYYz+v5xcuXp9M0hp5FTodLH+Km6y+X6e7uXfchE7qu969ff73bXXXDpuu6NF1US+9d3+8EkqLkBudledFfn6fLZZn3ZQ3dPm76eWlLWglYVQmMiJ5oqaqKiHv37p33vhti6DszFVGpzURVgQI4ZjGtuRRwbWwAkHMixiIt57VJ9dFtt6OPfa315uZ2Pl6my+Vqv6+53D0+3Gz3p3naThfvIzn35oc3HztPXbfpQuSYasp1LZqaiFgRcwz8YOeKzW+7FWQ5H9gl5Jhq8RS1thgcIoLofL6YgDVzZU3zPPc5XtONH0J0vmEBMFOsuQGApIYABJDXdGpHmmaz2g/9pzcfG7vhah/HAZyvTd/fv9uP+xfDy7uf3kcfnj1/fri7j1f7d493YzeQC7nmJDkOkZzz0QmpQjMVJGVw3oFzTtB4E6/irTle0woFrq9HbmpNi5ZghIS11sfHg4mCmtvvNvOyAAAi9rFjZjAruamqNs01OeNNP2ziaE2O5/s9udqyg/jy5XMKca5ZwKy1+/vHm5tnl9P0uObnz56tl+n16ze3+11uNR2TfxbWeQGi0+WMRMjweDgAqnO8G3cAklqSkmttvg8Nda1rLplip2ZFyrRcNt1OpJkZIdZSH+7uRUwVnNRmqiqyzkuM3ndxt9m3rpliS7UsJbC/Ha+Grk9LXlN2CMU0l4UJXEDJFRtnkX6Ib9683o67q6v9w+OdM37+/Pn58SFuN8FzP3QpJXZ0OhyOp8fYee9crbmlLFnUWilZWjGrsAmb6230Lq05+uHt3WMKFSoI55SWTRc9+dba4/EgIrWpm05nRVCw48NjSml3te/7kYmZnfPoO3ZGBIiiDrBj1lodYy15XZc+OGZ20Z0OUy16td8v01rXcjXu8prO5+P11c6Bbbo+zxObaauS89e//zKdL99/9/00XebLqaRcW7ZWEYy4/cN/9Cc//+Wnwp3WNo7Xusx5bbvtzXo5pmlqm7HvoyKc56mp5Fadd855X6Sdp8s0TSmV/b6E0DkkNsZmpebLUgs7NOLaWklEAFXXeQHnDK0Vd3h/f/P8+Xw+WSPn/Xw+k+Gm66FKDJ5F3n3/wziOrRqJffXFF6e370Sk5Iyl9QgDhmpaS0Zt/9t/92fffvYqI49XN//sP/5PcF5Pl8N6mHbDviwXaLeEhmilZAFrWt0QgxGqkiMmYFaCamrC0Z8PJ01l2228w7SWznfbcdNEpZUhjiZitQHjcr5cb3fYdPCxmUFuZOTZBYCOnc7L6WGFqutaVbVW2TmGnINBRAYCRrucTxvmDz/49Hi4u261XzFNp9ffvfvXf/vDzUevPvzkF8744Th9+OrjzvHD3VuMfr/fPkyntaxuu9kAYhvAufDwcDg8HAn4ww+v87JaUStaLRGLAwQUyYWRCRySc+QYHahhayAqNSNyz849+QKitpbShPJqOYMROSUAUkAE3xoDkgGa9hxCiA4wVglz7jB8/7ffvvzs57LK+bw8th+nx2mt7fPf/Htpu9H9zqRN58v5chQtt9d7R4Bq1vkYrjtQvLt/bLlKaiQYfadCT90qhBCYsCkbqTEakSA2ACASdA2cC4jIxt5ARSQXWauU0huwAqJ5BCIyRHzyrNQ8ESqMiL33kotbFprXmpe/94vf/OGHN7vttuvd67t7VIx9/+/+r//j/bvPlmWC6C55uRweCqjro6u11lqd13G7++D5ixiGdVpPj4fr/U3svALnaSEzR+yBpCoLaUMAaNmcU2TyRshx7Hpt0tba1qxrs9pYlMQ67xGeVC6SITlgJkcUgDr2zZK3ZrXk6ZLJYWs/++RnX371h49efdTdXP+fv/urP/n1b//m22+Ol/PzFx/89Prb+7u34/X+F3/8qz/9x//gh7t3f/j2G9oOwzAMJno5nXNKY99vhhEUWclz8BwIWJtYbVpqSxmUQJxV0gySDRtF6rZhgKy6NluyLRlz9rV1Yj0RqSAAmKg1JAvBD1039t127K52Y3RW8zSfHk9379L54JmqyH/6n/3zfrP97vWbJvZn//bP5rT86hc/X+bT8+urX372MWv94i/+4vDjTy93u//on/wT9/z581LKw8Pxp3d3OT/0w7bvtrvNlokYkImj8wiAatoEFYjZlATAKkoxs8bMwHp4eGAFp+AUAjpHwApgAKDkHTOS466L49j3XRcQrbbeuaPU6Xw4nx4v50PHeLPdNsJ/9a//67Db/ff/078Nu+1pnS0G18dPP/00pSQ5dwi+7/rteFznH3947cCstfYk/pe5nI+n2uk4bgMqgrGodw5UQc1MHDpppmpmIFV0zbqqmRFamtYARIiIzAYeiAEAwJzzXRc7z973Q7fZDF1wpJLnxUCW9XI+H3NZ1ZpjG7b9Qytut/03/+P/sH/x7D//L/+L//uv/uJ/+V//999/+eXnn39uJh9/8DL04XA5rbVcTsddCO7x4XK5XGqR3XgVeXh8PM6nc1nWcPNCDNiUFEBEqjQCJGKoUAEQDZtKy7WUkkBtMwxsCg0UWgUiZCIGBkUFBg7OB++jI4eKKiJEJKXlZc1rwqYBuWPfx5jyxfX++tnut3//t19+8bvz8f7bP3z5p//sT18+33/11fvz6e423jqU6KxjG/Yb+t1ffv3+7fxwP1nGz3/+69/84o9uNpvnm21A9ahkKjWXUqq0JlakOUKHxtqgZkgLpdXXFkV9ad6M0BQEHfpNhNFXNnKA1MwqYFNtqk1EzMz78Ob1m5ql4249r8H8vrtygn30SI2svv6bLwI0nc6jl3ff/eH+3evbq95HnZfHJR1yPlztfXDFgXkwbqk+Ph7H/r6WhQ0iE7RqBiBq2lSBDBTQDEwqNAUzNEJUUkFVoiejFwABAYHQCADhqQJIxg6JgAjMTESstlYNgKRZztWR70PvwQUX//iXn/QPb3/89JO//uKLf/Uv/uWc5leffPwf/uk/pVZffPjh7mp/WeZSl5ZbykvK2YGoQ1LmmsvlfGY27z0i1lLQAEVBFJUQEA0MrJRizciAmREU1AiUEQkNDRAR0QgBVRCR0UyaqlMVNWmtAYCpSirUNEZPRCklMC3SzusccrK1tNR++6vfXu+v/varP4jZZz//GSteTpdnz57dl4d3799XFQVb17mZunVdN5shxtiq1VoZCNXSuoIaGZAaqKERETpAVKu1QgMGBAAEVVVHSEQABqYASGgopVX1xICq2qTURKm1VlxlZlOVUp3C0MXYd9VUWz2WRgbYh1evXjqxSO7lzYuhGze7bRyHu8Pjhy8+7Lrx7uHhcprZu+P5/OPbn4jBtVJbreyDiKzTDJ2DKiWvvQ+gZgZo4BAZCQ3QoFYBRUVQBAIFNUNkJDADVURDNFAVqUbEzEbapGjWUhDZOefQQGvTVN1m52Lohn7Ntaay1Dyl9fjwuBt2aHR6PLVcV06AfL27ZgpprnlpCKEWffvTw48/3m2vdm4zjASoqtYk5eogBHYOCURNzdQcEQKwAZiZahNhI2QkswaKZgwMBACq1sCMCKWJWiMi7z06fkKtmAE15xwBWpP5crHSyGx/e+PZ1XntfU8h/tVf/79//Pf/3vX1NSBlaUWaIg27PUd/WZfLUh7P8/3Dwzff/6gAr66fu7HvUQ1ACfDvBDExAKiqidJTEdhMFABaMwUABEJQMDADsyc1hwiA+HQ1s9aaQxICkUagzCxmQKqqpiCtVbGH42kfN5ubmxC6Mi3OOIEZ0/3hsEprYhTjptsJkoDt9jfvp/n+PH3/9u6nd+8ep7Ubh/fHyUltItV7z4hiVktB0VrS2PUgqmYKqKiIYGagoEgIoACEYGoAZgiKgExqT0smFDBVVQQzzCUzGjkHhGggqtq0lDbEeJ7W4NvV5gaR1UhKneYZQ/fT4+NQMiDPOfl+EIIf7++ntH7zw5uf7t5nUfQcxm1B+P3r713nQyMopRQpBAhgAuKcK6V0LhBAyUUR+r4HwFJLTmKGaZ3zso5D98HzF9yFooJozGhAVaRIFVFw7JGLqK55GNg7L2rIBISpzOuaCUkQ1btnH7/Kc3r301upZWmtNj2djiH2DW0+PCjh77/75ts332OMPzzer7V98rPPqneXeYrbjZPWECCyVwIGZGfB+eBonucqjZHG7WaIW0Q8nS7H+SINY+zj0DvvY3DGlFpF0e3trdQsrYpCFsk5T8vK89xQXfBBI4rmVj12PvhhszPDmkoBupSCa045XUo71UZdt7vaf/f9ayv1cD6d5vnudFhrO0vNl3VVhegf1wXSuru+/tWvf+1KyojGiKqtiQC0xq54IiJpAgyltVzP0zQdD+dlWV/cvmBPwXnrzDMqYxa1WqZ1TetSc+GnlhV6FRHQWmpVZed78l0/jrt9DP26JjNcL2utcso1tSkt6+M8H1M6vH/3Cf9sNV1rTgTf3b2/OzzEcbM2WWrJAExU0Z6/fPH555//4o9+6Tr2gAoAwADeAzQG/LsXwqayTufjYTqdTsx+t9332w27YE9GOUEDa1Jba4c3b6bTOafUh7jfbYZh8OwcuT526zqn0sIAvuv7YdNEz/MiYjnVdc0llZLqcpnO52ld16+++/bdMn38yWc+eOztkn9Pfb/WYo4QHIJUaLurq3/0H/zDn/3ilyktDs1UxMycR+89sxMwtaaqubYnuZNB/WbY7a5ePntBwKimqgCgSElqXvI6L4fD4Xw4TtPSx+7Zze3V1dVmGDzTqxc34Wm9x+4yp6U+rKnc3T0cThczyKlejpfHx+PldM45i8iS1vLda/Dd57/51Z//u7+Y18WYlrRur68KqAc/9OH22fVmO+S8vn33o+tcaIqqCibaRKRVFbXGzi05LcvC3u2vr0IXmXySEtABEBECQGo1zcvpcJ6mBdSSIfpAXf90uBtCDKdp6aLrh40BHs6X2k5imJq8v7tHdjnV+/vHh7uHZUmqCqBEdDpPf/m7v/r0lz//8z//8+31VSs5DN1ut6NEzfTq9qbv41df/S0AEYHT1tiRYy5NUkpNMzA5TzlnM3PBx67b396M45hznU5nYkI0AMy5Xi6X0+N5mpZa63bY9ptN9GG/v97v98MwRN9Fh/mQEArzqobztDY1QVrW1NRE2rSs07ImVfOeENEgzVNrrS3L48Nhu9n1fQ/U3T5/9vyD58qYS/HRhRjuH+5TyjfPb91f/u53/Th0XRBrS1kVpN/0m90oql3XDf2uG/qr6+3V/kZV5/32/U/vibC0dl4ud/f3x+NZFbyL1SyEbtjuxv1ViF1tWsuaQG53+2U6z8dLU6hVDOGyru/f3xtQKe3xfD5MUyqVmZmcttoQ1ezZyw+mZf7Hf/pPv/72m3Ecdrvthx++urrZP56Oh8NDCPF6uzvpKRi67e11SulxPgtIbkVRYfB9YPah226vr6+327HrOgQt6yqQKUBJeZqmx+Ph/fFxmlIXh27bjVfXaNQAl3WttZJB9CEMww8PD9txUxTnvDD53OrDabqkqqohxGG7WXIxyiEEIqqVw2YQrQW0IvrO/8k/+PcPh4P37s2bN4j44YsPbrb7t2/fcoObce/QOR8DBgoSixYUb2Suj4rw2c9+5r1zjqqpplmbLMuyTGsqeV3TvK5VBQjZuWEzXt1ce9+jmUfvXEA0MBORnFcgp8RKvKRyuRwOx+M8r4jYdT0SDcPmgw+CGYYQELG0lkouJRGRIGxijDE+e/aMmabzsazp4f1drdVq28bRe2Zmh2Sd7xAxS47W+SEMmz50/v37908UjZG6rhu6zgzI+cPh7TInEWPnx3E0Td77GKMjB6poZiZPlNS0SbEY4zKd794/3N3dnc/n4/FYq/Sb0RETwDj6/bObod947wFATB+Oh5wzIraaGbYMuN3vVJVNSy7n06HW6tmN4/h3+QtmDs41ay03Iwih6/vRBdacayu5pNaKX3wXopm13HIVM+y6fjNut1t58AcAAjXnQASsiYgROyYgQ9EGTO/e/PjVV9+s69r3fe8DaGUFLbWKNu8D7sYYmFkAgexq3ObgHfPxeEQAadXBmGoa+r7knC5zSinGCKKZUVXd2HeGkNaW12Rose907NXhfr+vUpYcay2I6J0zEwB68cHLtFRm3oxba8jka24xxs7HhrVJVqlgRo5ATETevH379s0Pp8eHcRxf3DwDgGmZmZnZA4Azg9a0FGQ2QGDovfOuZ+Z1ntnMaqs5p3nx7FpanyJgVktWyQBPZoqrtYIoM5Oj6EOM/TD2p3lChuii99571/c9OwQBJ/74cL5cLsuSPMZh2GhQT9z5KIbVqWpmU2xQSsnL+jd//YWJbGL/7OrmajPWKmjmvX9CLbnAYFIyOodMprCmFQCKoZZ8fjgQQZnn2rKJ1lqhSSRHaKaGiMzOAZpoQ4S+i+w8ImkRExjiwIGBrGpVlSpNEVk4hh5xnud1XfKm32zHXReDNmu5QBM0JUBGItW6rtPxCK0Nfd913RgjgRGoR4hMwbsYI5Grtda8OuoJuYoc7+8MYZ2TSE2p9H1kpBjj0wrdE7P3T8KDnQvs3P8faGNiETkfjvM896fho88+8ezJEwsLiHMcQvDkdDVTbLmVtUiwGLuAPsk6XSY2IBVWYyLVJjmty/TZpx8Tudbak70XnEMz75yWSjEwWm5ZVZtDBC+5rsukqo8PRx/4fDjrfsvsPQIAgKqQOUBAFBFSA2J3dbVDNGQAoyKiq1qzVuTh7f1mv9ld7aILRYpWoQBd6GvRNCdG9+nHn3Vddzme8pL7EFsu7LyJttYayetvv352c7sZ+he3L1T1crnUWgMTB2/aclmD7z795NWL5x+8v3/37bffzudTCIGDv95uj6fTpx9/tCzzftj0Q5fWfLXb1lpFJMZunRdAvNrtc84m6lJaas1mxkQRSQMYQnB+Oc2t1JpyGKIL7CNp01ZKXRobefIm0lKppZQ1tTVbbd1IZV1345DnaRg6lcoOX736EIDWdc0555zvHh+WZenHbrsbv/326y+//CJ08aOPPtrtNj/88NPvf/97FzrvaLfdeIc5V4dEYKWUmgsickdPEVQTJcDgvLvM55STGTJ5IvLMKkCCLbeS83w5hT5uduO479EAsn775fdScOyGq+1VjH3H8b7ezedLH3wIocxL13XrdNxsBim17+PffPWHJ9bVdR0ANK3kcLMZ+j7G3tlFHx/v7+7edWNH5IbtIEW22+31bssoUiqCOkaVWvJaa62llFL6EKHvyADVnJoQI4JjZlBAJahSSwnItWotNTdBE9MmqQTu1jltwmYT+zEMQz+MoevYn/pjXhcTIYZlnVprBGJozLzZXeXaUlpKqyGEbojTVE+X8/F88p1n5n7TAwAyttJaK6ZGBLms67rWkmKM4zgS0dD1x+NR1cyeAqosrbbWnO88i0clIpJiBFBaa1WZOTB3YYMOyaAtacoJde5dsGZ3b+9+evMuxrjdbh0xAOScT9PkGY8P77vgJOehiwr6zXdf+9illFpr+/02Dn03Dq0V77yYtNq8913XocNCpWk7r+ec11JKSokInKO+j7XW/W4fQgghpJSsGREZKDvv4MkEN2nFtFgrrZUqRYwoDv0YBx9JoKaackraCpbYSl6WlEuJMbZSQwjSyjqtp+Px9npfStlthgriowvRf/qzz8btfp7nd+/eibXammgV048+eFGkpZROl+P7H96XVsd+2O12m804DENrLcYtKLbWRGSe53VOMcbdbue9Xy7LE6ZDCK7UCoogZkKgSESdD83EOe+QrbaqKiCmjdTICIiIHQ+83+/7bgwh1JaT1FIKEcUYP/jgg7EPwV+v06Xruu++/zEejjmvKaVhGMg7R46Z7w+PS1pVWz8Orz75eF6n4+PhzZvXHz1/RYzQoOs6U83nFQBEKjFO05mIEJGMENHMcs6OiJ4EliExO/IOIkjVPnQprXOac10AlCP56LyLadK8ljUnZs6pOudqy6UkFYkxRh+e3e5bWa+vdq8vZwF79erV+TJP0znGeHV7w8zLMpVSyJNzzpCcc2KttUYOh834/u5tXve1tNvbWyYqpey2w3a7/fyPfv3111+fz5Oq3uxv+r5f1/V0Ov1/F6iO6Be7PdgAAAAASUVORK5CYII=", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['image']" ] }, { "cell_type": "code", "execution_count": 8, "id": "07daca49", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['label']" ] }, { "cell_type": "code", "execution_count": 9, "id": "403cdf68-b3c3-4c64-b96b-4a0e1d86a526", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamelabel
0/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg142
1/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg142
2/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg142
3/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg142
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg142
.........
99995/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg127
99996/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg127
99997/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg127
99998/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg127
99999/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg127
\n", "

100000 rows × 2 columns

\n", "
" ], "text/plain": [ " filename label\n", "0 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg 142\n", "1 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg 142\n", "2 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg 142\n", "3 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg 142\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg 142\n", "... ... ...\n", "99995 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg 127\n", "99996 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg 127\n", "99997 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg 127\n", "99998 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg 127\n", "99999 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg 127\n", "\n", "[100000 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.annotations" ] }, { "cell_type": "markdown", "id": "6aac94ea", "metadata": {}, "source": [ "## Run fastdup" ] }, { "cell_type": "code", "execution_count": 10, "id": "8e90af72", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n", "FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n", "\n", "\n", " \n", " ad88 88 \n", " d8\" ,d 88 \n", " 88 88 88 \n", "MM88MMM ,adPPYYba, ,adPPYba, MM88MMM ,adPPYb,88 88 88 8b,dPPYba, \n", " 88 \"\" `Y8 I8[ \"\" 88 a8\" `Y88 88 88 88P' \"8a \n", " 88 ,adPPPPP88 `\"Y8ba, 88 8b 88 88 88 88 d8 \n", " 88 88, ,88 aa ]8I 88, \"8a, ,d88 \"8a, ,a88 88b, ,a8\" \n", " 88 `\"8bbdP\"Y8 `\"YbbdP\"' \"Y888 `\"8bbdP\"Y8 `\"YbbdP'Y8 88`YbbdP\"' \n", " 88 \n", " 88 \n", "\n", "\n", "2023-09-19 21:40:45 [INFO] Going to loop over dir /tmp/tmpjb5y9uvf.csv\n", "2023-09-19 21:40:45 [INFO] Found total 100000 images to run on, 100000 train, 0 test, name list 100000, counter 100000 \n", "2023-09-19 21:44:25 [INFO] Found total 100000 images to run onmated: 0 Minutes\n", "Finished histogram 38.760\n", "Finished bucket sort 38.971\n", "2023-09-19 21:44:45 [INFO] 20456) Finished write_index() NN model\n", "2023-09-19 21:44:45 [INFO] Stored nn model index file work_dir/nnf.index\n", "2023-09-19 21:44:56 [INFO] Total time took 251595 ms\n", "2023-09-19 21:44:56 [INFO] Found a total of 40 fully identical images (d>0.990), which are 0.02 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 11083 above threshold images (d>0.900), which are 5.54 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 10001 outlier images (d<0.050), which are 5.00 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Min distance found 0.601 max distance 1.000\n", "2023-09-19 21:44:56 [INFO] Running connected components for ccthreshold 0.960000 \n", ".0\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 100000 images\n", " Valid images are 100.00% (100,000) of the data, invalid are 0.00% (0) of the data\n", " Similarity: 0.05% (46) belong to 1 similarity clusters (components).\n", " 99.95% (99,954) images do not belong to any similarity cluster.\n", " Largest cluster has 4 (0.00%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.9, connected component threshold used is 0.96).\n", "\n", " Outliers: 6.34% (6,344) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n", "\n", "########################################################################################\n", "Would you like to see awesome visualizations for some of the most popular academic datasets?\n", "Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup\n", "########################################################################################\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = fastdup.create(input_dir=dataset.img_dir)\n", "fd.run(annotations=dataset.annotations)" ] }, { "cell_type": "markdown", "id": "676d9175", "metadata": {}, "source": [ "## Inspect Issues" ] }, { "cell_type": "markdown", "id": "1017106b", "metadata": {}, "source": [ "There are several methods we can use to inspect the issues found:\n", "\n", "```python\n", "fd.vis.duplicates_gallery() # create a visual gallery of duplicates\n", "fd.vis.outliers_gallery() # create a visual gallery of anomalies\n", "fd.vis.component_gallery() # create a visualization of connected components\n", "fd.vis.stats_gallery() # create a visualization of images statistics (e.g. blur)\n", "fd.vis.similarity_gallery() # create a gallery of similar images\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "id": "8f558b89", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:100: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n", "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:100: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "abca93b0350b4db8b9f144bbe2543bb9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/199/99657.jpg
To/178/89263.jpg
From_Label199
To_Label178
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/199/99613.jpg
To/178/89460.jpg
From_Label199
To_Label178
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/177/88577.jpg
To/199/99767.jpg
From_Label177
To_Label199
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/190/95277.jpg
To/13/6631.jpg
From_Label190
To_Label13
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/198/99073.jpg
To/14/7463.jpg
From_Label198
To_Label14
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/37/18797.jpg
To/35/17643.jpg
From_Label37
To_Label35
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/8/4204.jpg
To/141/70895.jpg
From_Label8
To_Label141
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/102/51355.jpg
To/138/69225.jpg
From_Label102
To_Label138
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/67/33640.jpg
To/125/62558.jpg
From_Label67
To_Label125
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/180/90258.jpg
To/174/87495.jpg
From_Label180
To_Label174
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/125/62815.jpg
To/67/33973.jpg
From_Label125
To_Label67
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/17/8525.jpg
To/16/8111.jpg
From_Label17
To_Label16
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "code", "execution_count": 12, "id": "de484e82", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cdef5a592f57462195f2cbb6fccc6822", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.600712
Path/198/99254.jpg
label198
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.639867
Path/12/6152.jpg
label12
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.642672
Path/94/47232.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.654982
Path/35/17626.jpg
label35
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.663625
Path/10/5240.jpg
label10
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.665014
Path/173/86745.jpg
label173
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.666785
Path/197/98818.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668334
Path/54/27267.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668349
Path/78/39235.jpg
label78
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668735
Path/196/98461.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.66936
Path/54/27129.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.671666
Path/84/42148.jpg
label84
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.673422
Path/94/47006.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.67446
Path/196/98207.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.674789
Path/196/98021.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.676092
Path/197/98911.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.677318
Path/87/43785.jpg
label87
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678071
Path/160/80147.jpg
label160
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678366
Path/140/70208.jpg
label140
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.679228
Path/196/98027.jpg
label196
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery()" ] }, { "cell_type": "code", "execution_count": 13, "id": "c5a7080b-04ff-42e3-8bdc-eb91d16e695d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "eb733f841f1349aabcf170f09495074c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/7287 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Similarity Report, label_score\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Similarity Report, label_score

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label0
from/0/35.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906011/85/42517.jpg85
0.905423/190/95331.jpg190
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/513.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911764/85/42716.jpg85
0.907565/166/83446.jpg166
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/515.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.933797/9/4557.jpg9
0.931858/93/46608.jpg93
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/521.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.916001/5/2756.jpg5
0.915641/5/2731.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/650.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923444/7/3800.jpg7
0.909015/17/8647.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/657.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930722/166/83497.jpg166
0.930567/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/671.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.925565/198/99447.jpg198
0.917802/17/8681.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/692.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.915914/15/7715.jpg15
0.907856/2/1496.jpg2
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/712.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906601/197/98725.jpg197
0.903525/196/98381.jpg196
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/732.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906571/5/2741.jpg5
0.900979/9/4555.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/737.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930051/17/8949.jpg17
0.926385/35/17505.jpg35
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/763.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.942992/195/97948.jpg195
0.940392/17/8642.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/769.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923526/46/23481.jpg46
0.914404/5/2583.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/852.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923934/148/74041.jpg148
0.920057/7/3839.jpg7
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/857.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906768/24/12085.jpg24
0.904972/191/95642.jpg191
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/868.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.909222/3/1599.jpg3
0.905293/111/55995.jpg111
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/871.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.914131/145/72763.jpg145
0.913387/9/4771.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/899.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911091/7/3800.jpg7
0.905312/5/2839.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/948.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.924178/40/20152.jpg40
0.921691/198/99487.jpg198
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/964.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.939224/35/17505.jpg35
0.939199/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtolabellabel2distancescorelength
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg][0, 0][190, 85][0.905423, 0.906011]0.02
14/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg][1, 1][166, 85][0.907565, 0.911764]0.02
16/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg][1, 1][93, 9][0.931858, 0.933797]0.02
19/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg][1, 1][5, 5][0.915641, 0.916001]0.02
68/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg][1, 1][17, 7][0.909015, 0.923444]0.02
........................
7273/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg][99, 99][99, 99][0.904351, 0.913638]100.02
7275/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg][99, 99][99, 99][0.92261, 0.92414]100.02
7279/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg][99, 99][99, 99][0.904262, 0.913118]100.02
7283/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg][99, 99][99, 99][0.91175, 0.914616]100.02
7285/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg][99, 99][99, 99][0.913667, 0.917606]100.02
\n", "

3796 rows × 7 columns

\n", "
" ], "text/plain": [ " from to label label2 distance score length\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg] [0, 0] [190, 85] [0.905423, 0.906011] 0.0 2\n", "14 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg] [1, 1] [166, 85] [0.907565, 0.911764] 0.0 2\n", "16 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg] [1, 1] [93, 9] [0.931858, 0.933797] 0.0 2\n", "19 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg] [1, 1] [5, 5] [0.915641, 0.916001] 0.0 2\n", "68 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg] [1, 1] [17, 7] [0.909015, 0.923444] 0.0 2\n", "... ... ... ... ... ... ... ...\n", "7273 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg] [99, 99] [99, 99] [0.904351, 0.913638] 100.0 2\n", "7275 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg] [99, 99] [99, 99] [0.92261, 0.92414] 100.0 2\n", "7279 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg] [99, 99] [99, 99] [0.904262, 0.913118] 100.0 2\n", "7283 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg] [99, 99] [99, 99] [0.91175, 0.914616] 100.0 2\n", "7285 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg] [99, 99] [99, 99] [0.913667, 0.917606] 100.0 2\n", "\n", "[3796 rows x 7 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.similarity_gallery(slice='diff')" ] }, { "cell_type": "markdown", "id": "a4eb87fa", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "That's a wrap! In this notebook we showed how you can run fastdup on a Hugging Face Dataset. You can use similar methods to run on other similar datasets on [Huggging Face Datasets](https://huggingface.co/datasets).\n", "\n", "Try it out and let us know what issues you find.\n", "\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. " ] }, { "cell_type": "markdown", "id": "4c372614-6511-45b1-ad09-9e99828ac0f5", "metadata": {}, "source": [ "\n", "## VL Profiler - A faster and easier way to diagnose and visualize dataset issues\n", "\n", "If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n", "\n", "VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost!\n", "\n", "[Sign up](https://app.visual-layer.com) now.\n", "\n", "[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/github_banner_profiler.gif)](https://app.visual-layer.com)\n", "\n", "As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)." ] }, { "cell_type": "markdown", "id": "e4dec7f1-c7c3-4b4b-9fba-f3867768d056", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", "
\n", " GitHub •\n", " Join Slack Community •\n", " Discussion Forum \n", "
\n", "\n", "
\n", " Blog •\n", " Documentation •\n", " About Us \n", "
\n", "\n", "
\n", " LinkedIn •\n", " Twitter \n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }