{ "cells": [ { "cell_type": "markdown", "id": "09937637-0401-42a3-a54e-bf20a3256464", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", " \n", "
\n", "
\n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", " \n", " \"Logo\"\n", " \n", "
" ] }, { "cell_type": "markdown", "id": "fbf7bc42-ba7d-498f-9b82-09584215a5db", "metadata": {}, "source": [ "# Analyzing Hugging Face Datasets\n", "\n", "[![Open in Colab](https://img.shields.io/badge/Open%20in%20Colab-blue?style=for-the-badge&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMAAAABuCAYAAABxyhyZAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH4wYOByseuquXkAAAAB1pVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVBkLmUHAAAUvklEQVR42u2d23YaSbKGv8gqqjjofHRPt+dy3qHbPQ+/V9vTaz/Cvps9u9stA8KWEOJUVbEvskBIBlEgKoGCXEtLlmQJKvKPiD8iIyOELV3aqCoEkBhQH0oRxFXwHu1/iKsQRWASMKPfiuyHiYAEuegJO770tqIkBvBBPYgCwIBvwPO//4U4giixMpQITAySADFy8bh18pTtAHuoSABJBSIfogpINd0QH4YCmj6KepCkv+ip/WzSz36cfj+CZAgyBNOFIAKvC8kAuewXVinGcoxLMCxbecZlC/bYh8iDJH38WECmiELVosao/fBjK88oAa8Hfhf8VJ463Hgjs7FvTutHCmVIQujXQEsQBxCb1Oq/fALN9oj64kcmAS8Bb2AVIuyA6YAZIhftrVcGbVaV5AC0DL2qlWMUWqBPynEsKl0MPq/J0+tCMAB5AOlspHHZqDekt4fKsALJIfTLqZXyUkFrfu9Y0z8sgKh176UhBI9g2lDqIufbowx6W1HiCkQHMDyAQdla+JEc836SsXES+4UXWVmGXTD3YB6Ri81QBlm/hSqrtU7HwBF0A0j8OVbdoVj8GMI+eHdQukMu7zbXazYP9Qn0BzAM1izHKTINBlB6AK8NQRu56MhOKoA2QyWpQf8MhocwKFkeL7phsEpFZBLw+1DugN8Gc7cxVkzrR0pyDN0jGFYtTYQNlOWEPP3YUqRKC/yHtdFN5y+qjVCJj2F4BL0jG5C5cMurcu0i4A2h3IbSV/Dba1MEbZyoleNJyuvNhlj7BeBnEig9QuUBzFfk6l4KqwBaP1X650/A36rNmiI2bwjhHZRbyNVXZ7LUZlUZnEDvwvJ7ZItl+VIRWuC1nAXMTl5EGzVleA69cxgEW75ZU8RXGkCtBX4rd1eu9SulewL9Y0t1pCiyxFLgUgyle6h8Q67qsvUKoDdXSv/M8tOtt1SviVGtGw+byLvVb5w2jpX+KfRP0+C2iHKckGdpAEELKvkaldz+sNaP7Ib1zrec7ixowfyhdePl25VtnNavlc4F9A82KKvjyDZX7qBcR65bsjUKoPVTpXtdcKv/SqBsgPI9lG+Q629Ly1hvK9aIdK93wOq/AtGgC9VGLrHB6l31H6GSvIfOebH46TKKEPas9frbzcJy1sax0ruE7nF6LrLDskTs4eTBNwjrKz2UXKkC6MdQ8RWufJAfoXe6JdVGOVKiahuqfy50gKZfTpTeOwv+nQb+FLjWWlBuIJffVoIss7K9/hQqCvQF6hHon5YG7PIqxVBqLQb++pny+OMe/LPcaucMHt9ZI7EpCqAfQ8VWGNvVGynBfyD8uqP7KDadF95ll+PNlfLwUxrs7sE/Uwm6x/D4ozUW61YA/RjqGPiTxKo36Ql2TQnSNF6liZx3JRv4T5X2OxhU9uDPogTDMsQh+j++rk0BxuDXGdHFriqBSaByi1y1MoL/Qnl8D/Ee/JmSCyaG8mdoNeDWs/TbtQK8Cv6dVgKBsA3B14wB75lNGe8tfzbZejGEn6Ftwc/Qfls/ldWZAowDXs32nndGCZS0Pijb6aVNdb7bc/7MaI2s5X9IwT+i3pEVvn5c3BOYpcDPRMDLXgmePWftLlMphDYPld4VdA/34F+E9rQnwC8TP49ZKuVuFga/jjRuCXAUWQlU7HVKvzH/vzZDpXcOndM9+DN51ZT2PEwB/+T/i1jYC2RWAP0t5fzxGy1kUZXAS2zhVpZ69ugMumd7cGeS6wT4mzPA/wJj+nv2eMAsBN5VgLWQSiBQ+QZea74hqZ8pj5dpgeB+vSrTEe15zfJ/Z1wA1cyZoUwKMI6wdXXP9p0SvOWPjQLy0eX2aR/P/s+KN6o0sGXQcwq1tF5Tuhebm/FZVI6a4/sYBbztBcA/qQQLQHE++FXfRn1ee9DyRO1Q/3Qx0Av2bqmJLTeTQfp54s2qB3iggf23emlrFVmiDcg0E6Jw8Cfyt8/zZXnzXrl/99R7Z1MsrabP4SVWdpJBlni2Y0fkPe3FqjTCZOD8c6mT/R358LpR8ueDNCfwv/QEV39CyBwlSHlYaQDlHiSPUOpaa+ENIbFdyiabMWkzVCsND5KSrawcVsBUoVeGYekN/E4gfLB5/3libJwo7dMNusU1Q5beIAV9zMvueWNZimcp3EplOQH+ZS3/5IqBkg2K5dfZSvCqAiyTV129Ekzcvw0ebaZF2uD3M5UZzLqwrq2K4oegh7bxVr+2WNmxYi+/hM255bnaDJXBuaU+6wT/5KX+kSy9Nkg308X+V2VZqmC7fCwhy8lszyrAP0mFvDdQIP2UFrm52pxndOjsCfjlewjvQe5yuSytjapaRTi0F/ajUgagChx8QX76dwbqc6Xcv0+bfK0R/H70JEu9R657OcryyN5bHpoMIE7r/d9Ke2ZFuWY2FfJfBX/icINeeoKqghoIW7m3HpHLR4FHtNlWSvdW+XpHab9RnU19ggxZny9lpXeyvkstiu2RGrYhbDqW5V0GWb4x4H1jpDvzR/rbAuUOq14B8B6omMzVlCvFTP1IiU6gez79KqKJ4egz8u6v+da/fq3c/Zi2JlwDzy8NoPIFgm9r6d5s212e2Tr+l7LMg/bMDIgF+fC9x/Nncv91gF9SlxWB/LS+rmv2MOserT8onR+e1+ooULnPlvNvVpXO8XqaAgg2vVxpZq5KzeVtnLcF2uhNxxb9TcryZWFbHuAfxz46kyFN/4XEsaRGXO3Xvsg/N6Rx6lVLqP0fVJtpckMgGEJwmy0WiQ5sBzx1jHyjUKtD7a+1gv/Zu3rXFA7+gNptauiWOORadqV/e1pSx/+e+6d5f9fg90B+2cD22Vf3os1YEYHuKQStTC06tBEqvWObJ3eV+VFsJ+aDBoSNtdDH1+ODO9FmpPhDGA7yt/wZMkJm6ndix+CXzQT/U/qvI1Q+w+EfUG1lBGPNdmh2Zv7FBrtHTQi+bBz4n8kyrEOvAa3JQzQHxmGKFzDfZX5idQ/+Xzd/KotcPAqlVvaWHEnNbeArCgdfwG9sTNfqmW/1rCvyj0jGFRauVvT965nvvnJosLYF/E9KkO292pqfWtqt2ZEwK7dQam7VnC75tS8jI+jO6MorChA5DH697QL/YqsGUdWNNVHsmUTlNs3Bb9eSX/syqttxIqtYn12fNM/ojyvxGSh0x6x+ZWLKTc4b6kdQa66sUdSa1MAdHF4U7ZnvBOriDfhMPZQowrJjR2tuKj4Ntgnv1ZetlqV86Fkv4IIxRrNiAHFEf3yQn4s7ipS4bKe15G5NxA6dq9wVQmzyS0qFHDGQUTbIgKOqz5T3F/4K7KBisz95q7hJoPwNufhWHGOi4EQJ4pcewJX11/kXFLZ/E6sOsj8CQS9TOcZ2UaG+ODGQ+sR9zPgbeSuAccTx1on9ek2hkr+XU4XgDrnsFM+YuMBJQjrDePRSrvL/SbEVAPFhkHPZswpUhrbxbhGXM4zYPTL60UH6Uyy325Qit/xkWk6vPOYsS+mAdItpQ/7p6FwgtrGvQcDJra9d6P/klfO/9WUSCNsbX+6w8VhJACMYjOT/gqUdoD+QjoDNGZcSg+kXW45JihkHrMQ4qcUwUnz6AxDlbElUwO8DxVYA+WdfXtbs5EODFEOS86YZ+0JFX/qlrEQ5lz8IthvGVaf4xiTWfLNBaebTzO3xvwoF2AX6I2nvoVxfQ8HfkamDCfkrgBY+M+80Ara3v/JWgGCwF/UKlcw4sc4ie2GvTJbDHXlON5hxowCqO7JheZdAJxDFu6EAjjDzVAqx9wAriFDzxr9n+5/uEEVxowBb/hAbZrpy3K2Y/do2BTA7QoHGwJd8hRkEe3Su9CVc5IF2gQLpssPT9u50vUGwi5Ng3Q+CWw3+fdAdGa3kNAjerxWs2E6ryXsNgr2oV4j+/D1AsiNqpqOpKnkrgGfbLu4C/8+T8clYAXLmPwngFT8GkOue4DuoKx8EoGHxDYon+SuAuKgFAkjUzhso+sq9TkdBAqBcbGf6W6i5F2mmSmacdOYd7ggNCgb5W5NEyL3qdBPoT94VH2oZq0HVTSuKXTgMjnv5H1YlBoZHtgFXYfmkqxhDMbY3ozjRtsLTIOnZWbuasyyjiv0oKv2JHdDykiC/9sUSk0jdaV2hyWsEQZSvCRO1s7YGh8WlPy5WMvlyipteLAU/yLQ3tbpujMnwGG0cF8+jusDJxC3Fp85wLjZN0i7UhdaCR9u5IW9OGYUQHxXLgbrqUG543hhLPqSDCnIPEncgGA66dk5X7qllA4+naOOkOAZFcDOeyzx1JzfPXI+j9tT6e4G9gNdLOzc4OBMYlKF/Ugzr/3uoTvpTvThhfoK8EXdTOiKeTekoFAM67wqmY0eVujCZ3TP05mqrZamfyhb8ruZT6DQFcNm6JIFCt4oLu2ByzgaNYwEfupd2uv32qsDaEiRjBRgPLHOVhoodziVwvjopDXJk0Xo16F2m3am3DPof07y/K7Sb57PpzHfCdKj0aDGVQK46QuXenZcToHsO/Qu0WdWtAn+CWzLw2phU+dlRNmiSCm2JEixcemA6UBq6lWXvHKS8PeB3zYSnjOcyUy3zOpTgX5urBFo/UwYnaPMw+3uUDgQPbtyqAl4MwV9Qb2+8QdGPoY7KY5yS/fiVGODZ8hxLJNncmEDrZ0rnB7h/D9FZdvxf9oXSnb0lpjkrgRdD+BkeGlA3kGyuQRmD33XQ62ViROmb/JRyM9dvUp5Uct1DtLURKvEZPF7awXeopTS1P5F39WwT45tVpfMTdE9z8vVis03lz9BuwK3HuLx91Kcr2YyB5PpbesrrmvNPBr9T5tOZmS51LVJK3dRJgt6cqjbWE9Bp41jp/wDtH57AD7YIbYGUo1w8CpU78Ib5yGoW+Ec/j6zlW7dn1U+h4q0J/HMwPdMy6L9Cd4cTk+skhtNDiN7bU9WwBV7bUoq8ZdSsKoMTGJxDrzr74Q/qUP4r05QW/VJWen9fvRcwE7TnJfinuX/j3hvop5TurIPyTCK8NHs2tf8qL/fJ/2bOpIaepuDvv4eoDFKBwQGU79H6vSJ3uSiCNg+VQQ06J9A/TOd8vQLW7hn4j0B9vvyve6I335T+4YrGJ8nrln/aGtXXS1pykMi4FiY3nj85enedVcA+r7Zrko0JWE5egv/FW/SGEDxC2AFpQ6mPnHWX3kS9rShRBfQQeke2ribJOi5TIHyAgz+Ry/mDqrVZVgY/wv3F27zAKNuzCPin7fhokKWu1iOMga9rpjuTBF9ef8b5m/fJQZHSTPBPs36JrbYs9yB5hFJadmCG49YkL72ENkOFtKlUHMCwAqYKj+lU95cFIlmRVG1C9S/L9efGFSdK++/Qr7zuXeZZ/qy0J4sieBOP7gsMdSGFGJe2j1LnLprjLWj95w1m9zNrUh5eYCrtmfMLiUAS2PYg5tBeQTRpTx6JgQH6v/10Y9LHaweggd3xJB1koenfk2WjfoXemfVKPM7H2+U30ZtDJQrtM+RNe7LIPprY36Ha85j/CnU8oG7SmpuJr0fJimTie5s2tsBktwPzZfWvtF5j1Upwsgj4M0b3s3pKjlrtrZT5iq3/r/0Huc5Aheo1pfvjYgHxKmjPMt5BXjGAymbXMhqmnvourSfySw4lEqsA/+RmjTdNp3/kcutNbZp0cJmpVEKuOkK5bpUm05sR9+BnwuqPPqIXX2964YrJBv4FHMVEOm0Vwl0V+DdiKXRPYHiRTV+vvwnVxvyzgXl5/v16BaOyiK5kNLQf+oLI26cAnRYJ/OmK0+uJ9bNsttFvQaU13/KvKuDdlZXmMxZJ8S7kAeRDT95ULHdSQPCDzeoMKtC9Qpvza/Lloi+Ub6H2dTqqR5b/oQHNPfgXiWbnZX3eRoGYuDizyMtogcE/+ZD9Q+gdZ5PjRdvGA5X2kzAVm9Ha057FwW8WB/9SCjDWMn+BjTktOvjTlRjoX6M3F9lqhS7vhPKNPVTTPe1ZGvz+8gd6yzN65emYWXeQ9swSyjCA/hXa6KpcduY+rVy3RG/SG/SmtQf/EuB/S3r+TSLWT6GOO/nqroP/hXE4rCM//Tv7qeofofKg8GWP+szILVnwL0N93kSBnlGheIYf2VXwjzZnwXYl8lNf+Cp2U/c6MH95bwf/mxVgHBSPKkdlD/7xikv27sACndvk576MqeVeCWYbF2+5jM/KKdBUSnSwB/8z8R7fQOkLctFboMis/DQhJdlj/pm5ltVWsK6+OOD2TLn/AXoHS1Y9Fgj84QPU/kKuWrK0QXF9eXyTKY9Jy3JW7FBWHwPeXCm969dvVRU2ABZboVpuQbmBXN2/LdHw32VloLvtCQzgCfLL6i/x5HcrqHGs9K5snUzmiyYFsPqlAdSa4NVXdntt3KRg1yjRxIFrXlc5c2XoeltRhidpZ4Uyy1082Qarjz3EqtxDcItct/LxrOvq1rGOQHfE9z/ke4fZSYhqvcGlHeszDIqjBCq2C3TQscVtXiv3y/vjIRJxQRXB8JTidHCB312HgGaoRIfQv4B+DRJ/+xUhGELwFcJvme4G7xVhTpA7qun52V3nCudJyjEt6h8/KYLqlqRLxWa2/KGlO95tpptgucnyY3oSD6ylhc2qLP6oGdoH9w281rd5txUlPoLuMUQ1iEr2ruympU5HbQ29BIIeBHfg3a8V+FO9gaZWdNQCJdlw0JN/gLvRCvBEjapKcmC7JcST7UlYo0mbEIsfQ+kBwjvwu8jFt432Vfp7WVF9fq1xU9Yk65V8exNtjQI827z6kRJVgZod+hCV0mZSki9N0gnLJGq7SwSP6aSXNnhdO/pom+LzkVcoiZ0DHbOePvyj0gUdfaFroTpboQDPPcOJkoSQlFPvENh2KIk8eQid9hQ6+/Fe/kg05fQRyMBOdQkGwD2YrpN2jM6UYbSSiedfJVUyL/7uCPS6fpqzlQrwHU0isHdv4zJEFfvZT6Ue++nPZtT2ecnTZy+CJE2dmC54fdtcS4aZGlwVRiH81DOMAmgzkYmBp/KL74wGz0soJ3sLKeBJWqXZ25a0xpZu4rjbW1o6mRjQNKUw8hAmJcGSpE8a2by9RFtHafJXinRqp0mbiCb6HPyj3kBi+TuSNilT3VjrnmX9P9CP0/2YVCGzAAAAAElFTkSuQmCC&labelColor=gray)](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Kaggle](https://img.shields.io/badge/Open%20in%20Kaggle-blue?style=for-the-badge&logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJYAAACWCAYAAAA8AXHiAAAABGdBTUEAALGPC/xhBQAAACBjSFJNAAB6JgAAgIQAAPoAAACA6AAAdTAAAOpgAAA6mAAAF3CculE8AAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAAAsTAAALEwEAmpwYAAAAB3RJTUUH5wURChYLYQ3XmQAAClFJREFUeNrtnWmsXVUVgL913n0dgJYKVFGh94ESiIqWaEIwRkhfheBALZJqRSXS0slSK5WOtAydR9S2SmLxh1H6gxjaGA2xXI0aGo2a1sYBceDdQpqWaq1CKOW+d5c/9n6vt6/TG/b2nmF9yfvXrnPOvV/2XmfftdcGwzAMwzAMwzAMwzCMoEizb6CRckUBEuA64H0Cpf7coSoAxxF+I8KfUbRjXKoesTCUmn0Dp2Es8AOBcn+1F/FyKXtVmQh0NPthikrS7Bs4De9hAFJ1I+7/XQO8o9kPUmRSJZZ3qVUGP3slpHM0Lgzp+vAlZUmfMWBSNWIZ+cHEMqJgYhlRMLGMKJhYRhRMLCMKJpYRBRPLiIKJZUTBxDKiYGIZUTCxjCiYWEYUTCwjCiaWEQUTy4iCiWVEwcQyomBiGVEwsYwomFhGFEwsIwomlhEFE8uIgollRMHEMqJgYhlRMLGMKJhYRhRMLCMKJpYRBRPLiIKJZUTBxDKiYGIZUTCxjCiYWEYUTCwjCiaWEQUTy4iCiWVEIV0nU+QQf6JZCXei2Q3AeQHCCnAI+CFwuNqevvM8TKyIeKmGA3OAecDowJfYUDvG/HJFNW1ymViR8FINAxYCC0QYGvoaqryzdTgloNbs5+2N5VgRaJBqAfGkOgBsE0mfVGBiBcdLNRS4H1gYSar9wIzkMD9GIW3TINhUGJQGqb4KLBZhWOhrqPICMEtKPF0fnU6pwMQKRoNU84AlkaT6GzBDEiramV6pwKbCIHiphgBfAR4QYXjQCyio8hwwNREqWk+3VGAj1qBp81IpzAWWBpcKUPgDMF2E3fWU5lS9sRFrELSdGKnmAA+KBFn8PAlV9gJTRNitGZEKbMQaMOWKotAKzBZ4KJJUvwWmCezJklRgYg0In1O1ArOBhxHOD3oBBYVfAdOBfSpQHZcdqcCmwn7T8NvfLOARES4IGV/d3y+BKZJRqcDE6hcNUs0ElseQCuVnwFSEP6nC/gxKBSZWn2mQajqwUoQRwS+i/AS4B3heFarjsykVmFh9okGqacCq0FIpoMqPgGki/B2F/RlK1E+HJe/nwEvVAkzFSTUyZHxVAHYA9wIvkWQzp+qNjVhnoUGqKcAaES4MGd9L9STuReAl6YKOm7IvFdiIdUbKFUWVFhG+CKyNIJUC24H7gEO1TjhwSz6kAhPrtLRVFJQE4S5gnQijQsZXpQ58F1da88/qy8Dk/EgFJtYplCtKXUnESbVehDeFjK9KF/AdXGXpkSytpvcHy7EaKPuRSoTPAxtEuChkfIVO4DFgPjmWCkysHtoqikCCcCewMbhUSg1lC7AYOJpnqcCmQqCnSiEBPiuwCeHikPFVeQP4GrAceDXvUoGJ1V2lIMBnvFSXhIyvynFgA7AKeK0IUkHBxSpXlHodSRI+DTyKhN33p8oxYA2wHjhWFKmgwGKVKwpdSNLCJODrIrw5ZHxVXsNNfY8Cx4skFRQ0eS9XlM5OhBbuII5UrwLLgE0UUCoo4IhV/qmigpRK3A58Q4S3hIyvyn+BB3DLCrUiSgUFE6tcUbQTkRYmAptFuDRkfFWOAouAbUBnUaWCAk2F5YpSHgXSwgScVG8NGV+Vf+H2FH6bgksFBRmxyhWlJFA9ym3AFhHeFjK+KoeBeQrfF6gXXSoowIhVriilEnQqnwC2Crw9ZHxVDgJzVE2qRnI/YtVbobPGx4CtIlwWMrbv+HKvwFMIqetR1UxyPWKJQFLjVuCbIlweMnZ3x5fXW3lKTapTyPWIpcp44FsijAkc13V8SXh6WC1bG0n/X+RVrDrwEWCCCOWQgX3Hl5kiPEMGmnM0i7yKNRSYHbrpme/4MgPh5yh0mFRnJJdiiSAQXKoOYKoIz2al40szyXXyHpgSuD2AptS5MbH6iF+q2IhypQCXu+JA4wyYWP3jemCVwij74M6OfT79QNwceAeueW1r2UatM2Ji9RMRWoAvA3e2lnp2Sxu9MLEGgG9ftLzWyY0CjDG5TsHEGiA+md+gcJW9JZ6KiTUIBD4ArAYusinxZEysweCGqom4M3OGmFwnMLEGiQgJrsntF67YgbSZXICJFQTfivvhFz7JOATKz5hcxRRLUb9DORi+3Hm9KlcjMGZXseUqnFjqeqjvBO5T5Ujg8NcBa4FLpHCf7MkU6vF9F70ngS+hPAZsVKUzVHy/Mn8bbgvY0CIn84URy3fR+x4u0T6AUAc2A9s14PfvS3ZmAnerIkWVqxBi+S5624C5Ai9rqaee6hVgKbA75PX8CWAPinBzUVfmcy+WKjVgK76LXke7sP9Gv1buiquqwHy/OSIYfuv+OoV3CcX7TTHXYvmGZ5uAJcB/eld9VscLKCTCs8Ay38wjGALvBdZB2KYjWSC3YvneVKuBRzhLF71qu6Cu89oTwBafi4XBXfKjOLGHFWnUyqVYPhnfhBPrnF30OtoFFWrAeoSdIe/FJ/PTgHuoFyeZz6VYwBu4o9n63JvK/6sjKItV+X3Im/EHjy8l4VaRYuRbeRWr33T482tEeA6XzB8KGV9cG8p1qlwL+V+ZN7EaaMi3dgErVHk96AWEd+P6kV6a5PyTz/nj9Z9qu/iTc3kceJyQi6fu7xZgmcJ5eZ4STazT4POyY8AKhV0hV+Z9Mnc3MF2UJK9lNibWGah3AcJBXBHf8yFj+63/S1T4eF7LbEysM/DizW7xVIQ9wCJV/h0yvrjTL9aqMjaPb4om1lnoTuZRduJqrYJVQgCIcA1uQ0bQ1pVpwMQ6B9V2AaEL2AI8ETTfcrQDDwHn52nUMrH6QEMlxDLCV0IA3AXMUqUlL3KZWH3F5VuxKiGGAAtFmID0nEaWaUysPlId7/KtlliVEO58xDUo71ey/6ZoYvWDarvQ5b7v8JUQgAhX4Y6guyzrTbhMrH7i8y1XCQE7Q48rAjfhSn0uyHK+ZWINnCPAIpS9QaO6kepzwBzIbjJvYg2A7lIcgb8ACyJUQrQC9wOfyuriqYk1QPyP1dTrcSohRBgFrFblesjehgwTaxBU24UkOVEJEXrxVOBKXDI/RshWDZeJNUgaKyGAXSHLbHy+9SEfe2SWdldn6FbTi18ZOAjMV5d3hYvtgk8G5gKlrORbJlYAuk+oEGEvsDhCJUQJd8jmpKwk8yZWIHoqIYhWCTESWKnKB7Owu9rECojPt6JVQojQhiuzuSLtC/MmVmB8ThSlEsJzA7ASuDDNU6KJFZiOca7yFKJVQgBMwuVcqT3EwMSKQHW82/4srhJiaYRKiBbcW+LkJKWHGJhYkeg4kcxvBzZHqIQYAayod/LhNG7ISJ1YEUp/m0ZDJcQGYEfo+P6c65Uoo9NWZpMqsbxUtQBy1SHs6/5A6ekJ4da39kS4xFjg6mY/Z29SJZZnH/CPgbrlpfwj8NdmPwj4xVPpqYSYrcqvVekKMjK7GC3AqGY/Z2/SeHTvPuB2lGtxK859/woUAY4Dv3tlCC+OqDX7URzVcUK5okjCbq0zEbdkcDFhDms9BPyi2c9oGIZhGIZhGIZhGIYRmf8B3B08w5ZeUIcAAAAldEVYdGRhdGU6Y3JlYXRlADIwMjMtMDUtMTdUMDk6Mzk6MTArMDA6MDBf+TKDAAAAJXRFWHRkYXRlOm1vZGlmeQAyMDIzLTA1LTE3VDA5OjM5OjEwKzAwOjAwLqSKPwAAAABJRU5ErkJggg==&labelColor=gray)](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-hf-datasets.ipynb)\n", "[![Explore the Docs](https://img.shields.io/badge/Explore%20the%20Docs-blue?style=for-the-badge&labelColor=gray&logo=read-the-docs)](https://visual-layer.readme.io/docs/hugging-face-datasets)\n", "\n", "This notebook shows how you can use fastdup to analyze any dataset from [Hugging Face Datasets](https://huggingface.co/docs/datasets/index).\n", "\n", "We will analyze an image classification dataset for:\n", "\n", "+ Duplicates / near-duplicates\n", "+ Outliers\n", "+ Wrong labels" ] }, { "cell_type": "markdown", "id": "34d4d2db", "metadata": {}, "source": [ "## Installation" ] }, { "cell_type": "code", "execution_count": 1, "id": "7176a4bc", "metadata": {}, "outputs": [], "source": [ "!pip install -Uq fastdup datasets" ] }, { "cell_type": "markdown", "id": "4dea523f", "metadata": {}, "source": [ "Now, test the installation. If there's no error message, we are ready to go." ] }, { "cell_type": "code", "execution_count": 2, "id": "655330c1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/usr/bin/dpkg\n" ] }, { "data": { "text/plain": [ "'1.41'" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import fastdup\n", "fastdup.__version__" ] }, { "cell_type": "markdown", "id": "40145087", "metadata": {}, "source": [ "## Load Dataset\n", "\n", "In this example we load the Tiny ImageNet dataset from [Hugging Face Datasets](https://huggingface.co/datasets)..\n", "\n", "Tiny ImageNet contains 100,000 images of 200 classes (500 for each class) downsized to 64×64 colored images. Each class has 500 training images, 50 validation images, and 50 test images.\n", "\n", "Let's load the dataset into our local directory." ] }, { "cell_type": "code", "execution_count": 3, "id": "9fb0fffc-ba54-4b77-beff-e068ae2f7753", "metadata": {}, "outputs": [], "source": [ "from fastdup.datasets import FastdupHFDataset" ] }, { "cell_type": "code", "execution_count": 4, "id": "ae56315c-07d6-4559-9a4d-fdf1772918e7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:No changes in dataset folder: /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images. Skipping image conversion.\n" ] } ], "source": [ "dataset = FastdupHFDataset(\"zh-plus/tiny-imagenet\", split=\"train\")" ] }, { "cell_type": "markdown", "id": "be18cac4", "metadata": {}, "source": [ "We can inspect the `dataset` object." ] }, { "cell_type": "code", "execution_count": 5, "id": "85ea7e08", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Dataset({\n", " features: ['image', 'label'],\n", " num_rows: 100000\n", "})" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset" ] }, { "cell_type": "code", "execution_count": 6, "id": "3e05ba85", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'image': ,\n", " 'label': 0}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]" ] }, { "cell_type": "code", "execution_count": 7, "id": "e1078a54", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAEAAAABACAIAAAAlC+aJAAAgG0lEQVR4nD26ya6uWZIlZM1uvuZvTnMbd7/eRJPhEZFkFdVQpRwUYsKAB2CA6i0YId6AR6ga1gSpJkgUSCkVMKCTACVUZoYSD88M765fd7/3nubvvmZ3ZsbgpJD2ZO+p7WW21rKF9l/9z0BWuliHqNGDuVAxFoIWILcz6PuN//Gj4d2H43dXdO9BuQNDD+pRowGaoqiZjWPPZGwNZcJ2oXZxljtQLvZse/Pq9qUDXE9nh7QZRgV5eHgY9z0z17TmeV6nORDe3t4Ou201aWBCgIhqhkaMpFlrzveP59p0//yZeo/O/zf/5r91sDQjEKuABEaIQI1MFJcMxOhZIae1Xo5LYqdD5zedIjgjAkMEMjAAABARMEBQVEVVEBOTZuLJe3ZEhAYA0Fqbpqm0Goc+p5rL2RMOw8iAeVpOpxMFL2hKiI6BkRAB0AyIaLfbMfk5F+/DVIpIa604gA0i9EpWqJkpqEmrVViNup4HolBLW9ZLg67r4o21jVFHZB6QwZgQyIOaCYCpkREyoSMiVDJVQvSBmQlFQwgtl5zLmteN35HzveMu+g5Ra6W+32w2WtUYzcAQCJGIwMCakBEaOGbHTIREIKDSigNzgASGKMqI5ExAC5awDW2QMkDmVO1sVANW78epLuSIkZnAAzA5IkSEWo2ImMlBcxSIOmzqEEDAOYeIiNh1nbJbbRXT0/FMjtihiaLjUqo1UVX2zhDIQEQMkcghmAAE5wGglCIlx3HoopOac1lc08rmBUAMgIwCCeKqloc2ufVE7d6XmZbqC7rqcOtsCzZ4wwDkwBiMAIkAGZEAEQkckSMiQAQAM4uOGU2aMDsGql47dmEYp+VyvlzMbP/s2e2zOD0e07yOuy0iioigITtmz4xstM7LOI5DF8mh95xrVWtSsrvcBheckZWWhRt6WiAd2tTIHmQ+QFo9XFxbWQCr1pN3Nwg1IHhQMiADImID8mAGoKCm0po0gZxbSZ7YMQNok+KQCNDMnHPd0A/jGGM0EUSOIdJOpdSSkzlqpgoGPjIgATbVL7/88ueffrbf74PzYFXqalrYmfv2avHeo2nOa7OmDiZb7/Q8aznYMvsGIWjkxs2BRplNLl4iQ8feOyQidA7YgfPQGpRKrai1JjlbSpaWwIymjARqDukJ8ezC4XjuNz0wXY5Hzelmt+2InKNpKmTOkBRBm7RSW9P1Mi2X6XQ6iVZj6q82iGDWalnd/5O+7qBzYE0yO2LnVyr3PD3acuYkkWLPoSMzdGQMLS9HDo5x9NB7Hz274MB1wB5yBZMnNDVrteYCORUmA2EwInDOmQF5F0IIrYrI+Xw+Xc6bF7dXVzunupzPBoqI+NTdVHPOecnHw+Gzzz6LIUzTVEHCGFwAncvj4737Qr95Ob642mw9utB3/bZf03y+P9gwmNE5T6EtL+Oz3gVu4puBZJcv/eB2Xa+aUYDElxkxgBFED844Z1ylQSsoTQQYIKekTcys1ua9byqh787T6dnz5yXNIbhlmXpHiHZ1tfOxfzifclpijE//7dNPP2UFMIlLWFsGACJ6fHwcN727+s2NH6Kg5JJWW3TBu+n003zPfptRDZWRWAFzs1SNlpvx+nR8PKYzpenq+vl2uwWPl6S1oCCCtlZWbRm0EgqRBQ4OEMkckWkzM0QioiLVOYcELrrcslooRRD0dJr2N9R1cS15Wqeh37gQRCS4YEYx9BSd60KGupaViNz783eH2QViAvZIajilJJqWQ0IfGNmDpylbBVuLekjp3fRwBKPOyjaQDQFBoUmMY1OrWqFlrQkkszUG671DAwIERjNTVXBsZDln9mjY+t6n9WIQc117F0pdm+bNbt+gHc6zIfR9X1NWJDRxzoUY1eO8rufzGRDdu8fXDmnoxu2wFd/V1pJUJGgpBRUCD4YCARVYMHjwmvbBhr6/3QYnaT28xz6pejOraqWkki6tzKjFMQRizyStBHTeewRUFTRqojmv/RjNig+0XhJgEU3kne851dyD9Jv+tKaqVaw9jXBVVdWefZW25vp4OBGRe/XyBSJ2cejHoSnk0yWtqRSJ7BwotGKm4iKxc7ELo+ulsW+9a06XPOd8Pqo/VYh+c10NqiSpC8kSQbpAIzlumFJic2MfANDMVCS3UlrhBohNIBtkpMZO2Mlm190fT3a0cX/tAy/zgoqOfMdRVUWbbz63IiKXy4V8cJ6YvHO9h+BEVANbJFSBpiJVchNmF9iHLrvWg4ui8+Eu0QGtjdtnLoTGYKKdAzBTVbUGUtUyQBNyPfm8rCTUhR4ARQQQc05mUmpiqiktziNgc94M2na3ezidT9OJ+t4Flkmnddlv9obAzCLSVKs0Yr+kjIju/vEYukit2roKghhAHxCpXBYyVBRjy76uZGJJkwwWtMw+wFXvb55duX6fcZgqJENVk6bVqtQVdCmtIvLVfp9SkmpdHJijiKCjpuIDl7pi0GWZrgZn1rwnkezJuiEu5+lyOcVuy0ytKhAJmHMOGoqBNGPHKaXUqvM+kgvSbMlTNUXvCEC1UmRnaI6cD2GIDbBIqSUdzpMta9+PHJ1ozstplWkWWhtmkZxTyTPWFaGYU2KKzFZLqa2VFZyaqgMirYFdy5WNWlrdbq9gyFRqobRGZKv1fDjurryqtmYiT+yXDbmpiqkRppTWltz19XU/DuBwWuYpz0Wl1lpbInBGjj2ptHSeduPu+fhsY+EK7f1jaeZe//BWfngnzp1T3t48z6WN4wZFMWevxmAlF7NM12U/7FThuz988ctf/rr3EbVsCIYQonbTcr7d7kpKD5b1anO7v+aEPvIZ/cvPfr5UcyzFYz9sTaBUNYRlTXHTf/PmG++5GblaK6fsAhOgMyylSsnQmpCF6D1wVbVUKqzObcbOo1VAOl3Op7dvp5K6zVYYz+v5xcuXp9M0hp5FTodLH+Km6y+X6e7uXfchE7qu969ff73bXXXDpuu6NF1US+9d3+8EkqLkBudledFfn6fLZZn3ZQ3dPm76eWlLWglYVQmMiJ5oqaqKiHv37p33vhti6DszFVGpzURVgQI4ZjGtuRRwbWwAkHMixiIt57VJ9dFtt6OPfa315uZ2Pl6my+Vqv6+53D0+3Gz3p3naThfvIzn35oc3HztPXbfpQuSYasp1LZqaiFgRcwz8YOeKzW+7FWQ5H9gl5Jhq8RS1thgcIoLofL6YgDVzZU3zPPc5XtONH0J0vmEBMFOsuQGApIYABJDXdGpHmmaz2g/9pzcfG7vhah/HAZyvTd/fv9uP+xfDy7uf3kcfnj1/fri7j1f7d493YzeQC7nmJDkOkZzz0QmpQjMVJGVw3oFzTtB4E6/irTle0woFrq9HbmpNi5ZghIS11sfHg4mCmtvvNvOyAAAi9rFjZjAruamqNs01OeNNP2ziaE2O5/s9udqyg/jy5XMKca5ZwKy1+/vHm5tnl9P0uObnz56tl+n16ze3+11uNR2TfxbWeQGi0+WMRMjweDgAqnO8G3cAklqSkmttvg8Nda1rLplip2ZFyrRcNt1OpJkZIdZSH+7uRUwVnNRmqiqyzkuM3ndxt9m3rpliS7UsJbC/Ha+Grk9LXlN2CMU0l4UJXEDJFRtnkX6Ib9683o67q6v9w+OdM37+/Pn58SFuN8FzP3QpJXZ0OhyOp8fYee9crbmlLFnUWilZWjGrsAmb6230Lq05+uHt3WMKFSoI55SWTRc9+dba4/EgIrWpm05nRVCw48NjSml3te/7kYmZnfPoO3ZGBIiiDrBj1lodYy15XZc+OGZ20Z0OUy16td8v01rXcjXu8prO5+P11c6Bbbo+zxObaauS89e//zKdL99/9/00XebLqaRcW7ZWEYy4/cN/9Cc//+Wnwp3WNo7Xusx5bbvtzXo5pmlqm7HvoyKc56mp5Fadd855X6Sdp8s0TSmV/b6E0DkkNsZmpebLUgs7NOLaWklEAFXXeQHnDK0Vd3h/f/P8+Xw+WSPn/Xw+k+Gm66FKDJ5F3n3/wziOrRqJffXFF6e370Sk5Iyl9QgDhmpaS0Zt/9t/92fffvYqI49XN//sP/5PcF5Pl8N6mHbDviwXaLeEhmilZAFrWt0QgxGqkiMmYFaCamrC0Z8PJ01l2228w7SWznfbcdNEpZUhjiZitQHjcr5cb3fYdPCxmUFuZOTZBYCOnc7L6WGFqutaVbVW2TmGnINBRAYCRrucTxvmDz/49Hi4u261XzFNp9ffvfvXf/vDzUevPvzkF8744Th9+OrjzvHD3VuMfr/fPkyntaxuu9kAYhvAufDwcDg8HAn4ww+v87JaUStaLRGLAwQUyYWRCRySc+QYHahhayAqNSNyz849+QKitpbShPJqOYMROSUAUkAE3xoDkgGa9hxCiA4wVglz7jB8/7ffvvzs57LK+bw8th+nx2mt7fPf/Htpu9H9zqRN58v5chQtt9d7R4Bq1vkYrjtQvLt/bLlKaiQYfadCT90qhBCYsCkbqTEakSA2ACASdA2cC4jIxt5ARSQXWauU0huwAqJ5BCIyRHzyrNQ8ESqMiL33kotbFprXmpe/94vf/OGHN7vttuvd67t7VIx9/+/+r//j/bvPlmWC6C55uRweCqjro6u11lqd13G7++D5ixiGdVpPj4fr/U3svALnaSEzR+yBpCoLaUMAaNmcU2TyRshx7Hpt0tba1qxrs9pYlMQ67xGeVC6SITlgJkcUgDr2zZK3ZrXk6ZLJYWs/++RnX371h49efdTdXP+fv/urP/n1b//m22+Ol/PzFx/89Prb+7u34/X+F3/8qz/9x//gh7t3f/j2G9oOwzAMJno5nXNKY99vhhEUWclz8BwIWJtYbVpqSxmUQJxV0gySDRtF6rZhgKy6NluyLRlz9rV1Yj0RqSAAmKg1JAvBD1039t127K52Y3RW8zSfHk9379L54JmqyH/6n/3zfrP97vWbJvZn//bP5rT86hc/X+bT8+urX372MWv94i/+4vDjTy93u//on/wT9/z581LKw8Pxp3d3OT/0w7bvtrvNlokYkImj8wiAatoEFYjZlATAKkoxs8bMwHp4eGAFp+AUAjpHwApgAKDkHTOS466L49j3XRcQrbbeuaPU6Xw4nx4v50PHeLPdNsJ/9a//67Db/ff/078Nu+1pnS0G18dPP/00pSQ5dwi+7/rteFznH3947cCstfYk/pe5nI+n2uk4bgMqgrGodw5UQc1MHDpppmpmIFV0zbqqmRFamtYARIiIzAYeiAEAwJzzXRc7z973Q7fZDF1wpJLnxUCW9XI+H3NZ1ZpjG7b9Qytut/03/+P/sH/x7D//L/+L//uv/uJ/+V//999/+eXnn39uJh9/8DL04XA5rbVcTsddCO7x4XK5XGqR3XgVeXh8PM6nc1nWcPNCDNiUFEBEqjQCJGKoUAEQDZtKy7WUkkBtMwxsCg0UWgUiZCIGBkUFBg7OB++jI4eKKiJEJKXlZc1rwqYBuWPfx5jyxfX++tnut3//t19+8bvz8f7bP3z5p//sT18+33/11fvz6e423jqU6KxjG/Yb+t1ffv3+7fxwP1nGz3/+69/84o9uNpvnm21A9ahkKjWXUqq0JlakOUKHxtqgZkgLpdXXFkV9ad6M0BQEHfpNhNFXNnKA1MwqYFNtqk1EzMz78Ob1m5ql4249r8H8vrtygn30SI2svv6bLwI0nc6jl3ff/eH+3evbq95HnZfHJR1yPlztfXDFgXkwbqk+Ph7H/r6WhQ0iE7RqBiBq2lSBDBTQDEwqNAUzNEJUUkFVoiejFwABAYHQCADhqQJIxg6JgAjMTESstlYNgKRZztWR70PvwQUX//iXn/QPb3/89JO//uKLf/Uv/uWc5leffPwf/uk/pVZffPjh7mp/WeZSl5ZbykvK2YGoQ1LmmsvlfGY27z0i1lLQAEVBFJUQEA0MrJRizciAmREU1AiUEQkNDRAR0QgBVRCR0UyaqlMVNWmtAYCpSirUNEZPRCklMC3SzusccrK1tNR++6vfXu+v/varP4jZZz//GSteTpdnz57dl4d3799XFQVb17mZunVdN5shxtiq1VoZCNXSuoIaGZAaqKERETpAVKu1QgMGBAAEVVVHSEQABqYASGgopVX1xICq2qTURKm1VlxlZlOVUp3C0MXYd9VUWz2WRgbYh1evXjqxSO7lzYuhGze7bRyHu8Pjhy8+7Lrx7uHhcprZu+P5/OPbn4jBtVJbreyDiKzTDJ2DKiWvvQ+gZgZo4BAZCQ3QoFYBRUVQBAIFNUNkJDADVURDNFAVqUbEzEbapGjWUhDZOefQQGvTVN1m52Lohn7Ntaay1Dyl9fjwuBt2aHR6PLVcV06AfL27ZgpprnlpCKEWffvTw48/3m2vdm4zjASoqtYk5eogBHYOCURNzdQcEQKwAZiZahNhI2QkswaKZgwMBACq1sCMCKWJWiMi7z06fkKtmAE15xwBWpP5crHSyGx/e+PZ1XntfU8h/tVf/79//Pf/3vX1NSBlaUWaIg27PUd/WZfLUh7P8/3Dwzff/6gAr66fu7HvUQ1ACfDvBDExAKiqidJTEdhMFABaMwUABEJQMDADsyc1hwiA+HQ1s9aaQxICkUagzCxmQKqqpiCtVbGH42kfN5ubmxC6Mi3OOIEZ0/3hsEprYhTjptsJkoDt9jfvp/n+PH3/9u6nd+8ep7Ubh/fHyUltItV7z4hiVktB0VrS2PUgqmYKqKiIYGagoEgIoACEYGoAZgiKgExqT0smFDBVVQQzzCUzGjkHhGggqtq0lDbEeJ7W4NvV5gaR1UhKneYZQ/fT4+NQMiDPOfl+EIIf7++ntH7zw5uf7t5nUfQcxm1B+P3r713nQyMopRQpBAhgAuKcK6V0LhBAyUUR+r4HwFJLTmKGaZ3zso5D98HzF9yFooJozGhAVaRIFVFw7JGLqK55GNg7L2rIBISpzOuaCUkQ1btnH7/Kc3r301upZWmtNj2djiH2DW0+PCjh77/75ts332OMPzzer7V98rPPqneXeYrbjZPWECCyVwIGZGfB+eBonucqjZHG7WaIW0Q8nS7H+SINY+zj0DvvY3DGlFpF0e3trdQsrYpCFsk5T8vK89xQXfBBI4rmVj12PvhhszPDmkoBupSCa045XUo71UZdt7vaf/f9ayv1cD6d5vnudFhrO0vNl3VVhegf1wXSuru+/tWvf+1KyojGiKqtiQC0xq54IiJpAgyltVzP0zQdD+dlWV/cvmBPwXnrzDMqYxa1WqZ1TetSc+GnlhV6FRHQWmpVZed78l0/jrt9DP26JjNcL2utcso1tSkt6+M8H1M6vH/3Cf9sNV1rTgTf3b2/OzzEcbM2WWrJAExU0Z6/fPH555//4o9+6Tr2gAoAwADeAzQG/LsXwqayTufjYTqdTsx+t9332w27YE9GOUEDa1Jba4c3b6bTOafUh7jfbYZh8OwcuT526zqn0sIAvuv7YdNEz/MiYjnVdc0llZLqcpnO52ld16+++/bdMn38yWc+eOztkn9Pfb/WYo4QHIJUaLurq3/0H/zDn/3ilyktDs1UxMycR+89sxMwtaaqubYnuZNB/WbY7a5ePntBwKimqgCgSElqXvI6L4fD4Xw4TtPSx+7Zze3V1dVmGDzTqxc34Wm9x+4yp6U+rKnc3T0cThczyKlejpfHx+PldM45i8iS1vLda/Dd57/51Z//u7+Y18WYlrRur68KqAc/9OH22fVmO+S8vn33o+tcaIqqCibaRKRVFbXGzi05LcvC3u2vr0IXmXySEtABEBECQGo1zcvpcJ6mBdSSIfpAXf90uBtCDKdp6aLrh40BHs6X2k5imJq8v7tHdjnV+/vHh7uHZUmqCqBEdDpPf/m7v/r0lz//8z//8+31VSs5DN1ut6NEzfTq9qbv41df/S0AEYHT1tiRYy5NUkpNMzA5TzlnM3PBx67b396M45hznU5nYkI0AMy5Xi6X0+N5mpZa63bY9ptN9GG/v97v98MwRN9Fh/mQEArzqobztDY1QVrW1NRE2rSs07ImVfOeENEgzVNrrS3L48Nhu9n1fQ/U3T5/9vyD58qYS/HRhRjuH+5TyjfPb91f/u53/Th0XRBrS1kVpN/0m90oql3XDf2uG/qr6+3V/kZV5/32/U/vibC0dl4ud/f3x+NZFbyL1SyEbtjuxv1ViF1tWsuaQG53+2U6z8dLU6hVDOGyru/f3xtQKe3xfD5MUyqVmZmcttoQ1ezZyw+mZf7Hf/pPv/72m3Ecdrvthx++urrZP56Oh8NDCPF6uzvpKRi67e11SulxPgtIbkVRYfB9YPah226vr6+327HrOgQt6yqQKUBJeZqmx+Ph/fFxmlIXh27bjVfXaNQAl3WttZJB9CEMww8PD9txUxTnvDD53OrDabqkqqohxGG7WXIxyiEEIqqVw2YQrQW0IvrO/8k/+PcPh4P37s2bN4j44YsPbrb7t2/fcoObce/QOR8DBgoSixYUb2Suj4rw2c9+5r1zjqqpplmbLMuyTGsqeV3TvK5VBQjZuWEzXt1ce9+jmUfvXEA0MBORnFcgp8RKvKRyuRwOx+M8r4jYdT0SDcPmgw+CGYYQELG0lkouJRGRIGxijDE+e/aMmabzsazp4f1drdVq28bRe2Zmh2Sd7xAxS47W+SEMmz50/v37908UjZG6rhu6zgzI+cPh7TInEWPnx3E0Td77GKMjB6poZiZPlNS0SbEY4zKd794/3N3dnc/n4/FYq/Sb0RETwDj6/bObod947wFATB+Oh5wzIraaGbYMuN3vVJVNSy7n06HW6tmN4/h3+QtmDs41ay03Iwih6/vRBdacayu5pNaKX3wXopm13HIVM+y6fjNut1t58AcAAjXnQASsiYgROyYgQ9EGTO/e/PjVV9+s69r3fe8DaGUFLbWKNu8D7sYYmFkAgexq3ObgHfPxeEQAadXBmGoa+r7knC5zSinGCKKZUVXd2HeGkNaW12Rose907NXhfr+vUpYcay2I6J0zEwB68cHLtFRm3oxba8jka24xxs7HhrVJVqlgRo5ATETevH379s0Pp8eHcRxf3DwDgGmZmZnZA4Azg9a0FGQ2QGDovfOuZ+Z1ntnMaqs5p3nx7FpanyJgVktWyQBPZoqrtYIoM5Oj6EOM/TD2p3lChuii99571/c9OwQBJ/74cL5cLsuSPMZh2GhQT9z5KIbVqWpmU2xQSsnL+jd//YWJbGL/7OrmajPWKmjmvX9CLbnAYFIyOodMprCmFQCKoZZ8fjgQQZnn2rKJ1lqhSSRHaKaGiMzOAZpoQ4S+i+w8ImkRExjiwIGBrGpVlSpNEVk4hh5xnud1XfKm32zHXReDNmu5QBM0JUBGItW6rtPxCK0Nfd913RgjgRGoR4hMwbsYI5Grtda8OuoJuYoc7+8MYZ2TSE2p9H1kpBjj0wrdE7P3T8KDnQvs3P8faGNiETkfjvM896fho88+8ezJEwsLiHMcQvDkdDVTbLmVtUiwGLuAPsk6XSY2IBVWYyLVJjmty/TZpx8Tudbak70XnEMz75yWSjEwWm5ZVZtDBC+5rsukqo8PRx/4fDjrfsvsPQIAgKqQOUBAFBFSA2J3dbVDNGQAoyKiq1qzVuTh7f1mv9ld7aILRYpWoQBd6GvRNCdG9+nHn3Vddzme8pL7EFsu7LyJttYayetvv352c7sZ+he3L1T1crnUWgMTB2/aclmD7z795NWL5x+8v3/37bffzudTCIGDv95uj6fTpx9/tCzzftj0Q5fWfLXb1lpFJMZunRdAvNrtc84m6lJaas1mxkQRSQMYQnB+Oc2t1JpyGKIL7CNp01ZKXRobefIm0lKppZQ1tTVbbd1IZV1345DnaRg6lcoOX736EIDWdc0555zvHh+WZenHbrsbv/326y+//CJ08aOPPtrtNj/88NPvf/97FzrvaLfdeIc5V4dEYKWUmgsickdPEVQTJcDgvLvM55STGTJ5IvLMKkCCLbeS83w5hT5uduO479EAsn775fdScOyGq+1VjH3H8b7ezedLH3wIocxL13XrdNxsBim17+PffPWHJ9bVdR0ANK3kcLMZ+j7G3tlFHx/v7+7edWNH5IbtIEW22+31bssoUiqCOkaVWvJaa62llFL6EKHvyADVnJoQI4JjZlBAJahSSwnItWotNTdBE9MmqQTu1jltwmYT+zEMQz+MoevYn/pjXhcTIYZlnVprBGJozLzZXeXaUlpKqyGEbojTVE+X8/F88p1n5n7TAwAyttJaK6ZGBLms67rWkmKM4zgS0dD1x+NR1cyeAqosrbbWnO88i0clIpJiBFBaa1WZOTB3YYMOyaAtacoJde5dsGZ3b+9+evMuxrjdbh0xAOScT9PkGY8P77vgJOehiwr6zXdf+9illFpr+/02Dn03Dq0V77yYtNq8913XocNCpWk7r+ec11JKSokInKO+j7XW/W4fQgghpJSsGREZKDvv4MkEN2nFtFgrrZUqRYwoDv0YBx9JoKaackraCpbYSl6WlEuJMbZSQwjSyjqtp+Px9npfStlthgriowvRf/qzz8btfp7nd+/eibXammgV048+eFGkpZROl+P7H96XVsd+2O12m804DENrLcYtKLbWRGSe53VOMcbdbue9Xy7LE6ZDCK7UCoogZkKgSESdD83EOe+QrbaqKiCmjdTICIiIHQ+83+/7bgwh1JaT1FIKEcUYP/jgg7EPwV+v06Xruu++/zEejjmvKaVhGMg7R46Z7w+PS1pVWz8Orz75eF6n4+PhzZvXHz1/RYzQoOs6U83nFQBEKjFO05mIEJGMENHMcs6OiJ4EliExO/IOIkjVPnQprXOac10AlCP56LyLadK8ljUnZs6pOudqy6UkFYkxRh+e3e5bWa+vdq8vZwF79erV+TJP0znGeHV7w8zLMpVSyJNzzpCcc2KttUYOh834/u5tXve1tNvbWyYqpey2w3a7/fyPfv3111+fz5Oq3uxv+r5f1/V0Ov1/F6iO6Be7PdgAAAAASUVORK5CYII=", "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['image']" ] }, { "cell_type": "code", "execution_count": 8, "id": "07daca49", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset[0]['label']" ] }, { "cell_type": "code", "execution_count": 9, "id": "403cdf68-b3c3-4c64-b96b-4a0e1d86a526", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
filenamelabel
0/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg142
1/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg142
2/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg142
3/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg142
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg142
.........
99995/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg127
99996/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg127
99997/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg127
99998/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg127
99999/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg127
\n", "

100000 rows × 2 columns

\n", "
" ], "text/plain": [ " filename label\n", "0 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71384.jpg 142\n", "1 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71204.jpg 142\n", "2 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71036.jpg 142\n", "3 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71014.jpg 142\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/142/71334.jpg 142\n", "... ... ...\n", "99995 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63864.jpg 127\n", "99996 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63822.jpg 127\n", "99997 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63874.jpg 127\n", "99998 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63824.jpg 127\n", "99999 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/127/63752.jpg 127\n", "\n", "[100000 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.annotations" ] }, { "cell_type": "markdown", "id": "6aac94ea", "metadata": {}, "source": [ "## Run fastdup" ] }, { "cell_type": "code", "execution_count": 10, "id": "8e90af72", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warning: fastdup create() without work_dir argument, output is stored in a folder named work_dir in your current working path.\n", "FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.\n", "\n", "\n", " \n", " ad88 88 \n", " d8\" ,d 88 \n", " 88 88 88 \n", "MM88MMM ,adPPYYba, ,adPPYba, MM88MMM ,adPPYb,88 88 88 8b,dPPYba, \n", " 88 \"\" `Y8 I8[ \"\" 88 a8\" `Y88 88 88 88P' \"8a \n", " 88 ,adPPPPP88 `\"Y8ba, 88 8b 88 88 88 88 d8 \n", " 88 88, ,88 aa ]8I 88, \"8a, ,d88 \"8a, ,a88 88b, ,a8\" \n", " 88 `\"8bbdP\"Y8 `\"YbbdP\"' \"Y888 `\"8bbdP\"Y8 `\"YbbdP'Y8 88`YbbdP\"' \n", " 88 \n", " 88 \n", "\n", "\n", "2023-09-19 21:40:45 [INFO] Going to loop over dir /tmp/tmpjb5y9uvf.csv\n", "2023-09-19 21:40:45 [INFO] Found total 100000 images to run on, 100000 train, 0 test, name list 100000, counter 100000 \n", "2023-09-19 21:44:25 [INFO] Found total 100000 images to run onmated: 0 Minutes\n", "Finished histogram 38.760\n", "Finished bucket sort 38.971\n", "2023-09-19 21:44:45 [INFO] 20456) Finished write_index() NN model\n", "2023-09-19 21:44:45 [INFO] Stored nn model index file work_dir/nnf.index\n", "2023-09-19 21:44:56 [INFO] Total time took 251595 ms\n", "2023-09-19 21:44:56 [INFO] Found a total of 40 fully identical images (d>0.990), which are 0.02 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 11083 above threshold images (d>0.900), which are 5.54 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Found a total of 10001 outlier images (d<0.050), which are 5.00 % of total graph edges\n", "2023-09-19 21:44:56 [INFO] Min distance found 0.601 max distance 1.000\n", "2023-09-19 21:44:56 [INFO] Running connected components for ccthreshold 0.960000 \n", ".0\n", " ########################################################################################\n", "\n", "Dataset Analysis Summary: \n", "\n", " Dataset contains 100000 images\n", " Valid images are 100.00% (100,000) of the data, invalid are 0.00% (0) of the data\n", " Similarity: 0.05% (46) belong to 1 similarity clusters (components).\n", " 99.95% (99,954) images do not belong to any similarity cluster.\n", " Largest cluster has 4 (0.00%) images.\n", " For a detailed analysis, use `.connected_components()`\n", "(similarity threshold used is 0.9, connected component threshold used is 0.96).\n", "\n", " Outliers: 6.34% (6,344) of images are possible outliers, and fall in the bottom 5.00% of similarity values.\n", " For a detailed list of outliers, use `.outliers()`.\n", "\n", "########################################################################################\n", "Would you like to see awesome visualizations for some of the most popular academic datasets?\n", "Click here to see and learn more: https://app.visual-layer.com/vl-datasets?utm_source=fastdup\n", "########################################################################################\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd = fastdup.create(input_dir=dataset.img_dir)\n", "fd.run(annotations=dataset.annotations)" ] }, { "cell_type": "markdown", "id": "676d9175", "metadata": {}, "source": [ "## Inspect Issues" ] }, { "cell_type": "markdown", "id": "1017106b", "metadata": {}, "source": [ "There are several methods we can use to inspect the issues found:\n", "\n", "```python\n", "fd.vis.duplicates_gallery() # create a visual gallery of duplicates\n", "fd.vis.outliers_gallery() # create a visual gallery of anomalies\n", "fd.vis.component_gallery() # create a visualization of connected components\n", "fd.vis.stats_gallery() # create a visualization of images statistics (e.g. blur)\n", "fd.vis.similarity_gallery() # create a gallery of similar images\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "id": "8f558b89", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:100: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n", "/home/dnth/anaconda3/envs/fastdup/lib/python3.10/site-packages/fastdup/galleries.py:100: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n", " df[out_col] = df[in_col].apply(lambda x: get_label_func.get(x, MISSING_LABEL))\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "abca93b0350b4db8b9f144bbe2543bb9", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Duplicates Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Duplicates Report

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/199/99657.jpg
To/178/89263.jpg
From_Label199
To_Label178
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/199/99613.jpg
To/178/89460.jpg
From_Label199
To_Label178
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/177/88577.jpg
To/199/99767.jpg
From_Label177
To_Label199
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/190/95277.jpg
To/13/6631.jpg
From_Label190
To_Label13
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/198/99073.jpg
To/14/7463.jpg
From_Label198
To_Label14
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/37/18797.jpg
To/35/17643.jpg
From_Label37
To_Label35
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/8/4204.jpg
To/141/70895.jpg
From_Label8
To_Label141
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/102/51355.jpg
To/138/69225.jpg
From_Label102
To_Label138
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/67/33640.jpg
To/125/62558.jpg
From_Label67
To_Label125
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/180/90258.jpg
To/174/87495.jpg
From_Label180
To_Label174
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/125/62815.jpg
To/67/33973.jpg
From_Label125
To_Label67
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance1.0
From/17/8525.jpg
To/16/8111.jpg
From_Label17
To_Label16
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.duplicates_gallery()" ] }, { "cell_type": "code", "execution_count": 12, "id": "de484e82", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cdef5a592f57462195f2cbb6fccc6822", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/20 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Outliers Report\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Outliers Report

Showing image outliers, one per row

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.600712
Path/198/99254.jpg
label198
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.639867
Path/12/6152.jpg
label12
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.642672
Path/94/47232.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.654982
Path/35/17626.jpg
label35
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.663625
Path/10/5240.jpg
label10
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.665014
Path/173/86745.jpg
label173
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.666785
Path/197/98818.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668334
Path/54/27267.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668349
Path/78/39235.jpg
label78
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.668735
Path/196/98461.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.66936
Path/54/27129.jpg
label54
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.671666
Path/84/42148.jpg
label84
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.673422
Path/94/47006.jpg
label94
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.67446
Path/196/98207.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.674789
Path/196/98021.jpg
label196
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.676092
Path/197/98911.jpg
label197
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.677318
Path/87/43785.jpg
label87
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678071
Path/160/80147.jpg
label160
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.678366
Path/140/70208.jpg
label140
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info
Distance0.679228
Path/196/98027.jpg
label196
\n", "
\n", "
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.outliers_gallery()" ] }, { "cell_type": "code", "execution_count": 13, "id": "c5a7080b-04ff-42e3-8bdc-eb91d16e695d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "eb733f841f1349aabcf170f09495074c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Generating gallery: 0%| | 0/7287 [00:00\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Similarity Report, label_score\n", " \n", " \n", "\n", "\n", "\n", "
\n", "
\n", "
\n", " \n", " \"logo\"\n", " \n", "
\n", " \n", "
\n", "
\n", "
\n", "

Similarity Report, label_score

\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label0
from/0/35.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906011/85/42517.jpg85
0.905423/190/95331.jpg190
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/513.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911764/85/42716.jpg85
0.907565/166/83446.jpg166
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/515.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.933797/9/4557.jpg9
0.931858/93/46608.jpg93
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/521.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.916001/5/2756.jpg5
0.915641/5/2731.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/650.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923444/7/3800.jpg7
0.909015/17/8647.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/657.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930722/166/83497.jpg166
0.930567/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/671.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.925565/198/99447.jpg198
0.917802/17/8681.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/692.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.915914/15/7715.jpg15
0.907856/2/1496.jpg2
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/712.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906601/197/98725.jpg197
0.903525/196/98381.jpg196
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/732.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906571/5/2741.jpg5
0.900979/9/4555.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/737.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.930051/17/8949.jpg17
0.926385/35/17505.jpg35
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/763.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.942992/195/97948.jpg195
0.940392/17/8642.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/769.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923526/46/23481.jpg46
0.914404/5/2583.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/852.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.923934/148/74041.jpg148
0.920057/7/3839.jpg7
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/857.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.906768/24/12085.jpg24
0.904972/191/95642.jpg191
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/868.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.909222/3/1599.jpg3
0.905293/111/55995.jpg111
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/871.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.914131/145/72763.jpg145
0.913387/9/4771.jpg9
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/899.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.911091/7/3800.jpg7
0.905312/5/2839.jpg5
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/948.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.924178/40/20152.jpg40
0.921691/198/99487.jpg198
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", "\n", " \n", " \n", "\n", " \n", "
Info From
label1
from/1/964.jpg
\n", "
\n", "
\n", "
\n", "
\n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", "
Info To
0.939224/35/17505.jpg35
0.939199/17/8749.jpg17
\n", "
\n", "
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Query Image
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t\t\n", "\t\t\t\t\t\t\t
Similar
\n", "\t\t\t\t\t\t
\n", "\t\t\t\t\t
\n", "
\n", " \n", "
\n", "
\n", " \n", "
\n", " \n", " " ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
fromtolabellabel2distancescorelength
4/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg][0, 0][190, 85][0.905423, 0.906011]0.02
14/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg][1, 1][166, 85][0.907565, 0.911764]0.02
16/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg][1, 1][93, 9][0.931858, 0.933797]0.02
19/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg][1, 1][5, 5][0.915641, 0.916001]0.02
68/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg][1, 1][17, 7][0.909015, 0.923444]0.02
........................
7273/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg][99, 99][99, 99][0.904351, 0.913638]100.02
7275/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg][99, 99][99, 99][0.92261, 0.92414]100.02
7279/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg][99, 99][99, 99][0.904262, 0.913118]100.02
7283/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg][99, 99][99, 99][0.91175, 0.914616]100.02
7285/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg[/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg][99, 99][99, 99][0.913667, 0.917606]100.02
\n", "

3796 rows × 7 columns

\n", "
" ], "text/plain": [ " from to label label2 distance score length\n", "4 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/0/35.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/190/95331.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42517.jpg] [0, 0] [190, 85] [0.905423, 0.906011] 0.0 2\n", "14 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/513.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/166/83446.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/85/42716.jpg] [1, 1] [166, 85] [0.907565, 0.911764] 0.0 2\n", "16 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/515.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/93/46608.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/9/4557.jpg] [1, 1] [93, 9] [0.931858, 0.933797] 0.0 2\n", "19 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/521.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2731.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/5/2756.jpg] [1, 1] [5, 5] [0.915641, 0.916001] 0.0 2\n", "68 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/1/650.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/17/8647.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/7/3800.jpg] [1, 1] [17, 7] [0.909015, 0.923444] 0.0 2\n", "... ... ... ... ... ... ... ...\n", "7273 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49882.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49544.jpg] [99, 99] [99, 99] [0.904351, 0.913638] 100.0 2\n", "7275 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49895.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49799.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49806.jpg] [99, 99] [99, 99] [0.92261, 0.92414] 100.0 2\n", "7279 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49919.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49734.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49875.jpg] [99, 99] [99, 99] [0.904262, 0.913118] 100.0 2\n", "7283 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49940.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49877.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49858.jpg] [99, 99] [99, 99] [0.91175, 0.914616] 100.0 2\n", "7285 /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49977.jpg [/home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49644.jpg, /home/dnth/.cache/huggingface/datasets/tiny-imagenet/jpg_images/99/49705.jpg] [99, 99] [99, 99] [0.913667, 0.917606] 100.0 2\n", "\n", "[3796 rows x 7 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fd.vis.similarity_gallery(slice='diff')" ] }, { "cell_type": "markdown", "id": "a4eb87fa", "metadata": {}, "source": [ "## Wrap Up\n", "\n", "That's a wrap! In this notebook we showed how you can run fastdup on a Hugging Face Dataset. You can use similar methods to run on other similar datasets on [Huggging Face Datasets](https://huggingface.co/datasets).\n", "\n", "Try it out and let us know what issues you find.\n", "\n", "\n", "Next, feel free to check out other tutorials -\n", "\n", "+ ⚡ [**Quickstart**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/quick-dataset-analysis.ipynb): Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!\n", "+ 🧹 [**Clean Image Folder**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/cleaning-image-dataset.ipynb): Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start.\n", "+ 🖼 [**Analyze Image Classification Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-image-classification-dataset.ipynb): Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go!\n", "+ 🎁 [**Analyze Object Detection Dataset**](https://nbviewer.org/github/visual-layer/fastdup/blob/main/examples/analyzing-object-detection-dataset.ipynb): Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. " ] }, { "cell_type": "markdown", "id": "4c372614-6511-45b1-ad09-9e99828ac0f5", "metadata": {}, "source": [ "\n", "## VL Profiler - A faster and easier way to diagnose and visualize dataset issues\n", "\n", "If you prefer a no-code platform to inspect and visualize your dataset, [**try our free cloud product VL Profiler**](https://app.visual-layer.com) - VL Profiler is our first no-code commercial product that lets you visualize and inspect your dataset in your browser. \n", "\n", "VL Profiler is free to get started. Upload up to 1,000,000 images for analysis at zero cost!\n", "\n", "[Sign up](https://app.visual-layer.com) now.\n", "\n", "[![image](https://raw.githubusercontent.com/visual-layer/fastdup/main/gallery/github_banner_profiler.gif)](https://app.visual-layer.com)\n", "\n", "As usual, feedback is welcome! Questions? Drop by our [Slack channel](https://visualdatabase.slack.com/join/shared_invite/zt-19jaydbjn-lNDEDkgvSI1QwbTXSY6dlA#/shared-invite/email) or open an issue on [GitHub](https://github.com/visual-layer/fastdup/issues)." ] }, { "cell_type": "markdown", "id": "e4dec7f1-c7c3-4b4b-9fba-f3867768d056", "metadata": {}, "source": [ "
\n", " \n", " \n", " \n", " \n", " \"vl\n", " \n", "
\n", " GitHub •\n", " Join Slack Community •\n", " Discussion Forum \n", "
\n", "\n", "
\n", " Blog •\n", " Documentation •\n", " About Us \n", "
\n", "\n", "
\n", " LinkedIn •\n", " Twitter \n", "
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 5 }