{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Please cite us if you use the software

\n", "\n", "\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Distance/Similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyCM's `distance` method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's `distance` method, a distance measure can be chosen from `DistanceType`. The measures' names are chosen based on the namig style suggested in [[1]](#ref1)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pycm import ConfusionMatrix, DistanceType" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$TP \\rightarrow True Positive$$\n", "$$TN \\rightarrow True Negative$$\n", "$$FP \\rightarrow False Positive$$\n", "$$FN \\rightarrow False Negative$$\n", "$$POP \\rightarrow Population$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AMPLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AMPLE similarity [[2]](#ref2) [[3]](#ref3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{AMPLE}=|\\frac{TP}{TP+FP}-\\frac{FN}{FN+TN}|$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6, 1: 0.3, 2: 0.17142857142857143}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.AMPLE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Anderberg's D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Anderberg's D [[4]](#ref4)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Anderberg} =\n", "\\frac{(max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))-\n", "(max(TP+FP,FP+TN)+max(TP+FP,FN+TN))}{2\\times POP}$$" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.16666666666666666, 1: 0.0, 2: 0.041666666666666664}" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Anderberg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Andres & Marzo's Delta" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Andres & Marzo's Delta correlation [[5]](#ref5)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{AndresMarzo_\\Delta} = \\Delta =\n", "\\frac{TP+TN-2 \\times \\sqrt{FP \\times FN}}{POP}$$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.8333333333333334, 1: 0.5142977396044842, 2: 0.17508504286947035}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.AndresMarzoDelta)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baroni-Urbani & Buser I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baroni-Urbani & Buser I similarity [[6]](#ref6)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaroniUrbaniBuserI} =\n", "\\frac{\\sqrt{TP\\times TN}+TP}{\\sqrt{TP\\times TN}+TP+FP+FN}$$" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.79128784747792, 1: 0.5606601717798213, 2: 0.5638559245324765}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaroniUrbaniBuserI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baroni-Urbani & Buser II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baroni-Urbani & Buser II correlation [[6]](#ref6)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BaroniUrbaniBuserII} =\n", "\\frac{\\sqrt{TP \\times TN}+TP-FP-FN}{\\sqrt{TP \\times TN}+TP+FP+FN}$$" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.58257569495584, 1: 0.12132034355964261, 2: 0.1277118490649528}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaroniUrbaniBuserII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Batagelj & Bren" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Batagelj & Bren distance [[7]](#ref7)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BatageljBren} =\n", "\\frac{FP \\times FN}{TP \\times TN}$$" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: 0.25, 2: 0.5}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BatageljBren)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu I distance [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuI} =\n", "\\frac{(TP+FP) \\times (TP+FN)-TP^2}{(TP+FP) \\times (TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4, 1: 0.8333333333333334, 2: 0.7}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu II similarity [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuII} =\n", "\\frac{TP^2 \\times TN^2}{(TP+FP) \\times (TP+FN) \\times (FP+TN) \\times (FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4666666666666667, 1: 0.11851851851851852, 2: 0.11428571428571428}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu III" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu III distance [[8]](#ref8)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{BaulieuIII} =\n", "\\frac{POP^2 - 4 \\times (TP \\times TN-FP \\times FN)}{2 \\times POP^2}$$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.20833333333333334, 1: 0.4166666666666667, 2: 0.4166666666666667}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu IV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu IV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuIV} = \\frac{FP+FN-(TP+\\frac{1}{2})\\times(TN+\\frac{1}{2})\\times TN \\times k}{POP}$$" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: -41.45702383161246, 1: -22.855395541901885, 2: -13.85431293274332}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* The default value of k is Euler's number $e$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu V distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuV} = \\frac{FP+FN+1}{TP+FP+FN+1}$$" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 0.8, 2: 0.6666666666666666}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VI distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVI} = \\frac{FP+FN}{TP+FP+FN+1}$$" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.3333333333333333, 1: 0.6, 2: 0.5555555555555556}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVII} = \\frac{FP+FN}{POP + TP \\times (TP-4)^2}$$" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.13333333333333333, 1: 0.14285714285714285, 2: 0.3333333333333333}" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu VIII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu VIII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuVIII} = \\frac{(FP-FN)^2}{POP^2}$$" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.027777777777777776, 1: 0.006944444444444444, 2: 0.006944444444444444}" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuVIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu IX" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu IX distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuIX} = \\frac{FP+2 \\times FN}{TP+FP+2 \\times FN+TN}$$" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.16666666666666666, 1: 0.35714285714285715, 2: 0.5333333333333333}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuIX)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu X" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu X distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuX} = \\frac{FP+FN+max(FP,FN)}{POP+max(FP,FN)}$$" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.2857142857142857, 1: 0.35714285714285715, 2: 0.5333333333333333}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuX)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XI distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXI} = \\frac{FP+FN}{FP+FN+TN}$$" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.2222222222222222, 1: 0.2727272727272727, 2: 0.5555555555555556}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXII} = \\frac{FP+FN}{TP+FP+FN-1}$$" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 1.0, 2: 0.7142857142857143}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XIII" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XIII distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXIII} = \\frac{FP+FN}{TP+FP+FN+TP \\times (TP-4)^2}$$" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.25, 1: 0.23076923076923078, 2: 0.45454545454545453}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XIV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XIV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXIV} = \\frac{FP+2 \\times FN}{TP+FP+2 \\times FN}$$" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4, 1: 0.8333333333333334, 2: 0.7272727272727273}" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Baulieu XV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Baulieu XV distance [[9]](#ref9)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$dist_{BaulieuXV} = \\frac{FP+FN+max(FP, FN)}{TP+FP+FN+max(FP, FN)}$$" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5714285714285714, 1: 0.8333333333333334, 2: 0.7272727272727273}" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BaulieuXV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benini I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benini I correlation [[10]](#ref10)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BeniniI} = \\frac{TP \\times TN-FP \\times FN}{(TP+FN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.2, 2: 0.14285714285714285}" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BeniniI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benini II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Benini II correlation [[10]](#ref10)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{BeniniII} = \\frac{TP \\times TN-FP \\times FN}{min((TP+FN)\\times(FN+TN), (TP+FP)\\times(FP+TN))}$$" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.3333333333333333, 2: 0.2}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.BeniniII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Canberra" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Canberra distance [[11]](#ref11) [[12]](#ref12)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Canberra} =\n", "\\frac{FP+FN}{(TP+FP)+(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.25, 1: 0.6, 2: 0.45454545454545453}" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Canberra)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Clement" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clement similarity [[13]](#ref13)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Clement} =\n", "\\frac{TP}{TP+FP}\\times\\Big(1 - \\frac{TP+FP}{POP}\\Big) +\n", "\\frac{TN}{FN+TN}\\times\\Big(1 - \\frac{FN+TN}{POP}\\Big)$$" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7666666666666666, 1: 0.55, 2: 0.588095238095238}" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Clement)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini I similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniI} =\n", "\\frac{log(1+TP+TN)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.9348704159880586, 1: 0.8977117175026231, 2: 0.8107144632819592}" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini II similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniII} =\n", "\\frac{log(1+POP)-log(1+FP+FN)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5716826589686053, 1: 0.4595236911453605, 2: 0.3014445045412856}" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini III" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini III similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniIII} =\n", "\\frac{log(1+TP)}{log(1+POP)}$$" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5404763088546395, 1: 0.27023815442731974, 2: 0.5404763088546395}" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniIII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini IV" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini IV similarity [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ConsonniTodeschiniIV} =\n", "\\frac{log(1+TP)}{log(1+TP+FP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7737056144690831, 1: 0.43067655807339306, 2: 0.6309297535714574}" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniIV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Consonni & Todeschini V" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Consonni & Todeschini V correlation [[14]](#ref14)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{ConsonniTodeschiniV} =\n", "\\frac{log(1+TP \\times TN)-log(1+FP \\times FN)}{log(1+\\frac{POP^2}{4})}$$" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.8560267854703983, 1: 0.30424737289682985, 2: 0.17143541431350617}" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ConsonniTodeschiniV)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dennis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dennis similarity [[15]](#ref15)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Dennis} =\n", "\\frac{TP-\\frac{(TP+FP)\\times(TP+FN)}{POP}}{\\sqrt{\\frac{(TP+FP)\\times(TP+FN)}{POP}}}$$" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.5652475842498528, 1: 0.7071067811865475, 2: 0.31622776601683794}" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Dennis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Digby" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Digby correlation [[16]](#ref16)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{Digby} =\n", "\\frac{(TP \\times TN) ^\\frac{3}{4}-(FP \\times FN)^\\frac{3}{4}}{(TP \\times TN)^\\frac{3}{4}+(FP \\times FN)^\\frac{3}{4}}$$" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.47759225007251715, 2: 0.2542302383508219}" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Digby)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dispersion" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dispersion correlation [[17]](#ref17)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{dispersion} =\n", "\\frac{TP \\times TN -FP \\times FN}{POP^2}\n", "$$" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.14583333333333334, 1: 0.041666666666666664, 2: 0.041666666666666664}" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Dispersion)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Doolittle" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Doolittle similarity [[18]](#ref18)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Doolittle} =\n", "\\frac{(TP\\times POP - (TP+FP)\\times(TP+FN))^2}{(TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.4666666666666667, 1: 0.06666666666666667, 2: 0.02857142857142857}" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Doolittle)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Eyraud" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eyraud similarity [[19]](#ref19)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Eyraud} =\n", "\\frac{TP-(TP+FP)\\times(TP+FN)}{(TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)}$$" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: -0.012698412698412698, 1: -0.009259259259259259, 2: -0.02142857142857143}" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Eyraud)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fager & McGowan" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fager & McGowan similarity [[20]](#ref20) [[21]](#ref21)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{FagerMcGowan} =\n", "\\frac{TP}{\\sqrt{(TP+FP)\\times(TP+FN)}} - \\frac{1}{2\\sqrt{max(TP+FP, TP+FN)}}$$" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5509898714915045, 1: 0.11957315586905015, 2: 0.3435984122732345}" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.FagerMcGowan)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Faith" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Faith similarity [[22]](#ref22)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Faith} =\n", "\\frac{TP+\\frac{TN}{2}}{POP}$$" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5416666666666666, 1: 0.4166666666666667, 2: 0.4166666666666667}" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Faith)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fleiss-Levin-Paik" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fleiss-Levin-Paik similarity [[23]](#ref23)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{FleissLevinPaik} =\n", "\\frac{2 \\times TN}{2 \\times TN + FP + FN}$$" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.875, 1: 0.8421052631578947, 2: 0.6153846153846154}" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.FleissLevinPaik)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forbes I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Forbes I similarity [[24]](#ref24) [[25]](#ref25)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{ForbesI} =\n", "\\frac{POP \\times TP}{(TP+FP)\\times(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 2.4, 1: 2.0, 2: 1.2}" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ForbesI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forbes II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Forbes II correlation [[26]](#ref26)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{ForbesII} =\n", "\\frac{FP \\times FN-TP \\times TN}{(TP+FP)\\times(TP+FN) - POP \\times min(TP+FP, TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 1.0, 1: 0.3333333333333333, 2: 0.2}" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.ForbesII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fossum" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fossum similarity [[27]](#ref27)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Fossum} =\n", "\\frac{POP \\times (TP-\\frac{1}{2})^2}{(TP+FP)\\times(TP+FN)}$$" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 5.0, 1: 0.5, 2: 2.5}" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Fossum)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gilbert & Wells" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Gilbert & Wells similarity [[28]](#ref28)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{GilbertWells} =\n", "ln \\frac{POP^3}{2\\pi (TP+FP)\\times(TP+FN)\\times(FP+TN)\\times(FN+TN)} +\n", "2ln \\frac{POP! \\times TP! \\times FP! \\times FN! \\times TN!}{(TP+FP)! \\times (TP+FN)! \\times (FP+TN)! \\times (FN+TN)!}$$" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 4.947742862177545, 1: 1.1129094954405283, 2: 0.4195337173255813}" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GilbertWells)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodall" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodall similarity [[29]](#ref29) [[30]](#ref30)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Goodall} =\\frac{2}{\\pi} \\sin^{-1}\\Big(\n", "\\sqrt{\\frac{TP + TN}{POP}}\n", "\\Big)$$" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.7322795271987701, 1: 0.6666666666666666, 2: 0.5533003790381138}" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Goodall)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodman & Kruskal's Lambda" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodman & Kruskal's Lambda similarity [[31]](#ref31)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{GK_\\lambda} =\n", "\\frac{\\frac{1}{2}((max(TP,FP)+max(FN,TN)+max(TP,FN)+max(FP,TN))-\n", "(max(TP+FP,FN+TN)+max(TP+FN,FP+TN)))}\n", "{POP-\\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}$$" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: 0.0, 2: 0.09090909090909091}" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GoodmanKruskalLambda)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goodman & Kruskal Lambda-r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Goodman & Kruskal Lambda-r correlation [[31]](#ref31)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{GK_{\\lambda_r}} =\n", "\\frac{TP + TN - \\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}\n", "{POP - \\frac{1}{2}(max(TP+FP,FN+TN)+max(TP+FN,FP+TN))}\n", "$$" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.5, 1: -0.2, 2: 0.09090909090909091}" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GoodmanKruskalLambdaR)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Guttman's Lambda A" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guttman's Lambda A similarity [[32]](#ref32)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Guttman_{\\lambda_a}} =\n", "\\frac{max(TP, FN) + max(FP, TN) - max(TP+FP, FN+TN)}{POP - max(TP+FP, FN+TN)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6, 1: 0.0, 2: 0.0}" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GuttmanLambdaA)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Guttman's Lambda B" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Guttman's Lambda B similarity [[32]](#ref32)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{Guttman_{\\lambda_b}} =\n", "\\frac{max(TP, FP) + max(FN, TN) - max(TP+FN, FP+TN)}{POP - max(TP+FN, FP+TN)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.3333333333333333, 1: 0.0, 2: 0.16666666666666666}" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.GuttmanLambdaB)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hamann" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hamann correlation [[33]](#ref33)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{Hamann} =\n", "\\frac{TP+TN-FP-FN}{POP}\n", "$$" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6666666666666666, 1: 0.5, 2: 0.16666666666666666}" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.Hamann)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Harris & Lahey" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Harris & Lahey similarity [[34]](#ref34)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{HarrisLahey} =\n", "\\frac{TP}{TP+FP+FN} \\times \\frac{2TN+FP+FN}{2POP}+\n", "\\frac{TN}{TN+FP+FN} \\times \\frac{2TP+FP+FN}{2POP}\n", "$$" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6592592592592592, 1: 0.3494318181818182, 2: 0.4068287037037037}" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.HarrisLahey)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Hawkins & Dotson" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Hawkins & Dotson similarity [[35]](#ref35)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{HawkinsDotson} =\n", "\\frac{1}{2} \\times \\Big(\\frac{TP}{TP+FP+FN}+\\frac{TN}{FP+FN+TN}\\Big)\n", "$$" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6888888888888889, 1: 0.48863636363636365, 2: 0.4097222222222222}" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.HawkinsDotson)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kendall's Tau" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kendall's Tau correlation [[36]](#ref36)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$corr_{KendallTau} =\n", "\\frac{2 \\times (TP+TN-FP-FN)}{POP \\times (POP-1)}\n", "$$" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.12121212121212122, 1: 0.09090909090909091, 2: 0.030303030303030304}" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KendallTau)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kent & Foster I" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kent & Foster I similarity [[37]](#ref37)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KentFosterI} =\n", "\\frac{TP-\\frac{(TP+FP)\\times(TP+FN)}{TP+FP+FN}}{TP-\\frac{(TP+FP)\\times(TP+FN)}{TP+FP+FN}+FP+FN}\n", "$$" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: -0.2, 2: -0.17647058823529413}" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KentFosterI)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Kent & Foster II" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kent & Foster II similarity [[37]](#ref37)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{KentFosterII} =\n", "\\frac{TN-\\frac{(FP+TN)\\times(FN+TN)}{FP+FN+TN}}{TN-\\frac{(FP+TN)\\times(FP+TN)}{FP+FN+TN}+FP+FN}\n", "$$" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.0, 1: -0.06451612903225801, 2: -0.15384615384615394}" ] }, "execution_count": 55, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.KentFosterII)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
1- C. C. Little, \"Abydos Documentation,\" 2018.
\n", "\n", "
2- V. Dallmeier, C. Lindig, and A. Zeller, \"Lightweight defect localization for Java,\" in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
\n", "\n", "
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, \"An evaluation of similarity coefficients for software fault localization,\" in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
\n", "\n", "
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
\n", "\n", "
5- A. M. Andrés and P. F. Marzo, \"Delta: A new measure of agreement between two raters,\" British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
\n", "\n", "
6- C. Baroni-Urbani and M. W. Buser, \"Similarity of binary data,\" Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
\n", "\n", "
7- V. Batagelj and M. Bren, \"Comparing resemblance measures,\" Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
\n", "\n", "
8- F. B. Baulieu, \"A classification of presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
\n", "\n", "
9- F. B. Baulieu, \"Two variant axiom systems for presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
\n", "\n", "
10- R. Benini, Principii di demografia. Barbera, 1901.
\n", "\n", "
11- G. N. Lance and W. T. Williams, \"Computer programs for hierarchical polythetic classification (“similarity analyses”),\" The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
\n", "\n", "
12- G. N. Lance and W. T. Williams, \"Mixed-Data Classificatory Programs I - Agglomerative Systems,\" Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
\n", "\n", "
13- P. W. Clement, \"A formula for computing inter-observer agreement,\" Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
\n", "\n", "
14- V. Consonni and R. Todeschini, \"New similarity coefficients for binary data,\" Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.
\n", "\n", "
15- S. F. Dennis, \"The Construction of a Thesaurus Automatically From,\" in Statistical Association Methods for Mechanized Documentation: Symposium Proceedings, 1965, vol. 269: US Government Printing Office, p. 61.
\n", "\n", "
16- P. G. Digby, \"Approximating the tetrachoric correlation coefficient,\" Biometrics, pp. 753-757, 1983.
\n", "\n", "
17- IBM Corp, \"IBM SPSS Statistics Algorithms,\" ed: IBM Corp Armonk, NY, USA, 2017.
\n", "\n", "
18- M. H. Doolittle, \"The verification of predictions,\" Bulletin of the Philosophical Society of Washington, vol. 7, pp. 122-127, 1885.
\n", "\n", "
19- H. Eyraud, \"Les principes de la mesure des correlations,\" Ann. Univ. Lyon, III. Ser., Sect. A, vol. 1, no. 30-47, p. 111, 1936.
\n", "\n", "
20- E. W. Fager, \"Determination and analysis of recurrent groups,\" Ecology, vol. 38, no. 4, pp. 586-595, 1957.
\n", "\n", "
21- E. W. Fager and J. A. McGowan, \"Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types,\" Science, vol. 140, no. 3566, pp. 453-460, 1963.
\n", "\n", "
22- D. P. Faith, \"Asymmetric binary similarity measures,\" Oecologia, vol. 57, pp. 287-290, 1983.
\n", "\n", "
23- J. L. Fleiss, B. Levin, and M. C. Paik, Statistical methods for rates and proportions. john wiley & sons, 2013.
\n", "\n", "
24- S. A. Forbes, On the local distribution of certain Illinois fishes: an essay in statistical ecology. Illinois State Laboratory of Natural History, 1907.
\n", "\n", "
25- A. Mozley, \"The statistical analysis of the distribution of pond molluscs in western Canada,\" The American Naturalist, vol. 70, no. 728, pp. 237-244, 1936.
\n", "\n", "
26- S. A. Forbes, \"Method of determining and measuring the associative relations of species,\" Science, vol. 61, no. 1585, pp. 518-524, 1925.
\n", "\n", "
27- E. G. Fossum and G. Kaskey, \"Optimization and standardization of information retrieval language and systems,\" SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV, 1966.
\n", "\n", "
28- N. Gilbert and T. C. Wells, \"Analysis of quadrat data,\" The Journal of Ecology, pp. 675-685, 1966.
\n", "\n", "
29- D. W. Goodall, \"The distribution of the matching coefficient,\" Biometrics, pp. 647-656, 1967.
\n", "\n", "
30- B. Austin and R. R. Colwell, \"Evaluation of some coefficients for use in numerical taxonomy of microorganisms,\" International Journal of Systematic and Evolutionary Microbiology, vol. 27, no. 3, pp. 204-210, 1977.
\n", "\n", "
31- L. A. Goodman, W. H. Kruskal, L. A. Goodman, and W. H. Kruskal, Measures of association for cross classifications. Springer, 1979.
\n", "\n", "
32- L. Guttman, \"An outline of the statistical theory of prediction,\" The prediction of personal adjustment, vol. 48, pp. 253-318, 1941.
\n", "\n", "
33- U. Hamann, \"Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen,\" Willdenowia, pp. 639-768, 1961.
\n", "\n", "
34- F. C. Harris and B. B. Lahey, \"A method for combining occurrence and nonoccurrence interobserver agreement scores,\" Journal of Applied Behavior Analysis, vol. 11, no. 4, pp. 523-527, 1978.
\n", "\n", "
35- R. P. Hawkins and V. A. Dotson, \"Reliability Scores That Delude: An Alice in Wonderland Trip Through the Misleading Characteristics of Inter-Observer Agreement Scores in Interval Recording,\" 1973.
\n", "\n", "
36- M. G. Kendall, \"A new measure of rank correlation,\" Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.
\n", "\n", "
37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" Handbook of behavioral assessment, pp. 279-328, 1977.
" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Distance/Similarity", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }