{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Please cite us if you use the software
\n", "\n", "\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Distance/Similarity" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "PyCM's `distance` method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's `distance` method, a distance measure can be chosen from `DistanceType`. The measures' names are chosen based on the namig style suggested in [[1]](#ref1)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from pycm import ConfusionMatrix, DistanceType" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$TP \\rightarrow True Positive$$\n", "$$TN \\rightarrow True Negative$$\n", "$$FP \\rightarrow False Positive$$\n", "$$FN \\rightarrow False Negative$$\n", "$$POP \\rightarrow Population$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AMPLE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "AMPLE similarity [[2]](#ref2) [[3]](#ref3)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$sim_{AMPLE}=|\\frac{TP}{TP+FP}-\\frac{FN}{FN+TN}|$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{0: 0.6, 1: 0.3, 2: 0.17142857142857143}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cm.distance(metric=DistanceType.AMPLE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1- C. C. Little, \"Abydos Documentation,\" 2018.\n", "\n", "
2- V. Dallmeier, C. Lindig, and A. Zeller, \"Lightweight defect localization for Java,\" in European conference on object-oriented programming, 2005: Springer, pp. 528-550.\n", "\n", "
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, \"An evaluation of similarity coefficients for software fault localization,\" in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.\n", "\n", "
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.\n", "\n", "
5- A. M. Andrés and P. F. Marzo, \"Delta: A new measure of agreement between two raters,\" British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.\n", "\n", "
6- C. Baroni-Urbani and M. W. Buser, \"Similarity of binary data,\" Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.\n", "\n", "
7- V. Batagelj and M. Bren, \"Comparing resemblance measures,\" Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.\n", "\n", "
8- F. B. Baulieu, \"A classification of presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.\n", "\n", "
9- F. B. Baulieu, \"Two variant axiom systems for presence/absence based dissimilarity coefficients,\" Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.\n", "\n", "
10- R. Benini, Principii di demografia. Barbera, 1901.\n", "\n", "
11- G. N. Lance and W. T. Williams, \"Computer programs for hierarchical polythetic classification (“similarity analyses”),\" The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.\n", "\n", "
12- G. N. Lance and W. T. Williams, \"Mixed-Data Classificatory Programs I - Agglomerative Systems,\" Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.\n", "\n", "
13- P. W. Clement, \"A formula for computing inter-observer agreement,\" Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.\n", "\n", "
14- V. Consonni and R. Todeschini, \"New similarity coefficients for binary data,\" Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.\n", "\n", "
15- S. F. Dennis, \"The Construction of a Thesaurus Automatically From,\" in Statistical Association Methods for Mechanized Documentation: Symposium Proceedings, 1965, vol. 269: US Government Printing Office, p. 61.\n", "\n", "
16- P. G. Digby, \"Approximating the tetrachoric correlation coefficient,\" Biometrics, pp. 753-757, 1983.\n", "\n", "
17- IBM Corp, \"IBM SPSS Statistics Algorithms,\" ed: IBM Corp Armonk, NY, USA, 2017.\n", "\n", "
18- M. H. Doolittle, \"The verification of predictions,\" Bulletin of the Philosophical Society of Washington, vol. 7, pp. 122-127, 1885.\n", "\n", "
19- H. Eyraud, \"Les principes de la mesure des correlations,\" Ann. Univ. Lyon, III. Ser., Sect. A, vol. 1, no. 30-47, p. 111, 1936.\n", "\n", "
20- E. W. Fager, \"Determination and analysis of recurrent groups,\" Ecology, vol. 38, no. 4, pp. 586-595, 1957.\n", "\n", "
21- E. W. Fager and J. A. McGowan, \"Zooplankton Species Groups in the North Pacific: Co-occurrences of species can be used to derive groups whose members react similarly to water-mass types,\" Science, vol. 140, no. 3566, pp. 453-460, 1963.\n", "\n", "
22- D. P. Faith, \"Asymmetric binary similarity measures,\" Oecologia, vol. 57, pp. 287-290, 1983.\n", "\n", "
23- J. L. Fleiss, B. Levin, and M. C. Paik, Statistical methods for rates and proportions. john wiley & sons, 2013.\n", "\n", "
24- S. A. Forbes, On the local distribution of certain Illinois fishes: an essay in statistical ecology. Illinois State Laboratory of Natural History, 1907.\n", "\n", "
25- A. Mozley, \"The statistical analysis of the distribution of pond molluscs in western Canada,\" The American Naturalist, vol. 70, no. 728, pp. 237-244, 1936.\n", "\n", "
26- S. A. Forbes, \"Method of determining and measuring the associative relations of species,\" Science, vol. 61, no. 1585, pp. 518-524, 1925.\n", "\n", "
27- E. G. Fossum and G. Kaskey, \"Optimization and standardization of information retrieval language and systems,\" SPERRY RAND CORP PHILADELPHIA PA UNIVAC DIV, 1966.\n", "\n", "
28- N. Gilbert and T. C. Wells, \"Analysis of quadrat data,\" The Journal of Ecology, pp. 675-685, 1966.\n", "\n", "
29- D. W. Goodall, \"The distribution of the matching coefficient,\" Biometrics, pp. 647-656, 1967.\n", "\n", "
30- B. Austin and R. R. Colwell, \"Evaluation of some coefficients for use in numerical taxonomy of microorganisms,\" International Journal of Systematic and Evolutionary Microbiology, vol. 27, no. 3, pp. 204-210, 1977.\n", "\n", "
31- L. A. Goodman, W. H. Kruskal, L. A. Goodman, and W. H. Kruskal, Measures of association for cross classifications. Springer, 1979.\n", "\n", "
32- L. Guttman, \"An outline of the statistical theory of prediction,\" The prediction of personal adjustment, vol. 48, pp. 253-318, 1941.\n", "\n", "
33- U. Hamann, \"Merkmalsbestand und verwandtschaftsbeziehungen der farinosae: ein beitrag zum system der monokotyledonen,\" Willdenowia, pp. 639-768, 1961.\n", "\n", "
34- F. C. Harris and B. B. Lahey, \"A method for combining occurrence and nonoccurrence interobserver agreement scores,\" Journal of Applied Behavior Analysis, vol. 11, no. 4, pp. 523-527, 1978.\n", "\n", "
35- R. P. Hawkins and V. A. Dotson, \"Reliability Scores That Delude: An Alice in Wonderland Trip Through the Misleading Characteristics of Inter-Observer Agreement Scores in Interval Recording,\" 1973.\n", "\n", "
36- M. G. Kendall, \"A new measure of rank correlation,\" Biometrika, vol. 30, no. 1/2, pp. 81-93, 1938.\n", "\n", "
37- R. N. Kent and S. L. Foster, \"Direct observational procedures: Methodological issues in naturalistic settings,\" Handbook of behavioral assessment, pp. 279-328, 1977." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Distance/Similarity", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }