{ "metadata": { "name": "", "signature": "sha256:44d2aa8ef5d85aec0081655c5f05c38e8c1b6ee3879de1a46ad0476ef3f11c98" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "[Sebastian Raschka](http://www.sebastianraschka.com) \n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%load_ext watermark" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "%watermark -v -d -u" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Last updated: 18/07/2014 \n", "\n", "CPython 3.4.1\n", "IPython 2.1.0\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This IPython notebook can be found in the GitHub repository [rasbt/algorithms_in_ipython_notebooks](https://github.com/rasbt/algorithms_in_ipython_notebooks)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "I would be happy to hear your comments and suggestions. \n", "Please feel free to drop me a note via\n", "[twitter](https://twitter.com/rasbt), [email](mailto:bluewoodtree@gmail.com), or [google+](https://plus.google.com/118404394130788869227).\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Dixon's Q test to identify outliers for small sample sizes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dixon's Q test 1 is a convenient way to quickly identify outliers in datasets that only contains a small number of observations: typically $3 \\leq n \\leq 10$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "1 R. B. Dean and W. J. Dixon (1951) [\"Simplified Statistics for Small Numbers of Observations\"](http://pubs.acs.org/doi/abs/10.1021/ac60052a025). Anal. Chem., 1951, 23 (4), 636\u2013638\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Sections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Application](#Application)\n", "\n", "- [Application](#Application)\n", "\n", "- [Criticism](#Criticism)\n", "\n", "- [Method](#Method)\n", "\n", " - [1) Arange values for observations in ascending order](#step1)\n", "\n", " - [2) Calculate Q](#step2)\n", "\n", " - [3) Compare calculated $Q_{exp}$ to the tabulated critical Q-value $Q_{crit}$](#step3)\n", " \n", " - [4) Example](#step4)\n", "\n", "- [Implementation in Python](#Implementation)\n", "\n", "- [Plotting the Data](#Plotting-the-Data)\n", "\n", " - [Bar plot of the sample means with standard deviation](#Bar-plot-of-the-sample-means-with-standard-deviation)\n", " \n", " - [Boxplot](#Boxplot)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Application" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although (at least in my opinion), the removal of outliers is a very questionable practice, this test is quite popular in the field of chemistry to \"objectively\" detect and reject outliers that are due to systematic errors by the experimentalist.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Criticism" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to use this test to legitimately remove (potential) outliers from a dataset, we should keep in mind that \n", "\n", "- our data has to be normal distributed,\n", "\n", "- and that we are not supposed to use this test more than once the same data set.\n", "\n", "In my opinion, the Dixon Q-test should only be used with great caution, since this simple statistic is based on the assumption that the data is normal distributed, which can be quite challenging to predict for small sample sizes (if no prior/additional information is provided). \n", "Personally, I would use the Dixon Q-test to only **detect** outliers and not to **remove** those, which can help with the identification of uncertainties in the data set or problems in experimental procedures.\n", "Intuitively, this is quite similar to an approach of identifying samples that have a large standard deviation.\n", "\n", "For example, if I tested ~1000 chemical compounds in some sort of activity assay - each compound 5 times, I would mark compounds that contain Q-test outliers for re-testing, because there might have been some problem in the measurement procedure that could have caused this inconsistency." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "1) Arange values for observations in ascending order" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, we arange the data for our sample in ascending order (from the lowest to the highest value):\n", "\n", "$x_1 < x_2 < . . . < x_N$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "2) Calculate Q" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we calculate the experimental Q-value ($Q_{exp}$).\n", "\n", "Note that in a later paper in 1953, Dixon and Dean 3 revisted the calculation of the Q-value and reported different equations for different scenarios:\n", "\n", "![](https://raw.githubusercontent.com/rasbt/algorithms_in_ipython_notebooks/master/images/dixon_q_equations.png)\n", "\n", "(from: Rorabacher, David B., 1991 2)\n", "\n", "- $r_{10} \\;\\; for \\;\\; 3 \\geq n \\leq 7$\n", "- $r_{11} \\;\\; for \\;\\; 8 \\geq n \\leq 10$\n", "- $r_{21} \\;\\; for \\;\\; 11 \\geq n \\leq 13$\n", "- $r_{22} \\;\\; for \\;\\; n \\leq 14$\n", "\n", "However, according to a statement/observation in a more recent paper (Rorabacher, David B., 1991 2): \"The *rl0* ratio is commonly designated as 'Q' and is generally considered to be the most convenient, legitimate, statistical test available for the rejection of deviant values from a small sample conforming to a Gaussian distribution. (It is equally well suited to larger data sets if only one outlier is present.)\"\n", "\n", "Therefore, I will use *rl0* for the following implementation of the Dixon Q-test:\n", "\n", "\\begin{equation} Q_{exp} = \\frac{x_{2} - x_{1}}{x_{n} - x_{1}} \\end{equation}\n", "\n", "where it is assumed that the data is arranged in ascending order: $x_1 < x_2 < . . . < x_N$\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "2 Rorabacher, David B. (1991) \u201c[Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon\u2019s\u2018 Q\u2019 Parameter and Related Subrange Ratios at the 95% Confidence Level.](http://pubs.acs.org/doi/pdf/10.1021/ac00002a010)\u201d Analytical Chemistry 63, no. 2 (1991): 139\u201346.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3 W.J. Dixon: [\"Processing data for outliers Reference\"](http://webspace.ship.edu/pgmarr/Geo441/Readings/Dixon%201953%20-%20Processing%20Data%20for%20Outliers.pdf): J. Biometrics 9 (1953) 74-89" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "3) Compare calculated $Q_{exp}$ to the tabulated critical Q-value $Q_{crit}$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "In the next step, we will compare the calculated $Q_{exp}$ value to the to the tabulated critical Q-value $Q_{crit}$ for a chosen confidence interval. \n", "If the calculated Q-value for a particular observation is larger than the critical Q-value ($Q_{exp}$ > $Q_{crit}$), this observation is considered to be an outlier according to the Q-test.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**r10 Critical values for Dixon's two-tailored Q-Test for 3 different confidence levels**\n", "\n", "| N | Q90% | Q95% | Q99% |\n", "|----|-------|-------|-------|\n", "| 3 | 0.941 | 0.97 | 0.994 |\n", "| 4 | 0.765 | 0.829 | 0.926 |\n", "| 5 | 0.642 | 0.71 | 0.821 |\n", "| 6 | 0.56 | 0.625 | 0.74 |\n", "| 7 | 0.507 | 0.568 | 0.68 |\n", "| 8 | 0.468 | 0.526 | 0.634 |\n", "| 9 | 0.437 | 0.493 | 0.598 |\n", "| 10 | 0.412 | 0.466 | 0.568 |\n", "| 11 | 0.392 | 0.444 | 0.542 |\n", "| 12 | 0.376 | 0.426 | 0.522 |\n", "| 13 | 0.361 | 0.41 | 0.503 |\n", "| 14 | 0.349 | 0.396 | 0.488 |\n", "| 15 | 0.338 | 0.384 | 0.475 |\n", "| 16 | 0.329 | 0.374 | 0.463 |\n", "| 17 | 0.32 | 0.365 | 0.452 |\n", "| 18 | 0.313 | 0.356 | 0.442 |\n", "| 19 | 0.306 | 0.349 | 0.433 |\n", "| 20 | 0.3 | 0.342 | 0.425 |\n", "| 21 | 0.295 | 0.337 | 0.418 |\n", "| 22 | 0.29 | 0.331 | 0.411 |\n", "| 23 | 0.285 | 0.326 | 0.404 |\n", "| 24 | 0.281 | 0.321 | 0.399 |\n", "| 25 | 0.277 | 0.317 | 0.393 |\n", "| 26 | 0.273 | 0.312 | 0.388 |\n", "| 27 | 0.269 | 0.308 | 0.384 |\n", "| 28 | 0.266 | 0.305 | 0.38 |\n", "| 29 | 0.263 | 0.301 | 0.376 |\n", "| 30 | 0.26 | 0.29 | 0.372 |\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "4) Example" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's consider the following sample consisting of 5 observations:\n", "\n", "0.142, 0.153, 0.135, 0.002, 0.175\n", "\n", "- First, we sort it in ascending order: \n", "0.002, 0.135, 0.142, 0.153, 0.175\n", "\n", "- Next, we calculate the Q-value:\n", "\n", "\\begin{equation} Q = \\frac{0.135 - 0.002}{0.175-0.002} \\approx 0.7687 \\end{equation}\n", "\n", "- Now, we look up the critical value for n=5 for a confidence level 95% in the Q-table $\\Rightarrow 0.71$ \n", "and we conclude that 0.002 (since 0.7687 > 0.71), that the observation 0.002 is an outlier at a confidence level of 95% according to Dixon's Q-test." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Implementation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Building dictionaries for Q-value look-up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will build a simple set of dictionaries for different confidence intervals from the tabulated data in David B. Rorabacher's paper: \n", "Rorabacher, David B. (1991) \u201c[Statistical Treatment for Rejection of Deviant Values: Critical Values of Dixon\u2019s\u2018 Q\u2019 Parameter and Related Subrange Ratios at the 95% Confidence Level.](http://pubs.acs.org/doi/pdf/10.1021/ac00002a010)\u201d Analytical Chemistry 63, no. 2 (1991): 139\u201346. \n", "\n", "which we will use to look up the critical Q-values (dictionary values) for different sample sizes (dictionary keys)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "q90 = [0.941, 0.765, 0.642, 0.56, 0.507, 0.468, 0.437, \n", " 0.412, 0.392, 0.376, 0.361, 0.349, 0.338, 0.329, \n", " 0.32, 0.313, 0.306, 0.3, 0.295, 0.29, 0.285, 0.281, \n", " 0.277, 0.273, 0.269, 0.266, 0.263, 0.26\n", " ]\n", "\n", "q95 = [0.97, 0.829, 0.71, 0.625, 0.568, 0.526, 0.493, 0.466, \n", " 0.444, 0.426, 0.41, 0.396, 0.384, 0.374, 0.365, 0.356, \n", " 0.349, 0.342, 0.337, 0.331, 0.326, 0.321, 0.317, 0.312, \n", " 0.308, 0.305, 0.301, 0.29\n", " ]\n", "\n", "q99 = [0.994, 0.926, 0.821, 0.74, 0.68, 0.634, 0.598, 0.568, \n", " 0.542, 0.522, 0.503, 0.488, 0.475, 0.463, 0.452, 0.442, \n", " 0.433, 0.425, 0.418, 0.411, 0.404, 0.399, 0.393, 0.388, \n", " 0.384, 0.38, 0.376, 0.372\n", " ]\n", "\n", "Q90 = {n:q for n,q in zip(range(3,len(q90)+1), q90)}\n", "Q95 = {n:q for n,q in zip(range(3,len(q95)+1), q95)}\n", "Q99 = {n:q for n,q in zip(range(3,len(q99)+1), q99)}" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Implementing a Dixon Q-test Function" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, I wrote some simple Python code to test one data row for Dixon Q-test outliers:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def dixon_test(data, left=True, right=True, q_dict=Q95):\n", " \"\"\"\n", " Keyword arguments:\n", " data = A ordered or unordered list of data points (int or float).\n", " left = Q-test of minimum value in the ordered list if True.\n", " right = Q-test of maximum value in the ordered list if True.\n", " q_dict = A dictionary of Q-values for a given confidence level,\n", " where the dict. keys are sample sizes N, and the associated values\n", " are the corresponding critical Q values. E.g.,\n", " {3: 0.97, 4: 0.829, 5: 0.71, 6: 0.625, ...}\n", " \n", " Returns a list of 2 values for the outliers, or None.\n", " E.g.,\n", " for [1,1,1] -> [None, None]\n", " for [5,1,1] -> [None, 5]\n", " for [5,1,5] -> [1, None]\n", " \n", " \"\"\"\n", " assert(left or right), 'At least one of the variables, `left` or `right`, must be True.'\n", " assert(len(data) >= 3), 'At least 3 data points are required'\n", " assert(len(data) <= max(q_dict.keys())), 'Sample size too large'\n", " \n", " sdata = sorted(data)\n", " Q_mindiff, Q_maxdiff = (0,0), (0,0)\n", " \n", " if left:\n", " Q_min = (sdata[1] - sdata[0]) \n", " try:\n", " Q_min /= (sdata[-1] - sdata[0])\n", " except ZeroDivisionError:\n", " pass\n", " Q_mindiff = (Q_min - q_dict[len(data)], sdata[0])\n", " \n", " if right:\n", " Q_max = abs((sdata[-2] - sdata[-1]))\n", " try:\n", " Q_max /= abs((sdata[0] - sdata[-1]))\n", " except ZeroDivisionError:\n", " pass\n", " Q_maxdiff = (Q_max - q_dict[len(data)], sdata[-1])\n", "\n", " if not Q_mindiff[0] > 0 and not Q_maxdiff[0] > 0:\n", " outliers = [None, None]\n", " \n", " elif Q_mindiff[0] == Q_maxdiff[0]:\n", " outliers = [Q_mindiff[1], Q_maxdiff[1]]\n", " \n", " elif Q_mindiff[0] > Q_maxdiff[0]:\n", " outliers = [Q_mindiff[1], None]\n", " \n", " else:\n", " outliers = [None, Q_maxdiff[1]]\n", " \n", " return outliers" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 3, "metadata": {}, "source": [ "Assertion Tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some simple assertion tests to make sure that the Dixon Q-test function behaves as expected/desired." ] }, { "cell_type": "code", "collapsed": false, "input": [ "test_data1 = [0.142, 0.153, 0.135, 0.002, 0.175]\n", "test_data2 = [0.542, 0.153, 0.135, 0.002, 0.175]\n", "\n", "assert(dixon_test(test_data1) == [0.002, None]), 'expect [0.002, None]'\n", "assert(dixon_test(test_data1, right=False) == [0.002, None]), 'expect [0.002, None]'\n", "assert(dixon_test(test_data2) == [None, None]), 'expect [None, None]'\n", "assert(dixon_test(test_data2, q_dict=Q90) == [None, 0.542]), 'expect [None, 0.542]'\n", "\n", "print('ok')" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "ok\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Example application" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below, I want to go through a naive example for our Dixon Q-test function using an example CSV file. \n", "In \"real\" application I would prefer `NumPy` and/or `pandas`, however for this simple case the in-built Python `csv` library should suffice." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below the example CSV file is shown that we are going to read in:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%%writefile ../../data/dixon_test_in.csv\n", ",x1,x2,x3,x4,x5\n", "id1,0.95,-0.65,0.6,0.82,NaN\n", "id2,2.08,NaN,-1.43,0.38,NaN\n", "id3,-0.46,NaN,-1.25,-2.62,0.22\n", "id4,0.24,1.88,-0.49,-0.73,-0.49\n", "id5,-1.65,2.1,-0.09,NaN,0.8\n", "id6,-0.44,0.93,0.19,-4.36,-0.88\n", "id7,0.36,-0.47,NaN,0.4,2.12\n", "id8,1.29,-0.48,-0.6,-0.38,0.27\n", "id9,-1.25,-1.35,1.13,1.7,-0.81\n", "id10,0.04,1.98,NaN,NaN,NaN" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Overwriting ../../data/dixon_test_in.csv\n" ] } ], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "import csv\n", "\n", "def csv_to_list(csv_file, delimiter=','):\n", " \"\"\" \n", " Reads in a CSV file and returns the contents as list,\n", " where every row is stored as a sublist, and each element\n", " in the sublist represents 1 cell in the table.\n", " \n", " \"\"\"\n", " with open(csv_file, 'r') as csv_con:\n", " reader = csv.reader(csv_con, delimiter=delimiter)\n", " return list(reader)\n", " \n", " \n", "def print_csv(csv_content):\n", " \"\"\" Prints CSV file to standard output.\"\"\"\n", " print(50*'-')\n", " for row in csv_content:\n", " row = [str(e) for e in row]\n", " print('\\t'.join(row))\n", " print(50*'-')\n", "\n", " \n", "def convert_cells_to_floats(csv_cont):\n", " \"\"\" \n", " Converts cells to floats if possible\n", " (modifies input CSV content list).\n", " \n", " \"\"\"\n", " for row in range(len(csv_cont)):\n", " for cell in range(len(csv_cont[row])):\n", " try:\n", " csv_cont[row][cell] = float(csv_cont[row][cell])\n", " except ValueError:\n", " pass " ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "csv_cont = csv_to_list('../../data/dixon_test_in.csv')\n", "convert_cells_to_floats(csv_cont)\n", "print_csv(csv_cont)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "--------------------------------------------------\n", "\tx1\tx2\tx3\tx4\tx5\n", "id1\t0.95\t-0.65\t0.6\t0.82\tnan\n", "id2\t2.08\tnan\t-1.43\t0.38\tnan\n", "id3\t-0.46\tnan\t-1.25\t-2.62\t0.22\n", "id4\t0.24\t1.88\t-0.49\t-0.73\t-0.49\n", "id5\t-1.65\t2.1\t-0.09\tnan\t0.8\n", "id6\t-0.44\t0.93\t0.19\t-4.36\t-0.88\n", "id7\t0.36\t-0.47\tnan\t0.4\t2.12\n", "id8\t1.29\t-0.48\t-0.6\t-0.38\t0.27\n", "id9\t-1.25\t-1.35\t1.13\t1.7\t-0.81\n", "id10\t0.04\t1.98\tnan\tnan\tnan\n", "--------------------------------------------------\n" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let us add a new `outlier` column and apply the Dixon Q-test function to our data set." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import math\n", "\n", "csv_cont[0].append('outlier')\n", "\n", "for row in csv_cont[1:]: # skips header\n", " nan_removed = [i for i in row[1:] if not math.isnan(i)]\n", " if len(nan_removed) >= 3:\n", " row.append(dixon_test(nan_removed, left=True, right=True, q_dict=Q90))\n", " else:\n", " row.append('NaN')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "print_csv(csv_cont)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "--------------------------------------------------\n", "\tx1\tx2\tx3\tx4\tx5\toutlier\n", "id1\t0.95\t-0.65\t0.6\t0.82\tnan\t[-0.65, None]\n", "id2\t2.08\tnan\t-1.43\t0.38\tnan\t[None, None]\n", "id3\t-0.46\tnan\t-1.25\t-2.62\t0.22\t[None, None]\n", "id4\t0.24\t1.88\t-0.49\t-0.73\t-0.49\t[None, None]\n", "id5\t-1.65\t2.1\t-0.09\tnan\t0.8\t[None, None]\n", "id6\t-0.44\t0.93\t0.19\t-4.36\t-0.88\t[-4.36, None]\n", "id7\t0.36\t-0.47\tnan\t0.4\t2.12\t[None, None]\n", "id8\t1.29\t-0.48\t-0.6\t-0.38\t0.27\t[None, None]\n", "id9\t-1.25\t-1.35\t1.13\t1.7\t-0.81\t[None, None]\n", "id10\t0.04\t1.98\tnan\tnan\tnan\tNaN\n", "--------------------------------------------------\n" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we can see in the table above, we have 2 potential outliers in our data set." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we let us write the results to a new CSV file for future reference:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "def write_csv(dest, csv_cont):\n", " \"\"\" Writes a comma-delimited CSV file. \"\"\"\n", "\n", " with open(dest, 'w') as out_file:\n", " writer = csv.writer(out_file, delimiter=',')\n", " for row in csv_cont:\n", " writer.writerow(row)\n", " \n", "write_csv('../../data/dixon_test_out.csv', csv_cont)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Plotting the Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To get a visual impression of how our data looks like, let us make some simple plots." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Bar plot of the sample means with standard deviation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First, let's create a bar plot with standard deviation, since such bar plots with standard deviation or standard error bars are probably the most common plots in any field. Although it is not always approapriate it is certainly the kind of data visualization we are most familiar with - due to the frequent exposure when reading scientific research articles." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "from matplotlib import pyplot as plt\n", "\n", "all_means = [np.nanmean(row[1:6]) for row in csv_cont[1:]]\n", "all_stddevs = [np.nanstd(row[1:6]) for row in csv_cont[1:]]\n", "\n", "fig = plt.figure(figsize=(8,6))\n", "\n", "y_pos = np.arange(len(csv_cont[1:]))\n", "y_pos = [x for x in y_pos]\n", "plt.yticks(y_pos, [row[0] for row in csv_cont[1:]], fontsize=10)\n", "plt.xlabel('measurement x')\n", "t = plt.title('Bar plot with standard deviation')\n", "\n", "plt.grid()\n", "plt.barh(y_pos, all_means, xerr=all_stddevs, align='center', alpha=0.4, color='g')\n", "\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAecAAAGJCAYAAACw8/t+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3X2YXVV96PHvVJSLksnQ6vAmZlolEHtLhkq1JZKM5NFi\nqvRS8LEvCKNFwm2J6bUW2qZeh9pQpX2UePNYTFWGyotVJFR7tffG0GMoVAWbgBKDoDCIBAYVklCh\n9srcP9Y+MzuTOXMm55y195qzv5/nOc85a589Z6+zzpr923v/9t4LJEmSJEmSJEmSJEmSJEmSJEmS\nJElSFxsF3lvwMv8E+NtZ3h8Gbi2mKm0ZprP1HAE+cRDzPwv8XJvLfAmwD+hp8e+b/ZZSy36q7Aqo\n0h4EfkRYQf4Q+EfgxQUufyJ7zEUnggHAXwJvz14PZJ8b4/9whIMLdmWb6+/QSQ8BC+a47CHgu9Om\n5X9LqaMMzirTBPAGwgryaOAx4H+1+FmHtPh3B7PX1OoeVlmfm6rnlF0BKXUGZ6XiP4DPAC/PTfs1\nYDuwh7CX857cewOEvc63AWPAF2f4zCHgYcLhx8eBB4DfnqUObwfuA34A/ANhgwFgW/Z8F2Ev/00z\n/O0Y8IvZ69/J6rYkK/8usDl7PcLUHm39c58E9gK/zNRe3F8RjiZ8BzhjljpfSviOe4FdwOnZ/H8C\nvDmr7/Zs3rcCO7N5vw1cmPucoexz3knYSHqEcOi67meAzxJ+i68AL51Wjw2E32gPcCfw6tx7I8CN\n2ffeA5wP/Czwpawu/xd44SzfEeCPsjo9TPjN8w4F/prwGzwK/A3wX7L3vknoR3WHEPrCIAceuWjU\nPi8AvgAcQ2jPvYS+McL+RyfOBO4BngD+GTgx996DwB8S+tCTwCezektSch4AVmavnw9cQ8gD160A\nfj57/QuEFe+vZ+UBwop1FDiMmVd0Q8B/ElbczwWWA08Bx2fvX81Uzvl0plbazwM+RAgedc0Oa19D\nCGwAmwhB/qKs/HfA2uz1CFMr9EUceFh7GPgxIaD3ZJ/xvQbLPIEQEI/Kyi/J1fE92XLzVhGCIoS2\n+Hfg5Kw8RGirEcKe7euz9xdm738yexxG+E0eZmrjAsIGyRHZd3knsJvQjvXv/GNC8IIQOP+Vqd/l\nNELAm17fujMIv/3LCf3kevb/PT4I3Az0AYcTNiIuz957N3Bt7rN+jRBA4cDgPFv7rODAw9rvYeq3\nXEzoWysJ7fdHhD5QP6LzAPBlwm91BGEjYHWD7ytJpXqQsCfyBGHl/TDwX2eZ/0rgA9nrAcKKdWCW\n+YcIAeew3LS/B/4se3018OfZ648B78vN94KsTi/Jys2C89sIe9sQVrxvA27Iyg8Sgj7sH5zr32F6\ncL4vV35+Nk//DMt8GWEvdyUhyOXll9PIZuAd2eshQv4/X5fHgFcSgs2PCQGobj2znxD2Q8IGVb0u\ntdx7L+HA3+W6Wer7caaCLYSNq/rv0UMIivnf5lcIRxwgtNFepvakr2Pq9x9g9pz/9PaZHpxHcnV+\nN2Hjpa6H0J+XZ+XpR23eT9jDl2bkYW2VaYKwJ3wEYc93DWFv9cjs/VcRDg+OEw4FriYcXs2bvsKc\n7gng6Vx5jKnD1XlHZ+/V/Tvh8Paxzb5EZhthD/AoQjD7NLCMsHe8ENgxx8+BsJdY96Ps+fAZ5rsf\n+ANCkHiMsDEw03erez1h7+0HhHZZxf7t+QNCsMov+3DgRYQ9wHxbPzTts99F2Ch5Mvvshex/qPrh\n3OtjmPl3aeToWZb9IsIGzNeyz3yCcAi6vuz7CYe2z8zmeyNhz3smzdpnNsdMq9dEVud8/8n/rk8z\n828qAQZnpWOCsKfyE0JQg7ASvZlwBncfcBUH9tlmZ9oeQVgp1y0i5C6ne4T998JfQFgxNzqkPN39\nhGBW38DYR1gZX8j+e5gTDV636gbCRsGi7PPe3+CzDyXk9K8g7IUfAXyeuZ2M9jjw/5g6isC016cR\nDuO+ifA7HUHILec/O1+f3cz8uzRqj92zLPv7hED38uwzj8jq0Jub5wbgtwgbgjuZ2qvOa9Y+zX6r\n72Xfoa4HOI7G/aeMs9M1jxicVbae3HN9L/qb2bTDmTrk/UrCYcFWVmqXMZXb/DXCXm19mfXl30A4\nIWgpYUV9OWEvqr439BgHngQ13ZeAi5nKVdemlevLrHucsKfa7HMbWUzIlR9KOKHuGcLGDYQNg4Hc\n8p6XPb6fLfP1wOvmuJyfADcR9tAPIwTC85n6LRYQgvf3s2X8T/YPjtONEU4aq/8uryactd/IpwiH\n+5cQAnr+xMBnCdcaX0nYi4awt5r/bp8EfpWQv7+uwTKatc9jhI21Rt/r04S+dXr2nf6Q8Hvc3mD+\nqp2hr4NkcFbZPkfYy9xDODnrPKaC8+8RcsJ7CTm9v5/2t3MJ1I8SAvwjhPzgauBbub+vf8bWbBmf\nyeb9WeA3c58zQjjp6wngnAbL+hJhg2Jbg/L0Zf6IkLu9jZCjfRUzX3vd6HseSrjW9nHC3uULCWdp\nw9QGyA8IgXAfIX/6qWxZv8VUjrzZciBsZBxOaM+PZ4+6f8oe3yLk15/mwEO80z/7twnf94eEYH7N\nLMv+J0LwvSVbxtZpn3cp4cjFlwn9aAv758cfJQTJX6FxH2rWPrsIG3Dfyd4/etr3uhc4l3Ap4OOE\nQP1GwkbLTA7mGntJ6ipDNM9JS1Jy3HOWJCkxBmd1Ow8dSpIkSZIkSV2l1NP5ly5dOnHXXXeVWQVJ\nkop2F1N3DZxR2dfaTUxMmBKMbWRkhJGRkbKr0dVs4/hOe+1pvOV9bym7Gl3vE3/8CW7dMh+GFJ+/\nenp6oEn89YSwCnjwwQfLrkLXs43j2/PDPWVXoRJs5zQYnCVJSozBuQKGh4fLrkLXs43jO+mUk8qu\nQiXYzmkwOFfA0NBQ2VXoerZxfIteuqj5TGqb7ZwGg3MF1Gq1sqvQ9Wzj+Ma+PduokuoU2zkNBmdJ\nkhJjcK4AD7nGZxvH5+HWYtjOaWgWnG9rMH0UODt7fTFhuLZngZ+eNt+HgPsIF1yf3FoVJUmqlmbB\neVmD6fmxSP8FWEkYQD1vFfAy4HjgQuBvWqyj2mQ+ND7bOD5zocWwndPQLDg/lT33ABsJA45vAfqZ\nurvJDg4MzABnMjWA+leAPuDIdiorSXX33nl02VWQDsrBbMM3C871veOzgMXAEuA84FSaD8V3LPsP\ndP8w8OK5V02dYj40Pts4vum50G997ZiSatLdzDnH08ngXLccuJ4QkHcDt8zx76bfO9QbaUuS1MQh\nc5xvgoMfJON7wHG58ouzafsZHh5mYGAAgL6+PgYHByf3Qup5PMvtlevTUqlPN5ant3XZ9enG8ldv\n/SrPLHyGE045AYDttQf4x79dDIT3oZY9W26vvJjLL02pPt1QrgGjHHkkwABz0Szg7gMWEA5rryac\n5HUkcA9wAXBTbt4HgFOAH2TlVYQzuVcBvwxcmT3nOSpVAWq12uQKTnHYxvGdu/pcll+4fLL8uY+8\ngjeu/lqJNepO2zZt49qPXFt2NbrSyEh4dGJUqnrk3Ey4JGon4SSv23PzvIOQWz4WuBvYlE3/PPAd\nwmVWHwF+b65fQJ1l0IjPNo7PXGgxbOc0NDus3Zt7vabBPB/KHjO5+KBrJElzsPgVj5RdBemgHMw2\nvHcIq4B8PlRx2MbxTb/+9oRTdpdUk+7mdc7xGJwlSZrHDM4VYD40Pts4PnOhxbCd02BwliQpMQbn\nCjAfGp9tHJ+50GLYzmkwOEuSlBiDcwWYD43PNo7PXGgxbOc0GJwlSUqMwbkCzIfGZxvHZy60GLZz\nGuY68IUklarvBX2MbTVwxNb3gr6yqyAOfqSpTnPgC0lSpXRi4AtJklQwg3MFmA+NzzaOzzYuhu2c\nBoOzJEmJMecsSVKBzDlLkjQPeSlVBdRqNe9gFZlt3LoNV21gfO940/n2PLqHjR/YWECNqs2+nAaD\ns6RSje8dZ9HK5reM3LZpWwG1kdLgYe0KcCs4Pts4Pu/5XAz7choMzpIkJcbgXAFetxifbRyf93wu\nhn05DQZnSZISY3CuAHNI8dnG8ZlzLoZ9OQ0GZ0mSEmNwrgBzSPHZxvGZcy6GfTkN7QTn2xpMHwXO\nzl6fDnwN+Ho2/TltLE+SpEpoJzgvazB9Inv8FCEgvxn4BWAMOL+N5alF5pDis43jM+dcDPtyGtoJ\nzk9lzz3ARmAXsAXoz6b/DPBj4P6s/EWm9qiltnn0TUqH/4+d1U5wrg8ndRawGFgCnAecmk1/nHB7\n0Fdk5XOA49pYnlrUrTmklL5Wt7ZxSsw5F6PVvuy/QGd14oSw5cD1hGC9G7gl995vAh8EvgLsBX7S\ngeVJktTVOjHwxQSNx6X8MiF4A7wOOH76DMPDwwwMDADQ19fH4ODgZM6jvgVn2fJM5ZtvrnHZZQCh\nDLXsuYzyUMnLn8/l9XOcfzHXbaolUF/LM5WXLq0xNJTO+iGlcq1WY3R0FGAy3jUz62DPTewDFhAO\na68GVgFHAvcAFwA3EfLP48ChwP8G/oKpXxRgYmJiAqkVIyPhoflt3RXr5jQq1djWMdZfsr7pfCqH\n/49z19PTA03ibydyzpuB+4CdwDXA7bl53pVNvwv4LPsHZhXEfGh8tnF85pyLYV9OQzuHtXtzr9c0\nmOeS7CF1nFd8SOnw/7GzvENYBXTrdYspfa1ubeOUeJ1zMVrty/4LdJbBWZKkxBicK8AcUny2cXzm\nnIthX06DwVmSpMQYnCvAfGh8tnF85pyLYV9Og8FZkqTEGJwrwBxSfLZxfOaci2FfToPBWZKkxBic\nK8AcUny2cXzmnIthX05DJwa+kKSW9ff2M7a1+SHr/t7+pvNI3aKdgS86wYEvClCr1dwajsw2js82\nLobtHF/sgS8kSVIE7jlLklQg95wlSZqHDM4V4HWL8dnG8dnGxbCd02BwliQpMeacJc0LG67awPje\ncfp7+1l70dqyqyO1zJyzpK4xvnecRSsXMb53vOyqSNEZnCvAHFJ8tnF83lu7GPblNBicJUlKjMG5\nArzbT3y2cXzeW7sY9uU0GJwlSUqMwbkCzCHFZxvHZ865GPblNBicJUlKjMG5AswhxWcbx2fOuRj2\n5TS0E5xvazB9FDg7e70K2AFsB24FXtrG8iRJqoR2gvOyBtMnsgfAh4E3AycD1wN/1sby1CJzSPHZ\nxvGZcy6GfTkN7QTnp7LnHmAjsAvYAvTn5tkNLMxe9wHfa2N5kjKuP6vJ37062gnO9b3js4DFwBLg\nPODU3DwXA18AvgucC7y/jeWpReaQ4iu6jau4kjbnXMzv7voiDZ04IWw54ZD1BGFP+ZbcZ38COAM4\nDrga+EAHlidJUlc7pAOfMcHMo2u8CHgecEdW/hRhL3o/w8PDDAwMANDX18fg4ODklls992G5vXJ9\nWir16cby9LaOvzzo6akvcyh77vbyiwgH6S7k8ktTqE/x5aVLYWQklF1fzJ9yrVZjdHQUYDLeNdPO\nkJH7gAWEw9qrCWdmHwncA1wA3Aw8DKwA7gN+l7AX/abcZzhkZAFqtdpkh1EcRbfxyEh4VMm5q89l\n+YXLGds6xvpL1pddnVIU8bu7vohvLkNGtrPnXI+qm4HTgZ3AQ8Dt2fRngbcR9ph7gB9mZRXMf7T4\nbOP4zDkXw76chnaCc2/u9ZoG8/xT9pDUQa4/q8nfvTo6cUKYEpfPJSmOotu4iitpr3Mu5nd3fZEG\ng7MkSYkxOFeAOaT4bOP4zDkXw76cBoOzJEmJMThXgDmk+Gzj+Mw5F8O+nAaDsyRJiTE4V4A5pPhs\n4/jMORfDvpwGg7MkSYkxOFeAOaT4bOP4zDkXw76cBoOzJEmJaWfgi05w4AtJc7Lhqg2M7x2nv7ef\ntRetLbs6UsvmMvCFwVmSpALNJTh7WLsCzCHFZxvHZxsXw3ZOg8FZkqTEeFhbkqQCeVhbkqR56JCy\nK6D4arWad/2JzDaO7+J3XszCoxaWXY2OS+3sc/tyGgzOkuaFJ//9SU5aeVLZ1ei4sa3eXEUH8rB2\nBbgVHJ9tHJ/31i6GfTkNBmdJkhJjcK4Ar1uMzzaOz3trF8O+nAaDsyRJiTE4V4A5pPhs4/jMORfD\nvpwGg7MkSYkxOFeAOaT4bOP4zDkXw76chnaC820Npo8CZ2evbwW2Z4/vAZvbWJ4kSZXQzk1IljWY\nPpE9AE7LTb8RuLmN5alF5pDis43jM+dcDPtyGtrZc34qe+4BNgK7gC1APwfe0LsXOB2DszTvedRT\nKeuW/tlOcK7vHZ8FLAaWAOcBp+beq/tvwBeZCugqkDmk+KrUxmV9VXPOxZjvfXmeV39SJ04IWw5c\nTwjIu4FbZpjnt4AbOrAsSZK6XicGvphg9nEpXwj8EvDrM705PDzMwMAAAH19fQwODk7mPOpbcJYt\np14eGhpKqj4xy7XaEGE42lCGoew5dvkCrttU5PKKKl/I5ZemVJ/5XV6xIpRT+X+prxtGR0cBJuNd\nM7MO9tzEPmAB4bD2amAVcCRwD3ABcFM230XAq4C3zvAZExMT04+AS0rZyEh4FG3dFetYtLL7Tgob\n2zrG+kvWl12NrlFW/zwYPWHrdtb424mc82bgPmAncA1w+7T53oyHtEtV34JTPLZxfOaci2FfTkM7\nh7V7c6/XzDLfa9pYhqTEeKWNUtYt/dM7hFXAULf01oRVqY3L+qpe51yM+d6X53n1JxmcJUlKjMG5\nAswhxWcbx2fOuRj25TQYnCVJSozBuQLmew5pPrCN4zPnXAz7choMzpIkJcbgXAHmkOKzjeMz51wM\n+3IaDM6SJCXG4FwB5pDis43jM+dcDPtyGgzOkiQlphOjUilxtVrNreHIbOP49jy6h7Gt3Zd37u/t\nL7sK+7Evp8HgLGleOOfMcwwaqox2hozsBIeMlCRVSuwhIyVJUgQG5wrwusX4bOP4bONi2M5pMDhL\nkpQYc86SJBVoLjlnz9aWCrDhqg2M7x0vuxrzWn9vP2svWlt2NaRCGJwrwOsW42vWxuN7x1m00jtc\ntWPbpm1lV6ESXF+kwZyzJEmJMThXgFvB8dnG8Xlv7WLYl9NgcJYkKTEG5wrwusX4bOP4HM+5GPbl\nNBicJUlKjMG5AswhxWcbx2fOuRj25TQYnCVJSkyrwfm2BtNHgbNz5fXAvcBOYE2Ly1KbzCHFZxvH\nZ865GPblNLQanJc1mD6RPQDeChwLnAC8HPhki8tqmX1MkvbnenF+aDU4P5U99wAbgV3AFqA/N89F\nwJ/nyo+3uKyW2QkDc0jx2cbxmXPujGbrRftyGloNzvW947OAxcAS4Dzg1Nw8LwV+E7gD+DzwshaX\nJUlSpbR7b+3lwPWEYL0buCX33qHA08AvEYL4x7P59zM8PMzAwAAAfX19DA4OTm651XMfrZZvvrnG\nZZcBDGVLq2XPVSvXp6VSn24s1183en99yfXrhvKVXH5pLaH6zM/yihWh3Gj9WZ/W7vrX8lS5Vqsx\nOjoKMBnvmml1yMh9wALgg8DdwNXZ9M8A1wE3Ad8EzgDGsuU8AfRN+5yoQ0aOjIRH1Xkj+/iatfG6\nK9Y58EWbtm3axrUfubbsasx7zdaLri/im8uQke1eSrUNeHP2OUcDr8m9dzNwevZ6BeGsbZXAf7T4\nbOP4zDkXw76chlYPa9d3dzcTAvBO4CHg9tw87yPsRf8Pwp72BS0uq2X2MUnan+vF+aHVPefe3Os1\nwInA64A3EA5pA+zJyicRLr36eovLapmdMPC6xfhs4/i8zrkzmq0X7ctpaPewtiRJ6jCDcwWYQ4rP\nNo7PnHMx7MtpMDhLkpQYg3MFmEOKzzaOz5xzMezLaTA4S5KUGINzBZhDis82js+cczHsy2kwOEuS\nlBiDcwWYQ4rPNo7PnHMx7MtpMDhLkpSYVge+6JSoA19Iqdhw1QbG946XXY15rb+3n7UXrS27GlLb\n5jLwhcFZkqQCFTEqleYBc0jx2cbx2cbFsJ3TYHCWJCkxHtaWJKlAHtaWJGkeMjhXgDmk+Gzj+Gzj\nYtjOaTik7ApI0lzc+Nkb2fLVLdGX4yVbSoHBuQK8V258tnF8C49ayKKV8e+vPba12ncisy+nwcPa\nkiQlxuBcAeaQ4rON4/Pe2sWwL6fB4CxJUmIMzhVgDik+2zg+x3Muhn05DQZnSZISY3CuAHNI8dnG\n8ZlzLoZ9OQ0GZ0mSEtNOcL6twfRR4Ozc6+8A27PHSW0sTy0yhxSfbRyfOedi2JfT0E5wXtZg+kT2\nqL9+F3By9ri7jeVJpfJon9Q9Uv9/bic4P5U99wAbgV3AFqB/2nxlj3xVeeaQOmO2ZrSN4zPnXIyq\n9OXUv2Y7wbm+d3wWsBhYApwHnDptvr8E7gI+ADyvjeVJklQJ7ezV7gMWAFcCOwj5ZYDPANcBNwFH\nAY8SgvIm4NvAe3OfMXH++eczMDAAQF9fH4ODg5M5j/oWnGXLKZQHB2vcdRdAKEMte7Zs2fJ8Ky9d\nWuPKK4tZf9RqNUZHRwEYGBjgsssugybxtxPB+YOEXPLV2fR8cM5bQcg/vzE3bWJiYgJpPhgZCQ+V\nY90V6wob+GL9JeujL0flKvP/uaenB5rE305cSrUNeHP2WUcDr8m9d3S9LoTD31/vwPJ0kKqSQyqT\nbRyfOedi2JfT0M6QkfVd3s3A6cBO4CHg9tw81wIvIgTn7cCftrE8qVReYSJ1j9T/n8s+k9rD2pLm\nxMPa6hZFHdaWJEkdZHCuAHNI8dnG8ZlzLoZ9OQ0GZ0mSEmNwrgDvlRufbRyf99Yuhn05DQZnSZIS\nY3CuAHNI8dnG8ZlzLoZ9OQ0GZ0mSEmNwrgBzSPHZxvGZcy6GfTkNBmdJkhLTzu07NU/UajW3hiOz\njePb8+gexrbGzzv3904fkr5a7MtpMDhLmhfOOfMcg4Yqw3trS5JUIO+tLUnSPGRwrgCvW4zPNo7P\nNi6G7ZwGg7MkSYkx5yxJUoHmknP2bG1J88KGqzYwvne84fv9vf2svWhtgTWS4jE4V4DXLcZnG8d3\nx/Y7WH7h8obvF3ENdBXYl9NgzlmSpMQYnCvAreD4bOP4vLd2MezLaTA4S5KUGINzBXjdYny2cXyO\n51wM+3IaDM6SJCXG4FwB5pDis43jM+dcDPtyGgzOkiQlptXgfFuD6aPA2dOmfQjY1+Jy1AHmkOKz\njeMz51wM+3IaWg3OyxpMn8gedacAfdOmSZKkWbQanJ/KnnuAjcAuYAvQz9T9Qp8DXAFcQvn38K40\nc0jxHUwbu2PSGnPOxcj3ZftqeVoNzvU94bOAxcAS4Dzg1Nx7FwP/ADzaTgWlbuMKT/OFfbU87Z4Q\nthy4nhCQdwO3ZNOPAc4h7FW711wyc0jx2cbxmXMuhn05De0OfDHBgcG3BxgEXgbcn017PvAtwl72\nfoaHhxkYGACgr6+PwcHBycMq9U5iub1yXSr1qXq5VhsijBgXyjCUPVuevXwK121aPOv8l19aS6i+\n87XMZHnpUhgZCe+n8v8zH8u1Wo3R0VGAyXjXTKt7tfuABYTD2quBVcCRwD3ABcBNDeafzvGcVTkj\nI+Ghg7PuinUsWtk47zy2dYz1l6wvsEbdz74aR8zxnOsRdTNwOrATeAi4vcn8kiSpiVZzzr2512uA\nE4HXAW/gwL3m6fOrYNMPb6vzDqaNPXm+Neaci5Hvy/bV8niHMKlgrvA0X9hXy2NwroAh/8Ois43j\n8zrnYtiX02BwliQpMQbnCjDnHJ9tHJ8552LYl9NgcJYkKTEG5wowhxSfbRyfOedi2JfTYHCWJCkx\nBucKMIcUn20cnznnYtiX02BwliQpMWWPGOW9tSXNyYarNjC+d7zh+/29/ay9aG2BNZJaM5d7axuc\nJUkq0FyCs4e1K8AcUny2cXy2cTFs5zQYnCVJSoyHtSVJKpCHtSVJmocMzhVgDik+2zg+27gYtnMa\nDim7AtJ8Vr+8Z+zbY2z56payq9PV9jy6x1tLqjIMzhXgCi2e8b3jLFq5iEUrve9zbGNbvUNYEVxf\npMHD2pIkJcbgXAHmkOK79857y65C1/Pe2sVwfZEGg7MkSYkxOFeAOaT4TjjlhLKr0PUcz7kYri/S\nYHCWJCkxBucKMIcUnznn+Mw5F8P1RRoMzpIkJaad4Hxbg+mjwNnZ648BO4C7gc3AwjaWpxaZQ4rP\nnHN85pyL4foiDe0E52UNpk9kD4A/AAaBk4DvAGvaWJ4kSZXQTnB+KnvuATYCu4AtQD9To23sy81z\nGPD9NpanBpqliMwhxWfOOb58ztkuHY/rizS0E5zre8dnAYuBJcB5wKm59wCuBnYT9p4/2sby1ID/\nS6oa+7y6XSdOCFsOXE8IyLuBW6a9/1bgGELeeV0HlqeDZA4pPnPO8ZlzLobrizR0YuCLCZoMGg08\nC3wSuGT6G8PDwwwMDADQ19fH4ODgZOeoH16xPHu5VhsijN0dyjCUPVuOX35tYvXp7vLll9aAIVas\nSOf/z7LlZuVarcbo6CjAZLxrpllQnc0+YAHhsPZqYBVwJHAPcAFwE/Ay4P5sOX8FPA28O/cZExMT\n+SPgasXISHg0UqvV3BqOZN0V61i0chH33nmve8+Rbdu0jWs/ci3QvM+rda4v4usJe1Ozxt929pzr\nUXUzcDqwE3gIuL2+fMJlVb1Z+U7g99tYniRJldBOcO7NvW50idSr2/h8zVGzjVy3guNzrzm+fM7Z\nLh2P64s0eIewLuD/kqrGPq9uZ3CugPqJCYrH65zj897axXB9kQaDsyRJiTE4V4A5pPjMOcfndc7F\ncH2RBoOzJEmJMThXgDmk+Mw5x2fOuRiuL9JgcJYkKTEG5wowhxSfOef4zDkXw/VFGgzOkiQlphMD\nXyhx3is3nv7efsa2jjH27TH37CLb8+iesqtQCa4v0mBwltqw9qK1gCu0IniikqqknVGpOsFRqSRJ\nlTKXUanMOUuSlBiDcwV4ODA+2zg+27gYtnMaDM6SJCXGnLMkSQWaS87Zs7UlzQsbrtrA+N7xUpbd\n39s/eWa+VASDcwV4mU98tnF8d2y/g+UXLi9l2WNbq3Nfb/tyGsw5S5KUGINzBbgVHJ9tHJ93YCuG\nfTkNBmdrynS5AAAKTElEQVRJkhJjcK4Ar1uMzzaOz/Gci2FfToPBWZKkxBicK8AcUny2cXzmnIth\nX06DwVmSpMS0E5xvazB9FDg7e30dsAv4OvAxvK66FOaQ4rON4zPnXAz7chraCc7LGkyfyB4A1wIn\nAr8AHAZc0MbyJEmqhHaC81PZcw+wkbCHvAXoZ+qeoV/IzX8H8OI2lqcWmUNqX7OdCds4PnPOxbAv\nH6iMgwntBOf63vFZwGJgCXAecGruvbrnAueyf7CW5g2P9EnVNd+Cc91y4HpCQN4N3DLDPB8GvkTj\nPLUiMocUn20cnznnYtiX09CJE7QmmH3oq/cAPwO8faY3h4eHGRgYAKCvr4/BwcHJwyr1TmK5vXJd\nKvWZj+VaDXp6QhmGsmfLxZZP4bpNi0tb/uWXFru88so0eb965RUr2l1/1BgdHQWYjHfNtDOe8z5g\nAeGw9mpgFXAkcA/hxK+bsue3AiuBZ2b4DMdz1rwwMhIeKs+6K9axaGU5eeexrWOsv2R9KctW+Tr9\n/z+X8Zw7kXPeDNwH7ASuAW7PzfM3hBPE/hXYDvxZG8uTJKkS2gnOvbnXawiXTL0OeANhrxnCiWDH\nAydnj79oY3lqkTmk9jU7gdU2js+cczHsywcq4wR27xAmzYFXl0jVZXBWFF63GJ9tHJ/XORfDvpwG\ng7MkSYkxOFeAOaT4bOP4zDkXw76cBoOzJEmJMThXgDmk+Gzj+Mw5F8O+nAaDsyRJiTE4V4A5pPhs\n4/jMORfDvpwGg7MkSYkxOFeAOaT4bOP4zDkXw76chk6MSiVJ0fX39jO2tZxD2/29/aUsV9VlcK6A\nWq3m1nBktnF8S09cahsXwL6cBg9rS5KUmHbGc+4Ex3OWJFVK7PGcJUlSBAbnCvC6xfhs4/hs42LY\nzmnwhDBJ88KNn72RLV/dUnY1ZtTf28/ai9aWXQ11EYNzBXjmZXy2cXwLj1rIopVpXutc1iVeMdiX\n0+BhbUmSEmNwrgBzSPHZxvF5b+1i2JfTYHCWJCkxBucKMIcUn20cn/fWLoZ9OQ0GZ0mSEmNwrgBz\nSPHZxvGZcy6GfTkNBmdJkhLTTnC+rcH0UeDs7PXFwP3As8BPt7EstcEcUny2cXzmnIthX05DO8F5\nWYPpE9kD4F+AlYDHoyRJmqN2gvNT2XMPsBHYBWwB+pkabWMHBubSmUOKzzZurFNNY865GDP1Zbt3\n8doJzvW947OAxcAS4Dzg1Nx7kirOFfv8529YvE6cELYcuJ4QkHcDt3TgM9VB5pDis43jM+dcDPty\nGjox8MUETQaNns3w8DADAwMA9PX1MTg4ONk56odXLFu2PH/LtdoQYWz5UIah7Plgy69t8+/jli+/\nNK36dLK8YkU6/Wk+lmu1GqOjowCT8a6ZloMqsA9YQDisvRpYBRwJ3ANcANyUm/cB4BTgB9M+Y2Ji\nwiPgsdVqtckOozhs48ZGRsKjXeeuPpflFy5v/4MiGNs6xvpL1pddjY6YqS936jdU0BO2VmeNv53I\nOW8G7gN2AtcAt+fmeQfwXeBY4G5gUxvLkySpEto5rN2be72mwTwfyh4qkXt08dnGjXWqacw5F2Om\nvmz3Ll4nTgiTpIZcsc9//obFMzhXQP3EBMVjG8fndc7FsC+nweAsSVJiDM4VYD40Pts4PnPOxbAv\np8HgLElSYgzOFWAOKT7bOD5zzsWwL6fB4FwBO3bsKLsKXc82ju+xRx4ruwqVYF9Og8G5Ap588smy\nq9D1bOP4nnnmmbKrUAn25TQYnCVJSozBuQIefPDBsqvQ9Wzj+Pb8cE/ZVagE+3Ia2hn4ohN2AEtL\nroMkSUW6CxgsuxKSJEmSJEmSJHWR9xKOv+8AtgLHlVudrvRXwDcJ7XwTsLDc6nStNwH3AD8BfrHk\nunSbM4BdhLHjLy25Lt3o48BjwNfLrkiXOw74Z8J64hvAO8qtzuwW5F6vAT5aVkW62GuZOjP/fdlD\nnXcisJjwz2dw7pznAPcDA8BzCRvyS8qsUBc6DTgZg3NsRzF1ItjhwL006MspXEq1L/f6cOD7ZVWk\ni20Bns1efwV4cYl16Wa7gG+VXYku9EpCcH4Q+E/gk8Cvl1mhLnQr8ETZlaiARwkblwBPEY5oHjPT\njIcUVaMm1gNvAX4E/HLJdel2bwNuKLsS0kE4Fvhurvww8KqS6iJ1ygDhaMVXZnqzqOC8hbA7P92f\nAp8D1mWPPwY+CLy1oHp1k2ZtDKGNfwxcX1SlutBc2lmdNVF2BaQOOxy4EVhL2IM+QFHB+bVznO96\n4PMxK9LFmrXxMLAKWBm/Kl1trn1ZnfM99j9R9DjC3rM0Hz0X+AxwLXBzyXWZ1fG512uAT5RVkS52\nBuHswBeWXZGK+GfgFWVXooscAnybcBjweXhCWCwDeEJYbD3A3xGOECfvRkKH2EHYmugvtzpd6T5g\nDNiePT5cbnW61lmE3OjThBM/vlBudbrK6wlntt4P/EnJdelGNwCPAP9B6MOmFuN4NeHk3B1MrY/P\nKLVGkiRJkiRJkiRJkiRJkiRJkiRJklSe55RdAUmVsIJwg3/v7CXNQQqjUkkqVhkb5a8BTi1huZIk\nHWCAMJTk1YQ7XF0HvA64jTC85C9l872AMOD9V4B/A87M/f024GvZ41ey6Udn07cT7rC3LJuev4n+\nOdlyAUaBq4AvA38NvJRwB7M7s885ITffh4F/Jdwycwi4BtiZ+yyy73B7VqdPZfWHMKzjSDb97uxz\nB4DdhL3m7YS7JOVdCbw7e/2rwJeQJCmiAcIYxD9PuK/uncDHsvfOBDZnry8Hfid73UcI5M8HDgMO\nzaYfD9yRvf5DwkhYEI6AHZ69zo+Pfjb7B+fPZnUA2Aq8LHv9qqxcn68+atmZwN5pdV9KuEf7l7K6\nAVzKVHB9APj97PV/B/42e/0e4J3M7DDgG4S9613AzzaYT6qMVMZzlrrZA4SBR8iev5i9/gYheEPY\nE30j8K6sfChh9KVHgY2EoPgTpgaK+SphT/u5hJFt7mpShwng09nz4YQ98E/n3n9ebr760JffyJaf\nr/tAVq+XE/ac639bfw1wU/b8b8Bv5Kb3MLOngbcDtxKG0HugyXeRup7BWYrvP3KvnyWMqV1/nf8f\n/A3CICV5I4RDwm8h5IqfyabfCpwGvIGwt/sBwohu+bGPD2N/P8qefwp4kjDQ+0zy9Zte90MIGwlb\ngN9u8Pf1v/kJc1/HnAQ8Dhw7x/mlruYJYVIa/g/wjly5Hjh7CXuvAOcxdTLXSwjB7KOEw+T1+R8D\nTiT8b5/F/sG6bi9h7/ScrNxDCI5zMUHIWy8j5K0h5JuPb/gXwT5gQYP3FhEOeZ9MGH3qlXOsi9S1\nDM5SfNMD5MQMr99LOER9N+Fw8mXZ9A8D5xOGmDuBqRO+XpNN+zfgTcCGbPofA/9IOOHskVmW+zvA\n72af8Q2mTkBrVL+87wPDhGEG7yIc0j5hhvkmcn//OcLGwnamTl6DsGHwUUIO/dGsTh9l6jC7JEmS\nJEmSJEmSJEmSJEmSJEmSJEmSJEmSJEnV8/8BayVjGlZHuFwAAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we would expect, basically every sample has a very (relatively) large standard deviation. However, we can't derive any detail about a possible cause for this large deviation: whether it is just due to 1 outlier, or if the whole data widely spread in general." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Boxplot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[[back to top]](#Sections)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A more useful plot in my opinion is Tukey's boxplot 4. Boxplots are in facts one of my preferred approaches to quickly and visually indicate outliers in a Gaussian data set. However, also boxplots have to be used with real caution and might also not very informative for small sample sizes.\n", "\n", "\n", "\n", " 4 Robert McGill, John W. Tukey and Wayne A. Larsen: \"[The American Statistician](http://www.jstor.org/discover/10.2307/2683468?uid=3739256&uid=2&uid=4&sid=21104147331297)\"\n", "Vol. 32, No. 1 (Feb., 1978), pp. 12-16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](https://raw.githubusercontent.com/rasbt/algorithms_in_ipython_notebooks/master/images/boxplot.png)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "csv_nonan = [[x for x in row[1:6] if not math.isnan(x)] for row in csv_cont[1:]]\n", "fig = plt.figure(figsize=(8,6))\n", "plt.boxplot(csv_nonan,0,'rs',0)\n", "plt.yticks([y+1 for y in y_pos], [row[0] for row in csv_cont[1:]])\n", "plt.xlabel('measurement x')\n", "t = plt.title('Box plot')\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAecAAAGJCAYAAACw8/t+AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHglJREFUeJzt3X2YXFdh3/Hv9WtNJNuzAcv4BTYFC5QEG0oM1AJ5/ZLU\n1UOhiuAhJKmRQY3SgtyE0CYO8OgqNDhtGkgRj2nBxmtelDxAJSekSVMTeYSREheCLIMUg1W8tgyy\ngew2lmqCXPv2j3NHO7s7o325M3PPvfP9PM88e+fOnTnn7s7sb865594DkiRJkiRJkiRJkiRJkiRJ\nkiRJkiRJQ20UeAY4peR6SEPFD5wUrwngSeAoMAn8CXBRmRWaRwp8suxKSHVgOEvxyoDXAsuB5wKP\nA9tKrZEkSUPuIeDqtvtrgW+03T8H+ATwXUIr+91AAowAhwnBDrAMOAT8YpdymsDNwL3A3wF3Ao38\nsVFmdmtfAPwx8LfAg8DGfP11wA+B44SW/r4F76UkSRXyEHBNvvws4A5gvO3xTwA7gR8Bnk8I7rfm\nj/00cAR4DvAx4DMnKacJPAr8eF7O55junh5lZjh/EfgwcAZwGeGLwVX5Y1vyOkmSVFsThFboFKFF\n+ijwk/ljpxJaqi9u2/6XgLvb7n8I+BqhFd2gu7uB97fdX5W/dsLMcL4Y+H+ELwMt7wduz5dTPOYs\n9YTHnKV4ZcDrCcF6JrAZ2A2cBzwbOB14uG37R4AL2+5/DPgJQmt7ap6yDs96ndPzMtpdQBiY9n9P\nUqakHjCcpWrICF3YTwOvBr4PPEVo2bY8j9C6htCy/iihm/ntwAvmef3nzVp+Ki+j3XcIx7OXdSkz\nm383JEmqtvZjzgmhFf0UodsZQhfyDkJYPh/4G6aPOb8X+FL+vJuAPXT/Mt4ktJxXEY45fxb4VP7Y\nKHOPOW8jtOQvBR5jetDaJuCevExJkmrpIabPc34CuB94c9vj5xIC+ruE7uX3EILx5YTu53+Yb3cK\nIahv6lJO65hza7T2HxFayBDC+Wmmw/lC4POE0dqHCMe5W0YI4TwJfGVxuypJktrdzXSLW1IEPOYs\nCeyKlqJiOEsCB3NJkiRJkiSpMko9znTZZZdl+/fvL7MKkiQN2n7gpSfboNRjzvv37yfLstretmzZ\nUnod3D/3bxj3r8775v5V/0a4Lv1JOSBMkqTIGM6SJEXGcO6jsbGxsqvQV+5ftdV5/+q8b+D+DYOy\nLzyQ5f3vkiQNhSRJYJ78teUsSVJkDGdJkiJzWtkVkKS6GBmBqanBlJWRkAzoqquNBkxODqQo5Tzm\nLEk9kiQwsH9pAyxsoPs1BDzmLElSBRnOkiRFZr5w3tNl/TiwPl9+B3AIeAYYmbXdh4AHCdcRfdnS\nqihJ0nCZL5xXd1mfMT3/65eAa4CHZ22zFnghcAnwS8BHllhHSVqy/PieKsK/VzDfaO1jwDLCgett\nwLXAYeA40wez7+vy3NcBd+TL9wLnAiuAxwvUV5Kk2puv5dxqHa8DVgKrgOuBK9oe6+ZCQpC3PApc\ntIQ6SpI0VBY6IGwNsJ0QyEeAXQt83uz+CQfjS5I0j4WGc8biz4n+NnBx2/2L8nUzpGl64tZsNhdZ\nhCTNL0nCLU07P56m09u03xa7fZ314vezkO3rqNlszsi6hZjvV3EUWE7o1t5EGOS1AjgAbAR2tG37\nEPBTwN/m99cSRnKvBV4F/H7+s50XIZHUV0mSMKj/M16EpBdlDe7vVZZeXISk9RvaSTgl6iBhkNfe\ntm1uJBxbvhC4H/hovv5PgW8RTrP6r8C/XnjVJUkaXmV3IthyltRXtpwrVZQt55xXCJMkKTKGs6Ra\nq3srrG78ewWGsyRJkTGcJUmKzHyX75QkLcKgztXNBlhWozGYcjTNcJakHhns4dLMSy7WmN3akiRF\nxnCWJCkyhrMkSZExnCVJiozhLElSZAxnSZIiYzhLkhQZw1mSpMgYzpIkRcZwliQpMoazJEmRMZwl\nSYqM4SxJUmQMZ0mSImM4S5IUGedzlipuZASmpvr3+hkJiTMHV0qjAZOTZddCRSQll59lg52dXKqd\nJIG+foz6XoB6zT9Z3JIkgXny125tSZIiUySc93RZPw6sz5evBv4a+Fq+/tQC5UmSNBSKhPPqLuuz\n/HYKIZDfBLwEeBh4S4HypKGWd4VJUfN92htFwvlY/jMBPgw8ANwFnJev/1HgOHAov/8FplvUkiSp\niyLh3BpusA5YCawCrgeuyNd/jzAa/OX5/TcAFxcoT5KkodCLAWFrgO2EsD4C7Gp77OeADwL3Ak8A\nT/egPEmSaq0X5zlndB8S/leE8Ab4GeCS2RukaXpieWxsjLGxsR5USaonD+dpoTq9V7ZsgbZ/uSek\nKWzd2pvtNVez2aTZbC7qOUU+6keB5YRu7U3AWmAFcADYCOwgHH/+LnAm8N+Bfw+019DznKUFSpKE\nTp8Xz3PWbGX+ybq9TzVtIec5F2k5t377OwmnTB0EHgH2tm3zLuC1hO7zW5gZzJIkqYOyO8lsOUsL\nZMtZC2XLOW5eIUySpAoynKWKsDWiKvB92huGsyRJkTGcJUmKjOEsSVJkenEREkkl6+fFSbI+v756\nr9EouwYqynCWKq7/428yHOIjDZbd2pIkRcZwliQpMoazJEmRMZwlSYqM4SxJUmQMZ0mSImM4S5IU\nGcNZkqTIGM6SJEXGcJYkKTKGsyRJkTGcJUmKjOEsSVJkDGdJkiLjlJGSBmZkBKamZq7LSEgim5Sy\n0YDJybJroWFW9hTqWdb/yWglRSJJOsw/3XFluSKskmokSRKYJ3/t1pYkKTKGsyRJkSkSznu6rB8H\n1ufLa4H7gH3APcALCpQnSdJQKBLOq7usz/IbwC3Am4CXAduB9xQoT1IF5MfTFDH/RvErMlr7GLCM\ncFB7G3AtcBg43rbNEeCcfPlc4NsFypMkaSgUCedW63gdsBJYBZwPHARuyx97B/A/gSeBJ4BXFShP\nkqSh0IsBYWsIXdYZoaW8q+21PwlcB1wM3A58oAflSZJUa70I54zO52s9BzgD+HJ+/zPAFbM3StP0\nxK3ZbPagOpLKlqbhXOHZtyrpVP807bxtt/2tyvbqr2azOSPrFqLIx+UosJzQrb2JMDJ7BXAA2Ajc\nCTwKXAk8CLyN0Ip+Y9treBESqWaSJKHb59qLkMThZH8j9d9CLkLSi2POO4GrCceaHwH25uufAd5K\naDEnwGR+X5IknUTZHU22nKWaseUcP1vO5fLynZIkVZDhLKmnbJHFz79R/AxnSZIiYzhLkhSZIqO1\nJWnRZp/vnHVYV7ZGo+waaNgZzpIGpvOhzgyPgEoz2a0tSVJkDGdJkiJjOEuSFBnDWZKkyBjOkiRF\nxnCWJCkyhrMkSZExnCVJiozhLElSZAxnSZIiYzhLkhQZw1mSpMgYzpIkRcZwliQpMoazJEmRcT5n\nSUNpZASmphb3nIyEpM+zTzcaMDnZ1yJUAUnJ5WdZ59nXJamvkgQW/e9nSU+KrgiVLEkSmCd/7daW\nJCkyRcJ5T5f148D6fPkeYF9++zaws0B5kiQNhSLHnFd3WZ/lN4DXtK3/HHBngfIkVVySJHgoS5pf\nkXA+Biwj9JtvA64FDgPHmduXfjZwNbChQHmSpCGUbtgAExNzHxgdJR0fH3BtBqNIOLe+/q4DVgKr\ngPOBg8Bts7b958AXCIEuSdLCTUyQ7t49Z3U6+JoMTC8GhK0BthPC+giwq8M2bwb+oAdlSZJUe704\nzznj5EPCnw1cDry+04Npmp5YHhsbY2xsrAdVklQVaQpbt85dv2VLeKxf20uD0mw2aTabi3pOkfOc\njwLLCd3am4C1wArgALAR2JFv98vAK4EbOryG5zlLQySmAWGe51wd6dhY527tK68kXWToxWAh5zn3\n4pjzTsJgr4PAI8DeWdu9Cbi5QDmSJA2VIuF8dtvy5pNsd1WBMiRJw250tPPgr9HRwdZjgLx8p6SB\nsVs7iiJUMi/fKSkqsQSzFDvDWZKkyBjOkiRFxnCWJCkyvbgIiSRVUrLIIbHZEp6zWI1Gf19f1WA4\nSxpKSxubluGQNg2C3dqSJEXGcJYkKTKGsyRJkTGcJUmKjOEsSVJkDGdJkiJjOEuSFBnDWZKkyBjO\nkiRFxnCWJCkyhrMkSZExnCVJiozhLElSZAxnSZIi45SRkmYYGYGpqf6WkZGQRDL5YqMBk5Nl10Ka\nqc/Ths8ry5Y2qaqkPkmSpc51HFshCxNRVTQkkiSBefLXbm1JkiJjOEuSFJmlhvOeLuvHgfVt938b\n+AZwENi8xLIkaajl3aAaIksdELa6y/osvwHcAFwIvCi//5wlliVJ0lBZajgfA5YRDmhvA64FDgPH\n27b5ZeDNbfe/t8Sy+irdsAEmJuY+MDpKOj4+4NpIkrT0cG61jtcBK4FVwPmE7uvb8sdeAPxcvs33\ngBuBQ0uuab9MTJDu3j1ndTr4mkiSBBQfELYG2E4I6yPArrbHzgR+AFwOfAz4eMGyJEkaCkXDOaP7\nuVqPAjvy5TuBSzttlKbpiVuz2SxYHUlavCSZe0vTztum6eC3V7U1m80ZWbcQS/2zHwWWE7qsNwFr\ngRXAAWAjIZRvBr4J3A6MAf8BeOWs1yn9IiTp2Fjnbu0rryT1y4KGkBchiU+SJJT9v1K9s5CLkBQ9\n5rwTuJpwrPkRYG/bNr8DfBr4VUKYb1xiWZIkDZWlhvPZbcvdzl/+O+C1S3z9wRkd7Tz4a3R0sPWQ\nJClX9tGM0ru1Jc1kt3Z87NauF6+tLUk1YDAPH8NZkqTIGM6SJEVmqQPCJNVYv8+tzQZQxkI1GmXX\nQJrLcJY0w2AOb2Z4FFXqzm5tSZIiYzhLkhQZw1mSpMgYzpIkRcZwliQpMoazJEmRMZwlSYqM4SxJ\nUmQMZ0mSImM4S5IUGcNZkqTIGM6SJEXGcJYkKTKGsyRJkTGcJUmKjPM5SxqYkRGYmiq7FouXkZBU\nZAbqRgMmJ8uuhYpKSi4/ywYzs7ukCCQJVPIjX6GKV6iqQytJEpgnf+3WliQpMkXCeU+X9ePA+rbl\nbwH78tulBcqTJGkoFDnmvLrL+iy/tZbfBewoUI5UeUmS4CEcqZhh+hwVCedjwDJCv/k24FrgMHB8\n1nZlH9eWJKlSinRrt76+rANWAquA64ErZm13M7Af+ABwRoHyJEkaCr0YELYG2E4I6yPArrbHbiIE\n9+XACPDrPShPkqRa60U4Z3Tvun4s/3kcuB14xewN0jQ9cWs2mz2ojhSnJJl7S9PO26ZpPbfXYFTh\nvdCr7aug2WzOyLqFKPJxOQosJ3RrbwLWAiuAA8BGwiCw5xJa0wnwQeBJ4DfbXsPznDUUhmkgy8lU\n9hzcClW8QlVdtLp8jhZynnORAWGt39BO4GrgIPAIsLdtm08Bz8krsY+ZwSxJkjoou6PJlrOGQl2+\n8RdV2VZdhSpeoaouWl0+R14hTJKkCjKcpQGow7d9qWzD9DkynCVJiozhLElSZAxnSZIiU+RUKkla\ntCpejCSjOvVuNMqugXrBcJY0MNUdz5NR2aqrkuzWliQpMoazJEmRMZwlSYqM4SxJUmQMZ0mSImM4\nS5IUGcNZkqTIGM6SJEXGcJYkKTKGsyRJkTGcJUmKjOEsSVJkDGdJkiJjOEuSFBmnjJQ0GElCEvnE\ni40GTE6WXQvJcJY0QLHP55wkZddACuzWliQpMoazJEmRWWo47+myfhxYP2vdh4CjSyxHUoQS+3+l\nvlpqOK/usj7Lby0/BZw7a50kSTqJpQ4IOwYsAxJgG3AtcBg4nq8DOBX4j8DPA+uKVVOSVHXphg0w\nMTH3gdFR0vHxAdcmbksN51ZLeB2wElgFnA8cBG7LH3sH8EfAY0UqKEmqiYkJ0t2756xOB1+T6BU9\nlWoNsJ0Q1keAXfn6C4A3AGNMt6QlSdICFA3njLnhmwAvBV4IHMrXPQv4JqGVPUOapieWx8bGGBsb\nK1glSYOQpuHWaf3WrXPXO/BEw6rZbNJsNhf1nKW2ao8Cywnd2puAtcAK4ACwEdjRZfvZsiz2qxJI\nmiNJEhb92U2S6K9CUoEqVlo6Nta5W/vKK0kXGV5Vlp/tcNL8LXrMeSdwNeFY8yPA3nm2lyRJ81hq\nOJ/dtrx5kdtLkobR6GjnwV+jo4OtRwWUPVjLbm2pguzWlpZuId3aXr5T0qL5pVrqL8NZkqTIGM6S\nJEXG+ZwlDUzs82U0GmXXQAoMZ0mDkWWeUyktkN3akiRFxnCWJCkyhrMkSZExnCVJiozhLElSZAxn\nSZIiYzhLkhQZw1mSpMgYzpIkRcZwliQpMoazJEmRMZwlSYqM4SxJUmQMZ0mSImM4S5IUGedzlkow\nMgJTU2XX4uQyEpKSZmBuNGByspSipSgkJZefZZnTr2v4JAlE/9YvsZKV+P1IS5QkCcyTv3ZrS5IU\nmSLhvKfL+nFgfb58G3AfcD+wEzinQHmSJA2FIuG8usv6LL8B/ArwUuBS4FvA5gLlaQjk3T3SwPne\nU0yKhPOx/GcCfBh4ALgLOI/pvvSjbducBXy/QHmSJA2FIuHcah2vA1YCq4DrgSvaHgO4HThCaD3f\nWqA8SZKGQi8GhK0BthMC+Qiwa9bjNwAXEI47v7sH5UmSVGu9COeM+U/Jegb4Q+Dy2Q+kaXri1mw2\ne1Ad1UmahtNqZt/StNrba36D/rtI/dJsNmdk3UIUeUseBZYTurU3AWuBFcABYCOwA3ghcCgv53eB\nHwDvbXsNz3PWDEmSMAzviUqcxztk5zkPy3tP5VvIec5FrhDWehfvBK4GDgKPAHtb5RNOqzo7v/8V\n4O0FypMkaSiU3Zljy1kzDEvrxZZzfEUPy3tP5fMKYZIkVZDhrKjYclFZfO8pJoazJEmRMZwlSYqM\n8zlLJYn93NqM8urYaJRTrhQLw1kqQTUOb2ZUoppSDdmtLUlSZAxnSZIiYzhLkhQZw1mSpMgYzpIk\nRcZwliQpMoazJEmRMZwlSYqM4SxJUmQMZ0mSImM4S5IUGcNZkqTIGM6SJEXGcJYkKTKGsyRJkXE+\nZ0l9NTICU1Nl16I3MhKSis5y3WjA5GTZtdBCJSWXn2XVmHVe0hIlCdTmY17hnalw1WsnSRKYJ3/t\n1pYkKTKGsyRJkSkSznu6rB8H1ufLnwYeAL4G3IbHuKUFy7u+JDF8n4ci4by6y/osvwF8Cngx8BLg\nLGBjgfIkSRoKRVqyx4BlhIPa24BrgcPAcaYPdP9Z2/ZfBi4qUJ4kSUOhSMu51TpeB6wEVgHXA1e0\nPdZyOvCLzAxrSZLUQS8GhK0BthMC+Qiwq8M2twC76X6cWpIk5XoRzhknP19rC/CjwDs7PZim6Ylb\ns9nsQXWk+kgSSNPOj6VpeHz2LbbtFY8qvF+6bV9lzWZzRtYtRJGPzlFgOaFbexOwFlgBHCAM/NqR\n/7wBuAb4+w6v4UVIpC6SJKEOn49aXfyiwjtT4aoD9fk8QP8vQtL6Le0EHgQOAncAe9u2+QhwHvCX\nwD7gPQXKkyRpKJTd6WTLWeqiLi2FqrfYZqjwzlS46kB9Pg/g5TslSaokw1mKVF1aCVIvDNvnwXCW\nJCkyhrMkSZFxIgpJfVeX850zqrsvjUbZNdBiGM6S+qpehwqzOdcmlvrBbm1JkiJjOEuSFBnDWZKk\nyBjOkiRFxnCWJCkyhrMkSZExnCVJiozhLElSZAxnSZIiYzhLkhQZw1mSpMgYzpIkRcZwliQpMoaz\nJEmRMZwlSYqM4SxJs4yMQJL070aSnPTxkZGyfwMqW1Jy+VlWr5nYJdVAkkBf/zXNU0Dfy1epkiSB\nefLXlrMkSZEpEs57uqwfB9bny+8ADgHPAHbUSJK0AEXCeXWX9Vl+A/gScA3wcIFyJGno5V2hGhKn\nFXjuMWAZod98G3AtcBg4znRf+n2FaicpWumGDTAxMfeB0VHS8fEB10aqlyLh3GodrwNWAquA84GD\nwG0F6yUpdhMTpLt3z1mdDr4mUu30YkDYGmA7IayPALt68JqSJA2tIi3nlowCp2SlaXpieWxsjLGx\nseI1kqSK8xBzfTSbTZrN5qKe04tw/iKwCbgDWAFcBXy6w3Yd32rt4SxJCmaf52xYV9fshufWrVvn\nfU6Rbu3WW2cn8CDhWPMdwN62bW4kDBK7ELgf+GiB8iRJGgpFWs5nty1v7rLNh/KbpLoZHe08+Gt0\ndLD1kGqo7I4SL98pKToxXr4zSRL8f1kPXr5TkmrCYB4uhrMkSZExnCVJikwvTqWSpNrp56lL2Tyv\n32j0r2xVg+EsSbP0//BuhkeQdTJ2a0uSFBnDWZKkyBjOkiRFxnDuo8Ve6Lxq3L9qq/P+1XnfwP0b\nBoZzH9X9Deb+VVud96/O+wbu3zAwnCVJiozhLElSZMqe+OI+4LKS6yBJ0iDtB15adiUkSZIkSZIk\nSaqRFHgU2Jffriu1Nv3za8AzwEjZFemx9xGOn9wH/AVwcbnV6bnfBf6GsI87gHPKrU5PvRE4ADwN\n/KOS69JL1wEPAA8Cv15yXXrt48DjwNfKrkifXAzcTXhffh24sdzq9NQ/AO4l/K88CNxcbnXmtwV4\nZ9mV6LOLgf8BPET9wnl52/Jm4NayKtInP830WQ2/k9/q4sXASsI/w7qE86nAIWAUOJ3wj3BVmRXq\nsdcAL6O+4Xw+0wOllgHfoF5/v2flP08D/gp4dbcNYzmVquxR4/32AeDflV2JPjnatrwM+H5ZFemT\nuwg9HhC+9V5UYl167QHgm2VXosdeQQjnCeAp4A+B15dZoR67B5gquxJ99BjhCxXAMUKv1QXlVafn\nnsx/nkH4IjnZbcNYwnkzodvwNuDckuvSa68ndNvfX3ZF+ui3gUeAt1CvluVsbwX+tOxK6KQuBA63\n3X80X6fqGSX0Etxbcj166RTCl4/HCT1WB7ttOKj5nO8idFfM9m7gI8Bv5fffB/we8LYB1atXTrZ/\nNwE/07auir0E3fbvN4HPE/bz3cBvAB8Ebhhc1Xpivv2DsH/Hge2DqlSPLGTf6sRpkuthGfA54N8Q\nWtB18Qyh2/4c4M+BMaBZYn0WbJR6HUv5ScI3pIfy21OE7rbzSqxTPz2PMIijbjYAewgDOuqoTsec\nX0UY39FyE/UbFDZKvf5PznY6Ibh+peyK9Nl7gXeVXYmTeW7b8q9SvZbJYtRxQNglbcubgU+WVZE+\nuY4wcvTZZVekj+4GXl52JXrkNOB/EwLsDOo3IAzqHc4J8AlCD1zdPJvpw7ZnAV8ErimvOvP7BOF4\n7H7gTmBFudXpq29Rv3D+HOEfxX3Af6N+vQIPAg8zfarfLeVWp6fWEY7P/oAwEOfPyq1Oz/xTwijf\nQ4SWc538AfAd4IeEv13VDiHN59WErt/7qN/ptS8BvkrYt/uBf1tudSRJkiRJkiRJkiRJkiRJkiRJ\nkiSpdKeWXQFJQ+FKwgQGj5ZdEakKYpn4QtLglPGl/CrgihLKlSRpjlHC1Iy3E65a9WnCRCh7CNM1\nXp5v9yPAxwkz8HwVeF3b878I/HV++8f5+ufm6/cRrtC2Ol/fPknAG/JyAcaB/0KYQ/Y/AS8gXBHs\nK/nrvKhtu1uAvyRcBnMMuIMwe07rtcj3YW9ep8/k9Ydw7fg0X39//rqjwBFCq3kfc+ew/X3CdYYB\n/gmwG0mS+miUMOHJTxCuG/wVwtSoEAJ4Z778fuAX8uVzCUH+LMI1eM/M118CfDlf/jXCzFIQesCW\n5cvt82uvZ2Y4/zHTs6L9BfDCfPmV+f3Wdq3r278OeGJW3S8jXCN4d143CBNLtML1IeDt+fK/Aj6W\nL28B3klnZxEmTLmK8EXmx7psJw2NQU0ZKQ2zhwiTZ5D//EK+/HVCeENoif4zpmepORO4mHDN6w8T\nQvFppica+V+ElvbphGvS75+nDhnw2fznMkIL/LNtj5/Rtl1rKsmv5+W31300r9ePE1rOree2lgF2\n5D+/Cvxs2/pu06X+APiXwD2EKQIfmmdfpNoznKX++2Hb8jOEeaFby+2fwZ8lTLTRLiV0Cf8LwrHi\nv8/X3wO8BngtobX7AcKMYO3zGZ/FTE/mP08B/g9hIvtO2us3u+6nEb4k3AX8fJfnt57zNAv/H3Mp\n8D3gwgVuL9WaA8KkOPw5cGPb/VZwnk1ovQJcz/RgrucRwuxWQjd5a/vHgRcTPtvrmBnWLU8QWqdv\nyO8nhHBciIxw3Ho14bg1hOPNl3R9RnAUWN7lsecTurxfRphR6hULrItUW4az1H+zAzLrsPw+Qhf1\n/YTu5K35+luAtxCmmXsR0wO+rsrXfRV4I/Cf8/W/AfwJYcDZd05S7i8Ab8tf4+tMD0DrVr923wc2\nEKYv3E/o0n5Rh+2ytud/nvBlYR/Tg9cgfDG4lXAM/bG8Trcy3c0uSZIkSZIkSZIkSZIkSZIkSZIk\nSZIkSZIkSdLw+f9QuMBXenTmUgAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The red squares indicate the outliers here. Quite interestingly, both outliers for the samples \"id6\" and \"id1\" where also picked up in our previous Dixon Q-test. However, the outliers in \"id4\" and \"id7\" were not indicated as outliers by Dixon's outlier test." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**I really don't want to draw any conclusion about which approach is right or wrong here, since in my opinion, drawing any conclusion from a data set that is based on such a small number of observations simply just doesn't make sense!** \n", "\n", "So you may wonder why I wasted your time if you read this article up to this point? Since Dixon's Q-test is still quite popular in certain scientific fields (e.g., chemistry) that it is important to understand its principles in order to draw your own conclusion of the presented research data that you might stumble upon in research articles or scientific talks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
" ] } ], "metadata": {} } ] }