{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Module 3 : Homework\n",
    "**Prof. Bruce Hamilton**  \n",
    "**Winter 2016**\n",
    "\n",
    "* * *\n",
    "\n",
    "## Correcting the Professor.\n",
    "  \n",
    "Nxf1 encodes a protein recruited to RNPs during nucleocytoplasmic transport. It has previously been demonstrated that for 6 of 7 target genes with a particular kind of retrotransposon in an intron, the Nxf1 allele was strongly correlated with the level of target gene expression.  Specifically, the “C” allele of Nxf1 allowed a higher level of target gene RNA expression than the “B” allele (no difference was detected in the 7th gene) with ~2x effect size.  \n",
    "  \n",
    "The research question is whether expression of an 8th gene that has the same class of retrotransposon in an intron is Nxf1-dependent like the majority of the others.  The tables downloaded in class (B6.txt and balbF2.txt) contain relative gene expression values from two different RT-qPCR experiments on different sets of animals. Both experiments are measuring the same (8th) gene, which has the retrotransposon inserted into an intron, just like the previously reported gene. The samples are NOT explicitly paired.\n",
    "\n",
    "#### 1. Is this a one-tailed or a two-tailed test and why?"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    " "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.a. What test statistic(s) would be most appropriate to compare data in B6.txt and why?  "
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 2.b. Perform the test in R."
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 3. What test statistic(s) would be most appropriate to compare data in F2.txt?  Perform the test in R. (For this exercise, you may either exclude the heterozygous individual, EB216, or treat as if C; extra credit–does this sample matter to your overall conclusion?)\n",
    "\n",
    "*Notes: usage of “less” or “greater” for direction of the test has opposite meaning in the two tests; see Details in the help page for each test: >?ks.test and >?wilcox.test.  The ks.test does not allow conditional calls to normal.f2~nxf1.f2*"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 4. What do you conclude about the influence of Nxf1 on gene 8?  How strong is the evidence overall?"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 5. Are parametric or non-parametric test more sensitive to outliers values?"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* * *\n",
    "\n",
    "## Some helpful bits of R.\n",
    "\n",
    "#### Creates an object “b6” that holds the data table in b6.txt."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "b6<-read.table(\"hot.txt\",header=T)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Allows R to call a column of data according to its header."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "attach(b6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calls up the documentation page for the t-test implementation, including expected or allowable arguments, which may define the version of the test that is run (paired, unpaired, 2-tailed, alternative directions for 1-tailed, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<table width=\"100%\" summary=\"page for t.test {stats}\"><tr><td>t.test {stats}</td><td style=\"text-align: right;\">R Documentation</td></tr></table>\n",
       "\n",
       "<h2>Student's t-Test</h2>\n",
       "\n",
       "<h3>Description</h3>\n",
       "\n",
       "<p>Performs one and two sample t-tests on vectors of data.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Usage</h3>\n",
       "\n",
       "<pre>\n",
       "t.test(x, ...)\n",
       "\n",
       "## Default S3 method:\n",
       "t.test(x, y = NULL,\n",
       "       alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "       mu = 0, paired = FALSE, var.equal = FALSE,\n",
       "       conf.level = 0.95, ...)\n",
       "\n",
       "## S3 method for class 'formula'\n",
       "t.test(formula, data, subset, na.action, ...)\n",
       "</pre>\n",
       "\n",
       "\n",
       "<h3>Arguments</h3>\n",
       "\n",
       "<table summary=\"R argblock\">\n",
       "<tr valign=\"top\"><td><code>x</code></td>\n",
       "<td>\n",
       "<p>a (non-empty) numeric vector of data values.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>y</code></td>\n",
       "<td>\n",
       "<p>an optional (non-empty) numeric vector of data values.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>alternative</code></td>\n",
       "<td>\n",
       "<p>a character string specifying the alternative\n",
       "hypothesis, must be one of <code>\"two.sided\"</code> (default),\n",
       "<code>\"greater\"</code> or <code>\"less\"</code>.  You can specify just the initial\n",
       "letter.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>mu</code></td>\n",
       "<td>\n",
       "<p>a number indicating the true value of the mean (or\n",
       "difference in means if you are performing a two sample test).</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>paired</code></td>\n",
       "<td>\n",
       "<p>a logical indicating whether you want a paired\n",
       "t-test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>var.equal</code></td>\n",
       "<td>\n",
       "<p>a logical variable indicating whether to treat the\n",
       "two variances as being equal. If <code>TRUE</code> then the pooled\n",
       "variance is used to estimate the variance otherwise the Welch\n",
       "(or Satterthwaite) approximation to the degrees of freedom is used.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>conf.level</code></td>\n",
       "<td>\n",
       "<p>confidence level of the interval.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>formula</code></td>\n",
       "<td>\n",
       "<p>a formula of the form <code>lhs ~ rhs</code> where <code>lhs</code>\n",
       "is a numeric variable giving the data values and <code>rhs</code> a factor\n",
       "with two levels giving the corresponding groups.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>data</code></td>\n",
       "<td>\n",
       "<p>an optional matrix or data frame (or similar: see\n",
       "<code>model.frame</code>) containing the variables in the\n",
       "formula <code>formula</code>.  By default the variables are taken from\n",
       "<code>environment(formula)</code>.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>subset</code></td>\n",
       "<td>\n",
       "<p>an optional vector specifying a subset of observations\n",
       "to be used.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>na.action</code></td>\n",
       "<td>\n",
       "<p>a function which indicates what should happen when\n",
       "the data contain <code>NA</code>s.  Defaults to\n",
       "<code>getOption(\"na.action\")</code>.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>...</code></td>\n",
       "<td>\n",
       "<p>further arguments to be passed to or from methods.</p>\n",
       "</td></tr>\n",
       "</table>\n",
       "\n",
       "\n",
       "<h3>Details</h3>\n",
       "\n",
       "<p>The formula interface is only applicable for the 2-sample tests.\n",
       "</p>\n",
       "<p><code>alternative = \"greater\"</code> is the alternative that <code>x</code> has a\n",
       "larger mean than <code>y</code>.\n",
       "</p>\n",
       "<p>If <code>paired</code> is <code>TRUE</code> then both <code>x</code> and <code>y</code> must\n",
       "be specified and they must be the same length.  Missing values are\n",
       "silently removed (in pairs if <code>paired</code> is <code>TRUE</code>).  If\n",
       "<code>var.equal</code> is <code>TRUE</code> then the pooled estimate of the\n",
       "variance is used.  By default, if <code>var.equal</code> is <code>FALSE</code>\n",
       "then the variance is estimated separately for both groups and the\n",
       "Welch modification to the degrees of freedom is used.\n",
       "</p>\n",
       "<p>If the input data are effectively constant (compared to the larger of the\n",
       "two means) an error is generated.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Value</h3>\n",
       "\n",
       "<p>A list with class <code>\"htest\"</code> containing the following components:\n",
       "</p>\n",
       "<table summary=\"R valueblock\">\n",
       "<tr valign=\"top\"><td><code>statistic</code></td>\n",
       "<td>\n",
       "<p>the value of the t-statistic.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>parameter</code></td>\n",
       "<td>\n",
       "<p>the degrees of freedom for the t-statistic.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>p.value</code></td>\n",
       "<td>\n",
       "<p>the p-value for the test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>conf.int</code></td>\n",
       "<td>\n",
       "<p>a confidence interval for the mean appropriate to the\n",
       "specified alternative hypothesis.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>estimate</code></td>\n",
       "<td>\n",
       "<p>the estimated mean or difference in means depending on\n",
       "whether it was a one-sample test or a two-sample test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>null.value</code></td>\n",
       "<td>\n",
       "<p>the specified hypothesized value of the mean or mean\n",
       "difference depending on whether it was a one-sample test or a\n",
       "two-sample test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>alternative</code></td>\n",
       "<td>\n",
       "<p>a character string describing the alternative\n",
       "hypothesis.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>method</code></td>\n",
       "<td>\n",
       "<p>a character string indicating what type of t-test was\n",
       "performed.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>data.name</code></td>\n",
       "<td>\n",
       "<p>a character string giving the name(s) of the data.</p>\n",
       "</td></tr>\n",
       "</table>\n",
       "\n",
       "\n",
       "<h3>See Also</h3>\n",
       "\n",
       "<p><code>prop.test</code>\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Examples</h3>\n",
       "\n",
       "<pre>\n",
       "require(graphics)\n",
       "\n",
       "t.test(1:10, y = c(7:20))      # P = .00001855\n",
       "t.test(1:10, y = c(7:20, 200)) # P = .1245    -- NOT significant anymore\n",
       "\n",
       "## Classical example: Student's sleep data\n",
       "plot(extra ~ group, data = sleep)\n",
       "## Traditional interface\n",
       "with(sleep, t.test(extra[group == 1], extra[group == 2]))\n",
       "## Formula interface\n",
       "t.test(extra ~ group, data = sleep)\n",
       "</pre>\n",
       "\n",
       "<hr /><div style=\"text-align: center;\">[Package <em>stats</em> version 3.2.2 ]</div>"
      ],
      "text/latex": [
       "\\inputencoding{utf8}\n",
       "\\HeaderA{t.test}{Student's t-Test}{t.test}\n",
       "\\methaliasA{t.test.default}{t.test}{t.test.default}\n",
       "\\methaliasA{t.test.formula}{t.test}{t.test.formula}\n",
       "\\keyword{htest}{t.test}\n",
       "%\n",
       "\\begin{Description}\\relax\n",
       "Performs one and two sample t-tests on vectors of data.\n",
       "\\end{Description}\n",
       "%\n",
       "\\begin{Usage}\n",
       "\\begin{verbatim}\n",
       "t.test(x, ...)\n",
       "\n",
       "## Default S3 method:\n",
       "t.test(x, y = NULL,\n",
       "       alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "       mu = 0, paired = FALSE, var.equal = FALSE,\n",
       "       conf.level = 0.95, ...)\n",
       "\n",
       "## S3 method for class 'formula'\n",
       "t.test(formula, data, subset, na.action, ...)\n",
       "\\end{verbatim}\n",
       "\\end{Usage}\n",
       "%\n",
       "\\begin{Arguments}\n",
       "\\begin{ldescription}\n",
       "\\item[\\code{x}] a (non-empty) numeric vector of data values.\n",
       "\\item[\\code{y}] an optional (non-empty) numeric vector of data values.\n",
       "\\item[\\code{alternative}] a character string specifying the alternative\n",
       "hypothesis, must be one of \\code{\"two.sided\"} (default),\n",
       "\\code{\"greater\"} or \\code{\"less\"}.  You can specify just the initial\n",
       "letter.\n",
       "\\item[\\code{mu}] a number indicating the true value of the mean (or\n",
       "difference in means if you are performing a two sample test).\n",
       "\\item[\\code{paired}] a logical indicating whether you want a paired\n",
       "t-test.\n",
       "\\item[\\code{var.equal}] a logical variable indicating whether to treat the\n",
       "two variances as being equal. If \\code{TRUE} then the pooled\n",
       "variance is used to estimate the variance otherwise the Welch\n",
       "(or Satterthwaite) approximation to the degrees of freedom is used.\n",
       "\\item[\\code{conf.level}] confidence level of the interval.\n",
       "\\item[\\code{formula}] a formula of the form \\code{lhs \\textasciitilde{} rhs} where \\code{lhs}\n",
       "is a numeric variable giving the data values and \\code{rhs} a factor\n",
       "with two levels giving the corresponding groups.\n",
       "\\item[\\code{data}] an optional matrix or data frame (or similar: see\n",
       "\\code{\\LinkA{model.frame}{model.frame}}) containing the variables in the\n",
       "formula \\code{formula}.  By default the variables are taken from\n",
       "\\code{environment(formula)}.\n",
       "\\item[\\code{subset}] an optional vector specifying a subset of observations\n",
       "to be used.\n",
       "\\item[\\code{na.action}] a function which indicates what should happen when\n",
       "the data contain \\code{NA}s.  Defaults to\n",
       "\\code{getOption(\"na.action\")}.\n",
       "\\item[\\code{...}] further arguments to be passed to or from methods.\n",
       "\\end{ldescription}\n",
       "\\end{Arguments}\n",
       "%\n",
       "\\begin{Details}\\relax\n",
       "The formula interface is only applicable for the 2-sample tests.\n",
       "\n",
       "\\code{alternative = \"greater\"} is the alternative that \\code{x} has a\n",
       "larger mean than \\code{y}.\n",
       "\n",
       "If \\code{paired} is \\code{TRUE} then both \\code{x} and \\code{y} must\n",
       "be specified and they must be the same length.  Missing values are\n",
       "silently removed (in pairs if \\code{paired} is \\code{TRUE}).  If\n",
       "\\code{var.equal} is \\code{TRUE} then the pooled estimate of the\n",
       "variance is used.  By default, if \\code{var.equal} is \\code{FALSE}\n",
       "then the variance is estimated separately for both groups and the\n",
       "Welch modification to the degrees of freedom is used.\n",
       "\n",
       "If the input data are effectively constant (compared to the larger of the\n",
       "two means) an error is generated.\n",
       "\\end{Details}\n",
       "%\n",
       "\\begin{Value}\n",
       "A list with class \\code{\"htest\"} containing the following components:\n",
       "\\begin{ldescription}\n",
       "\\item[\\code{statistic}] the value of the t-statistic.\n",
       "\\item[\\code{parameter}] the degrees of freedom for the t-statistic.\n",
       "\\item[\\code{p.value}] the p-value for the test.\n",
       "\\item[\\code{conf.int}] a confidence interval for the mean appropriate to the\n",
       "specified alternative hypothesis.\n",
       "\\item[\\code{estimate}] the estimated mean or difference in means depending on\n",
       "whether it was a one-sample test or a two-sample test.\n",
       "\\item[\\code{null.value}] the specified hypothesized value of the mean or mean\n",
       "difference depending on whether it was a one-sample test or a\n",
       "two-sample test.\n",
       "\\item[\\code{alternative}] a character string describing the alternative\n",
       "hypothesis.\n",
       "\\item[\\code{method}] a character string indicating what type of t-test was\n",
       "performed.\n",
       "\\item[\\code{data.name}] a character string giving the name(s) of the data.\n",
       "\\end{ldescription}\n",
       "\\end{Value}\n",
       "%\n",
       "\\begin{SeeAlso}\\relax\n",
       "\\code{\\LinkA{prop.test}{prop.test}}\n",
       "\\end{SeeAlso}\n",
       "%\n",
       "\\begin{Examples}\n",
       "\\begin{ExampleCode}\n",
       "require(graphics)\n",
       "\n",
       "t.test(1:10, y = c(7:20))      # P = .00001855\n",
       "t.test(1:10, y = c(7:20, 200)) # P = .1245    -- NOT significant anymore\n",
       "\n",
       "## Classical example: Student's sleep data\n",
       "plot(extra ~ group, data = sleep)\n",
       "## Traditional interface\n",
       "with(sleep, t.test(extra[group == 1], extra[group == 2]))\n",
       "## Formula interface\n",
       "t.test(extra ~ group, data = sleep)\n",
       "\\end{ExampleCode}\n",
       "\\end{Examples}"
      ],
      "text/plain": [
       "t.test                  package:stats                  R Documentation\n",
       "\n",
       "_\bS_\bt_\bu_\bd_\be_\bn_\bt'_\bs _\bt-_\bT_\be_\bs_\bt\n",
       "\n",
       "_\bD_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn:\n",
       "\n",
       "     Performs one and two sample t-tests on vectors of data.\n",
       "\n",
       "_\bU_\bs_\ba_\bg_\be:\n",
       "\n",
       "     t.test(x, ...)\n",
       "     \n",
       "     ## Default S3 method:\n",
       "     t.test(x, y = NULL,\n",
       "            alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "            mu = 0, paired = FALSE, var.equal = FALSE,\n",
       "            conf.level = 0.95, ...)\n",
       "     \n",
       "     ## S3 method for class 'formula'\n",
       "     t.test(formula, data, subset, na.action, ...)\n",
       "     \n",
       "_\bA_\br_\bg_\bu_\bm_\be_\bn_\bt_\bs:\n",
       "\n",
       "       x: a (non-empty) numeric vector of data values.\n",
       "\n",
       "       y: an optional (non-empty) numeric vector of data values.\n",
       "\n",
       "alternative: a character string specifying the alternative hypothesis,\n",
       "          must be one of ‘\"two.sided\"’ (default), ‘\"greater\"’ or\n",
       "          ‘\"less\"’.  You can specify just the initial letter.\n",
       "\n",
       "      mu: a number indicating the true value of the mean (or difference\n",
       "          in means if you are performing a two sample test).\n",
       "\n",
       "  paired: a logical indicating whether you want a paired t-test.\n",
       "\n",
       "var.equal: a logical variable indicating whether to treat the two\n",
       "          variances as being equal. If ‘TRUE’ then the pooled variance\n",
       "          is used to estimate the variance otherwise the Welch (or\n",
       "          Satterthwaite) approximation to the degrees of freedom is\n",
       "          used.\n",
       "\n",
       "conf.level: confidence level of the interval.\n",
       "\n",
       " formula: a formula of the form ‘lhs ~ rhs’ where ‘lhs’ is a numeric\n",
       "          variable giving the data values and ‘rhs’ a factor with two\n",
       "          levels giving the corresponding groups.\n",
       "\n",
       "    data: an optional matrix or data frame (or similar: see\n",
       "          ‘model.frame’) containing the variables in the formula\n",
       "          ‘formula’.  By default the variables are taken from\n",
       "          ‘environment(formula)’.\n",
       "\n",
       "  subset: an optional vector specifying a subset of observations to be\n",
       "          used.\n",
       "\n",
       "na.action: a function which indicates what should happen when the data\n",
       "          contain ‘NA’s.  Defaults to ‘getOption(\"na.action\")’.\n",
       "\n",
       "     ...: further arguments to be passed to or from methods.\n",
       "\n",
       "_\bD_\be_\bt_\ba_\bi_\bl_\bs:\n",
       "\n",
       "     The formula interface is only applicable for the 2-sample tests.\n",
       "\n",
       "     ‘alternative = \"greater\"’ is the alternative that ‘x’ has a larger\n",
       "     mean than ‘y’.\n",
       "\n",
       "     If ‘paired’ is ‘TRUE’ then both ‘x’ and ‘y’ must be specified and\n",
       "     they must be the same length.  Missing values are silently removed\n",
       "     (in pairs if ‘paired’ is ‘TRUE’).  If ‘var.equal’ is ‘TRUE’ then\n",
       "     the pooled estimate of the variance is used.  By default, if\n",
       "     ‘var.equal’ is ‘FALSE’ then the variance is estimated separately\n",
       "     for both groups and the Welch modification to the degrees of\n",
       "     freedom is used.\n",
       "\n",
       "     If the input data are effectively constant (compared to the larger\n",
       "     of the two means) an error is generated.\n",
       "\n",
       "_\bV_\ba_\bl_\bu_\be:\n",
       "\n",
       "     A list with class ‘\"htest\"’ containing the following components:\n",
       "\n",
       "statistic: the value of the t-statistic.\n",
       "\n",
       "parameter: the degrees of freedom for the t-statistic.\n",
       "\n",
       " p.value: the p-value for the test.\n",
       "\n",
       "conf.int: a confidence interval for the mean appropriate to the\n",
       "          specified alternative hypothesis.\n",
       "\n",
       "estimate: the estimated mean or difference in means depending on\n",
       "          whether it was a one-sample test or a two-sample test.\n",
       "\n",
       "null.value: the specified hypothesized value of the mean or mean\n",
       "          difference depending on whether it was a one-sample test or a\n",
       "          two-sample test.\n",
       "\n",
       "alternative: a character string describing the alternative hypothesis.\n",
       "\n",
       "  method: a character string indicating what type of t-test was\n",
       "          performed.\n",
       "\n",
       "data.name: a character string giving the name(s) of the data.\n",
       "\n",
       "_\bS_\be_\be _\bA_\bl_\bs_\bo:\n",
       "\n",
       "     ‘prop.test’\n",
       "\n",
       "_\bE_\bx_\ba_\bm_\bp_\bl_\be_\bs:\n",
       "\n",
       "     require(graphics)\n",
       "     \n",
       "     t.test(1:10, y = c(7:20))      # P = .00001855\n",
       "     t.test(1:10, y = c(7:20, 200)) # P = .1245    -- NOT significant anymore\n",
       "     \n",
       "     ## Classical example: Student's sleep data\n",
       "     plot(extra ~ group, data = sleep)\n",
       "     ## Traditional interface\n",
       "     with(sleep, t.test(extra[group == 1], extra[group == 2]))\n",
       "     ## Formula interface\n",
       "     t.test(extra ~ group, data = sleep)\n",
       "     "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "?t.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calls up the documentation page for the non-parametric tests, including the Wilcoxon rank sum (a.k.a. Mann-Whitney, unpaired samples including unequal numbers) and Wilcoxon signed rank (paired samples, must be equal numbers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<table width=\"100%\" summary=\"page for wilcox.test {stats}\"><tr><td>wilcox.test {stats}</td><td style=\"text-align: right;\">R Documentation</td></tr></table>\n",
       "\n",
       "<h2>Wilcoxon Rank Sum and Signed Rank Tests</h2>\n",
       "\n",
       "<h3>Description</h3>\n",
       "\n",
       "<p>Performs one- and two-sample Wilcoxon tests on vectors of data; the\n",
       "latter is also known as &lsquo;Mann-Whitney&rsquo; test.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Usage</h3>\n",
       "\n",
       "<pre>\n",
       "wilcox.test(x, ...)\n",
       "\n",
       "## Default S3 method:\n",
       "wilcox.test(x, y = NULL,\n",
       "            alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "            mu = 0, paired = FALSE, exact = NULL, correct = TRUE,\n",
       "            conf.int = FALSE, conf.level = 0.95, ...)\n",
       "\n",
       "## S3 method for class 'formula'\n",
       "wilcox.test(formula, data, subset, na.action, ...)\n",
       "</pre>\n",
       "\n",
       "\n",
       "<h3>Arguments</h3>\n",
       "\n",
       "<table summary=\"R argblock\">\n",
       "<tr valign=\"top\"><td><code>x</code></td>\n",
       "<td>\n",
       "<p>numeric vector of data values.  Non-finite (e.g., infinite or\n",
       "missing) values will be omitted.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>y</code></td>\n",
       "<td>\n",
       "<p>an optional numeric vector of data values: as with <code>x</code>\n",
       "non-finite values will be omitted.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>alternative</code></td>\n",
       "<td>\n",
       "<p>a character string specifying the alternative\n",
       "hypothesis, must be one of <code>\"two.sided\"</code> (default),\n",
       "<code>\"greater\"</code> or <code>\"less\"</code>.  You can specify just the initial\n",
       "letter.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>mu</code></td>\n",
       "<td>\n",
       "<p>a number specifying an optional parameter used to form the\n",
       "null hypothesis.  See &lsquo;Details&rsquo;.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>paired</code></td>\n",
       "<td>\n",
       "<p>a logical indicating whether you want a paired test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>exact</code></td>\n",
       "<td>\n",
       "<p>a logical indicating whether an exact p-value\n",
       "should be computed.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>correct</code></td>\n",
       "<td>\n",
       "<p>a logical indicating whether to apply continuity\n",
       "correction in the normal approximation for the p-value.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>conf.int</code></td>\n",
       "<td>\n",
       "<p>a logical indicating whether a confidence interval\n",
       "should be computed.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>conf.level</code></td>\n",
       "<td>\n",
       "<p>confidence level of the interval.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>formula</code></td>\n",
       "<td>\n",
       "<p>a formula of the form <code>lhs ~ rhs</code> where <code>lhs</code>\n",
       "is a numeric variable giving the data values and <code>rhs</code> a factor\n",
       "with two levels giving the corresponding groups.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>data</code></td>\n",
       "<td>\n",
       "<p>an optional matrix or data frame (or similar: see\n",
       "<code>model.frame</code>) containing the variables in the\n",
       "formula <code>formula</code>.  By default the variables are taken from\n",
       "<code>environment(formula)</code>.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>subset</code></td>\n",
       "<td>\n",
       "<p>an optional vector specifying a subset of observations\n",
       "to be used.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>na.action</code></td>\n",
       "<td>\n",
       "<p>a function which indicates what should happen when\n",
       "the data contain <code>NA</code>s.  Defaults to\n",
       "<code>getOption(\"na.action\")</code>.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>...</code></td>\n",
       "<td>\n",
       "<p>further arguments to be passed to or from methods.</p>\n",
       "</td></tr>\n",
       "</table>\n",
       "\n",
       "\n",
       "<h3>Details</h3>\n",
       "\n",
       "<p>The formula interface is only applicable for the 2-sample tests.\n",
       "</p>\n",
       "<p>If only <code>x</code> is given, or if both <code>x</code> and <code>y</code> are given\n",
       "and <code>paired</code> is <code>TRUE</code>, a Wilcoxon signed rank test of the\n",
       "null that the distribution of <code>x</code> (in the one sample case) or of\n",
       "<code>x - y</code> (in the paired two sample case) is symmetric about\n",
       "<code>mu</code> is performed.\n",
       "</p>\n",
       "<p>Otherwise, if both <code>x</code> and <code>y</code> are given and <code>paired</code>\n",
       "is <code>FALSE</code>, a Wilcoxon rank sum test (equivalent to the\n",
       "Mann-Whitney test: see the Note) is carried out.  In this case, the\n",
       "null hypothesis is that the distributions of <code>x</code> and <code>y</code>\n",
       "differ by a location shift of <code>mu</code> and the alternative is that\n",
       "they differ by some other location shift (and the one-sided\n",
       "alternative <code>\"greater\"</code> is that <code>x</code> is shifted to the right\n",
       "of <code>y</code>).\n",
       "</p>\n",
       "<p>By default (if <code>exact</code> is not specified), an exact p-value\n",
       "is computed if the samples contain less than 50 finite values and\n",
       "there are no ties.  Otherwise, a normal approximation is used.\n",
       "</p>\n",
       "<p>Optionally (if argument <code>conf.int</code> is true), a nonparametric\n",
       "confidence interval and an estimator for the pseudomedian (one-sample\n",
       "case) or for the difference of the location parameters <code>x-y</code> is\n",
       "computed.  (The pseudomedian of a distribution <i>F</i> is the median\n",
       "of the distribution of <i>(u+v)/2</i>, where <i>u</i> and <i>v</i> are\n",
       "independent, each with distribution <i>F</i>.  If <i>F</i> is symmetric,\n",
       "then the pseudomedian and median coincide.  See Hollander &amp; Wolfe\n",
       "(1973), page 34.)  Note that in the two-sample case the estimator for\n",
       "the difference in location parameters does <b>not</b> estimate the\n",
       "difference in medians (a common misconception) but rather the median\n",
       "of the difference between a sample from <code>x</code> and a sample from\n",
       "<code>y</code>.\n",
       "</p>\n",
       "<p>If exact p-values are available, an exact confidence interval is\n",
       "obtained by the algorithm described in Bauer (1972), and the\n",
       "Hodges-Lehmann estimator is employed.  Otherwise, the returned\n",
       "confidence interval and point estimate are based on normal\n",
       "approximations.  These are continuity-corrected for the interval but\n",
       "<em>not</em> the estimate (as the correction depends on the\n",
       "<code>alternative</code>).\n",
       "</p>\n",
       "<p>With small samples it may not be possible to achieve very high\n",
       "confidence interval coverages. If this happens a warning will be given\n",
       "and an interval with lower coverage will be substituted.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Value</h3>\n",
       "\n",
       "<p>A list with class <code>\"htest\"</code> containing the following components:\n",
       "</p>\n",
       "<table summary=\"R valueblock\">\n",
       "<tr valign=\"top\"><td><code>statistic</code></td>\n",
       "<td>\n",
       "<p>the value of the test statistic with a name\n",
       "describing it.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>parameter</code></td>\n",
       "<td>\n",
       "<p>the parameter(s) for the exact distribution of the\n",
       "test statistic.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>p.value</code></td>\n",
       "<td>\n",
       "<p>the p-value for the test.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>null.value</code></td>\n",
       "<td>\n",
       "<p>the location parameter <code>mu</code>.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>alternative</code></td>\n",
       "<td>\n",
       "<p>a character string describing the alternative\n",
       "hypothesis.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>method</code></td>\n",
       "<td>\n",
       "<p>the type of test applied.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>data.name</code></td>\n",
       "<td>\n",
       "<p>a character string giving the names of the data.</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>conf.int</code></td>\n",
       "<td>\n",
       "<p>a confidence interval for the location parameter.\n",
       "(Only present if argument <code>conf.int = TRUE</code>.)</p>\n",
       "</td></tr>\n",
       "<tr valign=\"top\"><td><code>estimate</code></td>\n",
       "<td>\n",
       "<p>an estimate of the location parameter.\n",
       "(Only present if argument <code>conf.int = TRUE</code>.)</p>\n",
       "</td></tr>\n",
       "</table>\n",
       "\n",
       "\n",
       "<h3>Warning</h3>\n",
       "\n",
       "<p>This function can use large amounts of memory and stack (and even\n",
       "crash <span style=\"font-family: Courier New, Courier; color: #666666;\"><b>R</b></span> if the stack limit is exceeded) if <code>exact = TRUE</code> and\n",
       "one sample is large (several thousands or more).\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Note</h3>\n",
       "\n",
       "<p>The literature is not unanimous about the definitions of the Wilcoxon\n",
       "rank sum and Mann-Whitney tests.  The two most common definitions\n",
       "correspond to the sum of the ranks of the first sample with the\n",
       "minimum value subtracted or not: <span style=\"font-family: Courier New, Courier; color: #666666;\"><b>R</b></span> subtracts and S-PLUS does not,\n",
       "giving a value which is larger by <i>m(m+1)/2</i> for a first sample\n",
       "of size <i>m</i>.  (It seems Wilcoxon's original paper used the\n",
       "unadjusted sum of the ranks but subsequent tables subtracted the\n",
       "minimum.)\n",
       "</p>\n",
       "<p><span style=\"font-family: Courier New, Courier; color: #666666;\"><b>R</b></span>'s value can also be computed as the number of all pairs\n",
       "<code>(x[i], y[j])</code> for which <code>y[j]</code> is not greater than\n",
       "<code>x[i]</code>, the most common definition of the Mann-Whitney test.\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>References</h3>\n",
       "\n",
       "<p>David F. Bauer (1972),\n",
       "Constructing confidence sets using rank statistics.\n",
       "<em>Journal of the American Statistical Association</em>\n",
       "<b>67</b>, 687&ndash;690.\n",
       "</p>\n",
       "<p>Myles Hollander and Douglas A. Wolfe (1973),\n",
       "<em>Nonparametric Statistical Methods.</em>\n",
       "New York: John Wiley &amp; Sons.\n",
       "Pages 27&ndash;33 (one-sample), 68&ndash;75 (two-sample).<br />\n",
       "Or second edition (1999).\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>See Also</h3>\n",
       "\n",
       "<p><code>psignrank</code>, <code>pwilcox</code>.\n",
       "</p>\n",
       "<p><code>wilcox_test</code> in package\n",
       "<a href=\"http://CRAN.R-project.org/package=coin\"><span class=\"pkg\">coin</span></a> for exact, asymptotic and Monte Carlo\n",
       "<em>conditional</em> p-values, including in the presence of ties.\n",
       "</p>\n",
       "<p><code>kruskal.test</code> for testing homogeneity in location\n",
       "parameters in the case of two or more samples;\n",
       "<code>t.test</code> for an alternative under normality\n",
       "assumptions [or large samples]\n",
       "</p>\n",
       "\n",
       "\n",
       "<h3>Examples</h3>\n",
       "\n",
       "<pre>\n",
       "require(graphics)\n",
       "## One-sample test.\n",
       "## Hollander &amp; Wolfe (1973), 29f.\n",
       "## Hamilton depression scale factor measurements in 9 patients with\n",
       "##  mixed anxiety and depression, taken at the first (x) and second\n",
       "##  (y) visit after initiation of a therapy (administration of a\n",
       "##  tranquilizer).\n",
       "x &lt;- c(1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30)\n",
       "y &lt;- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)\n",
       "wilcox.test(x, y, paired = TRUE, alternative = \"greater\")\n",
       "wilcox.test(y - x, alternative = \"less\")    # The same.\n",
       "wilcox.test(y - x, alternative = \"less\",\n",
       "            exact = FALSE, correct = FALSE) # H&amp;W large sample\n",
       "                                            # approximation\n",
       "\n",
       "## Two-sample test.\n",
       "## Hollander &amp; Wolfe (1973), 69f.\n",
       "## Permeability constants of the human chorioamnion (a placental\n",
       "##  membrane) at term (x) and between 12 to 26 weeks gestational\n",
       "##  age (y).  The alternative of interest is greater permeability\n",
       "##  of the human chorioamnion for the term pregnancy.\n",
       "x &lt;- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)\n",
       "y &lt;- c(1.15, 0.88, 0.90, 0.74, 1.21)\n",
       "wilcox.test(x, y, alternative = \"g\")        # greater\n",
       "wilcox.test(x, y, alternative = \"greater\",\n",
       "            exact = FALSE, correct = FALSE) # H&amp;W large sample\n",
       "                                            # approximation\n",
       "\n",
       "wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)\n",
       "\n",
       "## Formula interface.\n",
       "boxplot(Ozone ~ Month, data = airquality)\n",
       "wilcox.test(Ozone ~ Month, data = airquality,\n",
       "            subset = Month %in% c(5, 8))\n",
       "</pre>\n",
       "\n",
       "<hr /><div style=\"text-align: center;\">[Package <em>stats</em> version 3.2.2 ]</div>"
      ],
      "text/latex": [
       "\\inputencoding{utf8}\n",
       "\\HeaderA{wilcox.test}{Wilcoxon Rank Sum and Signed Rank Tests}{wilcox.test}\n",
       "\\methaliasA{wilcox.test.default}{wilcox.test}{wilcox.test.default}\n",
       "\\methaliasA{wilcox.test.formula}{wilcox.test}{wilcox.test.formula}\n",
       "\\keyword{htest}{wilcox.test}\n",
       "%\n",
       "\\begin{Description}\\relax\n",
       "Performs one- and two-sample Wilcoxon tests on vectors of data; the\n",
       "latter is also known as `Mann-Whitney' test.\n",
       "\\end{Description}\n",
       "%\n",
       "\\begin{Usage}\n",
       "\\begin{verbatim}\n",
       "wilcox.test(x, ...)\n",
       "\n",
       "## Default S3 method:\n",
       "wilcox.test(x, y = NULL,\n",
       "            alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "            mu = 0, paired = FALSE, exact = NULL, correct = TRUE,\n",
       "            conf.int = FALSE, conf.level = 0.95, ...)\n",
       "\n",
       "## S3 method for class 'formula'\n",
       "wilcox.test(formula, data, subset, na.action, ...)\n",
       "\\end{verbatim}\n",
       "\\end{Usage}\n",
       "%\n",
       "\\begin{Arguments}\n",
       "\\begin{ldescription}\n",
       "\\item[\\code{x}] numeric vector of data values.  Non-finite (e.g., infinite or\n",
       "missing) values will be omitted.\n",
       "\\item[\\code{y}] an optional numeric vector of data values: as with \\code{x}\n",
       "non-finite values will be omitted.\n",
       "\\item[\\code{alternative}] a character string specifying the alternative\n",
       "hypothesis, must be one of \\code{\"two.sided\"} (default),\n",
       "\\code{\"greater\"} or \\code{\"less\"}.  You can specify just the initial\n",
       "letter.\n",
       "\\item[\\code{mu}] a number specifying an optional parameter used to form the\n",
       "null hypothesis.  See `Details'.\n",
       "\\item[\\code{paired}] a logical indicating whether you want a paired test.\n",
       "\\item[\\code{exact}] a logical indicating whether an exact p-value\n",
       "should be computed.\n",
       "\\item[\\code{correct}] a logical indicating whether to apply continuity\n",
       "correction in the normal approximation for the p-value.\n",
       "\\item[\\code{conf.int}] a logical indicating whether a confidence interval\n",
       "should be computed.\n",
       "\\item[\\code{conf.level}] confidence level of the interval.\n",
       "\\item[\\code{formula}] a formula of the form \\code{lhs \\textasciitilde{} rhs} where \\code{lhs}\n",
       "is a numeric variable giving the data values and \\code{rhs} a factor\n",
       "with two levels giving the corresponding groups.\n",
       "\\item[\\code{data}] an optional matrix or data frame (or similar: see\n",
       "\\code{\\LinkA{model.frame}{model.frame}}) containing the variables in the\n",
       "formula \\code{formula}.  By default the variables are taken from\n",
       "\\code{environment(formula)}.\n",
       "\\item[\\code{subset}] an optional vector specifying a subset of observations\n",
       "to be used.\n",
       "\\item[\\code{na.action}] a function which indicates what should happen when\n",
       "the data contain \\code{NA}s.  Defaults to\n",
       "\\code{getOption(\"na.action\")}.\n",
       "\\item[\\code{...}] further arguments to be passed to or from methods.\n",
       "\\end{ldescription}\n",
       "\\end{Arguments}\n",
       "%\n",
       "\\begin{Details}\\relax\n",
       "The formula interface is only applicable for the 2-sample tests.\n",
       "\n",
       "If only \\code{x} is given, or if both \\code{x} and \\code{y} are given\n",
       "and \\code{paired} is \\code{TRUE}, a Wilcoxon signed rank test of the\n",
       "null that the distribution of \\code{x} (in the one sample case) or of\n",
       "\\code{x - y} (in the paired two sample case) is symmetric about\n",
       "\\code{mu} is performed.\n",
       "\n",
       "Otherwise, if both \\code{x} and \\code{y} are given and \\code{paired}\n",
       "is \\code{FALSE}, a Wilcoxon rank sum test (equivalent to the\n",
       "Mann-Whitney test: see the Note) is carried out.  In this case, the\n",
       "null hypothesis is that the distributions of \\code{x} and \\code{y}\n",
       "differ by a location shift of \\code{mu} and the alternative is that\n",
       "they differ by some other location shift (and the one-sided\n",
       "alternative \\code{\"greater\"} is that \\code{x} is shifted to the right\n",
       "of \\code{y}).\n",
       "\n",
       "By default (if \\code{exact} is not specified), an exact p-value\n",
       "is computed if the samples contain less than 50 finite values and\n",
       "there are no ties.  Otherwise, a normal approximation is used.\n",
       "\n",
       "Optionally (if argument \\code{conf.int} is true), a nonparametric\n",
       "confidence interval and an estimator for the pseudomedian (one-sample\n",
       "case) or for the difference of the location parameters \\code{x-y} is\n",
       "computed.  (The pseudomedian of a distribution \\eqn{F}{} is the median\n",
       "of the distribution of \\eqn{(u+v)/2}{}, where \\eqn{u}{} and \\eqn{v}{} are\n",
       "independent, each with distribution \\eqn{F}{}.  If \\eqn{F}{} is symmetric,\n",
       "then the pseudomedian and median coincide.  See Hollander \\& Wolfe\n",
       "(1973), page 34.)  Note that in the two-sample case the estimator for\n",
       "the difference in location parameters does \\bold{not} estimate the\n",
       "difference in medians (a common misconception) but rather the median\n",
       "of the difference between a sample from \\code{x} and a sample from\n",
       "\\code{y}.\n",
       "\n",
       "If exact p-values are available, an exact confidence interval is\n",
       "obtained by the algorithm described in Bauer (1972), and the\n",
       "Hodges-Lehmann estimator is employed.  Otherwise, the returned\n",
       "confidence interval and point estimate are based on normal\n",
       "approximations.  These are continuity-corrected for the interval but\n",
       "\\emph{not} the estimate (as the correction depends on the\n",
       "\\code{alternative}).\n",
       "\n",
       "With small samples it may not be possible to achieve very high\n",
       "confidence interval coverages. If this happens a warning will be given\n",
       "and an interval with lower coverage will be substituted.\n",
       "\\end{Details}\n",
       "%\n",
       "\\begin{Value}\n",
       "A list with class \\code{\"htest\"} containing the following components:\n",
       "\\begin{ldescription}\n",
       "\\item[\\code{statistic}] the value of the test statistic with a name\n",
       "describing it.\n",
       "\\item[\\code{parameter}] the parameter(s) for the exact distribution of the\n",
       "test statistic.\n",
       "\\item[\\code{p.value}] the p-value for the test.\n",
       "\\item[\\code{null.value}] the location parameter \\code{mu}.\n",
       "\\item[\\code{alternative}] a character string describing the alternative\n",
       "hypothesis.\n",
       "\\item[\\code{method}] the type of test applied.\n",
       "\\item[\\code{data.name}] a character string giving the names of the data.\n",
       "\\item[\\code{conf.int}] a confidence interval for the location parameter.\n",
       "(Only present if argument \\code{conf.int = TRUE}.)\n",
       "\\item[\\code{estimate}] an estimate of the location parameter.\n",
       "(Only present if argument \\code{conf.int = TRUE}.)\n",
       "\\end{ldescription}\n",
       "\\end{Value}\n",
       "%\n",
       "\\begin{Section}{Warning}\n",
       "This function can use large amounts of memory and stack (and even\n",
       "crash \\R{} if the stack limit is exceeded) if \\code{exact = TRUE} and\n",
       "one sample is large (several thousands or more).\n",
       "\\end{Section}\n",
       "%\n",
       "\\begin{Note}\\relax\n",
       "The literature is not unanimous about the definitions of the Wilcoxon\n",
       "rank sum and Mann-Whitney tests.  The two most common definitions\n",
       "correspond to the sum of the ranks of the first sample with the\n",
       "minimum value subtracted or not: \\R{} subtracts and S-PLUS does not,\n",
       "giving a value which is larger by \\eqn{m(m+1)/2}{} for a first sample\n",
       "of size \\eqn{m}{}.  (It seems Wilcoxon's original paper used the\n",
       "unadjusted sum of the ranks but subsequent tables subtracted the\n",
       "minimum.)\n",
       "\n",
       "\\R{}'s value can also be computed as the number of all pairs\n",
       "\\code{(x[i], y[j])} for which \\code{y[j]} is not greater than\n",
       "\\code{x[i]}, the most common definition of the Mann-Whitney test.\n",
       "\\end{Note}\n",
       "%\n",
       "\\begin{References}\\relax\n",
       "David F. Bauer (1972),\n",
       "Constructing confidence sets using rank statistics.\n",
       "\\emph{Journal of the American Statistical Association}\n",
       "\\bold{67}, 687--690.\n",
       "\n",
       "Myles Hollander and Douglas A. Wolfe (1973),\n",
       "\\emph{Nonparametric Statistical Methods.}\n",
       "New York: John Wiley \\& Sons.\n",
       "Pages 27--33 (one-sample), 68--75 (two-sample).\\\\{}\n",
       "Or second edition (1999).\n",
       "\\end{References}\n",
       "%\n",
       "\\begin{SeeAlso}\\relax\n",
       "\\code{\\LinkA{psignrank}{psignrank}}, \\code{\\LinkA{pwilcox}{pwilcox}}.\n",
       "\n",
       "\\code{\\LinkA{wilcox\\_test}{wilcox.Rul.test}} in package\n",
       "\\Rhref{http://CRAN.R-project.org/package=coin}{\\pkg{coin}} for exact, asymptotic and Monte Carlo\n",
       "\\emph{conditional} p-values, including in the presence of ties.\n",
       "\n",
       "\\code{\\LinkA{kruskal.test}{kruskal.test}} for testing homogeneity in location\n",
       "parameters in the case of two or more samples;\n",
       "\\code{\\LinkA{t.test}{t.test}} for an alternative under normality\n",
       "assumptions [or large samples]\n",
       "\\end{SeeAlso}\n",
       "%\n",
       "\\begin{Examples}\n",
       "\\begin{ExampleCode}\n",
       "require(graphics)\n",
       "## One-sample test.\n",
       "## Hollander & Wolfe (1973), 29f.\n",
       "## Hamilton depression scale factor measurements in 9 patients with\n",
       "##  mixed anxiety and depression, taken at the first (x) and second\n",
       "##  (y) visit after initiation of a therapy (administration of a\n",
       "##  tranquilizer).\n",
       "x <- c(1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30)\n",
       "y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)\n",
       "wilcox.test(x, y, paired = TRUE, alternative = \"greater\")\n",
       "wilcox.test(y - x, alternative = \"less\")    # The same.\n",
       "wilcox.test(y - x, alternative = \"less\",\n",
       "            exact = FALSE, correct = FALSE) # H&W large sample\n",
       "                                            # approximation\n",
       "\n",
       "## Two-sample test.\n",
       "## Hollander & Wolfe (1973), 69f.\n",
       "## Permeability constants of the human chorioamnion (a placental\n",
       "##  membrane) at term (x) and between 12 to 26 weeks gestational\n",
       "##  age (y).  The alternative of interest is greater permeability\n",
       "##  of the human chorioamnion for the term pregnancy.\n",
       "x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)\n",
       "y <- c(1.15, 0.88, 0.90, 0.74, 1.21)\n",
       "wilcox.test(x, y, alternative = \"g\")        # greater\n",
       "wilcox.test(x, y, alternative = \"greater\",\n",
       "            exact = FALSE, correct = FALSE) # H&W large sample\n",
       "                                            # approximation\n",
       "\n",
       "wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)\n",
       "\n",
       "## Formula interface.\n",
       "boxplot(Ozone ~ Month, data = airquality)\n",
       "wilcox.test(Ozone ~ Month, data = airquality,\n",
       "            subset = Month %in% c(5, 8))\n",
       "\\end{ExampleCode}\n",
       "\\end{Examples}"
      ],
      "text/plain": [
       "wilcox.test               package:stats                R Documentation\n",
       "\n",
       "_\bW_\bi_\bl_\bc_\bo_\bx_\bo_\bn _\bR_\ba_\bn_\bk _\bS_\bu_\bm _\ba_\bn_\bd _\bS_\bi_\bg_\bn_\be_\bd _\bR_\ba_\bn_\bk _\bT_\be_\bs_\bt_\bs\n",
       "\n",
       "_\bD_\be_\bs_\bc_\br_\bi_\bp_\bt_\bi_\bo_\bn:\n",
       "\n",
       "     Performs one- and two-sample Wilcoxon tests on vectors of data;\n",
       "     the latter is also known as ‘Mann-Whitney’ test.\n",
       "\n",
       "_\bU_\bs_\ba_\bg_\be:\n",
       "\n",
       "     wilcox.test(x, ...)\n",
       "     \n",
       "     ## Default S3 method:\n",
       "     wilcox.test(x, y = NULL,\n",
       "                 alternative = c(\"two.sided\", \"less\", \"greater\"),\n",
       "                 mu = 0, paired = FALSE, exact = NULL, correct = TRUE,\n",
       "                 conf.int = FALSE, conf.level = 0.95, ...)\n",
       "     \n",
       "     ## S3 method for class 'formula'\n",
       "     wilcox.test(formula, data, subset, na.action, ...)\n",
       "     \n",
       "_\bA_\br_\bg_\bu_\bm_\be_\bn_\bt_\bs:\n",
       "\n",
       "       x: numeric vector of data values.  Non-finite (e.g., infinite or\n",
       "          missing) values will be omitted.\n",
       "\n",
       "       y: an optional numeric vector of data values: as with ‘x’\n",
       "          non-finite values will be omitted.\n",
       "\n",
       "alternative: a character string specifying the alternative hypothesis,\n",
       "          must be one of ‘\"two.sided\"’ (default), ‘\"greater\"’ or\n",
       "          ‘\"less\"’.  You can specify just the initial letter.\n",
       "\n",
       "      mu: a number specifying an optional parameter used to form the\n",
       "          null hypothesis.  See ‘Details’.\n",
       "\n",
       "  paired: a logical indicating whether you want a paired test.\n",
       "\n",
       "   exact: a logical indicating whether an exact p-value should be\n",
       "          computed.\n",
       "\n",
       " correct: a logical indicating whether to apply continuity correction\n",
       "          in the normal approximation for the p-value.\n",
       "\n",
       "conf.int: a logical indicating whether a confidence interval should be\n",
       "          computed.\n",
       "\n",
       "conf.level: confidence level of the interval.\n",
       "\n",
       " formula: a formula of the form ‘lhs ~ rhs’ where ‘lhs’ is a numeric\n",
       "          variable giving the data values and ‘rhs’ a factor with two\n",
       "          levels giving the corresponding groups.\n",
       "\n",
       "    data: an optional matrix or data frame (or similar: see\n",
       "          ‘model.frame’) containing the variables in the formula\n",
       "          ‘formula’.  By default the variables are taken from\n",
       "          ‘environment(formula)’.\n",
       "\n",
       "  subset: an optional vector specifying a subset of observations to be\n",
       "          used.\n",
       "\n",
       "na.action: a function which indicates what should happen when the data\n",
       "          contain ‘NA’s.  Defaults to ‘getOption(\"na.action\")’.\n",
       "\n",
       "     ...: further arguments to be passed to or from methods.\n",
       "\n",
       "_\bD_\be_\bt_\ba_\bi_\bl_\bs:\n",
       "\n",
       "     The formula interface is only applicable for the 2-sample tests.\n",
       "\n",
       "     If only ‘x’ is given, or if both ‘x’ and ‘y’ are given and\n",
       "     ‘paired’ is ‘TRUE’, a Wilcoxon signed rank test of the null that\n",
       "     the distribution of ‘x’ (in the one sample case) or of ‘x - y’ (in\n",
       "     the paired two sample case) is symmetric about ‘mu’ is performed.\n",
       "\n",
       "     Otherwise, if both ‘x’ and ‘y’ are given and ‘paired’ is ‘FALSE’,\n",
       "     a Wilcoxon rank sum test (equivalent to the Mann-Whitney test: see\n",
       "     the Note) is carried out.  In this case, the null hypothesis is\n",
       "     that the distributions of ‘x’ and ‘y’ differ by a location shift\n",
       "     of ‘mu’ and the alternative is that they differ by some other\n",
       "     location shift (and the one-sided alternative ‘\"greater\"’ is that\n",
       "     ‘x’ is shifted to the right of ‘y’).\n",
       "\n",
       "     By default (if ‘exact’ is not specified), an exact p-value is\n",
       "     computed if the samples contain less than 50 finite values and\n",
       "     there are no ties.  Otherwise, a normal approximation is used.\n",
       "\n",
       "     Optionally (if argument ‘conf.int’ is true), a nonparametric\n",
       "     confidence interval and an estimator for the pseudomedian\n",
       "     (one-sample case) or for the difference of the location parameters\n",
       "     ‘x-y’ is computed.  (The pseudomedian of a distribution F is the\n",
       "     median of the distribution of (u+v)/2, where u and v are\n",
       "     independent, each with distribution F.  If F is symmetric, then\n",
       "     the pseudomedian and median coincide.  See Hollander & Wolfe\n",
       "     (1973), page 34.)  Note that in the two-sample case the estimator\n",
       "     for the difference in location parameters does *not* estimate the\n",
       "     difference in medians (a common misconception) but rather the\n",
       "     median of the difference between a sample from ‘x’ and a sample\n",
       "     from ‘y’.\n",
       "\n",
       "     If exact p-values are available, an exact confidence interval is\n",
       "     obtained by the algorithm described in Bauer (1972), and the\n",
       "     Hodges-Lehmann estimator is employed.  Otherwise, the returned\n",
       "     confidence interval and point estimate are based on normal\n",
       "     approximations.  These are continuity-corrected for the interval\n",
       "     but _not_ the estimate (as the correction depends on the\n",
       "     ‘alternative’).\n",
       "\n",
       "     With small samples it may not be possible to achieve very high\n",
       "     confidence interval coverages. If this happens a warning will be\n",
       "     given and an interval with lower coverage will be substituted.\n",
       "\n",
       "_\bV_\ba_\bl_\bu_\be:\n",
       "\n",
       "     A list with class ‘\"htest\"’ containing the following components:\n",
       "\n",
       "statistic: the value of the test statistic with a name describing it.\n",
       "\n",
       "parameter: the parameter(s) for the exact distribution of the test\n",
       "          statistic.\n",
       "\n",
       " p.value: the p-value for the test.\n",
       "\n",
       "null.value: the location parameter ‘mu’.\n",
       "\n",
       "alternative: a character string describing the alternative hypothesis.\n",
       "\n",
       "  method: the type of test applied.\n",
       "\n",
       "data.name: a character string giving the names of the data.\n",
       "\n",
       "conf.int: a confidence interval for the location parameter.  (Only\n",
       "          present if argument ‘conf.int = TRUE’.)\n",
       "\n",
       "estimate: an estimate of the location parameter.  (Only present if\n",
       "          argument ‘conf.int = TRUE’.)\n",
       "\n",
       "_\bW_\ba_\br_\bn_\bi_\bn_\bg:\n",
       "\n",
       "     This function can use large amounts of memory and stack (and even\n",
       "     crash R if the stack limit is exceeded) if ‘exact = TRUE’ and one\n",
       "     sample is large (several thousands or more).\n",
       "\n",
       "_\bN_\bo_\bt_\be:\n",
       "\n",
       "     The literature is not unanimous about the definitions of the\n",
       "     Wilcoxon rank sum and Mann-Whitney tests.  The two most common\n",
       "     definitions correspond to the sum of the ranks of the first sample\n",
       "     with the minimum value subtracted or not: R subtracts and S-PLUS\n",
       "     does not, giving a value which is larger by m(m+1)/2 for a first\n",
       "     sample of size m.  (It seems Wilcoxon's original paper used the\n",
       "     unadjusted sum of the ranks but subsequent tables subtracted the\n",
       "     minimum.)\n",
       "\n",
       "     R's value can also be computed as the number of all pairs ‘(x[i],\n",
       "     y[j])’ for which ‘y[j]’ is not greater than ‘x[i]’, the most\n",
       "     common definition of the Mann-Whitney test.\n",
       "\n",
       "_\bR_\be_\bf_\be_\br_\be_\bn_\bc_\be_\bs:\n",
       "\n",
       "     David F. Bauer (1972), Constructing confidence sets using rank\n",
       "     statistics.  _Journal of the American Statistical Association_\n",
       "     *67*, 687-690.\n",
       "\n",
       "     Myles Hollander and Douglas A. Wolfe (1973), _Nonparametric\n",
       "     Statistical Methods._ New York: John Wiley & Sons.  Pages 27-33\n",
       "     (one-sample), 68-75 (two-sample).\n",
       "     Or second edition (1999).\n",
       "\n",
       "_\bS_\be_\be _\bA_\bl_\bs_\bo:\n",
       "\n",
       "     ‘psignrank’, ‘pwilcox’.\n",
       "\n",
       "     ‘wilcox_test’ in package ‘coin’ for exact, asymptotic and Monte\n",
       "     Carlo _conditional_ p-values, including in the presence of ties.\n",
       "\n",
       "     ‘kruskal.test’ for testing homogeneity in location parameters in\n",
       "     the case of two or more samples; ‘t.test’ for an alternative under\n",
       "     normality assumptions [or large samples]\n",
       "\n",
       "_\bE_\bx_\ba_\bm_\bp_\bl_\be_\bs:\n",
       "\n",
       "     require(graphics)\n",
       "     ## One-sample test.\n",
       "     ## Hollander & Wolfe (1973), 29f.\n",
       "     ## Hamilton depression scale factor measurements in 9 patients with\n",
       "     ##  mixed anxiety and depression, taken at the first (x) and second\n",
       "     ##  (y) visit after initiation of a therapy (administration of a\n",
       "     ##  tranquilizer).\n",
       "     x <- c(1.83,  0.50,  1.62,  2.48, 1.68, 1.88, 1.55, 3.06, 1.30)\n",
       "     y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29)\n",
       "     wilcox.test(x, y, paired = TRUE, alternative = \"greater\")\n",
       "     wilcox.test(y - x, alternative = \"less\")    # The same.\n",
       "     wilcox.test(y - x, alternative = \"less\",\n",
       "                 exact = FALSE, correct = FALSE) # H&W large sample\n",
       "                                                 # approximation\n",
       "     \n",
       "     ## Two-sample test.\n",
       "     ## Hollander & Wolfe (1973), 69f.\n",
       "     ## Permeability constants of the human chorioamnion (a placental\n",
       "     ##  membrane) at term (x) and between 12 to 26 weeks gestational\n",
       "     ##  age (y).  The alternative of interest is greater permeability\n",
       "     ##  of the human chorioamnion for the term pregnancy.\n",
       "     x <- c(0.80, 0.83, 1.89, 1.04, 1.45, 1.38, 1.91, 1.64, 0.73, 1.46)\n",
       "     y <- c(1.15, 0.88, 0.90, 0.74, 1.21)\n",
       "     wilcox.test(x, y, alternative = \"g\")        # greater\n",
       "     wilcox.test(x, y, alternative = \"greater\",\n",
       "                 exact = FALSE, correct = FALSE) # H&W large sample\n",
       "                                                 # approximation\n",
       "     \n",
       "     wilcox.test(rnorm(10), rnorm(10, 2), conf.int = TRUE)\n",
       "     \n",
       "     ## Formula interface.\n",
       "     boxplot(Ozone ~ Month, data = airquality)\n",
       "     wilcox.test(Ozone ~ Month, data = airquality,\n",
       "                 subset = Month %in% c(5, 8))\n",
       "     "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "?wilcox.test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Calls only the values of “stuff” from rows 1-6 of the data table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<ol class=list-inline>\n",
       "\t<li>0.4202</li>\n",
       "\t<li>0.4718</li>\n",
       "\t<li>0.5351</li>\n",
       "\t<li>0.4955</li>\n",
       "\t<li>0.4299</li>\n",
       "\t<li>0.5609</li>\n",
       "</ol>\n"
      ],
      "text/latex": [
       "\\begin{enumerate*}\n",
       "\\item 0.4202\n",
       "\\item 0.4718\n",
       "\\item 0.5351\n",
       "\\item 0.4955\n",
       "\\item 0.4299\n",
       "\\item 0.5609\n",
       "\\end{enumerate*}\n"
      ],
      "text/markdown": [
       "1. 0.4202\n",
       "2. 0.4718\n",
       "3. 0.5351\n",
       "4. 0.4955\n",
       "5. 0.4299\n",
       "6. 0.5609\n",
       "\n",
       "\n"
      ],
      "text/plain": [
       "[1] 0.4202 0.4718 0.5351 0.4955 0.4299 0.5609"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lukewarm[1:6]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### As an argument to plot sets the range of y to values of 0-5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "ylim=c(0,5) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* * *\n",
    "\n",
    "([Return to top.](#Module-3-:-Homework))\n",
    "\n",
    "* * *"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}