{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Statistical Inference for Everyone: Technical Supplement\n", "\n", "\n", "\n", "This document is the technical supplement, for instructors, for [Statistical Inference for Everyone], the introductory statistical inference textbook from the perspective of \"probability theory as logic\".\n", "\n", "\n", "\n", "[Statistical Inference for Everyone]: http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating the Paired-Data Difference Between Means, $\\delta_k \\equiv x_k-y_k$\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$\\newcommand{\\bvec}[1]{\\mathbf{#1}}$\n", "\n", "We want\n", "\\begin{eqnarray}\n", "p(\\mu_\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)\n", "\\end{eqnarray}\n", "where $\\delta_k \\equiv x_k-y_k$.\n", "\n", "We have from the Normal model the following likelihoods for $x_k$ and\n", "$y_k$:\n", "\\begin{eqnarray}\n", "p(x_k|\\mu,\\sigma_x,I)&=&\\frac{1}{\\sqrt{2\\pi\\sigma_x^2}}e^{-(x_k -\n", "\\mu_x)^2/2\\sigma_x^2}\\\\\\\\\n", "p(y_k|\\mu,\\sigma_y,I)&=&\\frac{1}{\\sqrt{2\\pi\\sigma_y^2}}e^{-(y_k -\n", "\\mu_y)^2/2\\sigma_y^2}\n", "\\end{eqnarray}\n", "\n", "Now we need to find the likelihood function for $\\delta_k \\equiv x_k-y_k$. \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing Variables\n", "\n", "If we have $Z=f(X,Y)$, and we know about $X$ and $Y$, we can learn about $Z$.\n", "\\begin{eqnarray}\n", "p(Z|I)&=&\\int \\int p(Z|X,Y,I) \\times p(X,Y|I) dXdY \\\\\\\\\n", "&=&\\int \\int \\delta(Z-f(X,Y)) \\times p(X,Y|I) dXdY\n", "\\end{eqnarray}\n", "\n", "Say, $Z=X-Y$, and $X$ and $Y$ are independent, then $p(X,Y|I)=p(X|I)p(Y|I)$\n", "and we have \n", "\\begin{eqnarray}\n", "p(Z|I) &=& \\int dX p(X,I) \\int dY p(Y|I)\\delta(Z-X+Y) \\\\\\\\\n", "&=& \\int dX p(X,I)p(Y=X-Z|I)\n", "\\end{eqnarray}\n", "\n", "Further, if the probabilities are Gaussian, then we have\n", "\n", "\\begin{eqnarray}\n", "p(Z|I) &=& \\frac{1}{2\\pi\\sigma_x\\sigma_y}\\int_{-\\infty}^{\\infty} dX\n", "e^{-(X-\\mu_x)^2/2\\sigma_x^2}\\times e^{-(X-Z-\\mu_y)^2/2\\sigma_y^2} \n", "\\end{eqnarray}\n", "One can do some pretty boring algebra at this point (factoring the exponents),\n", "or use a program like *xmaxima*:\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " (C1) ASSUME_POS:TRUE;\n", " (D1) TRUE\n", " (C2) 1/(2*%PI)/sx/sy*integrate(exp(-(x-xo)^2/(2*sx^2))*\n", " exp(-(x-z-yo)^2/(2*sy^2)),x,-inf,inf);\n", "\n", " 2 2 2\n", " z + (2 yo - 2 xo) z + yo - 2 xo yo + xo\n", " - ------------------------------------------\n", " 2 2\n", " 2 sy + 2 sx\n", " SQRT(2) %E\n", " (D2) ------------------------------------------------------\n", " 2 2\n", " 2 SQRT(%PI) SQRT(sy + sx )\n", "\n", " (C3) factor(z^2+(2*yo-2*xo)*z+yo^2-2*xo*yo+xo^2);\n", " 2\n", " (D3) (z + yo - xo)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "So we get\n", "\n", "\\begin{eqnarray}\n", "p(Z|I) &=&\n", "\\frac{1}{\\sqrt{2\\pi(\\sigma_x^2+\\sigma_y^2)}}\n", "e^{-(z-(\\mu_x-\\mu_y))^2/2(\\sigma_x^2+\\sigma_y^2)} \\\\\\\\\n", "&=&\\frac{1}{\\sqrt{2\\pi\\sigma_z}}\n", "e^{-(z-\\mu_z)^2/2\\sigma_z} \\\\\\\\ \\mbox{ where }\n", "\\mu_z&\\equiv& \\mu_x-\\mu_y \\\\\\\\\n", "\\sigma_z^2&\\equiv&\\sigma_x^2+\\sigma_y^2\n", "\\end{eqnarray}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Continuing with Paired Data\n", "\n", "Changing variables to $\\delta_k$, it is clear that the likelihood for\n", "$\\delta_k$ is the same form as $\\delta_x$ and $\\delta_y$. Thus we have the\n", "*exact same* results on the paired difference, both for known and unknown\n", "$\\sigma$, quoted in z-test and t-test sections.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Difference of Means, $\\delta\\equiv \\mu_x - \\mu_y$, known $\\sigma_x$ and $\\sigma_y$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, the change of variables trick works, but since we are given the means\n", "($\\mu_x$ and $\\mu_y$) we need to use the posterior distributions,\n", "$p(\\mu_x|\\bvec{x},\\sigma_x,I)$ and $p(\\mu_y|\\bvec{y},\\sigma_y,I)$.\n", "\n", "\\begin{eqnarray}\n", "p(\\mu_x|\\bvec{x},\\sigma_x,I)&=& \n", "\\sqrt{\\frac{n}{2\\pi \\sigma_x^2}}e^{-n(\\bar{x}-\\mu_x)^2/2\\sigma_x^2} \\\\\\\\\n", "p(\\mu_y|\\bvec{y},\\sigma_y,I)&=& \n", "\\sqrt{\\frac{m}{2\\pi \\sigma_y^2}}e^{-n(\\bar{y}-\\mu_y)^2/2\\sigma_y^2}\n", "\\end{eqnarray}\n", "\n", "Performing the change of variables to $\\delta \\equiv \\mu_x-\\mu_y$ we get\n", "\n", "\\begin{eqnarray}\n", "p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n", "\\frac{\\sqrt{nm}}{2\\pi\\sigma_x\\sigma_y}\\int d\\mu_y\n", "e^{-n(\\bar{x}-\\delta-\\mu_y)^2/2\\sigma_x^2}\n", "e^{-m(\\bar{y}-\\mu_y)^2/2\\sigma_y^2} \n", "\\end{eqnarray}\n", "\n", "Again, using *xmaxima*,\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " (C1) ASSUME_POS:TRUE;\n", "\n", " (D1) TRUE\n", " (C2) f(d):=sqrt(n*m)/(2*%PI*sx*sy)*integrate(exp(-n*(xbar-d-my)^2/(2*sx^2))*\n", " exp(-m*(ybar-my)^2/(2*sy^2)),my,-inf,inf);\n", " f(d);\n", "\n", " 2\n", " SQRT(n m) (- n) (xbar - d - my)\n", " (D2) f(d) := ----------- INTEGRATE(EXP(----------------------)\n", " 2 %PI sx sy 2\n", " 2 sx\n", "\n", " 2\n", " (- m) (ybar - my)\n", " EXP(------------------), my, - INF, INF)\n", " 2\n", " 2 sy\n", " (C3) \n", " 2\n", " (D3) SQRT(2) SQRT(m) SQRT(n) EXPT(%E, - (m n ybar\n", "\n", " 2 2\n", " + (- 2 m n xbar + 2 d m n) ybar + m n xbar - 2 d m n xbar + d m n)\n", "\n", " 2 2 2 2\n", " /(2 n sy + 2 m sx ))/(2 SQRT(%PI) SQRT(n sy + m sx ))\n", " (C4) factor((m*n)*ybar^2+(-2*m*n*xbar+2*d*m*n)*ybar+m*n*xbar^2-2*d*m*n*xbar+m*n*d^2);\n", "\n", " 2\n", " (D4) m n (ybar - xbar + d)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rewritten, this is\n", "\n", "\\begin{eqnarray}\n", "p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n", "\\sqrt{\\frac{nm}{2\\pi(n\\sigma_x^2+m\\sigma_y^2)}}\n", " e^{-mn(\\delta-(\\bar{x}-\\bar{y}))^2/2(n\\sigma_x^2+m\\sigma_y^2)}\n", "\\end{eqnarray}\n", "\n", "or\n", "\n", "\\begin{eqnarray}\n", "\\mu_\\delta &\\equiv& \\mu_x-\\mu_y \\\\\n", "\\sigma_\\delta &\\equiv& \\frac{\\sigma_x^2}{n}+\\frac{\\sigma_y^2}{m}\\\\\n", "p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n", "\\frac{1}{\\sqrt{2\\pi\\sigma_\\delta^2}}\n", " e^{-(\\delta-\\mu_\\delta)^2/2\\sigma_\\delta^2}\n", "\\end{eqnarray}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Difference of Means, $\\delta\\equiv \\mu_x - \\mu_y$, unknown $\\sigma_x$ and $\\sigma_y$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Making definitions as before for the $t$ distribution for each variable\n", "\n", "\\begin{eqnarray}\n", "t_x&\\equiv&\\frac{\\mu_x-\\bar{x}}{S_x/\\sqrt{n}} \\\\\\\\\n", "t_y&\\equiv&\\frac{\\mu_y-\\bar{y}}{S_y/\\sqrt{n}} \\\\\\\\\n", "S_x^2&\\equiv&\\frac{1}{(n-1)}\\sum_{k=1}^{n} (x_k-\\mu_x)^2 \\\\\\\\\n", "S_y^2&\\equiv&\\frac{1}{(m-1)}\\sum_{k=1}^{m} (y_k-\\mu_y)^2\n", "\\end{eqnarray}\n", "From the addition of variables we get\n", "\\begin{eqnarray}\n", "t&\\equiv&\\frac{\\delta-(\\bar{x}-\\bar{y})}{\\sqrt{S_x^2/m+S_y^2/n}} \\\\\\\\\n", "\\tan \\theta &\\equiv& \\frac{S_x/\\sqrt{n}}{S_y/\\sqrt{m}}\n", "\\end{eqnarray}\n", "\n", "$\\tan \\theta$ depends on the data, and $t_x$, and $t_y$ are known, so the\n", "distribution for $t$ should be known. It is named the Behren's distribution.\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---------------------" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from IPython.core.display import HTML\n", "\n", "\n", "def css_styling():\n", " styles = open(\"../styles/custom.css\", \"r\").read()\n", " return HTML(styles)\n", "css_styling()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.9" } }, "nbformat": 4, "nbformat_minor": 0 }