{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#Statistical Inference for Everyone: Technical Supplement\n",
"\n",
"\n",
"\n",
"This document is the technical supplement, for instructors, for [Statistical Inference for Everyone], the introductory statistical inference textbook from the perspective of \"probability theory as logic\".\n",
"\n",
"\n",
"\n",
"[Statistical Inference for Everyone]: http://web.bryant.edu/~bblais/statistical-inference-for-everyone-sie.html\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Estimating the Paired-Data Difference Between Means, $\\delta_k \\equiv x_k-y_k$\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\newcommand{\\bvec}[1]{\\mathbf{#1}}$\n",
"\n",
"We want\n",
"\\begin{eqnarray}\n",
"p(\\mu_\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)\n",
"\\end{eqnarray}\n",
"where $\\delta_k \\equiv x_k-y_k$.\n",
"\n",
"We have from the Normal model the following likelihoods for $x_k$ and\n",
"$y_k$:\n",
"\\begin{eqnarray}\n",
"p(x_k|\\mu,\\sigma_x,I)&=&\\frac{1}{\\sqrt{2\\pi\\sigma_x^2}}e^{-(x_k -\n",
"\\mu_x)^2/2\\sigma_x^2}\\\\\\\\\n",
"p(y_k|\\mu,\\sigma_y,I)&=&\\frac{1}{\\sqrt{2\\pi\\sigma_y^2}}e^{-(y_k -\n",
"\\mu_y)^2/2\\sigma_y^2}\n",
"\\end{eqnarray}\n",
"\n",
"Now we need to find the likelihood function for $\\delta_k \\equiv x_k-y_k$. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Changing Variables\n",
"\n",
"If we have $Z=f(X,Y)$, and we know about $X$ and $Y$, we can learn about $Z$.\n",
"\\begin{eqnarray}\n",
"p(Z|I)&=&\\int \\int p(Z|X,Y,I) \\times p(X,Y|I) dXdY \\\\\\\\\n",
"&=&\\int \\int \\delta(Z-f(X,Y)) \\times p(X,Y|I) dXdY\n",
"\\end{eqnarray}\n",
"\n",
"Say, $Z=X-Y$, and $X$ and $Y$ are independent, then $p(X,Y|I)=p(X|I)p(Y|I)$\n",
"and we have \n",
"\\begin{eqnarray}\n",
"p(Z|I) &=& \\int dX p(X,I) \\int dY p(Y|I)\\delta(Z-X+Y) \\\\\\\\\n",
"&=& \\int dX p(X,I)p(Y=X-Z|I)\n",
"\\end{eqnarray}\n",
"\n",
"Further, if the probabilities are Gaussian, then we have\n",
"\n",
"\\begin{eqnarray}\n",
"p(Z|I) &=& \\frac{1}{2\\pi\\sigma_x\\sigma_y}\\int_{-\\infty}^{\\infty} dX\n",
"e^{-(X-\\mu_x)^2/2\\sigma_x^2}\\times e^{-(X-Z-\\mu_y)^2/2\\sigma_y^2} \n",
"\\end{eqnarray}\n",
"One can do some pretty boring algebra at this point (factoring the exponents),\n",
"or use a program like *xmaxima*:\n",
"\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" (C1) ASSUME_POS:TRUE;\n",
" (D1) TRUE\n",
" (C2) 1/(2*%PI)/sx/sy*integrate(exp(-(x-xo)^2/(2*sx^2))*\n",
" exp(-(x-z-yo)^2/(2*sy^2)),x,-inf,inf);\n",
"\n",
" 2 2 2\n",
" z + (2 yo - 2 xo) z + yo - 2 xo yo + xo\n",
" - ------------------------------------------\n",
" 2 2\n",
" 2 sy + 2 sx\n",
" SQRT(2) %E\n",
" (D2) ------------------------------------------------------\n",
" 2 2\n",
" 2 SQRT(%PI) SQRT(sy + sx )\n",
"\n",
" (C3) factor(z^2+(2*yo-2*xo)*z+yo^2-2*xo*yo+xo^2);\n",
" 2\n",
" (D3) (z + yo - xo)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"So we get\n",
"\n",
"\\begin{eqnarray}\n",
"p(Z|I) &=&\n",
"\\frac{1}{\\sqrt{2\\pi(\\sigma_x^2+\\sigma_y^2)}}\n",
"e^{-(z-(\\mu_x-\\mu_y))^2/2(\\sigma_x^2+\\sigma_y^2)} \\\\\\\\\n",
"&=&\\frac{1}{\\sqrt{2\\pi\\sigma_z}}\n",
"e^{-(z-\\mu_z)^2/2\\sigma_z} \\\\\\\\ \\mbox{ where }\n",
"\\mu_z&\\equiv& \\mu_x-\\mu_y \\\\\\\\\n",
"\\sigma_z^2&\\equiv&\\sigma_x^2+\\sigma_y^2\n",
"\\end{eqnarray}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Continuing with Paired Data\n",
"\n",
"Changing variables to $\\delta_k$, it is clear that the likelihood for\n",
"$\\delta_k$ is the same form as $\\delta_x$ and $\\delta_y$. Thus we have the\n",
"*exact same* results on the paired difference, both for known and unknown\n",
"$\\sigma$, quoted in z-test and t-test sections.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Difference of Means, $\\delta\\equiv \\mu_x - \\mu_y$, known $\\sigma_x$ and $\\sigma_y$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Again, the change of variables trick works, but since we are given the means\n",
"($\\mu_x$ and $\\mu_y$) we need to use the posterior distributions,\n",
"$p(\\mu_x|\\bvec{x},\\sigma_x,I)$ and $p(\\mu_y|\\bvec{y},\\sigma_y,I)$.\n",
"\n",
"\\begin{eqnarray}\n",
"p(\\mu_x|\\bvec{x},\\sigma_x,I)&=& \n",
"\\sqrt{\\frac{n}{2\\pi \\sigma_x^2}}e^{-n(\\bar{x}-\\mu_x)^2/2\\sigma_x^2} \\\\\\\\\n",
"p(\\mu_y|\\bvec{y},\\sigma_y,I)&=& \n",
"\\sqrt{\\frac{m}{2\\pi \\sigma_y^2}}e^{-n(\\bar{y}-\\mu_y)^2/2\\sigma_y^2}\n",
"\\end{eqnarray}\n",
"\n",
"Performing the change of variables to $\\delta \\equiv \\mu_x-\\mu_y$ we get\n",
"\n",
"\\begin{eqnarray}\n",
"p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n",
"\\frac{\\sqrt{nm}}{2\\pi\\sigma_x\\sigma_y}\\int d\\mu_y\n",
"e^{-n(\\bar{x}-\\delta-\\mu_y)^2/2\\sigma_x^2}\n",
"e^{-m(\\bar{y}-\\mu_y)^2/2\\sigma_y^2} \n",
"\\end{eqnarray}\n",
"\n",
"Again, using *xmaxima*,\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" (C1) ASSUME_POS:TRUE;\n",
"\n",
" (D1) TRUE\n",
" (C2) f(d):=sqrt(n*m)/(2*%PI*sx*sy)*integrate(exp(-n*(xbar-d-my)^2/(2*sx^2))*\n",
" exp(-m*(ybar-my)^2/(2*sy^2)),my,-inf,inf);\n",
" f(d);\n",
"\n",
" 2\n",
" SQRT(n m) (- n) (xbar - d - my)\n",
" (D2) f(d) := ----------- INTEGRATE(EXP(----------------------)\n",
" 2 %PI sx sy 2\n",
" 2 sx\n",
"\n",
" 2\n",
" (- m) (ybar - my)\n",
" EXP(------------------), my, - INF, INF)\n",
" 2\n",
" 2 sy\n",
" (C3) \n",
" 2\n",
" (D3) SQRT(2) SQRT(m) SQRT(n) EXPT(%E, - (m n ybar\n",
"\n",
" 2 2\n",
" + (- 2 m n xbar + 2 d m n) ybar + m n xbar - 2 d m n xbar + d m n)\n",
"\n",
" 2 2 2 2\n",
" /(2 n sy + 2 m sx ))/(2 SQRT(%PI) SQRT(n sy + m sx ))\n",
" (C4) factor((m*n)*ybar^2+(-2*m*n*xbar+2*d*m*n)*ybar+m*n*xbar^2-2*d*m*n*xbar+m*n*d^2);\n",
"\n",
" 2\n",
" (D4) m n (ybar - xbar + d)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Rewritten, this is\n",
"\n",
"\\begin{eqnarray}\n",
"p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n",
"\\sqrt{\\frac{nm}{2\\pi(n\\sigma_x^2+m\\sigma_y^2)}}\n",
" e^{-mn(\\delta-(\\bar{x}-\\bar{y}))^2/2(n\\sigma_x^2+m\\sigma_y^2)}\n",
"\\end{eqnarray}\n",
"\n",
"or\n",
"\n",
"\\begin{eqnarray}\n",
"\\mu_\\delta &\\equiv& \\mu_x-\\mu_y \\\\\n",
"\\sigma_\\delta &\\equiv& \\frac{\\sigma_x^2}{n}+\\frac{\\sigma_y^2}{m}\\\\\n",
"p(\\delta|\\bvec{x},\\bvec{y},\\sigma_x,\\sigma_y,I)&=&\n",
"\\frac{1}{\\sqrt{2\\pi\\sigma_\\delta^2}}\n",
" e^{-(\\delta-\\mu_\\delta)^2/2\\sigma_\\delta^2}\n",
"\\end{eqnarray}\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Difference of Means, $\\delta\\equiv \\mu_x - \\mu_y$, unknown $\\sigma_x$ and $\\sigma_y$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Making definitions as before for the $t$ distribution for each variable\n",
"\n",
"\\begin{eqnarray}\n",
"t_x&\\equiv&\\frac{\\mu_x-\\bar{x}}{S_x/\\sqrt{n}} \\\\\\\\\n",
"t_y&\\equiv&\\frac{\\mu_y-\\bar{y}}{S_y/\\sqrt{n}} \\\\\\\\\n",
"S_x^2&\\equiv&\\frac{1}{(n-1)}\\sum_{k=1}^{n} (x_k-\\mu_x)^2 \\\\\\\\\n",
"S_y^2&\\equiv&\\frac{1}{(m-1)}\\sum_{k=1}^{m} (y_k-\\mu_y)^2\n",
"\\end{eqnarray}\n",
"From the addition of variables we get\n",
"\\begin{eqnarray}\n",
"t&\\equiv&\\frac{\\delta-(\\bar{x}-\\bar{y})}{\\sqrt{S_x^2/m+S_y^2/n}} \\\\\\\\\n",
"\\tan \\theta &\\equiv& \\frac{S_x/\\sqrt{n}}{S_y/\\sqrt{m}}\n",
"\\end{eqnarray}\n",
"\n",
"$\\tan \\theta$ depends on the data, and $t_x$, and $t_y$ are known, so the\n",
"distribution for $t$ should be known. It is named the Behren's distribution.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---------------------"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n"
],
"text/plain": [
""
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from IPython.core.display import HTML\n",
"\n",
"\n",
"def css_styling():\n",
" styles = open(\"../styles/custom.css\", \"r\").read()\n",
" return HTML(styles)\n",
"css_styling()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 0
}