{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayes Theorem\n", "\n", "### Rohan L. Fernando\n", "\n", "### November 2015" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Motivation\n", "\n", "\n", "* In whole-genome analyses, the number $k$ of marker covariates typically exceeds the number $n$ of observations \n", "* Cannot use least squares methods to simultaneously estimate the effects of all the $k$ covariates\n", "* Bayesian inference is widely used to overcome this problem by combining prior information with the observed data\n", "* In the Bayesian approach, inferences are based on\n", "conditional probabilities, and the Bayes theorem is a statement on\n", "conditional probability" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Conditional Probability of $X$ Given $Y$\n", "\n", "\n", "Suppose $X$ and $Y$are two random variables with joint probability\n", "distribution $\\Pr(X,Y)$. Then, the conditional probability of $X$ given\n", "$Y$ is given by Bayes theorem as\n", "\n", "$$\\Pr(X|Y) = \\frac{\\Pr(X,Y)}{\\Pr(Y)} \\tag{1}$$\n", "\n", "where $\\Pr(Y)$ is the probability distribution of $Y$. \n", "\n", "Similarly, thethe conditional probability of $Y$ given $X$ is\n", "\n", "$$\\Pr(Y|X)=\\frac{\\Pr(X,Y)}{\\Pr(X)},$$ \n", "\n", "which upon rearranging gives\n", "\n", "$$\\Pr(X,Y)=\\Pr(Y|X)\\Pr(X). \\tag{2}$$\n", "\n", "Then, substituting (2) in (1) gives\n", "\n", "$$\\begin{eqnarray} \n", "\\Pr(X|Y) &= &\\frac{\\Pr(X,Y)}{\\Pr(Y)}\\\\\n", " &= &\\frac{\\Pr(Y|X)\\Pr(X)}{\\Pr(Y)},\n", "\\end{eqnarray}$$\n", "\n", "which is the form of the formula that is used for inference of $X$ given\n", "$Y.$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Bayes Theorem by Example\n", "----------------------------------------------\n", "\n", "Here we consider a simple example to justify the formula\n", "(1). The following table gives the joint distribution of\n", "smoking and lung cancer in a hypothetical population of 1,000,000\n", "individuals.\n", "\n", "$$\n", "\\begin{array}{c|lc|r}\n", " & \\text{Smoking} \\\\\n", "\\hline \n", "\\text{Cancer} & \\text{Yes} & \\text{No} & \\text{} \\\\\n", "\\hline\n", "\\text{Yes} & 42,500 & 7,500 & 50,000 \\\\\n", "\\text{No} & 207,500 & 742,500 & 950,000 \\\\\n", "\\hline \n", " & 250,000 & 750,000 & 1,000,000\n", "\\end{array}\n", "$$\n", "\n", "* Given these numbers, consider how you would compute the relative\n", " frequency of lung cancer among smokers" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " \n", "* There are a total of 250,000 smokers in this population, and among these 250,000 individuals, 42,500\n", " have lung cancer\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " \n", "* So, relative frequency of lung cancer among smokers is\n", " $\\frac{42,500}{250,000}$\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " \n", "* As we reason next, this relative frequency is also the conditional probability of lung cancer given the \n", " individual is a smoker" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* The frequentist definition of probability of an event is the\n", " limiting value of its relative frequency in a “large” number of\n", " trials.\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* Suppose we sample with replacement individuals from the 250,000\n", " smokers and compute the relative frequency of the incidence of lung\n", " cancer.\n", "\n", "* It can be shown that as the sample size goes to infinity, this\n", " relative frequency will approach $\\frac{42,500}{250,000}=0.17$.\n", " \n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "* This ratio can also be written as\n", " $$\\frac{42,500/1,000,000}{250,000/1,000,000}=0.17.$$\n", "\n", "* The ratio in the numerator is the joint probability of smoking and\n", " lung cancer, and the ratio in the denominator is the marginal\n", " probability of smoking.\n", " \n", "* So, $$\\Pr(X|Y) = \\frac{\\Pr(X,Y)}{\\Pr(Y)} $$ " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[NbConvertApp] Converting notebook BayesTheorem.ipynb to slides\n", "[NbConvertApp] Writing 197700 bytes to BayesTheorem.slides.html\n" ] } ], "source": [ ";ipython nbconvert --to slides BayesTheorem.ipynb" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Julia 0.4.0", "language": "julia", "name": "julia-0.4" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.4.0" } }, "nbformat": 4, "nbformat_minor": 0 }