#+AUTHOR: Arnaud Legrand #+TITLE: Introduction to Probabilities and Statistics #+DATE: MOSIG Lecture #+STARTUP: beamer overview indent #+TAGS: noexport(n) #+LaTeX_CLASS: beamer #+LaTeX_CLASS_OPTIONS: [11pt,xcolor=dvipsnames,presentation] #+OPTIONS: H:3 num:t toc:nil \n:nil @:t ::t |:t ^:nil -:t f:t *:t <:t #+LATEX_HEADER: \input{org-babel-style-preembule.tex} #+LATEX_HEADER: \usepackage{commath} #+LaTeX: \input{org-babel-document-preembule.tex} * List :noexport: ** TODO Jean-Marc - Je n’ai pas parlé de la série génératrice des moments qui va faire la confusion avec la fonction génératrice dans le cas discret $\Esp t^X$ pour laquelle les coefficients en 0 sont les probabilités et les coefficients en 1 les moments - En fait $M_X$ est surtout un outil calculatoire, il y a beaucoup de problèmes pour définir proprement son domaine si les variables sont réelles (singularités,…) je pense qu’il faut donner la bonne idée (transformée de Fourier dont M_X est un avatar). - Par contre j’ai parlé de fonction caractéristique en cours avec les propriétés qui vont bien - Avant l’inégalité de Bienaymé-Chebichev il faut parler de l’inégalité de Markov qui est vraiment fondamentale : lien entre espérance et cdf, Chebichev apparait donc comme un corollaire évident et on peut généraliser le processus (ce qui se fait plus tard si on fait des grandes déviations) - Euh la dernière limite slide 21 n’est pas correcte il vaut mieux dire que l’on a une approximation $\frac {S_n}n \simeq \mathcal{N}(\mu, \frac {\sigma^2}n)$ - Pour les notations, comme il y a différents types de convergence en théorie des probabilité (en loi, en proba, presque sure, dans L^1,.. L^p,…) il vaut mieux préciser systématiquement donc ici on a de la convergence en loi que je note $\stackrel{\longrightarrow}{\mathcal{L}}$ - Slide 25, pour caler les variances il faut diviser par $sqrt{2}$ (explique le racine de n dans la formule) * Brief reminder on probabilities ** *** Probabilities \vspace{-.3em} - Using probabilities enables to model *uncertainty* that may result of *incomplete information* or *imprecise measurements* \pause A *random variable* (or stochastic variable) is, roughly speaking, a variable whose value results from a measurement (or an observation) You can think of it as a *small box*: - Every time you open the box, you get a _different_ value. - I will use this box analogy throughout the whole lecture and I encourage you to ask yourself what the box can be in your own studies\medskip \pause - Formally a *probability space* is defined by $(\Omega, \F, \P)$ where: - $\Omega$, the *sample space*, is the set of all possible *outcomes* - \Eg all the possible combinations of your DNA with the one of your {girl|boy}friend - You may or may not be able to observe directly the outcome. \pause - \F if the set of *events* where an event is a set containing zero or more outcomes - \Eg the event of "the DNA corresponds to a girl with blue eyes" - An event is somehow more tangible and can generally be observed \pause - The *probability measure* $\P:\F \to [0,1]$ is a function returning an event's probability #+LaTeX: ($\P$("having a brown-eyed baby girl") = 0.0005) *** Continuous random variable - A *random variable* associates a *numerical value* to *outcomes* \begin{equation*} \rv{X}: \Omega \to \R \end{equation*} \vspace{-1.6em} - \Eg the weight of the baby at birth (assuming it solely depends on DNA, which is quite false but it's for the sake of the example) - Since many computer science experiments are based on time measurements, we focus on *continuous* variables - \textbf{Note:} To distinguish random variables, which are complex objects, from other mathematical objects, they will always be written in blue capital letters in this set of slides (\eg \rv{X}) - The probability measure on \Omega induces probabilities on the *values* of \rv{X} - $\P(\rv{X}=0.5213)$ is generally 0 as the outcome never exactly matches - $\P(0.5213\leq\rv{X}\leq0.5214)$ may however be non-zero *** Probability distribution #+begin_src R :results output graphics :file "pdf_babel/Gamma_distribution.pdf" :exports none :width 6 :height 3 :session library(ggplot2) library(ggthemes) df = data.frame(x=c(-2,10), y=c(0,.3)) func = dgamma pfunc = pgamma args = list(shape = 3) xmin = 1 xmax = 6 x = seq(from=xmin,to=xmax,length.out=50) y = do.call(func,c(list(x=x),args)) area = data.frame(x=x, y=y) integral = diff(range(do.call(pfunc,c(list(q=c(xmin,xmax)),args)))) label = paste("P(paste(",xmin," <= X) <= ",xmax,") == ", integral) r2.value <- 0.90 p = ggplot(data=df,aes(x=x,y=y)) + geom_point(size=0) + theme_classic() + stat_function(fun = func, colour = "darkgreen", args = args) + geom_area(data=area,aes(x=x,y=y),fill="lightskyblue2") + geom_text(x=.7*max(df$x),y=.7*max(df$y), label=label, parse=T) + ylab(expression(paste(f[X],"(",italic(w),")"))) + xlim(df$x) + xlab(expression(italic(w))) p # ggsave(p,file="pdf_babel/Gamma_distribution.pdf",width=6,height=4) #+end_src #+RESULTS: [[file:pdf_babel/Gamma_distribution.pdf]] A *probability distribution* (a.k.a. *probability density function* or p.d.f.) is used to describe the probabilities of different *values* occurring - A random variable \rv{X} has density $f_X$, where $f_X$ is a non-negative and integrable function, if: \qquad\quad $\displaystyle \boxed{\P[a \leq \rv{X} \leq b] = \int_a^b f_X(w) \, \dif w}$ #+BEGIN_EXPORT latex %\vspace{-.5em} \begin{columns} \begin{column}{.7\linewidth} \includegraphics[width=\linewidth]{pdf_babel/Gamma_distribution.pdf} \end{column} \begin{column}{.3\linewidth} \begin{boxedminipage}{1.1\linewidth} \scriptsize Note: \texttt{the} $\text{X}$ in \hbox{$1\leq\text{X}\leq6$} \textit{should be in blue...} \end{boxedminipage} \end{column} \end{columns} #+END_EXPORT # \vspace{-.5em} - \textbf{Note}: people often confuse the sample space with the random variable. Try to make the difference when modeling your system, it will help you *** Characterizing a random variable The probability density function *fully characterizes* the random variable but it is also complex object - It may be symmetrical or not - It may have one or several *modes* - It may have a bounded support or not, hence the random variable may have a *minimal* and/or a *maximal* value - The *median* cuts the probabilities in half #+begin_src R :results output graphics :file "pdf_babel/distribution_characteristics.pdf" :exports none :width 6 :height 3 :session library(ggplot2) library(ggthemes) xmin = -2 xmax = 10 ymin = 0 ymax = .3 func = dgamma pfunc = pgamma args = list(shape = 3) x = seq(from=xmin,to=xmax,length.out=500) y = do.call(pfunc,c(list(q=x),args)) dfminx = data.frame(x=x,y=y) minx = tail(dfminx[dfminx$y==0,],n=1)$x if(length(minx)==0) {minx=NA} maxx = head(dfminx[dfminx$y==1,],n=1)$x if(length(maxx)==0) {maxx=NA} medianx = tail(dfminx[dfminx$y<.5,],n=1)$x if(length(medianx)==0) {medianx=NA} y = do.call(func,c(list(x=x),args)) dfminx = data.frame(x=x,y=y) modex = dfminx[dfminx$y==max(dfminx$y),]$x espx = sum(dfminx$x*dfminx$y)*diff(range(head(dfminx$x,n=2))) dfstat = data.frame(name = c("min", "median", "max","mode","expected value"), x = c(minx,medianx,maxx,modex,espx), y = ymax) p = ggplot(data=dfstat,aes(x=x,y=y,color=name)) + geom_line(alpha=1) + xlim(xmin,xmax) + ylim(ymin,ymax) + theme_classic() + ylab("f(x)") + stat_function(fun = func, colour = "black", args = args) + geom_vline(aes(xintercept=x,color=name)) + guides(colour = guide_legend("")) p # ggsave(p,file="pdf_babel/Gamma_distribution.pdf",width=6,height=4) #+end_src #+RESULTS: [[file:pdf_babel/distribution_characteristics.pdf]] #+BEGIN_CENTER \includegraphics[width=.7\linewidth]{pdf_babel/distribution_characteristics.pdf} #+END_CENTER \vspace{-1em} \bf These are interesting aspects of $\mathbf{f_X}$ but they barely summarize it *** Expected value and variance - When one speaks of the "expected price", "expected height", etc. one means the *expected value* of a random variable that is a price, a height, etc. \begin{align*} \E[\rv{X}] & = x_1p_1 + x_2p_2 + \ldots + x_kp_k = \int_{-\infty}^\infty x f_X(x)\, \dif x \end{align*} The expected value of \rv{X} is the ``average value'' of \rv{X}.\smallskip It is \textbf{not} the most probable value. The mean is _one_ aspect of the distribution of \rv{X}. The *median* or the *mode* are other interesting aspects. - The *variance* is a measure of how far the values of a random variable are spread out from each other. If a random variable \rv{X} has the expected value (mean) $\mu = \E[\rv{X}]$, then the variance of \rv{X} is given by: #+BEGIN_EXPORT latex \begin{align*} \Var(\rv{X}) &= \E\left[(\rv{X} - \mu)^2 \right] = \int_{-\infty}^\infty (x-\mu)^2 f_X(x)\, \dif x \end{align*} #+END_EXPORT - The *standard deviation $\sigma$* is the square root of the variance. This normalization allows to compare it with the expected value * Moment Generating Function ** Intuitions *** Definition Working with the density function is not always convenient, especially when *summing* random variables (it implies *convolving* the pdf). We need an /alternate representation/. \medskip How could we summarize a random variable ? - By its mean, its variance, its skewness, \dots by its moments $\mu_k = \E(\rv{X}^k)$ - It is not clear that it would be sufficient although we would know a lot about $f_{\rv{X}}$. Let's define the *moment generating function* \M as follows: #+BEGIN_EXPORT latex \begin{align*} \M & = \E\left(e^{t\rv{X}}\right) = \E\left(\sum_{k=0}^\infty \frac{t^k\rv{X}^k}{k!}\right) = \E\left(\sum_{k=0}^\infty \frac{t^k\rv{X}^k}{k!}\right) = \sum_{k=0}^\infty \mu_k\frac{t^k}{k!}\\ & = \int e^{tx}f_{\rv{X}}(x)dx \end{align*} #+END_EXPORT *** Deriving moments with the mgf Remember we have $\displaystyle \M = \sum_{k=0}^\infty \mu_k\frac{t^k}{k!}$ Therefore $\displaystyle \frac{d^n \MM}{dt^n}(0) = \mu_n$ \medskip All the moments of $\rv{X}$ are encoded in \M. Is there more ? *** Characterization of a distribution through the mgf Let's assume that $\rv{X}$ is discrete $\left((x_1,p_1),\dots, (x_n,p_n)\right)$ with $x_1<\dots0$ be any positive real number. Then $\P(|\rv{X}-\mu|\geq \epsilon) \leq \frac{\Var(\rv{X})}{\epsilon^2}$. **** Proof #+BEGIN_EXPORT latex \begin{align*} \Var(\rv{X}) = \int (x-\mu)^2f(x).dx & \geq \int_{|x-\mu|\geq\epsilon} (x-\mu)^2f(x).dx \\ & \geq \int_{|x-\mu|\geq\epsilon} \epsilon^2f(x).dx = \epsilon^2 \underbrace{\int_{|x-\mu|\geq\epsilon} f(x).dx}_{\P(|\rv{X}-\mu|\geq \epsilon)} \end{align*} #+END_EXPORT *** Law of Large Numbers **** Law of Large Numbers Let \rv{X_1}, \rv{X_2}, \dots, \rv{X_n} be a sequence of identical and independent random variables with finite expected value $\mu=\E(\rv{X_i})$ and finite variance $\sigma^2=\Var(\rv{X_i})$. Let $\rv{S_n} = \rv{X_1} + \rv{X_2} + \dots + \rv{X_n}$. Then for any $\epsilon>0$, $\P(|\rv{S_n}/n-\mu|\geq \epsilon) \xrightarrow[n\to\infty]{} 0$. **** Proof The $\rv{X_i}$ are i.i.d, hence: - $\Var(\rv{S_n}) = n.\sigma^2 \leadsto \Var(\rv{S_n}/n) = \sigma^2/n$. - $\E(\rv{S_n}/n) = \mu$. Using Chebyshev's inequality: #+BEGIN_EXPORT latex $$\P(|\rv{S_n}/n-\mu|\geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2} \xrightarrow[n\to\infty]{} 0 \text{ (for a fixed $\varepsilon$)}$$ #+END_EXPORT *** Illustration: convergence in probability #+begin_src R :results output graphics :file pdf_babel/tcl_large_numbers.pdf :exports results :width 7 :height 2 :session library(ggplot2) N=10000; repsamp = function(N,r) { x = 0 ; for(i in 1:r) { x = x + sample(x=c(0,1),replace=T,N); } x = x/r data.frame(x=x,r=r) } df = rbind(repsamp(N,1), repsamp(N,10), repsamp(N,100), repsamp(N,1000),repsamp(N,10000)); ggplot(df,aes(x=x)) + geom_histogram(binwidth=.05) + facet_wrap(~r,nrow=1) + theme_bw() + scale_x_continuous(breaks=c(0,.5,1)) + xlab("Sn") #+end_src #+RESULTS: [[file:pdf_babel/tcl_large_numbers.pdf]] So we do converge to a spike, but how ? Assume $\sigma=1$ and we aim at having a precision of $\epsilon=.1$. For $n=500$, the previous formula only gives us $\P(|\rv{S_n}/n-\mu|\geq \epsilon) \leq \frac{\sigma^2}{n\epsilon^2} = \frac{100}{n} = 0.5 \quad\frowny$ In general, for an $\alpha$ confidence interval (i.e., $\P(|\rv{S_n}/n-\mu|\le \delta)\le\alpha$), we get $\delta=\frac{1}{\sqrt{1-\alpha}}.\frac{\sigma}{\sqrt{n}}$ | $\alpha$ | Chebyshev's Range $\frowny$ | CLT range $\smiley$ | |------+--------------------------+--------------------------| | .95 | $4.47\frac{\sigma}{\sqrt{n}}$ | $1.95\frac{\sigma}{\sqrt{n}}$ | | .999 | $31.6\frac{\sigma}{\sqrt{n}}$ | $6.58\frac{\sigma}{\sqrt{n}}$ | ** Central Limit Theorem *** Central Limit Theorem [\textbf{CLT}] - Let $\{\rv{X_1}, \rv{X_2}, \dots, \rv{X_n}\}$ be a random sample of size $n$ (\ie a sequence of *independent* and *identically distributed* random variables with expected values $\mu$ and variances $\sigma^2$) - We know that $\E(\rv{S_n}/n) = \mu$ and $\Var(\rv{S_n}) = n\sigma^2$. - Let's define the *standardized mean* of these random variables as: #+BEGIN_EXPORT latex $$\displaystyle \rv{S^*_n} = \frac{S_n-n\mu}{\sqrt{n\sigma^2}}$$ #+END_EXPORT We have $\E(\rv{S^*_n}) = 0$ and $\Var(\rv{S^*_n}) = 1$. - For large $n$, the distribution of $\rv{S^*_n}$ is approximately *normal* #+BEGIN_EXPORT latex \begin{equation*} \rv{S^*_n} \xrightarrow[n\to\infty]{} \N\left(0,1\right) \end{equation*} #+END_EXPORT Or equivalently #+BEGIN_EXPORT latex \begin{equation*} \frac{\rv{S_n}}{n} \xrightarrow[n\to\infty]{} \N\left(\mu,\frac{\sigma^2}{n}\right) \end{equation*} #+END_EXPORT *** CLT Illustration: the mean smooths distributions #+begin_src R :results output graphics :file "pdf_babel/CLT_illustration.pdf" :exports none :width 9 :height 6 :session library(ggplot2) library(ggthemes) triangle <- function(n=10) { sqrt(runif(n)) } broken <- function(n=10) { x=runif(n); x/(1-x); } broken_mid <- function(n=10) { x=(runif(n)+runif(n))/2; x/(1-x); } generate <- function(n=50000,N=c(1,2,5,10,15,20,30,100), law=c("unif","binom","triangle")) { df=data.frame(); for(l in law) { for(p in N) { X=rep.int(0,n); for(i in 1:p) { X = X + switch(l, unif = runif(n), binom = rbinom(n,1,.5), exp=rexp(n,rate = 2), norm=rnorm(n,mean = .5), triangle=triangle(n)-1/6, broken=broken(n), broken_mid=broken_mid(n)); } X = X/p; df=rbind(df,data.frame(N=p,SN=X,law=l)); } } df; } d=generate() ggplot(data=d,aes(x=SN)) + geom_density(aes(y = ..density..)) + facet_grid(law~N) + theme_classic() + xlab("") + scale_x_continuous(breaks=c(0,.5,1)) #+end_src #+RESULTS: [[file:pdf_babel/CLT_illustration.pdf]] Start with an *arbitrary* distribution and compute the distribution of $S_n$ for increasing values of $n$. #+BEGIN_CENTER #+LaTeX: \includegraphics<1>[width=.8\linewidth]{pdf_babel/CLT_illustration.pdf} #+END_CENTER *** The Normal distribution #+BEGIN_EXPORT latex \begin{overlayarea}{\linewidth}{4.5cm} \begin{center}% \includegraphics<1>[height=4.5cm]{pdf_babel/normal_distribution.pdf}% \includegraphics<2>[height=4.5cm]{images/Standard_deviation_diagram.pdf}% \end{center} \vspace{-5.7cm} \begin{flushright} \fbox{\textbf{Density}: $f_{\mu,\sigma}(x)= \frac{1}{\sigma\sqrt{2\pi}} e^{ -\frac{(x-\mu)^2}{2\sigma^2} }$} \end{flushright} \end{overlayarea} \uncover<1->{The smaller the variance the more ``spiky'' the distribution.} \uncover<2->{ #+END_EXPORT - Dark blue is less than one standard deviation from the mean\approx 68% of the set. - Two standard deviations from the mean (medium and dark blue)\approx95% - Three standard deviations (light, medium, and dark blue)\approx99.7% #+LaTeX: } *** The Normal distribution (property 1) The family of normal distributions is *closed under linear transformations*: if \rv{X} is normally distributed with mean $\mu$ and standard deviation $\sigma$, then the variable \rv{Y} = a\rv{X} + b is also normally distributed, with mean $a\mu + b$ and standard deviation $|a|\sigma$. #+begin_src R :results output graphics :file pdf_babel/normal_distribution_linearity.pdf :exports results :width 11 :height 4 :session par(mfrow=c(1,2)) hist(rnorm(10000,mean=0,sd=1),breaks=30) hist(3*rnorm(10000,mean=0,sd=1)+10,breaks=30) par(mfrow=c(1,1)) #+end_src #+RESULTS: [[file:pdf_babel/normal_distribution_linearity.pdf]] *** The Normal distribution (property 2) *Convolution*: if \rv{X_1} and \rv{X_2} are two independent normal random variables, with means $\mu_1$, $\mu_2$ and standard deviations $\sigma_1$, $\sigma_2$, then their sum $\rv{X_1} + \rv{X_2}$ will also be normally distributed, with mean $\mu_1 + \mu_2$ and variance $\sigma_1^2 + \sigma_2^2$. #+begin_src R :results output graphics :file pdf_babel/normal_distribution_convolution.pdf :exports results :width 11 :height 4 :session hist(rnorm(10000,mean=2,sd=3) + rnorm(10000,mean=3,sd=4),breaks=30) #+end_src #+RESULTS: [[file:pdf_babel/normal_distribution_convolution.pdf]] Intuitively, if $\rv{S^*_n}$ converges to something (say $\L$), it "/has to/" be a normal distribution: #+BEGIN_EXPORT latex $$\frac{1}{2}(\underbrace{\rv{S^*_{1\dots n}}}_{\sim\L} + \underbrace{\rv{S^*_{n+1\dots2n}}}_{\sim\L}) = \underbrace{\rv{S^*_{2n}}}_{\sim\L}$$ #+END_EXPORT *** Moment generating function of the normal distribution Let's assume $\rv{X}\sim\N(0,1)$. #+BEGIN_EXPORT latex \begin{align*} \M & = \int e^{tx}f_\N(x).dx = \int e^{tx}\frac{e^{-\frac{x^2}{2}}}{\sqrt{2\pi}}dx = \int \frac{e^{\frac{1}{2}(-x^2+2tx)}}{\sqrt{2\pi}}dx \\ & = \int \frac{e^{\frac{1}{2}(-(x-t)^2+t^2)}}{\sqrt{2\pi}}dx = e^{\frac{t^2}{2}} \int \frac{e^{\frac{-(x-t)^2}{2}}}{\sqrt{2\pi}}dx = e^{\frac{t^2}{2}} \int \frac{e^{\frac{-x^2}{2}}}{\sqrt{2\pi}}dx \\ & = e^{\frac{t^2}{2}} \end{align*} #+END_EXPORT Actually, if we assume $\rv{X}\sim\N(\mu,\sigma^2)$, one can easily prove in the same way that: #+BEGIN_EXPORT latex \begin{align*} \M = e^{\mu t + \frac{1}{2}\sigma^2t^2} \end{align*} #+END_EXPORT *** Proof of the CLT #+BEGIN_EXPORT latex $\boxed{ \begin{array}{l} \M = \E(e^{t\rv{X}})\approx 1+\mu t+ \sigma^2\frac{t^2}{2}+o(t^2)\\ \leadsto \log(\M[\rv{X-\mu}]) \approx \sigma^2\frac{t^2}{2}+o(t^2) \end{array} }$ \hfill $\boxed{\begin{cases}\rv{S_n}=\rv{X_1}+\dots+\rv{X_n}\\ \rv{S^*_n}=\frac{\rv{S_n}-n\mu}{\sigma\sqrt{n}}\end{cases}}$ #+END_EXPORT We have: #+BEGIN_EXPORT latex \begin{align*} \M[\rv{S^*_n}] & = \E(e^{t\rv{S^*_n}}) = \E(e^{t{\frac{\rv{S_n}-n\mu}{\sigma\sqrt{n}}}}) = \E(e^{\frac{t}{\sigma\sqrt{n}}(\rv{S_n}-n\mu)}) = \MM[\rv{S_n-n\mu}]\left(\frac{t}{\sigma\sqrt{n}}\right)\\ & = \Bigg(\MM[\rv{X-\mu}]\Big(\underbrace{\frac{t}{\sigma\sqrt{n}}}_{\xrightarrow[n\to\infty]{} 0}\Big)\Bigg)^n \text{\qquad (since $\M[\rv{X}+\rv{Y}]=\M[\rv{X}]\M[\rv{Y}]$)}\\ & = \exp\left(\!n\log\left(\!\MM[\rv{X-\mu}]\left(\!\frac{t}{\sigma\sqrt{n}}\!\right)\!\right)\!\right) = \exp\left(\!n\left(\!\sigma^2\frac{t^2}{2n\sigma^2}+o\left(\!\frac{t^2}{n^2}\!\right)\!\right)\!\right)\\ & = \exp\left(\frac{t^2}{2} + o(t^2/n)\right) \xrightarrow[n\to\infty]{} e^{t^2/2} \text{, which is the mgf of $\N(0,1)$}\qed \end{align*} #+END_EXPORT *** CLT = convergence of laws The law of \rv{S^*_n} converges to $\N(0,1)$. In other words, whatever the initial law of $X$: #+LaTeX: $$\lim_{n\to\infty} \P[a<\rv{S^*_n}= 1775.5 & X <= 1776.6])/length(X) df = data.frame(x = X, y = seq(1:length(X))) df$valid = 1 df[abs(df$x - mu) > ci, ]$valid = 0 ggplot(df, aes(x = x, y = y, color = factor(valid))) + geom_point() + geom_errorbarh(aes(xmax = x - ci, xmin = x + ci)) + geom_vline(xintercept = mu) + theme_classic() + guides(colour = guide_legend("")) + xlim(mu-3*ci,mu+3*ci) + ylab("Trial #") + xlab("Observation: sample mean with \nconfidence interval") + coord_flip() + ggtitle(paste(n," observations of the mean of ",N," samples")) #+end_src #+RESULTS: [[file:pdf_babel/CI_illustration.pdf]] #+BEGIN_EXPORT latex \begin{overlayarea}{\linewidth}{4.5cm} \begin{center}% \includegraphics<1>[height=4.5cm]{images/Standard_deviation_diagram.pdf}% \includegraphics<2>[height=4.5cm]{pdf_babel/CI_illustration.pdf}% \end{center} \end{overlayarea} #+END_EXPORT When $n$ is large: #+BEGIN_EXPORT latex \begin{center} \scalebox{.9}{$\displaystyle \P\left(\mu\in \left[\rv{S_n}-2\frac{\sigma}{\sqrt{n}},\rv{S_n}+2\frac{\sigma}{\sqrt{n}}\right]\right) = \P\left(\rv{S_n}\in \left[\mu-2\frac{\sigma}{\sqrt{n}},\mu+2\frac{\sigma}{\sqrt{n}}\right]\right) \approx 95\%$} \end{center} \uncover<2>{There is 95\% of chance that the \alert{true mean} lies within 2$\frac{\sigma}{\sqrt{n}}$ of the \alert{sample mean}.} #+END_EXPORT *** Without any particular hypothesis - Assume, you have evaluated two *alternatives* $A$ and $B$ on $n$ different *setups* - You therefore consider the associated random variables \rv{A} and \rv{B} and try to *estimate* their expected values $\mu_A$ and $\mu_B$ #+BEGIN_EXPORT latex \begin{center} \begin{overlayarea}{.9\linewidth}{4.5cm} \begin{center}% \includegraphics<1>[scale=.911,subfig=1]{fig/2sample_comp_1.fig}% \includegraphics<2>[scale=.911,subfig=2]{fig/2sample_comp_2.fig}% \includegraphics<3->[scale=.911,subfig=3]{fig/2sample_comp_3.fig}% \end{center} \end{overlayarea} \end{center} \vspace{-.8em} \begin{overlayarea}{\linewidth}{1.5cm}% \only<1>{The two 95\% confidence intervals do not overlap\vspace{-.8em} \begin{flushright} $\leadsto \mu_A<\mu_B$ with more than 90\% of confidence \smiley \end{flushright} }% \only<2>{The two 95\% confidence intervals do overlap\vspace{-.8em} \begin{flushright} $\leadsto$ Nothing can be concluded \frowny\\ Reduce C.I? \end{flushright} }% \only<3>{The two 70\% confidence intervals do not overlap\vspace{-.8em} \begin{flushright} $\leadsto\mu_A<\mu_B$ with less than 50\% of confidence \frowny $\leadsto$ more experiments... \end{flushright} }% \only<4->{The width of the confidence interval is proportional to $\frac{\sigma}{\sqrt{n}}$\vspace{-.8em} \begin{flushright} You can estimate how much more experiments you need\smiley\\ 4 times more to halve it! \frowny Try to \alert{reduce variance} if you can...\smiley \end{flushright} } \end{overlayarea} #+END_EXPORT