{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Solutions for Chapter 2 Probability\n",
"+ date: 2017-03-08\n",
"+ tags: mlapp, statistics\n",
"+ author: Peijun Zhu\n",
"+ slug: mlapp-ch2-sol"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"$\\def\\bra#1{\\mathinner{\\langle{#1}|}}\n",
"\\def\\ket#1{\\mathinner{|{#1}\\rangle}}\n",
"\\def\\braket#1{\\mathinner{\\langle{#1}\\rangle}}\n",
"\\newcommand{\\mdef}{\\overset{\\mathrm{def}}{=}}\n",
"\\newcommand{\\bm}{\\mathbf}\n",
"\\DeclareMathOperator{\\Var}{Var} \n",
"\\DeclareMathOperator{\\det}{det} \n",
"\\DeclareMathOperator{\\tr}{tr} \n",
"\\DeclareMathOperator{\\E}{E} \n",
"\\DeclareMathOperator{\\Cov}{Cov} \n",
"\\DeclareMathOperator{\\Beta}{B} \n",
"\\DeclareMathOperator{\\Bdist}{Beta}\n",
"\\DeclareMathOperator{\\sgn}{sgn}\n",
"\\DeclareMathOperator{\\adj}{adj}\n",
"\\DeclareMathOperator{\\ii}{i} \n",
"\\DeclareMathOperator{\\dd}{d} \n",
"\\newcommand{\\pp}{\\partial} \n",
"\\DeclareMathOperator{\\rhs}{RHS}\n",
"\\DeclareMathOperator{\\lhs}{LHS}\n",
"\\DeclareMathOperator{\\Beta}{B}$\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.1\n",
"1. $P=1/3$\n",
"1. $P=1/2$, the difference is that you happen to see a child."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2 Obvious"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.3\n",
"\\begin{align}\n",
"\\Var[x+y]&=\\E[(x+y)^2]-\\E[x+y]^2\\\\\n",
"&=\\E[x^2]+\\E[y^2]+2\\E[xy]-\\E[x]^2-\\E[y]^2-2\\E[x]\\E[y]\\\\\n",
"&=\\Var[x]+\\Var[y]+2\\Cov[x,y]\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.4\n",
"$I, N$ means infected and not infected, $P$ means positive\n",
"$$\\Pr(I|P)=\\frac{\\Pr(IP)}{\\Pr(P)}=\\frac{\\Pr(IP)}{\\Pr(IP)+\\Pr(NP)}\\approx\\frac{10^{-4}}{10^{-4}+0.01}\\approx 0.01$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.5\n",
"The door we chose has probability $1/3$ for the prize. After the host has revealed another invalid door, the chance of the last door is $1-1/3=2/3$. Note the host is **not randomly** removing a door, so the chance of the two unchosen doors will fall to the last door."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.6\n",
"1. $$P(H|e_1,e_2)=\\frac{P(H,e_1,e_2)}{P(e_1,e_2}=\\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}$$\n",
" + Only ii is sufficient\n",
"2. Given $P(e_1,e_2|H)=P(e_1|H)P(e_2|H)$, we have $$P(H|e_1,e_2)=\\frac{P(e_1,H)P(e_2,H)}{P(H)P(e_1,e_2)}$$\n",
" + Use $P(e_1,e_2)=\\sum_h P(e_1,e_2|H=h)P(H=h)$, i is equivalent to iii\n",
" + Both i, iii are sufficient now"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.7\n",
"Pairwise independence means\n",
"$$P(X_iX_j)=P(X_i)P(X_j), \\forall i\\neq j\\in \\{1,\\ldots ,n\\}$$\n",
"\n",
"Mutually independent means\n",
"$$P(\\prod_{X\\in S}X)=\\prod_{X\\in S}, \\forall S\\subseteq \\{1,\\ldots ,n\\}$$\n",
"\n",
"Thus for $n\\geq 3$, the two group of identities are not necessarily equivalent.\n",
"### Counter example\n",
"$$X, Y\\sim \\operatorname{Ber}(0.5), \\quad Z=X\\oplus Y\\sim \\operatorname{Ber}(0.5)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.8\n",
"We want to prove $$P(x,y|z)=P(x|z)P(y|z)\\Leftrightarrow P(x,y|z)=g(x,z)h(y,z)$$\n",
"\n",
"The LHS to RHS is obvious. We are now going from RHS to LHS.\n",
"First of all, $p(x,y|z)$ must satisfy unitary condition\n",
"$$\\iint p(x,y|z)dxdy=\\left(\\int g(x,z)dx\\right)\\left(\\int h(y,z)dy\\right)=1$$\n",
"and $$P(x|z)= g(x,z)\\int h(y,z)dy,\\qquad P(y|z)= h(y,z)\\int h(x,z)dx$$\n",
"And the deduction to LHS is trivial now."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.9\n",
"$$X\\perp W|Z,Y\\Rightarrow P(XWZY)=P(XZY)P(WZY)/P(ZY)$$\n",
"$$X\\perp Y|Z\\Rightarrow P(XYZ)=P(XZ)P(YZ)/P(Z)$$\n",
"$$X\\perp Y|W\\Rightarrow P(XYW)=P(XW)P(YW)/P(W)$$\n",
"### a. True\n",
"We need to prove $$P(XYWZ)=P(XZ)P(YWZ)/P(Z),$$\n",
"which is the result of plugging 2nd equation into 1st one.\n",
"### b. False\n",
"We need to prove $$P(XYZW)=P(XZW)P(YZW)/P(ZW),$$\n",
"which is not reachable."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.10\n",
"The pdf of $y=1/x$ is\n",
"\n",
"\\begin{align}\n",
"g(y)&=f(x(y))\\left|\\frac{dx}{dy}\\right|=f(1/y)/y^2\\\\\n",
"&=\\frac{b^a}{\\Gamma(a)}(1/y)^{a-1}e^{-b/y}/y^2\\\\\n",
"&=\\frac{b^a}{\\Gamma(a)}y^{-(a+1)}e^{-b/y}\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.11\n",
"\\begin{align}\n",
"Z^2&=\\int_0^{2\\pi}\\int_0^\\infty r\\exp\\left(-\\frac{r^2}{2\\sigma^2}\\right)drd\\theta\\\\\n",
"&=2\\pi\\int_0^\\infty\\exp\\left(-\\frac{t}{\\sigma^2}\\right)dt,\\quad t=r^2/2\\\\\n",
"&=2\\pi\\sigma^2\n",
"\\end{align}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.12\n",
"Notice $-\\sum_x\\sum_y p(x,y)\\log p(x)=-\\sum_x p(x)\\log p(x)=H(X)$\n",
"\n",
"\\begin{align}\n",
"I(X,Y)&=\\sum_x\\sum_y p(x,y)\\log\\frac{p(x,y)}{p(x)p(y)}\\\\\n",
"&=H(X)+H(Y)-H(X,Y)\n",
"\\end{align}\n",
"\n",
"\\begin{align}\n",
"H(Y|X)&=-\\sum_y \\log p(x,y)\\log\\frac{p(x,y)}{p(x)}\\\\\n",
"&=H(X,Y)-H(X)\n",
"\\end{align}\n",
"\n",
"So $I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.13\n",
"Suppose we have $$X_{\\pm}=\\frac{X_1\\pm X_2}{\\sqrt{2}}$$\n",
"then $$\\begin{bmatrix}\n",
"X_+\\\\\n",
"X_-\n",
"\\end{bmatrix}\\sim N\\left(\\mathbf{0},\\begin{bmatrix}\\sigma_+^2&\\\\&\\sigma_-^2\\end{bmatrix}\\right)=N(0,\\sigma_+^2)N(0,\\sigma_-^2)$$\n",
"\n",
"And $$\\sigma_+^2\\frac{(X_1+X_2)^2}{2}+\\sigma_-^2\\frac{(X_1-X_2)^2}{2}=\\sigma^2 (X_1^2+X_2^2)-2\\rho X_1X_2$$\n",
"\n",
"Thus $\\sigma_\\pm=(1\\pm\\rho)\\sigma^2$. And $X_+$ is independent on $X_-$. From $X_\\pm$, we can get the distribution of $X_1$ and $X_2$.\n",
"\n",
"If $\\rho=1$, then $X_1=X_2=X/\\sqrt 2$, the mutual information $I(X_1,X_2)=H(X_1)$\n",
"\n",
"If $\\rho=-1$, similiar.\n",
"\n",
"If $\\rho=0$, they are independent. So $I(X_1,X_2)=0$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.14\n",
"1. $r=1-H(Y|X)/H(X)=(H(X)-H(Y|X))/H(X)=I(Y, X)/H(x)$\n",
"1. It is obvious that $H(X)>0$.\n",
" 1. $r\\geq 0$ because we have $I(X,Y)\\geq 0$ using Jensen inequality: $$I(X,Y)=-\\sum_{i,j} \\Pr(i,j)\\log \\frac{p_ip_j}{\\Pr(i,j)}\\geq - \\log \\sum_{i,j}p_ip_j=0$$\n",
" 1. We need to prove $H(Y|X)\\geq 0$. And $H(Y|X)=H(X,Y)-H(Y)\\geq 0$. \n",
" \n",
"\\begin{align}\n",
"H(X,Y)&=-\\sum_i\\sum_j \\Pr(i,j)\\log \\Pr(i,j)\\\\\n",
"&\\geq -\\sum_i p_i\\log \\frac{\\sum_j p^2(i,j)}{p_i}\\\\\n",
"&\\geq -\\sum_i p_i\\log \\frac{p_i^2}{p_i}=H(Y)\n",
"\\end{align}\n",
"\n",
" As a result, $r\\leq 1$\n",
"1. $r=0$ holds iff $I(X,Y)=0$. In this case, $\\dfrac{p_ip_j}{\\Pr(i,j)}$ is a constant, so $X$ is independent on $Y$\n",
"1. $r=0$ holds iff $I(X,Y)=H(X)$. i.e. $X=Y$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.15\n",
"MLE(Maximum Likehood Estimation) is to maximize the likehood\n",
"$$\\prod_i q(x=i|\\theta)^{n_i}=\\exp(n_i\\log q_i)\\propto\\exp(\\sum_ip_i\\log q_i)$$\n",
"\n",
"It is equivalent to minimize $-\\sum_ip_i\\log q_i\\equiv -\\sum_ip_i\\log(q_i/p_i)$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.16\n",
"$f=x^{a-1}(1-x)^{b-1}/\\Beta(a,b)\\propto x^{a-1}(1-x)^{b-1}$\n",
"$$E(x)=\\int xf dx=\\Beta(a+1,b)/\\Beta(a,b)=\\frac{a}{a+b}$$\n",
"We can prove the last step as follows:\n",
"\n",
"\\begin{align}\n",
"\\Beta(p, q+1)&=\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n",
"&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n",
"&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n",
"&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\frac{p}{q}\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n",
"&=\\Beta(p,q)-\\frac{p}{q}\\Beta(p, q+1)\n",
"\\end{align}\n",
"\n",
"As a result, $\\Beta(p, q+1)=\\dfrac{q}{p+q}\\Beta(p,q)$. We can also use the relationship betweeen Beta and Gamma function, and $\\Gamma(p+1)=p\\Gamma(p)$\n",
"$$\\Beta(p,q+1)=\\frac{\\Gamma(p)\\Gamma(q)}{\\Gamma(p+q+1)}=\\frac{q\\Gamma(p)\\Gamma(q)}{(p+q)\\Gamma(p+q)}=\\frac{q}{p+q}\\Beta(p,q)$$\n",
"\n",
"$$E(x^2)=\\int x^2f dx=\\Beta(a+2,b)/\\Beta(a,b)=\\frac{a}{a+b}\\frac{a+1}{a+b+1}$$\n",
"\n",
"$$\\frac{f'}{f}=\\frac{a-1}{x}+\\frac{b-1}{1-x}=0\\Rightarrow x_m=\\frac{a-1}{a-b}$$\n",
"\n",
"$$Var(x)=E(x^2)-E(x)^2=\\frac{ab}{(a+b)^2(a+b+1)}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.17\n",
"The cdf(cumulative distribution function) of each variable is\n",
"\n",
"$$F_1(x)=\\Pr(X