{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solutions for Chapter 2 Probability\n", "+ date: 2017-03-08\n", "+ tags: mlapp, statistics\n", "+ author: Peijun Zhu\n", "+ slug: mlapp-ch2-sol" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "$\\def\\bra#1{\\mathinner{\\langle{#1}|}}\n", "\\def\\ket#1{\\mathinner{|{#1}\\rangle}}\n", "\\def\\braket#1{\\mathinner{\\langle{#1}\\rangle}}\n", "\\newcommand{\\mdef}{\\overset{\\mathrm{def}}{=}}\n", "\\newcommand{\\bm}{\\mathbf}\n", "\\DeclareMathOperator{\\Var}{Var} \n", "\\DeclareMathOperator{\\det}{det} \n", "\\DeclareMathOperator{\\tr}{tr} \n", "\\DeclareMathOperator{\\E}{E} \n", "\\DeclareMathOperator{\\Cov}{Cov} \n", "\\DeclareMathOperator{\\Beta}{B} \n", "\\DeclareMathOperator{\\Bdist}{Beta}\n", "\\DeclareMathOperator{\\sgn}{sgn}\n", "\\DeclareMathOperator{\\adj}{adj}\n", "\\DeclareMathOperator{\\ii}{i} \n", "\\DeclareMathOperator{\\dd}{d} \n", "\\newcommand{\\pp}{\\partial} \n", "\\DeclareMathOperator{\\rhs}{RHS}\n", "\\DeclareMathOperator{\\lhs}{LHS}\n", "\\DeclareMathOperator{\\Beta}{B}$\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.1\n", "1. $P=1/3$\n", "1. $P=1/2$, the difference is that you happen to see a child." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2 Obvious" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.3\n", "\\begin{align}\n", "\\Var[x+y]&=\\E[(x+y)^2]-\\E[x+y]^2\\\\\n", "&=\\E[x^2]+\\E[y^2]+2\\E[xy]-\\E[x]^2-\\E[y]^2-2\\E[x]\\E[y]\\\\\n", "&=\\Var[x]+\\Var[y]+2\\Cov[x,y]\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.4\n", "$I, N$ means infected and not infected, $P$ means positive\n", "$$\\Pr(I|P)=\\frac{\\Pr(IP)}{\\Pr(P)}=\\frac{\\Pr(IP)}{\\Pr(IP)+\\Pr(NP)}\\approx\\frac{10^{-4}}{10^{-4}+0.01}\\approx 0.01$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.5\n", "The door we chose has probability $1/3$ for the prize. After the host has revealed another invalid door, the chance of the last door is $1-1/3=2/3$. Note the host is **not randomly** removing a door, so the chance of the two unchosen doors will fall to the last door." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.6\n", "1. $$P(H|e_1,e_2)=\\frac{P(H,e_1,e_2)}{P(e_1,e_2}=\\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}$$\n", " + Only ii is sufficient\n", "2. Given $P(e_1,e_2|H)=P(e_1|H)P(e_2|H)$, we have $$P(H|e_1,e_2)=\\frac{P(e_1,H)P(e_2,H)}{P(H)P(e_1,e_2)}$$\n", " + Use $P(e_1,e_2)=\\sum_h P(e_1,e_2|H=h)P(H=h)$, i is equivalent to iii\n", " + Both i, iii are sufficient now" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.7\n", "Pairwise independence means\n", "$$P(X_iX_j)=P(X_i)P(X_j), \\forall i\\neq j\\in \\{1,\\ldots ,n\\}$$\n", "\n", "Mutually independent means\n", "$$P(\\prod_{X\\in S}X)=\\prod_{X\\in S}, \\forall S\\subseteq \\{1,\\ldots ,n\\}$$\n", "\n", "Thus for $n\\geq 3$, the two group of identities are not necessarily equivalent.\n", "### Counter example\n", "$$X, Y\\sim \\operatorname{Ber}(0.5), \\quad Z=X\\oplus Y\\sim \\operatorname{Ber}(0.5)$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.8\n", "We want to prove $$P(x,y|z)=P(x|z)P(y|z)\\Leftrightarrow P(x,y|z)=g(x,z)h(y,z)$$\n", "\n", "The LHS to RHS is obvious. We are now going from RHS to LHS.\n", "First of all, $p(x,y|z)$ must satisfy unitary condition\n", "$$\\iint p(x,y|z)dxdy=\\left(\\int g(x,z)dx\\right)\\left(\\int h(y,z)dy\\right)=1$$\n", "and $$P(x|z)= g(x,z)\\int h(y,z)dy,\\qquad P(y|z)= h(y,z)\\int h(x,z)dx$$\n", "And the deduction to LHS is trivial now." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.9\n", "$$X\\perp W|Z,Y\\Rightarrow P(XWZY)=P(XZY)P(WZY)/P(ZY)$$\n", "$$X\\perp Y|Z\\Rightarrow P(XYZ)=P(XZ)P(YZ)/P(Z)$$\n", "$$X\\perp Y|W\\Rightarrow P(XYW)=P(XW)P(YW)/P(W)$$\n", "### a. True\n", "We need to prove $$P(XYWZ)=P(XZ)P(YWZ)/P(Z),$$\n", "which is the result of plugging 2nd equation into 1st one.\n", "### b. False\n", "We need to prove $$P(XYZW)=P(XZW)P(YZW)/P(ZW),$$\n", "which is not reachable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.10\n", "The pdf of $y=1/x$ is\n", "\n", "\\begin{align}\n", "g(y)&=f(x(y))\\left|\\frac{dx}{dy}\\right|=f(1/y)/y^2\\\\\n", "&=\\frac{b^a}{\\Gamma(a)}(1/y)^{a-1}e^{-b/y}/y^2\\\\\n", "&=\\frac{b^a}{\\Gamma(a)}y^{-(a+1)}e^{-b/y}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.11\n", "\\begin{align}\n", "Z^2&=\\int_0^{2\\pi}\\int_0^\\infty r\\exp\\left(-\\frac{r^2}{2\\sigma^2}\\right)drd\\theta\\\\\n", "&=2\\pi\\int_0^\\infty\\exp\\left(-\\frac{t}{\\sigma^2}\\right)dt,\\quad t=r^2/2\\\\\n", "&=2\\pi\\sigma^2\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.12\n", "Notice $-\\sum_x\\sum_y p(x,y)\\log p(x)=-\\sum_x p(x)\\log p(x)=H(X)$\n", "\n", "\\begin{align}\n", "I(X,Y)&=\\sum_x\\sum_y p(x,y)\\log\\frac{p(x,y)}{p(x)p(y)}\\\\\n", "&=H(X)+H(Y)-H(X,Y)\n", "\\end{align}\n", "\n", "\\begin{align}\n", "H(Y|X)&=-\\sum_y \\log p(x,y)\\log\\frac{p(x,y)}{p(x)}\\\\\n", "&=H(X,Y)-H(X)\n", "\\end{align}\n", "\n", "So $I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.13\n", "Suppose we have $$X_{\\pm}=\\frac{X_1\\pm X_2}{\\sqrt{2}}$$\n", "then $$\\begin{bmatrix}\n", "X_+\\\\\n", "X_-\n", "\\end{bmatrix}\\sim N\\left(\\mathbf{0},\\begin{bmatrix}\\sigma_+^2&\\\\&\\sigma_-^2\\end{bmatrix}\\right)=N(0,\\sigma_+^2)N(0,\\sigma_-^2)$$\n", "\n", "And $$\\sigma_+^2\\frac{(X_1+X_2)^2}{2}+\\sigma_-^2\\frac{(X_1-X_2)^2}{2}=\\sigma^2 (X_1^2+X_2^2)-2\\rho X_1X_2$$\n", "\n", "Thus $\\sigma_\\pm=(1\\pm\\rho)\\sigma^2$. And $X_+$ is independent on $X_-$. From $X_\\pm$, we can get the distribution of $X_1$ and $X_2$.\n", "\n", "If $\\rho=1$, then $X_1=X_2=X/\\sqrt 2$, the mutual information $I(X_1,X_2)=H(X_1)$\n", "\n", "If $\\rho=-1$, similiar.\n", "\n", "If $\\rho=0$, they are independent. So $I(X_1,X_2)=0$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.14\n", "1. $r=1-H(Y|X)/H(X)=(H(X)-H(Y|X))/H(X)=I(Y, X)/H(x)$\n", "1. It is obvious that $H(X)>0$.\n", " 1. $r\\geq 0$ because we have $I(X,Y)\\geq 0$ using Jensen inequality: $$I(X,Y)=-\\sum_{i,j} \\Pr(i,j)\\log \\frac{p_ip_j}{\\Pr(i,j)}\\geq - \\log \\sum_{i,j}p_ip_j=0$$\n", " 1. We need to prove $H(Y|X)\\geq 0$. And $H(Y|X)=H(X,Y)-H(Y)\\geq 0$. \n", " \n", "\\begin{align}\n", "H(X,Y)&=-\\sum_i\\sum_j \\Pr(i,j)\\log \\Pr(i,j)\\\\\n", "&\\geq -\\sum_i p_i\\log \\frac{\\sum_j p^2(i,j)}{p_i}\\\\\n", "&\\geq -\\sum_i p_i\\log \\frac{p_i^2}{p_i}=H(Y)\n", "\\end{align}\n", "\n", " As a result, $r\\leq 1$\n", "1. $r=0$ holds iff $I(X,Y)=0$. In this case, $\\dfrac{p_ip_j}{\\Pr(i,j)}$ is a constant, so $X$ is independent on $Y$\n", "1. $r=0$ holds iff $I(X,Y)=H(X)$. i.e. $X=Y$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.15\n", "MLE(Maximum Likehood Estimation) is to maximize the likehood\n", "$$\\prod_i q(x=i|\\theta)^{n_i}=\\exp(n_i\\log q_i)\\propto\\exp(\\sum_ip_i\\log q_i)$$\n", "\n", "It is equivalent to minimize $-\\sum_ip_i\\log q_i\\equiv -\\sum_ip_i\\log(q_i/p_i)$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.16\n", "$f=x^{a-1}(1-x)^{b-1}/\\Beta(a,b)\\propto x^{a-1}(1-x)^{b-1}$\n", "$$E(x)=\\int xf dx=\\Beta(a+1,b)/\\Beta(a,b)=\\frac{a}{a+b}$$\n", "We can prove the last step as follows:\n", "\n", "\\begin{align}\n", "\\Beta(p, q+1)&=\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n", "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n", "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n", "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\frac{p}{q}\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n", "&=\\Beta(p,q)-\\frac{p}{q}\\Beta(p, q+1)\n", "\\end{align}\n", "\n", "As a result, $\\Beta(p, q+1)=\\dfrac{q}{p+q}\\Beta(p,q)$. We can also use the relationship betweeen Beta and Gamma function, and $\\Gamma(p+1)=p\\Gamma(p)$\n", "$$\\Beta(p,q+1)=\\frac{\\Gamma(p)\\Gamma(q)}{\\Gamma(p+q+1)}=\\frac{q\\Gamma(p)\\Gamma(q)}{(p+q)\\Gamma(p+q)}=\\frac{q}{p+q}\\Beta(p,q)$$\n", "\n", "$$E(x^2)=\\int x^2f dx=\\Beta(a+2,b)/\\Beta(a,b)=\\frac{a}{a+b}\\frac{a+1}{a+b+1}$$\n", "\n", "$$\\frac{f'}{f}=\\frac{a-1}{x}+\\frac{b-1}{1-x}=0\\Rightarrow x_m=\\frac{a-1}{a-b}$$\n", "\n", "$$Var(x)=E(x^2)-E(x)^2=\\frac{ab}{(a+b)^2(a+b+1)}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.17\n", "The cdf(cumulative distribution function) of each variable is\n", "\n", "$$F_1(x)=\\Pr(X