{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Solutions for Chapter 2 Probability\n",
    "+ date: 2017-03-08\n",
    "+ tags: mlapp, statistics\n",
    "+ author: Peijun Zhu\n",
    "+ slug: mlapp-ch2-sol"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<span class=\"mathmacros\" style=\"display:none;\">\n",
    "$\\def\\bra#1{\\mathinner{\\langle{#1}|}}\n",
    "\\def\\ket#1{\\mathinner{|{#1}\\rangle}}\n",
    "\\def\\braket#1{\\mathinner{\\langle{#1}\\rangle}}\n",
    "\\newcommand{\\mdef}{\\overset{\\mathrm{def}}{=}}\n",
    "\\newcommand{\\bm}{\\mathbf}\n",
    "\\DeclareMathOperator{\\Var}{Var} \n",
    "\\DeclareMathOperator{\\det}{det} \n",
    "\\DeclareMathOperator{\\tr}{tr}   \n",
    "\\DeclareMathOperator{\\E}{E}     \n",
    "\\DeclareMathOperator{\\Cov}{Cov} \n",
    "\\DeclareMathOperator{\\Beta}{B}  \n",
    "\\DeclareMathOperator{\\Bdist}{Beta}\n",
    "\\DeclareMathOperator{\\sgn}{sgn}\n",
    "\\DeclareMathOperator{\\adj}{adj}\n",
    "\\DeclareMathOperator{\\ii}{i}   \n",
    "\\DeclareMathOperator{\\dd}{d}   \n",
    "\\newcommand{\\pp}{\\partial}   \n",
    "\\DeclareMathOperator{\\rhs}{RHS}\n",
    "\\DeclareMathOperator{\\lhs}{LHS}\n",
    "\\DeclareMathOperator{\\Beta}{B}$\n",
    "</span>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1\n",
    "1. $P=1/3$\n",
    "1. $P=1/2$, the difference is that you happen to see a child."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 Obvious"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3\n",
    "\\begin{align}\n",
    "\\Var[x+y]&=\\E[(x+y)^2]-\\E[x+y]^2\\\\\n",
    "&=\\E[x^2]+\\E[y^2]+2\\E[xy]-\\E[x]^2-\\E[y]^2-2\\E[x]\\E[y]\\\\\n",
    "&=\\Var[x]+\\Var[y]+2\\Cov[x,y]\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4\n",
    "$I, N$ means infected and not infected, $P$ means positive\n",
    "$$\\Pr(I|P)=\\frac{\\Pr(IP)}{\\Pr(P)}=\\frac{\\Pr(IP)}{\\Pr(IP)+\\Pr(NP)}\\approx\\frac{10^{-4}}{10^{-4}+0.01}\\approx 0.01$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5\n",
    "The door we chose has probability $1/3$ for the prize. After the host has revealed another invalid door, the chance of the last door is $1-1/3=2/3$. Note the host is **not randomly** removing a door, so the chance of the two unchosen doors will fall to the last door."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.6\n",
    "1. $$P(H|e_1,e_2)=\\frac{P(H,e_1,e_2)}{P(e_1,e_2}=\\frac{P(e_1,e_2|H)P(H)}{P(e_1,e_2)}$$\n",
    " + Only ii is sufficient\n",
    "2. Given $P(e_1,e_2|H)=P(e_1|H)P(e_2|H)$, we have $$P(H|e_1,e_2)=\\frac{P(e_1,H)P(e_2,H)}{P(H)P(e_1,e_2)}$$\n",
    " + Use $P(e_1,e_2)=\\sum_h P(e_1,e_2|H=h)P(H=h)$, i is equivalent to iii\n",
    " + Both i, iii are sufficient now"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.7\n",
    "Pairwise independence means\n",
    "$$P(X_iX_j)=P(X_i)P(X_j), \\forall i\\neq j\\in \\{1,\\ldots ,n\\}$$\n",
    "\n",
    "Mutually independent means\n",
    "$$P(\\prod_{X\\in S}X)=\\prod_{X\\in S}, \\forall S\\subseteq \\{1,\\ldots ,n\\}$$\n",
    "\n",
    "Thus for $n\\geq 3$, the two group of identities are not necessarily equivalent.\n",
    "### Counter example\n",
    "$$X, Y\\sim \\operatorname{Ber}(0.5), \\quad Z=X\\oplus Y\\sim \\operatorname{Ber}(0.5)$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.8\n",
    "We want to prove $$P(x,y|z)=P(x|z)P(y|z)\\Leftrightarrow P(x,y|z)=g(x,z)h(y,z)$$\n",
    "\n",
    "The LHS to RHS is obvious. We are now going from RHS to LHS.\n",
    "First of all, $p(x,y|z)$ must satisfy unitary condition\n",
    "$$\\iint p(x,y|z)dxdy=\\left(\\int g(x,z)dx\\right)\\left(\\int h(y,z)dy\\right)=1$$\n",
    "and $$P(x|z)= g(x,z)\\int h(y,z)dy,\\qquad P(y|z)= h(y,z)\\int h(x,z)dx$$\n",
    "And the deduction to LHS is trivial now."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.9\n",
    "$$X\\perp W|Z,Y\\Rightarrow P(XWZY)=P(XZY)P(WZY)/P(ZY)$$\n",
    "$$X\\perp Y|Z\\Rightarrow P(XYZ)=P(XZ)P(YZ)/P(Z)$$\n",
    "$$X\\perp Y|W\\Rightarrow P(XYW)=P(XW)P(YW)/P(W)$$\n",
    "### a. True\n",
    "We need to prove $$P(XYWZ)=P(XZ)P(YWZ)/P(Z),$$\n",
    "which is the result of plugging  2nd equation into 1st one.\n",
    "### b. False\n",
    "We need to prove $$P(XYZW)=P(XZW)P(YZW)/P(ZW),$$\n",
    "which is not reachable."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.10\n",
    "The pdf of $y=1/x$ is\n",
    "\n",
    "\\begin{align}\n",
    "g(y)&=f(x(y))\\left|\\frac{dx}{dy}\\right|=f(1/y)/y^2\\\\\n",
    "&=\\frac{b^a}{\\Gamma(a)}(1/y)^{a-1}e^{-b/y}/y^2\\\\\n",
    "&=\\frac{b^a}{\\Gamma(a)}y^{-(a+1)}e^{-b/y}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.11\n",
    "\\begin{align}\n",
    "Z^2&=\\int_0^{2\\pi}\\int_0^\\infty r\\exp\\left(-\\frac{r^2}{2\\sigma^2}\\right)drd\\theta\\\\\n",
    "&=2\\pi\\int_0^\\infty\\exp\\left(-\\frac{t}{\\sigma^2}\\right)dt,\\quad t=r^2/2\\\\\n",
    "&=2\\pi\\sigma^2\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.12\n",
    "Notice $-\\sum_x\\sum_y p(x,y)\\log p(x)=-\\sum_x p(x)\\log p(x)=H(X)$\n",
    "\n",
    "\\begin{align}\n",
    "I(X,Y)&=\\sum_x\\sum_y p(x,y)\\log\\frac{p(x,y)}{p(x)p(y)}\\\\\n",
    "&=H(X)+H(Y)-H(X,Y)\n",
    "\\end{align}\n",
    "\n",
    "\\begin{align}\n",
    "H(Y|X)&=-\\sum_y \\log p(x,y)\\log\\frac{p(x,y)}{p(x)}\\\\\n",
    "&=H(X,Y)-H(X)\n",
    "\\end{align}\n",
    "\n",
    "So $I(X,Y)=H(X)-H(X|Y)=H(Y)-H(Y|X)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.13\n",
    "Suppose we have $$X_{\\pm}=\\frac{X_1\\pm X_2}{\\sqrt{2}}$$\n",
    "then $$\\begin{bmatrix}\n",
    "X_+\\\\\n",
    "X_-\n",
    "\\end{bmatrix}\\sim N\\left(\\mathbf{0},\\begin{bmatrix}\\sigma_+^2&\\\\&\\sigma_-^2\\end{bmatrix}\\right)=N(0,\\sigma_+^2)N(0,\\sigma_-^2)$$\n",
    "\n",
    "And $$\\sigma_+^2\\frac{(X_1+X_2)^2}{2}+\\sigma_-^2\\frac{(X_1-X_2)^2}{2}=\\sigma^2 (X_1^2+X_2^2)-2\\rho X_1X_2$$\n",
    "\n",
    "Thus $\\sigma_\\pm=(1\\pm\\rho)\\sigma^2$. And $X_+$ is independent on $X_-$. From $X_\\pm$, we can get the distribution of $X_1$ and $X_2$.\n",
    "\n",
    "If $\\rho=1$, then $X_1=X_2=X/\\sqrt 2$, the mutual information $I(X_1,X_2)=H(X_1)$\n",
    "\n",
    "If $\\rho=-1$, similiar.\n",
    "\n",
    "If $\\rho=0$, they are independent. So $I(X_1,X_2)=0$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.14\n",
    "1. $r=1-H(Y|X)/H(X)=(H(X)-H(Y|X))/H(X)=I(Y, X)/H(x)$\n",
    "1. It is obvious that $H(X)>0$.\n",
    " 1. $r\\geq 0$ because we have $I(X,Y)\\geq 0$ using Jensen inequality: $$I(X,Y)=-\\sum_{i,j} \\Pr(i,j)\\log \\frac{p_ip_j}{\\Pr(i,j)}\\geq - \\log \\sum_{i,j}p_ip_j=0$$\n",
    " 1. We need to prove $H(Y|X)\\geq 0$. And $H(Y|X)=H(X,Y)-H(Y)\\geq 0$. \n",
    " \n",
    "\\begin{align}\n",
    "H(X,Y)&=-\\sum_i\\sum_j \\Pr(i,j)\\log \\Pr(i,j)\\\\\n",
    "&\\geq -\\sum_i p_i\\log \\frac{\\sum_j p^2(i,j)}{p_i}\\\\\n",
    "&\\geq -\\sum_i p_i\\log \\frac{p_i^2}{p_i}=H(Y)\n",
    "\\end{align}\n",
    "\n",
    " As a result, $r\\leq 1$\n",
    "1. $r=0$ holds iff $I(X,Y)=0$. In this case, $\\dfrac{p_ip_j}{\\Pr(i,j)}$ is a constant, so $X$ is independent on $Y$\n",
    "1. $r=0$ holds iff $I(X,Y)=H(X)$. i.e. $X=Y$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.15\n",
    "MLE(Maximum Likehood Estimation) is to maximize the likehood\n",
    "$$\\prod_i q(x=i|\\theta)^{n_i}=\\exp(n_i\\log q_i)\\propto\\exp(\\sum_ip_i\\log q_i)$$\n",
    "\n",
    "It is equivalent to minimize $-\\sum_ip_i\\log q_i\\equiv -\\sum_ip_i\\log(q_i/p_i)$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.16\n",
    "$f=x^{a-1}(1-x)^{b-1}/\\Beta(a,b)\\propto x^{a-1}(1-x)^{b-1}$\n",
    "$$E(x)=\\int xf dx=\\Beta(a+1,b)/\\Beta(a,b)=\\frac{a}{a+b}$$\n",
    "We can prove the last step as follows:\n",
    "\n",
    "\\begin{align}\n",
    "\\Beta(p, q+1)&=\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n",
    "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n",
    "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\int_0^1 x^{p}(1-x)^{q-1}dx\\\\\n",
    "&=\\int_0^1 x^{p-1}(1-x)^{q-1}dx-\\frac{p}{q}\\int_0^1 x^{p-1}(1-x)^{q}dx\\\\\n",
    "&=\\Beta(p,q)-\\frac{p}{q}\\Beta(p, q+1)\n",
    "\\end{align}\n",
    "\n",
    "As a result, $\\Beta(p, q+1)=\\dfrac{q}{p+q}\\Beta(p,q)$. We can also use the relationship betweeen Beta and Gamma function, and $\\Gamma(p+1)=p\\Gamma(p)$\n",
    "$$\\Beta(p,q+1)=\\frac{\\Gamma(p)\\Gamma(q)}{\\Gamma(p+q+1)}=\\frac{q\\Gamma(p)\\Gamma(q)}{(p+q)\\Gamma(p+q)}=\\frac{q}{p+q}\\Beta(p,q)$$\n",
    "\n",
    "$$E(x^2)=\\int x^2f dx=\\Beta(a+2,b)/\\Beta(a,b)=\\frac{a}{a+b}\\frac{a+1}{a+b+1}$$\n",
    "\n",
    "$$\\frac{f'}{f}=\\frac{a-1}{x}+\\frac{b-1}{1-x}=0\\Rightarrow x_m=\\frac{a-1}{a-b}$$\n",
    "\n",
    "$$Var(x)=E(x^2)-E(x)^2=\\frac{ab}{(a+b)^2(a+b+1)}$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.17\n",
    "The cdf(cumulative distribution function) of each variable is\n",
    "\n",
    "$$F_1(x)=\\Pr(X<x)=x$$\n",
    "\n",
    "So the cdf that the smallest one is smaller than $x$ is $$F_m(x)=1-(1-F_1(x))^2=x(2-x)$$\n",
    "\n",
    "So the density function is $f_m=F_m'=2(1-x)$. The subscript $m$ means minimum"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}