{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2ad87afe",
   "metadata": {},
   "source": [
    "# Discrete Data and the Multinomial Distribution\n",
    "\n",
    "- **[1]** (##) We consider IID data $D = \\{x_1,x_2,\\ldots,x_N\\}$ obtained from tossing a $K$-sided die. We use a *binary selection variable*\n",
    "$$x_{nk} \\equiv \\begin{cases} 1 & \\text{if $x_n$ lands on $k$th face}\\\\\n",
    "    0 & \\text{otherwise}\n",
    "\\end{cases}\n",
    "$$\n",
    "with probabilities $p(x_{nk} = 1)=\\theta_k$.      \n",
    "  (a) Write down the probability for the $n$th observation $p(x_n|\\theta)$ and derive the log-likelihood $\\log p(D|\\theta)$.     \n",
    "  (b) Derive the maximum likelihood estimate for $\\theta$.\n",
    "\n",
    "\n",
    "- **[2]** (#) In the notebook, Laplace's generalized rule of succession (the probability that we throw the $k$th face at the next toss) was derived as \n",
    "$$\\begin{align*}\n",
    "p(x_{\\bullet,k}=1|D) = \\frac{m_k + \\alpha_k }{ N+ \\sum_k \\alpha_k}\n",
    "\\end{align*}$$\n",
    "Provide an interpretation of the variables $m_k,N,\\alpha_k,\\sum_k\\alpha_k$.\n",
    "\n",
    "\n",
    "- **[3]** (##) Show that Laplace's generalized rule of succession can be worked out to a prediction that is composed of a prior prediction and data-based correction term. \n",
    "\n",
    "\n",
    "- **[4]** (#) Verify that    \n",
    "  (a) the categorial distribution is a special case of the multinomial for $N=1$.    \n",
    "  (b) the Bernoulli is a special case of the categorial distribution for $K=2$.    \n",
    "  (c) the binomial is a special case of the multinomial for $K=2$.\n",
    "\n",
    "<!---\n",
    "- **[5]** (###) Determine the mean, variance and mode of a Beta distribution.\n",
    "--->\n",
    "\n",
    "- **[5]** (###) Consider a data set of binary variables $D=\\{x_1,x_2,\\ldots,x_N\\}$ with a Bernoulli distribution $\\mathrm{Ber}(x_k|\\mu)$ as data generating distribution and a Beta prior for $\\mu$. Assume that you make $n$ observations with $x=1$ and $N-n$ observations with $x=0$. Now consider a new draw $x_\\bullet$. We are interested in computing $p(x_\\bullet|D)$. Show that the mean value for $p(x_\\bullet|D)$ lies in between the prior mean and Maximum Likelihood estimate.\n",
    "\n",
    "\n",
    "- **[6]** Consider a data set $D = \\{(x_1,y_1), (x_2,y_2),\\dots,(x_N,y_N)\\}$ with one-hot encoding for the $K$ discrete classes, i.e.,  $y_{nk} = 1$ if and only if $y_n \\in \\mathcal{C}_k$, else $y_{nk} = 0$. Also given are the class-conditional distribution $p(x_n| y_{nk}=1,\\theta) = \\mathcal{N}(x_n|\\mu_k,\\Sigma)$ and multinomial prior $p(y_{nk}=1) = \\pi_k$.       \n",
    "  (a) Proof that the joint log-likelihood is given by\n",
    "$$\\begin{equation*}\n",
    "\\log p(D|\\theta) =  \\sum_{n,k} y_{nk} \\log \\mathcal{N}(x_n|\\mu_k,\\Sigma) + \\sum_{n,k} y_{nk} \\log \\pi_k\n",
    "\\end{equation*}$$  \n",
    "  (b) Show now that the MLE of the *class-conditional* mean is given by\n",
    "$$\\begin{equation*}\n",
    " \\hat \\mu_k = \\frac{\\sum_n y_{nk} x_n}{\\sum_n y_{nk}} \n",
    "\\end{equation*}\n",
    "$$\n",
    "\n",
    "\n",
    "\n",
    "<!---\n",
    "- Show that the beta, categorical, multinomial and Dirichlet distributions are normalized.\n",
    "--->\n",
    "\n",
    "<!---\n",
    "- Show that the beta, categorical, multinomial and Dirichlet distributions are normalized.\n",
    "--->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fc095f2c",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.8.2",
   "language": "julia",
   "name": "julia-1.8"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}