{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d3027ced",
   "metadata": {},
   "source": [
    "# Bayesian Machine Learning\n",
    "\n",
    "\n",
    "- **[1]** (#) (a) Explain shortly the relation between machine learning and Bayes rule.     \n",
    "   (b) How are Maximum a Posteriori (MAP) and Maximum Likelihood (ML) estimation related to Bayes rule and machine learning?\n",
    "\n",
    "\n",
    "- **[2]** (#) What are the four stages of the Bayesian design approach?\n",
    "\n",
    "\n",
    "- **[3]** (##) The Bayes estimate is a summary of a posterior distribution by a delta distribution on its mean, i.e., \n",
    "$$\n",
    "\\hat \\theta_{bayes}  = \\int \\theta \\, p\\left( \\theta |D \\right)\n",
    "\\,\\mathrm{d}{\\theta}\n",
    "$$\n",
    "Proof that the Bayes estimate minimizes the mean-squared error, i.e., proof that\n",
    "$$\n",
    "\\hat \\theta_{bayes} = \\arg\\min_{\\hat \\theta} \\int_\\theta (\\hat \\theta -\\theta)^2 p \\left( \\theta |D \\right) \\,\\mathrm{d}{\\theta}\n",
    "$$\n",
    "   \n",
    "- **[4]** (##) We consider the coin toss example from the notebook and use a conjugate prior for a Bernoulli likelihood function.    \n",
    "  (a) Derive the Maximum Likelihood estimate.    \n",
    "  (b) Derive the MAP estimate.          \n",
    "  (c) Do these two estimates ever coincide (if so under what circumstances)?    \n",
    "\n",
    "\n",
    "- **[5]** (##) A model $m_1$ is described by a single parameter $\\theta$, with $0 \\leq \\theta \\leq1 $. The system can produce data $x \\in \\{0,1\\}$. The sampling distribution and prior are given by\n",
    "$$\\begin{aligned}\n",
    "p(x|\\theta,m_1) &=  \\theta^x (1-\\theta)^{(1-x)} \\\\\n",
    "p(\\theta|m_1) &= 6\\theta(1-\\theta)\n",
    "\\end{aligned}$$  \n",
    "  (a) Work out the probability $p(x=1|m_1)$.        \n",
    "  (b) Determine the posterior $p(\\theta|x=1,m_1)$.        \n",
    "\n",
    "Now consider a second model $m_2$ with the following sampling distribution and prior on $0 \\leq \\theta \\leq 1$:\n",
    "$$\\begin{aligned}\n",
    "p(x|\\theta,m_2) &= (1-\\theta)^x \\theta^{(1-x)} \\\\\n",
    "p(\\theta|m_2) &= 2\\theta\n",
    "\\end{aligned}$$\n",
    "  (c) ​Determine the probability $p(x=1|m_2)$.          \n",
    "\n",
    "Now assume that the model priors are given by\n",
    "  $$\\begin{aligned}\n",
    "    p(m_1) &= 1/3  \\\\\n",
    "    p(m_2) &= 2/3\n",
    "    \\end{aligned}$$       \n",
    "  (d) Compute the probability $p(x=1)$ by \"Bayesian model averaging\", i.e., by weighing the predictions of both models appropriately.         \n",
    "  (e) Compute the fraction of posterior model probabilities $\\frac{p(m_1|x=1)}{p(m_2|x=1)}$.          \n",
    "  (f) Which model do you prefer after observation $x=1$?\n",
    "  \n",
    "\n",
    "​\n",
    "<!---\n",
    "- **[6]** (###) Given a single observation $x_0$ from a uniform distribution $\\mathrm{Unif}[0,1/\\theta]$, where $\\theta > 0$.  \n",
    "  (a) Show that $\\mathbb{E}[g(x_0)] = \\theta$  if and only if $\\int_0^{1/\\theta} g(u) du =1$.     \n",
    "  (b) Show that there is no function $g$ that satisfies the condition for all $\\theta > 0$.     \n",
    "--->"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27db69a5",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.8.2",
   "language": "julia",
   "name": "julia-1.8"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.8.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}