{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Bayesian Machine Learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Preliminaries\n", "\n", "- Goals\n", " - Introduction to Bayesian (i.e., probabilistic) modeling\n", "- Materials\n", " - Mandatory\n", " - These lecture notes\n", " - Optional\n", " - Bishop pp. 68-74 (on the coin toss example)\n", " - [Ariel Caticha - 2012 - Entropic Inference and the Foundations of Physics](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Caticha-2012-Entropic-Inference-and-the-Foundations-of-Physics.pdf), pp.35-44 (section 2.9, on deriving Bayes rule for updating probabilities)\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Challenge: Predicting a Coin Toss\n", "\n", "- **Problem**: We observe the following sequence of heads (outcome $=1$) and tails (outcome $=0$) when tossing the same coin repeatedly $$D=\\{1011001\\}\\,.$$\n", "\n", "- What is the probability that heads comes up next?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- **Solution**: later in this lecture. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The Bayesian Machine Learning Framework\n", "\n", "- Suppose that your application is to predict a future observation $x$, based on $N$ past observations $D=\\{x_1,\\dotsc,x_N\\}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The Bayesian design approach to solving this task involves four stages: \n", "\n", "
| $\\log_{10} B_{12}$ | Evidence for $m_1$ |
| 0 to 0.5 | not worth mentioning |
| 0.5 to 1 | substantial |
| 1 to 2 | strong |
| >2 | decisive |

| Bayesian | Maximum Likelihood | |
| 1. Model Specification | Choose a model $m$ with data generating distribution $p(x|\\theta,m)$ and parameter prior $p(\\theta|m)$ | Choose a model $m$ with same data generating distribution $p(x|\\theta,m)$. No need for priors. |
| 2. Learning | use Bayes rule to find the parameter posterior,\n", "$$\n", "p(\\theta|D) \\propto p(D|\\theta) p(\\theta)\n", "$$ | By Maximum Likelihood (ML) optimization,\n", "$$ \n", " \\hat \\theta = \\arg \\max_{\\theta} p(D |\\theta)\n", "$$ |
| 3. Prediction | $$\n", "p(x|D) = \\int p(x|\\theta) p(\\theta|D) \\,\\mathrm{d}\\theta\n", "$$ | \n", "$$ \n", " p(x|D) = p(x|\\hat\\theta)\n", "$$ |