{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Bayesian Machine Learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Preliminaries\n", "\n", "- Goals\n", " - Introduction to Bayesian (i.e., probabilistic) modeling\n", "- Materials\n", " - Mandatory\n", " - These lecture notes\n", " - Optional\n", " - Bishop pp. 21-24 " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example Problem: Predicting a Coin Toss\n", "\n", "- **Question**. We observe a the following sequence of heads (h) and tails (t) when tossing the same coin repeatedly $$D=\\{hthhtth\\}\\,.$$\n", "\n", "- What is the probability that heads (h) comes up next?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- **Answer** later in this lecture. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### The Bayesian Machine Learning Framework\n", "\n", "- Suppose that your task is to predict a future observation $x$, based on $N$ past observations $D=\\{x_1,\\dotsc,x_N\\}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The Bayesian approach for this task involves three stages: \n", " 1. Model specification\n", " 1. Parameter estimation (i.e., learning from observed data; using Bayesian inference)\n", " 1. Prediction (apply the model)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ " \n", "- Next, we discuss these three stages in a bit more detail." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### (1) Model specification\n", "\n", "Your first task is to propose a model with tuning parameters $\\theta$ for generating the observations $x$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- This involves specification of $p(x|\\theta)$ and a prior for the parameters $p(\\theta)$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- _You_ choose the distribution $p(x|\\theta)$ based on your physical understanding of the data generating process.\n", "- Note that, for a given data set $D=\\{x_1,x_2,\\dots,x_N\\}$ with independent observations $x_n$,\n", "$$ p(D|\\theta) = \\prod_{n=1}^N p(x_n|\\theta)$$\n", "so usually you select a model for generating one observation $x_n$ and then use (in-)dependence assumptions to combine these models into a model for your observed data set $D$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- _You_ choose the prior $p(\\theta)$ to reflect what you know about the parameter values before you see the data $D$.\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### (2) Parameter estimation\n", "\n", "- After model specification, you need to measure/collect a data set $D$. Then, use Bayes rule to find the posterior distribution for the parameters,\n", "$$\n", "p(\\theta|D) = \\frac{p(D|\\theta) p(\\theta)}{p(D)} \\propto p(D|\\theta) p(\\theta)\n", "$$ \n", " - The denominator is only a normalization factor." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Note that there's **no need for you to design a _smart_ parameter estimation algorithm**. The only complexity lies in the computational issues. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- This \"recipe\" works only if the RHS factors can be evaluated; this is what machine learning is about \n", " $\\Rightarrow$ **Machine learning is EASY, apart from computational details :) **\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### (3) Prediction\n", "\n", "- Given the data $D$, our knowledge about the yet unobserved datum $x$ is captured by\n", "$$\\begin{align*}\n", "p(x|D) &= \\int p(x,\\theta|D) \\,\\mathrm{d}\\theta\\\\\n", " &= \\int p(x|\\theta,D) p(\\theta|D) \\,\\mathrm{d}\\theta\\\\\n", " &= \\int p(x|\\theta) p(\\theta|D) \\,\\mathrm{d}\\theta\\\\\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Again, **no need to invent a special prediction algorithm**. Probability theory takes care of all that. The complexity of prediction is just computational: how to carry out the marginalization over $\\theta$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- In order to execute prediction, you need to have access to the factors $p(x|\\theta)$ and $p(\\theta|D)$. Where do these factors come from? Are they available?\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- What did we learn from $D$? Without access to $D$, we would predict new observations through\n", "$$\n", "p(x) = \\int p(x,\\theta) \\,\\mathrm{d}\\theta = \\int p(x|\\theta) p(\\theta) \\,\\mathrm{d}\\theta\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- NB The application of the learned posterior $p(\\theta|D)$ not necessarily has to be prediction. We use it here as an example, but other applications are of course also possible. \n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Bayesian Model Comparison \n", "\n", "- There appears to be a remaining problem: How good really were our model assumptions $p(x|\\theta)$ and $p(\\theta)$? " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Technically, this is a **model comparison** problem" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- [**Q**.] What if I have more candidate models, say $\\mathcal{M} = \\{m_1,\\ldots,m_K\\}$ where each model relates to specific prior $p(\\theta|m_k)$ and likelihood $p(D|\\theta,m_k)$? Can we evaluate the relative performance of a model against another model from the set?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- [**A**.]: Start again with **model specification**. Specify a prior $p(m_k)$ for each of the models and then solve the desired inference problem: \n", "$$\\begin{align*} \n", "p(m_k|D) &= \\frac{p(D|m_k) p(m_k)}{p(D)} \\\\\n", " &\\propto p(m_k) \\cdot p(D|m_k) \\\\\n", " &= p(m_k)\\cdot \\int_\\theta p(D,\\theta|m_k) \\,\\mathrm{d}\\theta\\\\\n", " &= \\underbrace{p(m_k)}_{\\substack{\\text{model}\\\\\\text{prior}}}\\cdot \\int_\\theta \\underbrace{p(D|\\theta,m_k)}_{\\text{likelihood}} \\,\\underbrace{p(\\theta|m_k)}_{\\text{prior}}\\, \\mathrm{d}\\theta\\\\\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Bayesian Model Comparison (continued) \n", "\n", "- You, the engineer, have to choose the factors $p(D|\\theta,m_k)$, $p(\\theta|m_k)$ and $p(m_k)$. After that, for a given data set $D$, the model posterior $p(m_k|D)$ can be computed. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- If you need to work with one model,select the model with largest posterior $p(m_k|D)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Alternatively, if you don't want to choose a model, you can do prediction by **Bayesian model averaging** to utilitize the predictive power from all models:\n", "$$\\begin{align*}\n", "p(x|D) &= \\sum_k \\int p(x,\\theta,m_k|D)\\,\\mathrm{d}\\theta \\\\\n", " &= \\sum_k \\underbrace{p(m_k|D)}_{\\substack{\\text{model}\\\\\\text{posterior}}} \\cdot \\int \\underbrace{p(\\theta|D)}_{\\substack{\\text{parameter}\\\\\\text{posterior}}} \\, \\underbrace{p(x|\\theta,m_k)}_{\\text{likelihood}} \\,\\mathrm{d}\\theta\n", "\\end{align*}$$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- $\\Rightarrow$ In a Bayesian framework, **model comparison** follows the same recipe as parameter estimation; it just works at one higher hierarchical level." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- More on this in part 2 (Tjalkens). " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Machine Learning and the Scientific Method Revisited\n", "\n", "- Bayesian probability theory provides a unified framework for information processing (and even the Scientific Method).\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Now Solve the Example Problem: Predicting a Coin Toss\n", "\n", "- We observe a the following sequence of heads (h) and tails (t) when tossing the same coin repeatedly $$D=\\{hthhtth\\}\\,.$$\n", "\n", "- What is the probability that heads (h) comes up next? We solve this in the next slides ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Coin toss example (1): Model Specification\n", "\n", "We observe a sequence of $N$ coin tosses $D=\\{x_1,\\ldots,x_N\\}$ with $n$ heads. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "##### Likelihood\n", "- Assume a Bernoulli distributed variable $p(x_k=h|\\mu)=\\mu$, which leads to a **binomial** distribution for the likelihood (assume $n$ times heads were thrown):\n", "$$ \n", "p(D|\\mu) = \\prod_{k=1}^N p(x_k|\\mu) = \\mu^n (1-\\mu)^{N-n}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "##### Prior\n", "- Assume the prior belief is governed by a **beta distribution**\n", "$$\n", "p(\\mu) = \\mathcal{B}(\\mu|\\alpha,\\beta) = \\frac{\\Gamma(\\alpha+\\beta)}{\\Gamma(\\alpha)\\Gamma(\\beta)} \\mu^{\\alpha-1}(1-\\mu)^{\\beta-1}\n", "$$\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The Beta distribution is a **conjugate prior** for the Binomial distribution, which means that \n", "$$\n", "\\text{beta} \\propto \\text{binomial} \\times \\text{beta}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- $\\alpha$ and $\\beta$ are called **hyperparameters**, since they parameterize the distribution for another parameter ($\\mu$). E.g., $\\alpha=\\beta=1$ (uniform).\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Coin toss example (2): Parameter estimation\n", "\n", "- Infer posterior PDF over $\\mu$ through Bayes rule\n", "\n", "$$\\begin{align*}\n", "p(\\mu|D) &\\propto p(D|\\mu)\\,p(\\mu|\\alpha,\\beta) \\\\ \n", " &= \\mu^n (1-\\mu)^{N-n} \\times \\mu^{\\alpha-1} (1-\\mu)^{\\beta-1} \\\\\n", " &= \\mu^{n+\\alpha-1} (1-\\mu)^{N-n+\\beta-1} \n", "\\end{align*}$$\n", "\n", "hence the posterior is also beta distributed as\n", "\n", "$$\n", "p(\\mu|D) = \\mathcal{B}(\\mu|\\,n+\\alpha, N-n+\\beta)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Essentially, **here ends the machine learning activity**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Coin Toss Example (3): Prediction\n", "\n", "- Now, we want to **use** the trained model. Let's use it to predict future observations. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Marginalize over the parameter posterior to get the predictive PDF for a new coin toss $x_\\bullet$, given the data $D$,\n", "\n", "$$\\begin{align*}\n", "p(x_\\bullet=h|D) &= \\int_0^1 p(x_\\bullet=h|\\mu)\\,p(\\mu|D) \\,\\mathrm{d}\\mu \\\\\n", " &= \\int_0^1 \\mu \\times \\mathcal{B}(\\mu|\\,n+\\alpha, N-n+\\beta) \\,\\mathrm{d}\\mu \\\\\n", " &= \\frac{n+\\alpha}{N+\\alpha+\\beta} \\qquad \\mbox{(a.k.a. Laplace rule)}\\hfill\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Finally, we're ready to solve our example problem: for $D=\\{hthhtth\\}$ and uniform prior ($\\alpha=\\beta=1$), we get\n", "\n", "$$ p(x_\\bullet=h|D)=\\frac{n+1}{N+2} = \\frac{4+1}{7+2} = \\frac{5}{9}$$\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Coin Toss Example: What did we learn?\n", "\n", "- What did we learn from the data? Before seeing any data, we think that $$p(x_\\bullet=h)=\\left. p(x_\\bullet=h|D) \\right|_{n=N=0} = \\frac{\\alpha}{\\alpha + \\beta}\\,.$$ " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- After the $N$ coin tosses, we think that $p(x_\\bullet=h|D) = \\frac{n+\\alpha}{N+\\alpha+\\beta}$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Note the following decomposition\n", "\n", "$$\\begin{align*}\n", " p(x_\\bullet=h|\\,D) &= \\frac{n+\\alpha}{N+\\alpha+\\beta} = \\frac{n}{N+\\alpha+\\beta} + \\frac{\\alpha}{N+\\alpha+\\beta} \\\\\n", " &= \\frac{N}{N+\\alpha+\\beta}\\cdot \\frac{n}{N} + \\frac{\\alpha+\\beta}{N+\\alpha+\\beta} \\cdot \\frac{\\alpha}{\\alpha+\\beta} \\\\\n", " &= \\underbrace{\\frac{\\alpha}{\\alpha+\\beta}}_{prior} + \\underbrace{\\frac{N}{N+\\alpha+\\beta}}_{gain}\\cdot \\big( \\underbrace{\\frac{n}{N}}_{MLE} - \\underbrace{\\frac{\\alpha}{\\alpha+\\beta}}_{prior} \\big)\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Note that, since $0\\leq\\text{gain}\\lt 1$, the Bayesian estimate lies between prior and maximum likelihood estimate." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- For large $N$, the gain goes to $1$ and $p(x_\\bullet=h|D)$ goes to the maximum likelihood estimate (the relative frequency) $n/N$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### CODE EXAMPLE\n", "\n", "**Bayesian evolution of $p(\\mu|D)$ for the coin toss**\n", "\n", "Let's see how $p(\\mu|D)$ evolves as we increase the number of coin tosses $N$. We'll use two different priors to demonstrate the effect of the prior on the posterior (set $N=0$ to inspect the prior)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ " \n" ], "text/plain": [ "HTML{String}(\" \\n\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "HTML{String}(\"\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "HTML{String}(\"\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ " \n" ], "text/plain": [ "HTML{String}(\" \\n\")" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "
\n", "WebIO.mount(this.previousSibling,{"props":{},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{"className":"field"},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{},"nodeType":"Scope","type":"node","instanceArgs":{"imports":{"data":[{"name":"knockout","type":"js","url":"/assetserver/88bf0b823d344fcff0b77616d12840286b6c8fa3-knockout.js"},{"name":"knockout_punches","type":"js","url":"/assetserver/2bd5b097e96d6b4fb8b3b86c01ec3eb6f00ba8bc-knockout_punches.js"},{"name":null,"type":"js","url":"/assetserver/33fc17ead5d6c83c74b2449f7c19558c51387837-all.js"},{"name":null,"type":"css","url":"/assetserver/f0f1a82ab037979b58e3919f91c0a1436c2b13ea-style.css"},{"name":null,"type":"css","url":"/assetserver/84fa6bb423ab1a438691a358fbcb123d0820e96c-main.css"}],"type":"async_block"},"id":"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1","handlers":{"_promises":{"importsLoaded":[function (ko, koPunches) {\n", " ko.punches.enableAll();\n", " ko.bindingHandlers.numericValue = {\n", " init : function(element, valueAccessor, allBindings, data, context) {\n", " var stringified = ko.observable(ko.unwrap(valueAccessor()));\n", " stringified.subscribe(function(value) {\n", " var val = parseFloat(value);\n", " if (!isNaN(val)) {\n", " valueAccessor()(val);\n", " }\n", " })\n", " valueAccessor().subscribe(function(value) {\n", " var str = JSON.stringify(value);\n", " if ((str == "0") && (["-0", "-0."].indexOf(stringified()) >= 0))\n", " return;\n", " if (["null", ""].indexOf(str) >= 0)\n", " return;\n", " stringified(str);\n", " })\n", " ko.applyBindingsToNode(element, { value: stringified, valueUpdate: allBindings.get('valueUpdate')}, context);\n", " }\n", " };\n", " var json_data = JSON.parse("{\\"changes\\":0,\\"value\\":96}");\n", " var self = this;\n", " function AppViewModel() {\n", " for (var key in json_data) {\n", " var el = json_data[key];\n", " this[key] = Array.isArray(el) ? ko.observableArray(el) : ko.observable(el);\n", " }\n", " \n", " \n", " [this["changes"].subscribe((function (val){!(this.valueFromJulia["changes"]) ? (WebIO.setval({"name":"changes","scope":"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1","id":"ob_02","type":"observable"},val)) : undefined; return this.valueFromJulia["changes"]=false}),self),this["value"].subscribe((function (val){!(this.valueFromJulia["value"]) ? (WebIO.setval({"name":"value","scope":"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1","id":"ob_01","type":"observable"},val)) : undefined; return this.valueFromJulia["value"]=false}),self)]\n", " \n", " }\n", " self.model = new AppViewModel();\n", " self.valueFromJulia = {};\n", " for (var key in json_data) {\n", " self.valueFromJulia[key] = false;\n", " }\n", " ko.applyBindings(self.model, self.dom);\n", "}\n", "]},"changes":[(function (val){return (val!=this.model["changes"]()) ? (this.valueFromJulia["changes"]=true, this.model["changes"](val)) : undefined})],"value":[(function (val){return (val!=this.model["value"]()) ? (this.valueFromJulia["value"]=true, this.model["value"](val)) : undefined})]},"systemjs_options":null,"observables":{"changes":{"sync":false,"id":"ob_02","value":0},"value":{"sync":true,"id":"ob_01","value":96}}},"children":[{"props":{"attributes":{"class":"interact-flex-row"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{"attributes":{"class":"interact-flex-row-left"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{"className":"interact ","style":{"padding":"5px 10px 0px 10px"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"label"},"children":["N"]}]},{"props":{"attributes":{"class":"interact-flex-row-center"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{"max":192,"min":0,"attributes":{"type":"range","data-bind":"numericValue: value, valueUpdate: 'input', event: {change : function () {this.changes(this.changes()+1)}}","orient":"horizontal"},"step":1,"className":"slider slider is-fullwidth","style":{}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"input"},"children":[]}]},{"props":{"attributes":{"class":"interact-flex-row-right"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[{"props":{"attributes":{"data-bind":"text: value"}},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"p"},"children":[]}]}]}]}]},{"props":{},"nodeType":"Scope","type":"node","instanceArgs":{"imports":{"data":[],"type":"async_block"},"id":"scope-b2cdd369-f329-483a-82fb-04f269a831ce","handlers":{"obs-output":[function (updated_htmlstr) {\n", " var el = this.dom.querySelector("#out");\n", " WebIO.propUtils.setInnerHtml(el, updated_htmlstr);\n", "}]},"systemjs_options":null,"observables":{"obs-output":{"sync":false,"id":"ob_05","value":"<div class='display:none'></div><unsafe-script style='display:none'>\\nWebIO.mount(this.previousSibling,{&quot;props&quot;:{&quot;attributes&quot;:{&quot;class&quot;:&quot;interact-flex-row&quot;}},&quot;nodeType&quot;:&quot;DOM&quot;,&quot;type&quot;:&quot;node&quot;,&quot;instanceArgs&quot;:{&quot;namespace&quot;:&quot;html&quot;,&quot;tag&quot;:&quot;div&quot;},&quot;children&quot;:[{&quot;props&quot;:{&quot;setInnerHtml&quot;:&quot;&lt;img src=&#39;&#39;&gt;&lt;/img&gt;&quot;},&quot;nodeType&quot;:&quot;DOM&quot;,&quot;type&quot;:&quot;node&quot;,&quot;instanceArgs&quot;:{&quot;namespace&quot;:&quot;html&quot;,&quot;tag&quot;:&quot;div&quot;},&quot;children&quot;:[]}]})</unsafe-script>"}}},"children":[{"props":{"id":"out","setInnerHtml":"<div class='display:none'></div><unsafe-script style='display:none'>\\nWebIO.mount(this.previousSibling,{&quot;props&quot;:{&quot;attributes&quot;:{&quot;class&quot;:&quot;interact-flex-row&quot;}},&quot;nodeType&quot;:&quot;DOM&quot;,&quot;type&quot;:&quot;node&quot;,&quot;instanceArgs&quot;:{&quot;namespace&quot;:&quot;html&quot;,&quot;tag&quot;:&quot;div&quot;},&quot;children&quot;:[{&quot;props&quot;:{&quot;setInnerHtml&quot;:&quot;&lt;img src=&#39;&#39;&gt;&lt;/img&gt;&quot;},&quot;nodeType&quot;:&quot;DOM&quot;,&quot;type&quot;:&quot;node&quot;,&quot;instanceArgs&quot;:{&quot;namespace&quot;:&quot;html&quot;,&quot;tag&quot;:&quot;div&quot;},&quot;children&quot;:[]}]})</unsafe-script>"},"nodeType":"DOM","type":"node","instanceArgs":{"namespace":"html","tag":"div"},"children":[]}]}]})\n", "
" ], "text/plain": [ "Widget{:manipulate,Any}(OrderedDict{Symbol,Any}(:N=>Widget{:slider,Int64}(OrderedDict{Symbol,Any}(:changes=>Observable{Int64} with 1 listeners. Value:\n", "0,:value=>Observable{Int64} with 2 listeners. Value:\n", "96), Observable{Int64} with 2 listeners. Value:\n", "96, Scope(\"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1\", Node{DOM}(DOM(:html, :div), Any[Node{DOM}(DOM(:html, :div), Any[Node{DOM}(DOM(:html, :label), Any[\"N\"], Dict{Symbol,Any}(:className=>\"interact \",:style=>Dict{Any,Any}(:padding=>\"5px 10px 0px 10px\")), 1)], Dict{Symbol,Any}(:attributes=>Dict(\"class\"=>\"interact-flex-row-left\")), 2), Node{DOM}(DOM(:html, :div), Any[Node{DOM}(DOM(:html, :input), Any[], Dict{Symbol,Any}(:max=>192,:min=>0,:attributes=>Dict{Any,Any}(:type=>\"range\",Symbol(\"data-bind\")=>\"numericValue: value, valueUpdate: 'input', event: {change : function () {this.changes(this.changes()+1)}}\",\"orient\"=>\"horizontal\"),:step=>1,:className=>\"slider slider is-fullwidth\",:style=>Dict{Any,Any}()), 0)], Dict{Symbol,Any}(:attributes=>Dict(\"class\"=>\"interact-flex-row-center\")), 1), Node{DOM}(DOM(:html, :div), Any[Node{DOM}(DOM(:html, :p), Any[], Dict{Symbol,Any}(:attributes=>Dict(\"data-bind\"=>\"text: value\")), 0)], Dict{Symbol,Any}(:attributes=>Dict(\"class\"=>\"interact-flex-row-right\")), 1)], Dict{Symbol,Any}(:attributes=>Dict(\"class\"=>\"interact-flex-row\")), 7), Dict{String,Tuple{Observables.AbstractObservable,Union{Nothing, Bool}}}(\"changes\"=>(Observable{Int64} with 1 listeners. Value:\n", "0, nothing),\"value\"=>(Observable{Int64} with 2 listeners. Value:\n", "96, nothing)), Set(String[]), nothing, Any[\"knockout\"=>\"/Users/bert/.julia/packages/Knockout/JIqpG/src/../assets/knockout.js\", \"knockout_punches\"=>\"/Users/bert/.julia/packages/Knockout/JIqpG/src/../assets/knockout_punches.js\", \"/Users/bert/.julia/packages/InteractBase/3SqBl/src/../assets/all.js\", \"/Users/bert/.julia/packages/InteractBase/3SqBl/src/../assets/style.css\", \"/Users/bert/.julia/packages/InteractBulma/Ohu5Y/src/../assets/main.css\"], Dict{Any,Any}(\"_promises\"=>Dict{Any,Any}(\"importsLoaded\"=>Any[JSString(\"function (ko, koPunches) {\\n ko.punches.enableAll();\\n ko.bindingHandlers.numericValue = {\\n init : function(element, valueAccessor, allBindings, data, context) {\\n var stringified = ko.observable(ko.unwrap(valueAccessor()));\\n stringified.subscribe(function(value) {\\n var val = parseFloat(value);\\n if (!isNaN(val)) {\\n valueAccessor()(val);\\n }\\n })\\n valueAccessor().subscribe(function(value) {\\n var str = JSON.stringify(value);\\n if ((str == \\\"0\\\") && ([\\\"-0\\\", \\\"-0.\\\"].indexOf(stringified()) >= 0))\\n return;\\n if ([\\\"null\\\", \\\"\\\"].indexOf(str) >= 0)\\n return;\\n stringified(str);\\n })\\n ko.applyBindingsToNode(element, { value: stringified, valueUpdate: allBindings.get('valueUpdate')}, context);\\n }\\n };\\n var json_data = JSON.parse(\\\"{\\\\\\\"changes\\\\\\\":0,\\\\\\\"value\\\\\\\":96}\\\");\\n var self = this;\\n function AppViewModel() {\\n for (var key in json_data) {\\n var el = json_data[key];\\n this[key] = Array.isArray(el) ? ko.observableArray(el) : ko.observable(el);\\n }\\n \\n \\n [this[\\\"changes\\\"].subscribe((function (val){!(this.valueFromJulia[\\\"changes\\\"]) ? (WebIO.setval({\\\"name\\\":\\\"changes\\\",\\\"scope\\\":\\\"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1\\\",\\\"id\\\":\\\"ob_02\\\",\\\"type\\\":\\\"observable\\\"},val)) : undefined; return this.valueFromJulia[\\\"changes\\\"]=false}),self),this[\\\"value\\\"].subscribe((function (val){!(this.valueFromJulia[\\\"value\\\"]) ? (WebIO.setval({\\\"name\\\":\\\"value\\\",\\\"scope\\\":\\\"knockout-component-4ba54b13-d2c1-42a7-80b9-41e2e57dc0e1\\\",\\\"id\\\":\\\"ob_01\\\",\\\"type\\\":\\\"observable\\\"},val)) : undefined; return this.valueFromJulia[\\\"value\\\"]=false}),self)]\\n \\n }\\n self.model = new AppViewModel();\\n self.valueFromJulia = {};\\n for (var key in json_data) {\\n self.valueFromJulia[key] = false;\\n }\\n ko.applyBindings(self.model, self.dom);\\n}\\n\")]),\"changes\"=>Any[JSString(\"(function (val){return (val!=this.model[\\\"changes\\\"]()) ? (this.valueFromJulia[\\\"changes\\\"]=true, this.model[\\\"changes\\\"](val)) : undefined})\")],\"value\"=>Any[JSString(\"(function (val){return (val!=this.model[\\\"value\\\"]()) ? (this.valueFromJulia[\\\"value\\\"]=true, this.model[\\\"value\\\"](val)) : undefined})\")]), ConnectionPool(Channel{Any}(sz_max:9223372036854775807,sz_curr:3), Set(AbstractConnection[]), Channel{AbstractConnection}(sz_max:32,sz_curr:0))), ##52#53{#dom#15{##dom#13#14{Dict{Any,Any},DOM}},typeof(scope)}(#dom#15{##dom#13#14{Dict{Any,Any},DOM}}(##dom#13#14{Dict{Any,Any},DOM}(Dict{Any,Any}(:className=>\"field\"), DOM(:html, :div))), scope))), Observable{Any} with 0 listeners. Value:\n", "Figure(PyObject
), nothing, getfield(Main, Symbol(\"##5#8\")){Observable{Any}}(Observable{Any} with 0 listeners. Value:\n", "Figure(PyObject
)))" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "using Reactive, Interact, PyPlot, Distributions\n", "f = figure()\n", "range_grid = range(0.0, stop=1.0, length=100)\n", "μ = 0.4\n", "samples = rand(192) .<= μ # Flip 192 coins\n", "@manipulate for N=0:1:192; withfig(f) do\n", " n = sum(samples[1:N]) # Count number of heads in first N flips\n", " posterior1 = Beta(1+n, 1+(N-n))\n", " posterior2 = Beta(5+n, 5+(N-n))\n", " plot(range_grid, pdf.(posterior1,range_grid), \"k-\")\n", " plot(range_grid, pdf.(posterior2,range_grid), \"k--\")\n", " xlabel(L\"\\mu\"); ylabel(L\"p(\\mu|\\mathcal{D})\"); grid()\n", " title(L\"p(\\mu|\\mathcal{D})\"*\" for N=$(N), n=$(n) (real \\$\\\\mu\\$=$(μ))\")\n", " legend([\"Based on uniform prior \"*L\"B(1,1)\",\"Based on prior \"*L\"B(5,5)\"], loc=4)\n", " end\n", "end" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$\\Rightarrow$ With more data, the relevance of the prior diminishes!\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### From Posterior to Point-Estimate\n", "\n", "- Sometimes we want just one 'best' parameter (vector), rather than a posterior distribution over parameters. Why?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Recall Bayesian prediction\n", "\n", "$$\n", "p(x|D) = \\int p(x|\\theta)p(\\theta|D)\\,\\mathrm{d}{\\theta}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- If we approximate posterior $p(\\theta|D)$ by a delta function for one 'best' value $\\hat\\theta$, then the predictive distribution collapses to\n", "\n", "$$\n", "p(x|D)= \\int p(x|\\theta)\\,\\delta(\\theta-\\hat\\theta)\\,\\mathrm{d}{\\theta} = p(x|\\hat\\theta)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- This is the model $p(x|\\theta)$ evaluated at $\\theta=\\hat\\theta$." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Note that $p(x|\\hat\\theta)$ is much easier to evaluate than the integral for full Bayesian prediction." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Some Well-known Point-Estimates\n", "\n", "- **Bayes estimate**\n", "\n", "$$\n", "\\hat \\theta_{bayes} = \\int \\theta \\, p\\left( \\theta |D \\right)\n", "\\,\\mathrm{d}{\\theta}\n", "$$\n", " - (homework). Proof that the Bayes estimate minimizes the expected mean-square error, i.e., proof that\n", "\n", "$$\n", "\\hat \\theta_{bayes} = \\arg\\min_{\\hat \\theta} \\int_\\theta (\\hat \\theta -\\theta)^2 p \\left( \\theta |D \\right) \\,\\mathrm{d}{\\theta}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- **Maximum A Posteriori** (MAP) estimate \n", "$$\n", "\\hat \\theta_{\\text{map}}= \\arg\\max _{\\theta} p\\left( \\theta |D \\right) =\n", "\\arg \\max_{\\theta} p\\left(D |\\theta \\right) \\, p\\left(\\theta \\right)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- **Maximum Likelihood** (ML) estimate\n", "$$\n", "\\hat \\theta_{ml} = \\arg \\max_{\\theta} p\\left(D |\\theta\\right)\n", "$$\n", " - Note that Maximum Likelihood is MAP with uniform prior" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Bayesian vs Maximum Likelihood Learning\n", "\n", "Consider the task: predict a datum $x$ from an observed data set $D$.\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Bayesian Maximum Likelihood
1. Model SpecificationChoose a model $m$ with data generating distribution $p(x|\\theta,m)$ and parameter prior $p(\\theta|m)$Choose a model $m$ with same data generating distribution $p(x|\\theta,m)$. No need for priors.
2. Learninguse Bayes rule to find the parameter posterior,\n", "$$\n", "p(\\theta|D) = \\propto p(D|\\theta) p(\\theta)\n", "$$ By Maximum Likelihood (ML) optimization,\n", "$$ \n", " \\hat \\theta = \\arg \\max_{\\theta} p(D |\\theta)\n", "$$
3. Prediction$$\n", "p(x|D) = \\int p(x|\\theta) p(\\theta|D) \\,\\mathrm{d}\\theta\n", "$$\n", "$$ \n", " p(x|D) = p(x|\\hat\\theta)\n", "$$
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Report Card on Maximum Likelihood Estimation\n", "\n", "- Maximum Likelihood (ML) is MAP with uniform prior, or MAP is 'penalized' ML\n", "$$\n", "\\hat \\theta_{map} = \\arg \\max _\\theta \\{ \\overbrace{\\log\n", "p\\left( D|\\theta \\right)}^{\\mbox{log-likelihood}} + \\overbrace{\\log\n", "p\\left( \\theta \\right)}^{\\mbox{penalty}} \\}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- (good!). Works rather well if we have a lot of data because the influence of the prior diminishes with more data." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- (bad). Cannot be used for model comparison. E.g. best model does generally not correspond to largest likelihood (see part-2, Tjalkens)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- (good). Computationally often do-able. Useful fact (since $\\log$ is monotonously increasing):\n", "$$\\arg\\max_\\theta \\log p(D|\\theta) = \\arg\\max_\\theta p(D|\\theta)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$\\Rightarrow$ **ML estimation is an approximation to Bayesian learning**, but for good reason a very popular learning method when faced with lots of available data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "open(\"../../styles/aipstyle.html\") do f display(\"text/html\", read(f, String)) end" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "@webio": { "lastCommId": "0edceeb5b5d748f5987d5e6719f15198", "lastKernelId": "1e91b343-4cbb-4e56-94bb-edf332d47cc6" }, "anaconda-cloud": {}, "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Julia 1.1.0", "language": "julia", "name": "julia-1.1" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "1.1.0" } }, "nbformat": 4, "nbformat_minor": 1 }