{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline\n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "import statnlpbook.mle as smle\n", "from statnlpbook.util import safe_log as log\n", "util.execute_notebook('mle.ipynb')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "\n", "$$\n", "\\newcommand{\\Xs}{\\mathcal{X}}\n", "\\newcommand{\\Ys}{\\mathcal{Y}}\n", "\\newcommand{\\y}{\\mathbf{y}}\n", "\\newcommand{\\balpha}{\\boldsymbol{\\alpha}}\n", "\\newcommand{\\bbeta}{\\boldsymbol{\\beta}}\n", "\\newcommand{\\aligns}{\\mathbf{a}}\n", "\\newcommand{\\align}{a}\n", "\\newcommand{\\source}{\\mathbf{s}}\n", "\\newcommand{\\target}{\\mathbf{t}}\n", "\\newcommand{\\ssource}{s}\n", "\\newcommand{\\starget}{t}\n", "\\newcommand{\\repr}{\\mathbf{f}}\n", "\\newcommand{\\repry}{\\mathbf{g}}\n", "\\newcommand{\\x}{\\mathbf{x}}\n", "\\newcommand{\\prob}{p}\n", "\\newcommand{\\a}{\\alpha}\n", "\\newcommand{\\b}{\\beta}\n", "\\newcommand{\\vocab}{V}\n", "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n", "\\newcommand{\\param}{\\theta}\n", "\\DeclareMathOperator{\\perplexity}{PP}\n", "\\DeclareMathOperator{\\argmax}{argmax}\n", "\\DeclareMathOperator{\\argmin}{argmin}\n", "\\newcommand{\\train}{\\mathcal{D}}\n", "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n", "\\newcommand{\\length}[1]{\\text{length}(#1) }\n", "\\newcommand{\\indi}{\\mathbb{I}}\n", "\\newcommand{\\china}{\\text{China}}\n", "\\newcommand{\\mexico}{\\text{Mexico}}\n", "\\newcommand{\\paramc}{\\param_\\china}\n", "\\newcommand{\\paramm}{\\param_\\mexico}\n", "\\newcommand{\\countc}{\\counts{\\train}{\\china}}\n", "\\newcommand{\\countm}{\\counts{\\train}{\\mexico}}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Maximum Likelihood Estimation\n", "for **ShallowDrumpf**!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "What does \n", "$$\n", "\\argmax_\\params \\sum_{(\\x,\\y) \\in \\train} \\log \\prob_\\params(\\x,\\y)\n", "$$\n", "have to do with counting?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Application: ShallowDrumpf\n", "\n", "Develop **unigram language model** for generating simplified Trump speeches\n", "\n", "> China, China, China, Mexico, China, Mexico ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Model\n", "\n", "$$\n", "\\prob_\\params(w) = \\params_w\n", "$$\n", "\n", "$$\n", "\\prob_\\params(\\text{China}) = \\params_\\text{China} \\qquad \\prob_\\params(\\text{Mexico}) = \\params_\\text{Mexico} \n", "$$\n", "\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.7" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "m = \"Mexico\"\n", "c = \"China\"\n", "def prob(th_china, th_mexico, word):\n", " return th_china if word == 'China' else th_mexico\n", "\n", "prob(0.7, 0.3, 'China')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Maximum Likelihood Objective\n", "\n", "$$\n", "l(\\params) = \\sum_{w \\in \\train} \\log \\prob_\\params(w)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "l(\\params) = \\countc \\log \\paramc + \\countm \\log \\paramm\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Solution is **counting**:\n", "\n", "$$\n", "\\paramc = \\frac{\\countc}{\\countc + \\countm}\n", "$$" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(0.75, 0.25)" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def mle(data):\n", " theta_china = len([w for w in data if w == 'China']) / len(data)\n", " return theta_china, 1.0 - theta_china \n", "\n", "mle([c,c,m,c])" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Loss Surface" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "scrolled": true, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def ll(th_china, th_mexico, data):\n", " return sum([log(prob(th_china, th_mexico, w)) for w in data])\n", "\n", "data = [c,c,m,c] # how does this graph look with all Cs?\n", "smle.plot_mle_graph(lambda x,y: ll(x,y, data), mle(data), \n", " x_label='China',y_label='Mexico')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Solution trivial (and useless) without **constraints**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Constraints:\n", "\n", "* $0 \\leq \\paramc \\leq 1 $\n", "* $0 \\leq \\paramm \\leq 1 $\n", "* $\\paramc + \\paramm = 1$\n", " * Isoline of $g(\\paramc,\\paramm)=\\paramc + \\paramm$ " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "" ], "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "smle.plot_mle_graph(lambda x,y: ll(x,y, data), mle(data), \n", " show_constraint=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Gradients at Optimum" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": false, "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "" ], "text/plain": [ "" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "smle.plot_mle_graph(lambda x,y: ll(x,y, data), mle(data), \n", " show_constraint=True, show_optimum=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "\\nabla_\\params l(\\params) = \\alpha \\nabla_\\params g(\\params)\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$$\n", "l(\\params) = \\countc \\log \\paramc + \\countm \\log \\paramm\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$\n", "\\frac{\\partial l(\\params)}{\\partial \\paramc} = \\frac{\\counts{D}{China}}{\\paramc}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "g(\\params) = \\paramc + \\paramm\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "-" } }, "source": [ "$$\n", "\\frac{\\partial g(\\params)}{\\partial \\paramc} = 1\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$$\n", "\\frac{\\partial l(\\params)}{\\partial \\paramc} = \\alpha \\frac{\\partial g(\\params)}{\\partial \\paramc}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "\\frac{\\countc}{\\paramc} = \\alpha \n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "$$\n", "\\paramc = \\frac{\\countc}{\\alpha} = \\ldots\n", "$$\n", "$$\n", "\\paramm = \\frac{\\countm}{\\alpha} = \\ldots\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "$$\n", "\\paramc = \\frac{\\countc}{\\countc + \\countm}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary\n", "\n", "* Derive MLE by \n", " * equating loss and constraint gradient\n", " * using constraint equation\n", "* Easy to extend to any discrete generative model with conditional probability tables\n", "* Learning goal: be able to derive the equation for new models " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Background Material\n", "* Introduction to MLE in [Mike Collin's notes](http://www.cs.columbia.edu/~mcollins/em.pdf)" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" } }, "nbformat": 4, "nbformat_minor": 1 }