{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "%%capture\n", "%load_ext autoreload\n", "%autoreload 2\n", "# %cd ..\n", "import sys\n", "sys.path.append(\"..\")\n", "import statnlpbook.util as util\n", "util.execute_notebook('language_models.ipynb')\n", "# import tikzmagic\n", "%load_ext tikzmagic\n", "matplotlib.rcParams['figure.figsize'] = (10.0, 6.0)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "\n", "$$\n", "\\newcommand{\\prob}{p}\n", "\\newcommand{\\x}{\\mathbf{x}}\n", "\\newcommand{\\vocab}{V}\n", "\\newcommand{\\params}{\\boldsymbol{\\theta}}\n", "\\newcommand{\\param}{\\theta}\n", "\\DeclareMathOperator{\\perplexity}{PP}\n", "\\DeclareMathOperator{\\argmax}{argmax}\n", "\\newcommand{\\train}{\\mathcal{D}}\n", "\\newcommand{\\counts}[2]{\\#_{#1}(#2) }\n", "$$" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from IPython.display import Image\n", "import random" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/html": [ "\n", "
\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%html\n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "# Language Modelling\n" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Language Models\n", "\n", "calculate the **probability of seeing a sequence of words**. " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "What is the most likely next word?\n", "\n", "> We're going to ..." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "How about now?\n", "\n", "> We're going to win ..." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "How likely is this sequence?\n", "\n", "> We're going to win bigly. " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Is it more likely than this one?\n", "\n", "> We're going to win big league." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Use Cases: Machine Translation\n", "\n", "> Vi skal vinne stort\n", "\n", "translates to?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "> We will win by a mile\n", "\n", "or \n", "\n", "> We will win bigly" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Use Cases: Speech Recognition\n", "\n", "What did he [say](https://www.theguardian.com/us-news/video/2016/may/04/donald-trump-we-are-going-to-win-bigly-believe-me-video)?\n", "\n", "> We're going to win bigly\n", "\n", "or\n", "\n", "> We're going to win big league" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Use Cases: Natural Language Generation\n", "\n", "\n", "\n", "https://twitter.com/deepdrumpf" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Other applications of language models?" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Outlook: Importance of Language Models\n", "\n", "- State of the art NLP models utilise **transfer learning**\n", "- Good (neural) language models are excellent starting points for transfer learning\n", " * Step 1: unsupervised language model **pre-training**\n", " * Step 2: supervised language model **fine-tuning**" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Outlook: How to build good language models?\n", "\n", "- Train them on large datasets\n", "- Use a large number of features/parameters" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "... but first, the basics" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Overview\n", "\n", "* Language Modelling from scratch\n", "* Evaluation\n", "* Dealing with Out-Of-Vocabulary words (OOVs)\n", "* Training\n", "* Smoothing" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Formally\n", "Models the probability \n", "\n", "$$\\prob(w_1,\\ldots,w_d)$$ \n", "\n", "of observing sequences of words \\\\(w_1,\\ldots,w_d\\\\). " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Without loss of generality: \n", "\n", "\\begin{align}\n", "\\prob(w_1,\\ldots,w_d) &= p(w_1) p(w_2|w_1) p(w_3|w_1, w_2) \\ldots \\\\\n", " &= \\prob(w_1) \\prod_{i = 2}^d \\prob(w_i|w_1,\\ldots,w_{i-1})\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Structured Prediction\n", "\n", "predict word $y=w_i$ \n", "* conditioned on history $\\x=w_1,\\ldots,w_{i-1}$." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## N-Gram Language Models\n", "\n", "Impossible to estimate sensible probability for each history \n", "\n", "$$\n", "\\x=w_1,\\ldots,w_{i-1}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Change **representation**\n", "truncate history to last $n-1$ words: \n", "\n", "$$\n", "\\mathbf{f}(\\x)=w_{i-(n-1)},\\ldots,w_{i-1}\n", "$$\n", "\n", "$\\prob(\\text{bigly}|\\text{...,blah, blah, blah, we, will, win}) \n", "= \\prob(\\text{bigly}|\\text{we, will, win})$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Unigram LM\n", "\n", "Set $n=1$:\n", "$$\n", "\\prob(w_i|w_1,\\ldots,w_{i-1}) = \\prob(w_i).\n", "$$\n", "\n", "$\\prob(\\text{bigly}|\\text{we, will, win}) = \\prob(\\text{bigly})$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Bigram LM\n", "\n", "Set $n=2$:\n", "$$\n", "\\prob(w_i|w_1,\\ldots,w_{i-1}) = \\prob(w_i|w_{i-1}).\n", "$$\n", "\n", "$\\prob(\\text{bigly}|\\text{we, will, win}) = \\prob(\\text{bigly}|\\text{win})$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### *Uniform* LM\n", "Set $n=0$:\n", "\n", "Same probability for each word in the vocabulary \\\\(\\vocab\\\\):\n", "\n", "$$\n", "\\prob(w_i|w_1,\\ldots,w_{i-1}) = \\frac{1}{|\\vocab|}.\n", "$$\n", "\n", "$\\prob(\\text{big}) = \\prob(\\text{bigly}) = \\frac{1}{|\\vocab|}$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Let us look at a training set and create a uniform LM from it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['Can', \"'t\", 'even', 'call', 'this', 'a', 'blues', 'song', 'It']" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train[:9]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.9999999999999635" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "vocab = set(train)\n", "baseline = UniformLM(vocab)\n", "sum([baseline.probability(w) for w in vocab])" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "What about words outside the vocabulary? What is their probability?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Sampling\n", "* Sampling from an LM is easy and instructive\n", "* Usually, the better the LM, the better the samples" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Sample **incrementally**, one word at a time " ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "def sample_once(lm, history, words):\n", " probs = [lm.probability(word, *history) for word in words]\n", " return np.random.choice(words,p=probs)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "'sunrise'" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample_once(baseline, [], list(baseline.vocab)) " ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "outputs": [], "source": [ "def sample(lm, initial_history, amount_to_sample):\n", " words = list(lm.vocab)\n", " result = []\n", " result += initial_history\n", " for _ in range(0, amount_to_sample):\n", " history = result[-(lm.order - 1):]\n", " result.append(sample_once(lm,history,words))\n", " return result" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "['dummies',\n", " 'find',\n", " 'being',\n", " 'bars',\n", " 'clap',\n", " 'rapping',\n", " 'droppin',\n", " 'fender',\n", " 'hated',\n", " 'Recognize']" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sample(baseline, [], 10)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Evaluation\n", "* **Extrinsic**: how does it improve a downstream task?\n", "* **Intrinsic**: how well does it model language?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Intrinsic Evaluation\n", "**Shannon Game**: Predict next word, win if prediction matches the word in actual corpus\n", "\n", "> Our horrible trade agreements with [???]\n", "\n", "The expected reward is the probability of the corpus." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Formalised by\n", "\n", "\\begin{align}\n", "\\prob(w_1) \\prob(w_2|w_1) \\ldots \\prob(w_T|w_1,\\ldots,w_{T-1}) &= \\prod_{i=1}^T \\prob(w_i|w_1,\\ldots,w_{i-1})\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "But then the longer the sequence, the lower the probability...\n", "\n", "$\\to$ normalise by the length" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Perplexity \n", "Given test sequence \\\\(w_1,\\ldots,w_T\\\\), perplexity \\\\(\\perplexity\\\\) is **geometric mean of inverse probabilities** or, put differently the **inverse probability of the test set, normalised by the number of words**:\n", "\n", "\\begin{align}\n", "\\perplexity(w_1,\\ldots,w_T) &= \\sqrt[T]{\\frac{1}{\\prob(w_1)} \\frac{1}{\\prob(w_2|w_1)} \\ldots} \\\\\n", "&= \\sqrt[T]{\\prod_i^T \\frac{1}{\\prob(w_i|w_1,\\ldots,w_{i-1})}}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Perplexity for a bigram language model:\n", "\n", "\\begin{align}\n", "\\perplexity(w_1,\\ldots,w_T) &= \\sqrt[T]{\\prod_i^T \\frac{1}{\\prob(w_i|w_{i-1})}}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Perplexity for a unigram language model:\n", "\n", "\\begin{align}\n", "\\perplexity(w_1,\\ldots,w_T) &= \\sqrt[T]{\\prod_i^T \\frac{1}{\\prob(w_i)}}\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Perplexity for a uniform language model:\n", "\n", "\\begin{align}\n", "\\perplexity(w_1,\\ldots,w_T) &= \\sqrt[T]{\\prod_i^T \\frac{1}{1/|V|}} = |V|\n", "\\end{align}" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Brief note on inverse functions\n", "\n", "- $T$ is the number of words under consideration, e.g. for bigram language models, it is 2.\n", "- For simplicity, assume $a = \\frac{1}{p(w_1, ... w_T)}$\n", "- $\\sqrt[T]{a}$ is the inverse function of $a^T$\n", "- Meaning, we are looking for a number for the Perplexity $PP$, which, when multiplied with itself $T$ times, results in $a$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Interpretation\n", "\n", "Consider LM where \n", "* at each position there are exactly **2** possible words with $\\frac{1}{2}$ probability each\n", "* in test sequence, one of these is always the true word " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Then \n", "\n", "* $\\perplexity(w_1,\\ldots,w_T) = \\sqrt[T]{2 \\cdot 2 \\cdot\\ldots} = 2$\n", "* Whenever a model has to guess the next word, it is confused as to which one the 2 to pick\n", "* Perplexity $\\approx$ average number of choices\n", "* The lower the number of average choice, i.e. the lower the Perplexity, the better" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Perplexity of uniform LM on an **unseen** test set?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "inf" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perplexity(baseline, test)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Problem: model assigns **zero probability** to words not in the vocabulary. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[('does', 0.0),\n", " ('Ceremonies', 0.0),\n", " ('Masquerading', 0.0),\n", " ('also', 0.0),\n", " ('Creativity', 0.0)]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "[(w,baseline.probability(w)) for w in test if w not in vocab][:5]" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## The Long Tail\n", "New words not specific to our corpus: \n", "* long **tail** of words that appear only a few times\n", "* each individual one has low probability, but probability of seeing any long tail word is high\n" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Let us plot word frequency ranks (x-axis) against frequency (y-axis) " ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAz4AAAH+CAYAAABdvNtFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABBm0lEQVR4nO3deXxU9aH///eZyb7MQBIIZGFHIKwSgrIKWKFYta61t16Liq20tl5LbX/avd5+S+9ttbY6uNS2Lt2oVqj1oojKoqAiqyyyryEJ2SCThWwz5/dHSISyZZLJnJkzr+fjwUNy5syZtzweB/LO53M+H8M0TVMAAAAAYGMOqwMAAAAAQFej+AAAAACwPYoPAAAAANuj+AAAAACwPYoPAAAAANuj+AAAAACwPYoPAAAAANuj+AAAAACwvRirAwTK7/erqKhIqampMgzD6jgAAAAALGKapqqrq5WVlSWH48JjOhFTfDwejzwejxobG7Vv3z6r4wAAAAAIE0eOHFFOTs4FzzFM0zRDlCcoqqqq1K1bNx05ckQul8vqOAAAAAAs4vV6lZubqxMnTsjtdl/w3IgZ8WnVOr3N5XJRfAAAAAC06xEYFjcAAAAAYHsUHwAAAAC2R/EBAAAAYHsUHwAAAAC2R/EBAAAAYHsRU3w8Ho/y8vJUUFBgdRQAAAAAESbi9vHxer1yu92qqqpiOWsAAAAgigXSDSJmxAcAAAAAOoriAwAAAMD2KD4AAAAAbI/iAwAAAMD2KD4AAAAAbI/iAwAAAMD2Iqb4sI8PAAAAgI5iHx8AAAAAEYl9fAAAAADgNBSfTnh3T5n+saFQzT6/1VEAAAAAXECM1QEild9v6udLd+qTYq8ef2eP7p0+SDdcmq0YJ10SAAAACDd8l95BPtPU9WOylJYcp4MVdfrOyx9rxiOr9Pf1R9TECBAAAAAQVljcoJPqGpv1pw8O6elV+1VR2yhJyk1L1DenD9YNY7MVywgQAAAA0CUC6QYUnyCpa2zWnz84rKdX71N5zacF6N5pg3RTfg4FCAAAAAgyio+FTjb69OcPD+mpVftVXtMgScrpnqh7pw/STWNzFBdDAQIAAACCgeITBloL0NOr96usuqUAZXdrKUA351OAAAAAgM6yZfHxeDzyeDzy+XzavXt32BefVvVNPv3lw8N6atU+lZ5WgL42baBuGZej+BinxQkBAACAyGTL4tMqUkZ8/l19k09/XXdYT678tABluRP0temD9AUKEAAAABAwik8Yq2/y6W/rDuvJVft0zNtSgHq7E/T1aQP1hYJcChAAAADQThSfCFDf5NPf1x/RwhX7VOKtlyT1ciXoa9MG6taCXCXEUoAAAACAC6H4RJD6Jp9eWn9EC1fuU3FVSwHKdMXra1cM1BfH96EAAQAAAOdB8YlADc0+/X19oZ5csVdFpxWgeVcM1H9QgAAAAICzUHwiWEOzTy+tL9TC0wpQj9SWEaAvXUYBAgAAAFpRfGygsdmvlzcUyrNir46eOCmppQDdM3WAbrusrxLjKEAAAACIbhQfG2ls9usfGwv1xDufFqCMlHjNu4ICBAAAgOhG8bGhxma/XtlYqCdW7FXh8dYCFKd7pg7UbZf3UVJcjMUJAQAAgNCi+NhYk8+vxRuP6vEVe3SksqUApSfH6atTB+j2CX0pQAAAAIgaFJ8o0OTza/Gmo3rinb06XFknqaUAfWXqAN1+eV8lx1OAAAAAYG+2LD4ej0cej0c+n0+7d++O+uLTqsnn15JNR/XEir06VNFSgNKS4/SVKQP05QkUIAAAANiXLYtPK0Z8zq3Z59eSzUV64p09OniqAHVPitXdUwZozsR+SqEAAQAAwGYoPlGs2efXq1uK9Pg7e3WgvFaS1C0ptm0EKDUh1uKEAAAAQHBQfKBmn1//+rhIj7+9V/tPK0B3T+6vORP7UYAAAAAQ8Sg+aOPzm/rXliL99p092l/WUoDcibGaO7m/7pjUTy4KEAAAACIUxQdn8flNvfZxkX779h7tO1WAXAkxmjt5gO6cTAECAABA5KH44Lx8flP/t7VYv317j/aW1khqKUB3Te6vOyf1lzuRAgQAAIDIQPHBRfn8ppaeKkB7ThWg1IQY3TWpv+6aTAECAABA+KP4oN38flNLt7UUoN3HThWg+BjdOamf5k4eIHcSBQgAAADhieKDgPn9pt7YXqLfvLVHu45VS2opQHdM6qe5k/urW1KcxQkBAACAM1F80GF+v6ll20v0m7f3aGdJSwFKiY/RHRNbClD3ZAoQAAAAwgPFB53m95t6c0eJfvP2Xn1S7JUkJcc5dcu4XOV0T5QrMVauhFi5EmPkbvt9rFLjY+RwGBanBwAAQDSg+CBo/H5Tyz85pt+8tUc7ThWgCzGMlilyrsTY0wrRmeXIndhyzJXQ+vvYtt8nxDpkGBQnAAAAXBzFB0Fnmqbe+qRU7+4pU9XJJnlPNrX8t7657fcNzf5Of06s02grQamJsXIlxJxVjlpLU4/UeI3t011xMY4g/B8CAAAg0gTSDWJClKnTPB6PPB6PfD6f1VGikmEYuiovU1flZZ73nPomn6rrm08VorPLkbft+LnP8flNNflMVdQ2qqK2sV25UhNidNWwTH12RC9NvaSHEmKdwfpfBgAAgI0w4oOwYJqm6hp9pxWi5raRJW/9qXJ0svm03zdpX1mtymsa2q6RHOfUjGGZunpEL10xpIeS4iKm1wMAAKADmOqGqOD3m9pw+LiWbi3WG9tKVFxV3/ZaQqxD04f01OyRvTVjaE+lxFOCAAAA7Ibig6jj95vaUnhCr28r0dKtxSo8frLttbgYh6YO7qGrR/bSlcMy5U5kU1YAAAA7oPggqpmmqe1FXi3dWqzXt5XoQHlt22uxTkOTB2Vo9ojeuiovk32JAAAAIhjFBzjFNE3tOlatpVtL9PrWYu0prWl7zekwNHFguj47opdmDe+ljJR4C5MCAAAgUBQf4Dz2llbr9a0lWrqtpG1jVklyGNL4/mm6emRvzRreS5muBAtTAgAAoD0oPkA7HCiv1evbWhZG+Liwqu24YUj5fbpr9sjemj2il7K6JVqYEgAAAOdD8QECdKSyTm9sK9Hr24q18fCJtuNOh6G7J/fXf31mMMtjAwAAhBmKD9AJxVUn9cap1eE+OnhckpTTPVE/u36Epg3paXE6AAAAtKL4AEHy9ifH9MMl21R0ao+ga0dn6UfX5KlHKgshAAAAWC2QbuAIUSYgIl05LFPL51+huZP7y2FI/9pSpCsfWam/rjssvz+ifmYAAAAQ1Sg+wEUkx8foh9fk6Z/3TtaIbJe89c166JWtuvWZ97W3tNrqeAAAAGgHig/QTiNz3Fry9Un6weeGKSnOqY8OHtfs37yrR9/cpfomn9XxAAAAcAEUHyAAMU6H7p4yQG9+a6quHNpTTT5Tv31nr67+zbtau6/c6ngAAAA4D4oP0AE53ZP07JxxWnjbWPVMjdf+8lp96Xcf6oGXtuh4baPV8QAAAPBvKD5ABxmGoatH9tZb375C/3l5HxmG9PKGQl356Cq9srFQEbZgIgAAgK1RfIBOciXE6mfXj9TL8yZqSGaqKmsbNf/vW3T779fpYHmt1fEAAAAgig8QNPl9u+tf35ys78waovgYh97bW65Zj62WZ8VeNTb7rY4HAAAQ1SKm+Hg8HuXl5amgoMDqKMB5xcU4dO/0QVp2/1RNHpShhma/frlsl659/D1tOFRpdTwAAICoZZgR9iBCILuzAlYyTVNLNh/Vf7/2iSprG2UY0s1jc/TArCHKdCVYHQ8AACDiBdINImbEB4g0hmHohktz9Pb8K3RLfo5MU3ppQ6Gm/XKlfr18t+oam62OCAAAEDUY8QFCZOPh4/rZazu08fAJSVLP1Hg9MHOIbsrPkdNhWBsOAAAgAgXSDSg+QAiZpqmlW0v0izc+0ZHKk5Kkob1S9YPP5Wny4AyL0wEAAEQWig8Q5hqafXph7SE9/s4eeetbprxNH9JD37t6mAZnplqcDgAAIDJQfIAIcby2Ub99Z49efP+Qmv2mnA5DXyzI1f2fuUQ9UuOtjgcAABDWKD5AhDlQXqtfvP6Jlm0/JklKiY/R16YN1NzJ/ZUQ67Q4HQAAQHii+AAR6sP9Ffp/Sz/Rx4VVkqQsd4K+89kh+vzobDlYAAEAAOAMFB8ggvn9pl7dUqT/fWOniqrqJUmjctz63tXDdPmAdIvTAQAAhA+KD2AD9U0+/f69A3py5T7VNLQsgDCst0u35Ofo82OylJ7CM0AAACC6UXwAGymvadCvl+/WS+sL1ejzS5JinYZmDO2pW/JzdcWQHop1shcxAACIPhQfwIZO1DXq1S1FenlDYdszQJKUkRKvGy7N0s35uRrSi6WwAQBA9KD4ADa3s8Srf2wo1OJNR1Ve09h2fFSOWzfn5+i60VnqlhRnYUIAAICuR/EBokSTz6+Vu8r08oYjevuTUjX7W27nOKdDVw3P1M1jc9Q/I7nd14uPdSgzNYEV5AAAQESg+ABRqKKmQUs2F+ml9Ue0s6S6w9eJczqUk5aovmlJ6puerD5pSeqb3vIrp3sS+woBAICwQfEBoty2o1V6eUOhXt9WrNoGX7vfV9/kaxs1OhfDkHq5EjS0V6p+fO1w9QtgNAkAACDYKD4AOqTZ51dxVb0OVdTpcGWdDlXW6nBFXdvXrctqSy3PE73ytYmKYUU5AABgkUC6QUyIMgGIADFOh3LTkpSblnTWa6ZpqrK2UXtLa/SVF9br48Iq/WHNAX116kALkgIAAASGH9UCaBfDMJSeEq/LBqTr+58bJkl6dPluHaqotTgZAADAxVF8AATsC+NyNXFguuqb/HrwH1sVYTNmAQBAFKL4AAiYYRj6xY2jlBDr0Pv7K7TooyNWRwIAALggig+ADumTnqQHZg6RJP2/pZ/omLfe4kQAAADnR/EB0GF3Tuqv0TluVdc364dLtjHlDQAAhC2KD4AOczoM/eKmUYpxGHpzxzG9vq3E6kgAAADnRPEB0CnDerv09WktS1r/6J/bdKKu0eJEAAAAZ4uY4uPxeJSXl6eCggKrowD4N/fOGKRBPVNUXtOon/3fJ1bHAQAAOIthRtik/EB2ZwUQOhsOVermp96XaUovzh2vKYN7WB0JAADYXCDdIGJGfACEt/y+aZozoZ8k6aFXtqq2odnaQAAAAKeh+AAImu/MGqLsbokqPH5Sj7y52+o4AAAAbWKsDgDAPpLjY/TzG0dqzh/W6Y9rD6jZ71emK0HpyXFKT4lXWnKcMlJafp8c55RhGFZHBgAAUYLiAyCorrikh24cm61XNh7VC+8fOu95CbEO3XBpth7+/AjFOhl8BgAAXYviAyDofnHjKI3vl6ZDlXWqqGlQRU2jymsbVVHToMraRtU1+lTf5Ndf1x1RWXWDnvjSWCXEOq2ODQAAbIxV3QCEXF1js97dU677/rpJDc1+TRqUrmduH6fkeH4WAwAA2o9V3QCEtaS4GM0a3kvP3TleyXFOrdlboS//YZ2qTjZZHQ0AANgUxQeAZSYMTNef7r5MroQYbTh0XF/63QeqrG20OhYAALAhig8AS13ap7v+9tUJSk+O0/Yir259+n0d89ZbHQsAANgMxQeA5fKyXFp0zwT1ciVoT2mNvvD0+yo8Xmd1LAAAYCMUHwBhYVDPFL00b4L6pCXpUEWdbnnqfe0vq7E6FgAAsAmKD4CwkZuWpL/fM0EDeySruKpeX3j6A+0s8VodCwAA2ADFB0BY6eVO0N/vmaC83i6V1zTo1qc/0OYjJ6yOBQAAIhzFB0DYSU+J11+/erku7dNNVSebdNvvPtCH+yusjgUAACIYxQdAWHInxupPcy/ThAHpqm30ac4f12nV7jKrYwEAgAhF8QEQtpLjY/THOws0Y2hP1Tf5dffzH+mNbSVWxwIAABGI4gMgrCXEOvXUf+brcyN7q8ln6t6/bNTLGwrl95tWRwMAABHEME0zor578Hq9crvdqqqqksvlsjoOgBBp9vn1//1jq/6xsVCSlBofoxHZbo3KdWt0TjeNynEru1uiDMOwOCkAAAiVQLpBTIgyAUCnxDgd+uXNo5SRGqfn1x5UdUOz3t9fofdPW/QgLTlO7sTYs97rdBj65oxB+vyY7FBGBgAAYYQRHwARp9nn1+5jNfq48IS2FFbp48IT2lVSreYLTH/rlhSr1d+dLlfC2cUIAABEJkZ8ANhajNOhvCyX8rJc+uL4lmP1TT7tOVajhmbfWec/+MpW7S2t0bPvHtD8qy4JcVoAABAOKD4AbCEh1qmROe5zvvbAzEs0708b9ft392vOhL5KT4kPcToAAGA1VnUDYHuzhvfSyGy3aht9enLlPqvjAAAAC1B8ANieYRj6zqwhkqQXPjik4qqTFicCAAChRvEBEBWmDM7QZf3T1Njs12/f3mN1HAAAEGIUHwBR4fRRn7+vL9SB8lqLEwEAgFCi+ACIGuP6pWnG0J7y+U39evluq+MAAIAQovgAiCrfntmynPWrW4q0o8hrcRoAABAqFB8AUWV4llvXjOotSXp0+S6L0wAAgFCh+ACIOvOvukROh6G3PinV1/+8QbuPVVsdCQAAdDGKD4CoM6BHiv7rysEyDGnp1hLNemy1/utvm7S/rMbqaAAAoIsYpmmaVocIhNfrldvtVlVVlVwul9VxAESwXSXV+vXy3Xpje4kkyWFIN47N0feuHqa05DiL0wEAgIsJpBsw4gMgag3plaqnbs/Xa9+crCuH9pTflF7eUKjbnv1QJ+oarY4HAACCiOIDIOqNyHbr93cU6B9fm6geqfH6pNirOX9YJ299k9XRAABAkFB8AOCU/L7d9ee7L1P3pFhtKazSXX/8SHWNzVbHAgAAQUDxAYDTXJKZqhfnXqbUhBitP3Rcdz+/XvVNPqtjAQCATqL4AMC/GZHt1vN3jVdynFNr91Vo3p82UH4AAIhwIS8+R44c0bRp05SXl6dRo0bppZdeCnUEALiosX266/d3FCgh1qGVu8p0zePvacOh41bHAgAAHRTy5ayLi4t17NgxjRkzRqWlpRo7dqx27dql5OTkdr2f5awBhNLaveW672+bVV7TIMOQ7pjYT9+ZNURJcTFWRwMAIOqF9XLWvXv31pgxYyRJPXv2VFpamiorK0MdAwDaZeKgDL01f6puGpsj05T+uOagZj22Wmv2llsdDQAABCDg4rN69Wpde+21ysrKkmEYWrJkyVnnLFy4UP3791dCQoLy8/P17rvvnvNa69evl9/vV25ubsDBASBUuiXF6ZEvjNbzd41XdrdEHak8qdue/VA/e22HGpp59gcAgEgQcPGpra3V6NGj9cQTT5zz9UWLFun+++/X97//fW3atElTpkzR7Nmzdfjw4TPOq6io0Je//GU988wzHUsOACF2xSU9tOxbU/Wfl/eRJD373gHd4FmrvaXVFicDAAAX06lnfAzD0OLFi3X99de3Hbvssss0duxYPfnkk23Hhg0bpuuvv14LFiyQJDU0NOiqq67SV77yFd1+++0X/IyGhgY1NDS0fe31epWbm8szPgAs9daOY/rOy1t0vK5JCbEO/fCaPN06LlcxThbLBAAgVCx7xqexsVEbNmzQzJkzzzg+c+ZMrV27VpJkmqbuuOMOzZgx46KlR5IWLFggt9vd9otpcQDCwWfyMvXG/VM1eVCG6pv8+v7ibRr2ozd01aOrNO/FDfrlsp0qPF5ndUwAAHBKUItPeXm5fD6fMjMzzziemZmpkpISSdKaNWu0aNEiLVmyRGPGjNGYMWO0devW817zoYceUlVVVduvI0eOBDMyAHRYpitBL9w1Xt+7eqhS4mPU5DO1p7RGb2wvkWfFPt3y1Puqa2y2OiYAAJDUJeuxGoZxxtemabYdmzx5svx+f7uvFR8fr/j4+KDmA4BgcTgMfXXqQN09eYCKqk5qX1mt9pfV6Her96uoql5Pr9qvb111idUxAQCIekEd8cnIyJDT6Wwb3WlVWlp61igQANiJw2Eop3uSrrikh+6c1F8/uCZPkvTUqn06euKkxekAAEBQi09cXJzy8/O1fPnyM44vX75cEydODOZHAUBYmz2ily7rn6aGZr8WLP3E6jgAAES9gItPTU2NNm/erM2bN0uSDhw4oM2bN7ctVz1//nw9++yz+sMf/qBPPvlE3/rWt3T48GHNmzcvqMEBIJwZhqEfXZsnhyG99nGx1h1go2YAAKwU8DM+69ev1/Tp09u+nj9/viRpzpw5eu6553TrrbeqoqJCDz/8sIqLizVixAgtXbpUffv2DV5qAIgAw7Pc+uL4PvrLh4f1039t16vfmCynw7j4GwEAQNB1ah+fUPJ4PPJ4PPL5fNq9ezf7+ACICBU1DZr2q5Wqrm/Wf39+uG6f0M/qSAAA2EYg+/hETPFpFcj/HACEg2ff3a+f/V/Lcz6fG9Vb37t6mLK7JVqcCgCAyGfZBqYAgLPdMbGf7pjYTw5D+r+Pi3XlIyv1m7f26GSjz+poAABEDUZ8ACBEdhR59ZN/bW9b6MCdGKsvjs/Vlyf0YwQIAIAOYKobAIQp0zT12sfF+uWyXTpcWSdJcjoMzRqeqTsm9ldBv+5nbQINAADOjeIDAGHO5zf1zs5S/XHNAa3dV9F2fHiWS3dO6q9rR/dWfIzTwoQAAIQ/ig8ARJCdJV49t+agFm86qoZmvySpZ2q8vjJlgL50WR8lxwe88wAAAFGB4gMAEaiytlF/++iwXlh7SCXeeklSt6RY3XhpjlITzi4/DsPQsN6pmjQog3IEAIhKtiw+7OMDIFo0Nvu1ZNNRPblqnw6U1170/FinoYJ+aZo2pIemDempwT1TeE4IABAVbFl8WjHiAyBa+Pym3thWog8PVOhcf1PXN/n04YHKtkUSWmW5E3TFkJ6aNqSHJg3KUAqjQQAAm6L4AECUME1TB8prtWp3mVbuKtMH+yvanhNq1Tr4k5maoEX3XK6+6ckWJAUAIPgoPgAQpU42+vTBgQqt2lWmlbtKdbDizNGgaUN66I93FDAVDgBgC4F0A+Y/AICNJMY5NX1IT00f0lPScFXWNqrZ71fxiXrd/NRardxVpjd3HNOs4b2sjgoAQEg5rA4AAOg6aclx6pmaoNG53fTVqQMkSQ//a4fqGpstTgYAQGgx4gMAUeLe6YO0ZFORjp44qXte3KB+6cmKi3HoiktaFkFwOpj+BgCwL57xAYAosmx7ie55ccNZxzNd8frq1IGaO7m/BakAAOgYWz7jc/o+PgCAjpmZl6nffHGM9pe17A9UVtOgpVuLdczboP9+bYf6ZyRpxtBMi1MCABB8jPgAQJRrbPbrv1/boRc/OKQsd4LenH8Fe/8AACJCIN2AxQ0AIMrFxTj00NVDlZuWqKKqev1q2S6rIwEAEHQUHwCAkuJi9PMbRkqSnn//oFbsLLU4EQAAwUXxAQBIkqYM7qFb8nNkmtLdL6zXC+8ftDoSAABBwzM+AIA2Dc0+fe+VbfrHxkJJUk73RMU4DPV2J2rOxL66Kq8Xy14DAMKGLVd1AwB0vfgYp351yygN6pmi/122U4XHT0qSDlbU6f39FRqQkaxn54zTgB4pFicFACAwjPgAAM6p8HidjnnrZZrSqt1levGDQzpR16SR2W698vWJinUyWxoAYC1WdQMAdFpO9yTl903TuH5p+vbMIVp2/1R1S4rV1qNVevztPVbHAwAgIBQfAEC7ZLoS9LPrR0iSPCv3adPh4xYnAgCg/SKm+Hg8HuXl5amgoMDqKAAQta4ZlaXPj8mSz29qwes7rY4DAEC78YwPACAgxVUnNfV/V6jJZ+qleRNU0C/N6kgAgCjFMz4AgC7T252om/NzJEkLV+y1OA0AAO1D8QEABOyeqQPlMKQVu8r0xrYSbS2sUn2Tz+pYAACcF8UHABCwfhnJunZ0liRp3p826Non3tOdf/xIETZ7GgAQRSg+AIAOmX/VJcrr7VJ2t0TFOAy9v79CS7eWWB0LAIBzovgAADqkb3qylv7XFK15cIa+MWOQJOnnSz9hyhsAICxRfAAAnXbP1IHq7U7Q0RMn9dhbe+T3M+UNABBeKD4AgE5LjHPqwdlDJUlPrdqnLz7zgfaV1VicCgCAT1F8AABBcd3oLP3wmjwlxjq17mClvvjMByqrbrA6FgAAkig+AIAgMQxDcyf311vfvkKDe6aorLpB9/11k3xMewMAhAHDjLC1RwPZnRUAYI29pTW67on3VNfoU293guJiWn7Olhjr1N1TBuimsdkyDMPilACASBdIN4iYER+Px6O8vDwVFBRYHQUAcBGDeqbof24aJafDUHFVvQ5V1OlQRZ12llTrgZe2aP7ft6imodnqmACAKMKIDwCgyxSdOKniqvq2r9fsLddv3t4jn99Uv/QkPf4fYzUyx21hQgBAJAukG1B8AAAhtf5gpe776yYVVdUrIdah1745RYN6plgdCwAQgWw51Q0AYA/j+qVp6X9N0fj+aapv8ut7r2xl3x8AQJej+AAAQq5bUpweuWV029LXP/jnNj235oDKa1j+GgDQNSg+AABL5KYl6dszL5Ek/eXDw/rJv3bo2sff09bCKtU3+VTf5FOTz29xSgCAXcRYHQAAEL3unNRfNQ3NOlxZp42HjutgRZ2ufeK9ttfjYhz6xvRB+uaMQSx/DQDoFBY3AACEhaqTTfrWos16Z2fpWa99Zlim+qUn6ab8HA3rzd/9AIAWrOoGAIhYdY3Nal3r4JWNhfrxq9vV+i9VcpxTf7ijQJcNSLcuIAAgbFB8AAC28cH+Cq3dW67391foo4PHlRjr1Bv3T1Hf9GSrowEALMZy1gAA27h8QLrmzxyiF+depoJ+3XWyySfPir1WxwIARBiKDwAgIiTEOvXg7GGSpFc2HlXh8TqLEwEAIgnFBwAQMfL7dtekQelq9pv6nzd2KcJmawMALETxAQBElPlXXSKHIf1rS5F+8cZOq+MAACIExQcAEFHy+6bpFzeOkiQ9vWq/Pi48YW0gAEBEiJji4/F4lJeXp4KCAqujAAAs9oWCXM0e0UuS9NYnZ+/7AwDAv4uY4nPvvfdqx44d+uijj6yOAgAIAzOG9pQkrdxF8QEAXFzEFB8AAE53xZAekqSPC6tUVt1gcRoAQLij+AAAIlLP1ASNyG7ZrG717jKL0wAAwh3FBwAQsaYPaZnu9vz7B1Xf5LM4DQAgnFF8AAAR64vj+8idGKuPC6t013Mf6dHlu1VRw7Q3AMDZKD4AgIiV3S1RT3zpUjkMae2+Cv327T366osb1OTzWx0NABBmKD4AgIg2ZXAPLbpngr4xfZBS42O04dBxPbp8t9WxAABhhuIDAIh4Bf3S9MCsIfqfm1s2Nn1y5T6tYsEDAMBpKD4AANu4emRv3X55X0nSPS+u14xHVmrZ9hKLUwEAwgHFBwBgK9//3DCNzu2m+ia/9pfV6oGXtqi46qTVsQAAFqP4AABsJSHWqVe+NlGvfmOSRud2U3V9s36weJvVsQAAFqP4AABsx+kwNCqnmx65ZbQMQ3p7Z6lKquqtjgUAsBDFBwBgW4N6pmhsn+6SpDd38KwPAEQzig8AwNY+O7yXJOmNbRQfAIhmFB8AgK3NOlV8PjxQqbLqBovTAACsQvEBANhan/QkjcntJp/f1KPLd1kdBwBgEYoPAMD2vv+5YZKkv647oi88/b5e3VJkcSIAQKhFTPHxeDzKy8tTQUGB1VEAABGmoF+absnPkSStO1CpB//xsUq9rPIGANHEME3TtDpEILxer9xut6qqquRyuayOAwCIEPVNPq3ZW67fvL1HHxdWqaBfd43Idre9bsjQ50b1Un7fNAtTAgACEUg3oPgAAKLKhkPHddOTa8/5WnpynFZ/d7qS42NCnAoA0BGBdAP+ZgcARJX8vt218Lax2na06ozj/9xcpKMnTuobf9moH1yTp4E9UixKCADoCoz4AAAgafGmQn1r0RZJUlyMQ8u/NVV905MtTgUAuJBAukHELG4AAEBXum50tm4dlytJamz266lV+yxOBAAIJooPAACSnA5D/3PzKL00b4KklqWvJy54W3uOVVucDAAQDBQfAABOU9AvTdOG9JAkFVXV67fv7LU4EQAgGFjcAACAf/Psl8dp2fZjuvcvG7V0a7EGZCQrxmGcdZ5hSFfl9dKQXqkWpAQABILFDQAAOI/bnv1Aa/ZWXPCcnO6JWvWd6XKeoxgBALoWy1kDABAEP79hpP645qAamv3nfP21j4tUePyk3ttbrisu6RHidACAQFB8AAA4j77pyfrJdcPP+3p8jEPPrT2orzy/XvGxDs2d3F/3f+aSECYEALQXixsAANBB/3l5X8XHONTo86u6vllPrdqnmoZmq2MBAM6BER8AADpoUM8Urfv+Z3S8tlF3Pf+R9pfVauGKvcrv211xMQ6N75+m+Bin1TEBAKL4AADQKe7EWLkTY3Xjpdn61Zu7tXDlpxuffnXqAH3v6mEWpgMAtGKqGwAAQXDbZX115dCeGp3bTUNPLW+9eNNR+fwRtXgqANgWy1kDABBkjc1+jfvZcnnrmzVtSA95vjRWyfFMsgCAYAukGzDiAwBAkMXFOPTZEb0kSSt3len37x2wOBEAgOIDAEAXeGDWEA3okSxJWra9xOI0AACKDwAAXaBnaoJeumeCHIa0vcir+/66Sc2+c2+ECgDoehQfAAC6SHpKvCYOzJAkvbqlSMt3HLM4EQBEL4oPAABd6Fe3jFZ6cpwk6Z2dpRanAYDoxRIzAAB0oV7uBP32Py7Vbc9+qDe2l6jR55fTMHTb5X2U3zfN6ngAEDUoPgAAdLGCfmnqlhSrE3VN+ufmIknSrmPV+r/7plicDACiB8UHAIAuFhfj0J/mXqYPD1Sq2efXgtd3anuRV+U1DcpIibc6HgBEBZ7xAQAgBEZkuzV3cn/dc8VADe2VKkn6+p82qr7JZ3EyAIgOFB8AAELsikt6SJLWHazUnz44ZHEaAIgOEVN8PB6P8vLyVFBQYHUUAAA65atTB8iV0DLb/L295RanAYDoYJimaVodIhBer1dut1tVVVVyuVxWxwEAoEO2Ha3SNY+/p+Q4pxbfO0kO49PXMl0JSk2ItS4cAESIQLoBixsAAGCBvN4uuRNjVXWySTN/vfqM17olxWrVd6bLnUj5AYBgiZipbgAA2InDYeirUwcoPTlO3ZNi237FOAydqGvShkOVVkcEAFthxAcAAIvcO32Q7p0+6IxjD7y0RS9vKNSGQ8c1Y2imRckAwH4oPgAAhJGxfbrr5Q2FemXjUR2qqDvr9ZnDe+m60VkWJAOAyEbxAQAgjFw+IE2SVFxVr9c+Lj7r9bc+OaarR/RSjJPZ6gAQCIoPAABhZECPFL1w13jtL6s567VfLtul2kaf9pXVasipTVABAO1D8QEAIMxMvaSHpp7a5PR0S7eVaN2BSm07WkXxAYAAUXwAAIgQI7LcWnegUg++8rF++M9tbccTY5365S2jWAwBAC6ACcIAAESIK4f1lMOQmnym6hp9bb8qahu1eFOR1fEAIKwx4gMAQISYNChDG35wlWoamtuOfXigUg+8tEW7SrwWJgOA8EfxAQAggnRPjlP35Li2rx0OQ5K0v6xWtQ3NSox1th0DAHyKqW4AAESwLHeCUuNj1Ow3NfzHyzThF2+rrLrB6lgAEHYoPgAARDDDMHTN6N5tXx/zNuj9/RUWJgKA8ETxAQAgwi24cZS2/mSmbrw0W5K0r/TsPYAAINpRfAAAsIHUhNi2vX32nWPzUwCIdixuAACATQzqmSJJeuuTY5rxyEpJLXv/PHbrGBY8ABD1GPEBAMAmRua4FRfjUH2TX/vLarW/rFavbinSrmPVVkcDAMsx4gMAgE30TE3Qygem6UhlnSTpR//crl3HqnWoolbDerssTgcA1qL4AABgI1ndEpXVLVGSNLR3qnYdq9bBijqLUwGA9Sg+AADYVN/0ZEnSu3vKlJESf97zYp2Gpg3pKXdibKiiAUDIUXwAALCpARktxWfN3gqt2XvhvX1uyc/RL28ZHYpYAGAJig8AADY1c3imbsnPUVlNw3nPOV7bqC2FVSyAAMD2KD4AANhUUlzMRUdxthdV6XO/fU9FJ06GKBUAWIPlrAEAiGLZpxZCKK9pVH2Tz+I0ANB1GPEBACCKuRNjlRTnVF2jTz9Ysk0p8Rf+1sAwpOtGZ+nSPt1DlBAAgoPiAwBAFDMMQ/0zkrW9yKuXNxS26z0fHazUa9+c0sXJACC4KD4AAES5X948Wm9sL5Fpmhc873hdo/70wWEdPc7zQAAiD8UHAIAol5flUl6W66LnVda2FJ/jdU1qbPYrLoZHhQFEDv7GAgAA7dItMVaxTkOSVH6BJbIBIBwx4gMAANrF4TCUkRKv4qp6/e2jI8o5tSKcJMXHOvSZYZlKvsjiCABgFf52AgAA7dbLnaDiqnr99u09Z7127/SB+s6soRakAoCLo/gAAIB2+/ZVQ/T8+wfl93+6EMLREye1s6RaB8prLUwGABdG8QEAAO02eXCGJg/OOOPYq1uKdN9fN6miptGiVABwcSxuAAAAOiU9OU5Sy6pvABCuKD4AAKBT0ig+ACIAU90AAECntI341DXqi8+83673fH5Mtv5jfJ+ujAUAZ6D4AACATumeHKfuSbE6XtekD/ZXtus9u0qqKT4AQsqS4nPDDTdo5cqVuvLKK/Xyyy9bEQEAAARJrNOhxV+fpK1Hqy56bk1Dsx56ZatOnGyS32/K4TBCkBAALCo+9913n+666y49//zzVnw8AAAIsn4ZyeqXkXzR8xqafXrola0yTam6oVnuxNgQpAMAixY3mD59ulJTU634aAAAYKH4GKcSYlu+/fCebLI4DYBoEnDxWb16ta699lplZWXJMAwtWbLkrHMWLlyo/v37KyEhQfn5+Xr33XeDkRUAANhA6yhPFcUHQAgFPNWttrZWo0eP1p133qmbbrrprNcXLVqk+++/XwsXLtSkSZP09NNPa/bs2dqxY4f69An8IcaGhgY1NDS0fe31egO+BgAACB+uhFgd8zboiXf2qpc74aLnZ6TE6e4pA5QQ6wxBOgB2FXDxmT17tmbPnn3e1x999FHNnTtXd999tyTpscce07Jly/Tkk09qwYIFAQdcsGCBfvrTnwb8PgAAEJ56uRO0p7RGb2wvafd7+qYn69rRWV2YCoDdBXVxg8bGRm3YsEEPPvjgGcdnzpyptWvXduiaDz30kObPn9/2tdfrVW5ubqdyAgAA6/zkuuF6dXORfH7zoue+uaNEu4/VqKKm4aLnAsCFBLX4lJeXy+fzKTMz84zjmZmZKin59Kc6s2bN0saNG1VbW6ucnBwtXrxYBQUF57xmfHy84uPjgxkTAABYaGCPFH3rqkvadW5ZdYN2H6tRTUNzF6cCYHddspy1YZy5Jr9pmmccW7ZsWVd8LAAAsJmUhJZvVWoafBYnARDpgrqcdUZGhpxO5xmjO5JUWlp61igQAADAxSTHtxSfWkZ8AHRSUItPXFyc8vPztXz58jOOL1++XBMnTgzmRwEAgCiQEt+ykhtT3QB0VsBT3WpqarR37962rw8cOKDNmzcrLS1Nffr00fz583X77bdr3LhxmjBhgp555hkdPnxY8+bNC2pwAABgfynxLXv+rNpdpv945oNznpMU59SDs4dqcCabowM4v4CLz/r16zV9+vS2r1tXXJszZ46ee+453XrrraqoqNDDDz+s4uJijRgxQkuXLlXfvn07FdTj8cjj8cjnY44vAADRom96kiSpsrZR7++vOO95/TKS9cNr8kIVC0AEMkzTvPhakmHE6/XK7XarqqpKLpfL6jgAAKALmaapDw9Uqqz63MtZv7njmP61pUhfLMjVL24aFeJ0AKwWSDfoklXdAAAAgsEwDF0+IP28r5dWN+hfW4pU18iMEAAXFtTFDQAAAEIpKa5l8QOKD4CLofgAAICIlRjbUnxONrHqG4ALo/gAAICIlciID4B24hkfAAAQsVqnunlPNqm46uR5z0uMdapbUlyoYgEIQxQfAAAQsVqLz76yWk1Y8M55z3MY0sLb8vXZEb1CFQ1AmImYqW4ej0d5eXkqKCiwOgoAAAgTw3q7NLRXquKcjvP+chiS35Q2HzlhdVwAFmIfHwAAYGu/XLZTnhX7dMfEfvrJdcOtjgMgiALpBhEz4gMAANAR8TEt0+Eamv0WJwFgJYoPAACwtYTYlm93GppY+Q2IZhQfAABgawmn9vqpb6b4ANGM4gMAAGwtPqZ1xIepbkA0o/gAAABbY8QHgMQ+PgAAwOZaFzc4VFGnp1ftu+C5/TKSNWs4e/0AdkTxAQAAtuZKbPl2p/D4SS14fedFz3/721doYI+Uro4FIMQipvh4PB55PB75fAxTAwCA9hvfL033zRikoyfqL3jem9tLVN3QrIqaRg3sEaJwAEKGDUwBAAAkffax1dpZUq0/zb1MkwdnWB0HQDuwgSkAAECA4k6t/tbI7BLAlig+AAAAkmKdp4pPc0RNhgHQThQfAAAASXGtxcfHfj+AHVF8AAAAdNpUt2aKD2BHFB8AAAB9OtWtiREfwJYoPgAAAJLiGfEBbC1i9vEBAADoSq1T3VbuKlVtY3PA789MTdD1l2bL6TCCHQ1AEFB8AAAAJKUmtHxbtGJXmVbsKuvQNXqkxmvqJex+CoSjiCk+Ho9HHo9HPtbWBwAAXeDuyQNkSDrZFPj3Gqt2l+mYt0HH6xqDHwxAUBimaUbUYvWB7M4KAAAQCnP+sE6rdpfpV7eM1s35OVbHAaJGIN2AxQ0AAAA6KdbZ8lxPMyvCAWGL4gMAANBJMY5TS2H7I2oiDRBVKD4AAACdFMOIDxD2KD4AAACd1Lr5abOPER8gXFF8AAAAOql1755mproBYYviAwAA0EksbgCEP4oPAABAJ7G4ARD+KD4AAACdxOIGQPiLsToAAABApGtd3GDZ9hIdOX4y4Pc7DOmW/FxNHpwR7GgATomY4uPxeOTxeOTz+ayOAgAAcIaMlDhJ0r6yWu0rq+3QNfaX1Wry4MnBjAXgNIZpmhE1GdXr9crtdquqqkoul8vqOAAAAKprbNZrHxertqE54PceqqjTc2sPamCPZL397WnBDwfYWCDdIGJGfAAAAMJVUlyMvjAut0Pv/ehgpZ5be1CsiwB0LRY3AAAAsJDDaFkYwUfzAboUxQcAAMBCrZufUnyArkXxAQAAsJCTER8gJCg+AAAAFmob8Yms9aaAiEPxAQAAsFBr8fEz4gN0KYoPAACAhU7tfcqID9DFKD4AAAAWcjpavh3z+Sg+QFei+AAAAFiobXEDRnyALkXxAQAAsJCjdaobz/gAXYriAwAAYCH28QFCI8bqAAAAANHs9OWstx2tCso101Pi1NudGJRrAXZB8QEAALBQzKm5bqYpXfP4e0G5pmFIi78+SWNyuwXleoAdREzx8Xg88ng88vl8VkcBAAAImu5JsbpudJbWHagMyvUqaxvV6PNrf1kNxQc4jWGakbWEiNfrldvtVlVVlVwul9VxAAAAwsqcP6zTqt1l+tUto3Vzfo7VcYAuFUg3YHEDAAAAGzn1yJD8kfWzbaDLUXwAAABsxHFqX6AIm9QDdDmKDwAAgI0Yp4oPq2MDZ6L4AAAA2AhT3YBzo/gAAADYiIMRH+CcKD4AAAA2cmpbIJ7xAf4NxQcAAMBG2p7xYcgHOAPFBwAAwEaY6gacG8UHAADARljcADg3ig8AAICNfLqPj8VBgDBD8QEAALARgxEf4JwoPgAAADbCMz7AuVF8AAAAbIRnfIBzo/gAAADYyKfP+FB8gNNRfAAAAGzEYKobcE4xVgcAAABA8LROdVuxq1RVJ5tC8plOh6Hrx2QrL8sVks8DOiJiio/H45HH45HP57M6CgAAQNhyJcZKkjYdPqFNh0+E7HO3Ha3SX75yecg+DwiUYUbYBFCv1yu3262qqiq5XPxUAQAA4HRl1Q3667rDqmsMzQ+LD1XU6vVtJRqV49ar35gcks8EWgXSDSJmxAcAAAAX1yM1XvddOThkn7diZ6le31bCKnIIeyxuAAAAgA5znHqoyO+3OAhwERQfAAAAdBj7BiFSUHwAAADQYZ/uG2RxEOAiKD4AAADoMIMRH0QIig8AAAA6zNG2YSrFB+GN4gMAAIAOY6obIgXFBwAAAB3G4gaIFBQfAAAAdJjRNtXN4iDARVB8AAAA0GGM+CBSUHwAAADQYTzjg0hB8QEAAECHsaobIgXFBwAAAB3GPj6IFBQfAAAAdJiDxQ0QISg+AAAA6DDHqe8m/TQfhDmKDwAAADqMZ3wQKSg+AAAA6LBPl7O2NgdwMRQfAAAAdJjBiA8iBMUHAAAAHeZkHx9ECIoPAAAAOoxnfBApKD4AAADoMPbxQaSg+AAAAKDDHA728UFkoPgAAACgw1pXdTMZ8UGYi7E6AAAAACJX6zM+zX5Tq3aXWZzm3IZnuZSREm91DFiM4gMAAIAOi3F8uqrbnD+sszjNufV2J2jtgzPalt5GdIqY4uPxeOTxeOTz+ayOAgAAgFPSU+I1Z0JfrT903OooZ/H5Te0sqVZxVb1M89OFGBCdDDPCJmR6vV653W5VVVXJ5XJZHQcAAABh6kRdo8Y8vFyStO/nV8vpoPnYTSDdgMUNAAAAYHsR9rN+dAGKDwAAAGzJ0KcjPNQeUHwAAABgT6fNbGPABxQfAAAA2BKLGeB0FB8AAADYnslkt6hH8QEAAIAtnT7gw1Q3UHwAAABgS2xYitNRfAAAAGBL1B6cjuIDAAAA22OqGyg+AAAAsKXTZ7qxuAEoPgAAALClMzYwpfdEPYoPAAAAbOnMER9EO4oPAAAAANuj+AAAAMD2TOa6RT2KDwAAAGyJqW44HcUHAAAAtsTiBjgdxQcAAACA7VF8AAAAYEunT3VjrhsoPgAAALClM3sPzSfaUXwAAABgS4bBMz74FMUHAAAAtkfvAcUHAAAAtmRc/BREEYoPAAAAbOmMfXyY6xb1KD4AAACwpTOe8bEwB8IDxQcAAAC2x4APKD4AAAAAbI/iAwAAANtqne3GPj6g+AAAAMC22p7yofdEPYoPAAAAbI/eA4oPAAAAbOv0ld0Q3Sg+AAAAsK3W2sOqbqD4AAAAwLZY3ACtKD4AAACwPUZ8QPEBAACAbRmnJrvRe0DxAQAAgH2xtgFOofgAAADAtj5d3IAxn2hH8QEAAIDt0XtgSfF57bXXNGTIEA0ePFjPPvusFREAAAAQBdjGB61iQv2Bzc3Nmj9/vlasWCGXy6WxY8fqxhtvVFpaWqijAAAAwOYMHvLBKSEf8Vm3bp2GDx+u7Oxspaam6uqrr9ayZctCHQMAAABRoG0fH6a6Rb2Ai8/q1at17bXXKisrS4ZhaMmSJWeds3DhQvXv318JCQnKz8/Xu+++2/ZaUVGRsrOz277OycnR0aNHO5YeAAAAaAc2MEXAU91qa2s1evRo3XnnnbrpppvOen3RokW6//77tXDhQk2aNElPP/20Zs+erR07dqhPnz7nXFHDYPIlAAAAukDrd5nLtpcoIyXe0ix2khIfo5nDe1kdIyABF5/Zs2dr9uzZ53390Ucf1dy5c3X33XdLkh577DEtW7ZMTz75pBYsWKDs7OwzRngKCwt12WWXnfd6DQ0NamhoaPva6/UGGhkAAABRKi7GodpGn36+dKfVUWxlQEay/YvPhTQ2NmrDhg168MEHzzg+c+ZMrV27VpI0fvx4bdu2TUePHpXL5dLSpUv1ox/96LzXXLBggX76058GMyYAAACixEOzh+m1rcVWx7Cd3q4EqyMELKjFp7y8XD6fT5mZmWccz8zMVElJScsHxsTokUce0fTp0+X3+/Xd735X6enp573mQw89pPnz57d97fV6lZubG8zYAAAAsKkvFOTqCwV874guWs7635/ZMU3zjGPXXXedrrvuunZdKz4+XvHxzMcEAAAA0HFBXc46IyNDTqezbXSnVWlp6VmjQAAAAAAQKkEtPnFxccrPz9fy5cvPOL58+XJNnDgxmB8FAAAAAO0W8FS3mpoa7d27t+3rAwcOaPPmzUpLS1OfPn00f/583X777Ro3bpwmTJigZ555RocPH9a8efOCGhwAAAAA2ivg4rN+/XpNnz697evWhQfmzJmj5557TrfeeqsqKir08MMPq7i4WCNGjNDSpUvVt2/f4KUGAAAAgAAY5rl2FA1DHo9HHo9HPp9Pu3fvVlVVlVwul9WxAAAAAFjE6/XK7Xa3qxtETPFpFcj/HAAAAAD7CqQbBHVxAwAAAAAIRxQfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexFTfDwej/Ly8lRQUGB1FAAAAAARhuWsAQAAAEQklrMGAAAAgNNQfAAAAADYHsUHAAAAgO1RfAAAAADYXozVAQLVuhaD1+u1OAkAAAAAK7V2gvas1xZxxae6ulqSlJuba3ESAAAAAOGgurpabrf7gudE3HLWfr9fRUVFSk1NlWEYkqSCggJ99NFH7b5Ge8+/2Hler1e5ubk6cuRI1CytHeifdVcKRZZgf0ZnrtfR9wbyvmDdGxL3h9Ui7f7o7LU68n7+7Qgd7g3rrsW9Ed64Nzp/LdM0VV1draysLDkcF36KJ+JGfBwOh3Jycs445nQ6A7pB2nt+e89zuVxRc4MG+mfdlUKRJdif0ZnrdfS9gbwv2PeGxP1hlUi7Pzp7rY68n387Qod7w7prcW+EN+6N4FzrYiM9rWyxuMG9997bJecHet1oEE5/JqHIEuzP6Mz1OvreQN7HvdE54fTnEmn3R2ev1ZH3829H6ITTnwn3RvDfw73RceH0ZxIN90bETXULJ4HsFAtEG+4P4Ny4N4Bz495AV7PFiI9V4uPj9eMf/1jx8fFWRwHCDvcHcG7cG8C5cW+gqzHiAwAAAMD2GPEBAAAAYHsUHwAAAAC2R/EBAAAAYHsUHwAAAAC2R/HpQq+99pqGDBmiwYMH69lnn7U6DhA2brjhBnXv3l0333yz1VGAsHHkyBFNmzZNeXl5GjVqlF566SWrIwFho7q6WgUFBRozZoxGjhyp3/3ud1ZHQgRiVbcu0tzcrLy8PK1YsUIul0tjx47Vhx9+qLS0NKujAZZbsWKFampq9Pzzz+vll1+2Og4QFoqLi3Xs2DGNGTNGpaWlGjt2rHbt2qXk5GSrowGW8/l8amhoUFJSkurq6jRixAh99NFHSk9PtzoaIggjPl1k3bp1Gj58uLKzs5Wamqqrr75ay5YtszoWEBamT5+u1NRUq2MAYaV3794aM2aMJKlnz55KS0tTZWWltaGAMOF0OpWUlCRJqq+vl8/nEz+7R6AoPuexevVqXXvttcrKypJhGFqyZMlZ5yxcuFD9+/dXQkKC8vPz9e6777a9VlRUpOzs7Lavc3JydPTo0VBEB7pUZ+8NwK6CeW+sX79efr9fubm5XZwaCI1g3B8nTpzQ6NGjlZOTo+9+97vKyMgIUXrYBcXnPGprazV69Gg98cQT53x90aJFuv/++/X9739fmzZt0pQpUzR79mwdPnxYks75UwjDMLo0MxAKnb03ALsK1r1RUVGhL3/5y3rmmWdCERsIiWDcH926ddOWLVt04MAB/eUvf9GxY8dCFR92YeKiJJmLFy8+49j48ePNefPmnXFs6NCh5oMPPmiapmmuWbPGvP7669teu++++8w///nPXZ4VCKWO3ButVqxYYd50001dHRGwREfvjfr6enPKlCnmCy+8EIqYgCU6829Hq3nz5pl///vfuyoibIoRnw5obGzUhg0bNHPmzDOOz5w5U2vXrpUkjR8/Xtu2bdPRo0dVXV2tpUuXatasWVbEBUKmPfcGEI3ac2+Ypqk77rhDM2bM0O23325FTMAS7bk/jh07Jq/XK0nyer1avXq1hgwZEvKsiGwxVgeIROXl5fL5fMrMzDzjeGZmpkpKSiRJMTExeuSRRzR9+nT5/X5997vfZeUR2F577g1JmjVrljZu3Kja2lrl5ORo8eLFKigoCHVcIGTac2+sWbNGixYt0qhRo9qef3jxxRc1cuTIUMcFQqo990dhYaHmzp0r0zRlmqa+8Y1vaNSoUVbERQSj+HTCvz+zY5rmGceuu+46XXfddaGOBVjuYvcGKxwiWl3o3pg8ebL8fr8VsYCwcKH7Iz8/X5s3b7YgFeyEqW4dkJGRIafTecZPsCWptLT0rJ9WANGEewM4N+4N4Py4PxAqFJ8OiIuLU35+vpYvX37G8eXLl2vixIkWpQKsx70BnBv3BnB+3B8IFaa6nUdNTY327t3b9vWBAwe0efNmpaWlqU+fPpo/f75uv/12jRs3ThMmTNAzzzyjw4cPa968eRamBroe9wZwbtwbwPlxfyAsWLiiXFhbsWKFKemsX3PmzGk7x+PxmH379jXj4uLMsWPHmqtWrbIuMBAi3BvAuXFvAOfH/YFwYJjmOXbaBAAAAAAb4RkfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexQfAAAAALZH8QEAAABgexQfAAAAALb3/wPnnG/POxhdPAAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.xscale('log')\n", "plt.yscale('log') \n", "plt.plot(ranks, sorted_counts)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "In log-space such rank vs frequency graphs are **linear** \n", "\n", "* Known as **Zipf's Law**\n", "\n", "Let $r_w$ be the rank of a word \\\\(w\\\\), and \\\\(f_w\\\\) its frequency:\n", "\n", "$$\n", " f_w \\propto \\frac{1}{r_w}.\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Out-of-Vocabularly (OOV) Tokens\n", "In your test set, there will virtually always be words with zero training counts.\n", "\n", "Why is this a problem?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "* If probability of a word in the test set is 0, the entire probability of the test set is 0\n", " * Perplexity is based on inverse probability of test set\n", " * Since we cannot divide by 0, we cannot compute perplexity at all at this point\n", "* Underestimating probability of unseen words\n", " * Downstream application performance suffers" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Solutions:\n", "1. Remove unseen words from test set (pretend there is no problem)?\n", "2. Use subword tokenisation to ensure there are no `OOV` tokens?\n", "3. **Replace unseen words with out-of-vocabularly token, estimate its probability (`OOV` injection)?**\n", "4. Move probability mass to unseen words (smoothing)?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### `OOV` Injection Procedures\n", "\n", "* For the test set: mark all words not in the training vocabulary as `OOV`\n", "* For the training set: when a word occurs for the first time, mark it as `OOV`" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Replacing Words with OOV Tokens" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['with', 'the', 'lyrics', 'of', 'the', 'year', 'Than', 'the', 'gimmick', 'with', 'the', 'gear', 'and', 'the', 'right', 'puppeteer', 'Now', 'you', 'can', 'be', 'the', 'next', 'rock', 'Shakespear', 'you', \"'\", 're', 'still', '10', 'steps', 'away', 'from', 'having', 'a', 'career', 'You', 'step', 'up', 'the', 'plate']\n", "['with', 'the', 'lyrics', 'of', 'the', '[OOV]', '[OOV]', 'the', '[OOV]', 'with', 'the', '[OOV]', 'and', 'the', 'right', '[OOV]', 'Now', 'you', 'can', 'be', 'the', 'next', 'rock', '[OOV]', 'you', \"'\", 're', 'still', '10', 'steps', 'away', 'from', '[OOV]', 'a', 'career', 'You', '[OOV]', 'up', 'the', 'plate']\n" ] } ], "source": [ "print(test[60:100])\n", "\n", "# Replace every word not within the vocabulary with the `OOV` symbol\n", "# [word if word in vocab else OOV for word in data]\n", "print(replace_OOVs(baseline.vocab, test[60:100]))" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Injecting OOV Tokens for New Word Events\n", "\n", "Consider the \"words\"\n", "\n", "> AA AA BB BB AA\n", "\n", "Going left to right, how often do I see new words?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Inject `OOV` tokens to mark these \"new word events\"" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['[OOV]', 'AA', '[OOV]', 'BB', 'AA']" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inject_OOVs([\"AA\",\"AA\",\"BB\",\"BB\",\"AA\"])" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Estimate `OOV` Probability\n", "What is the probability of seeing a word you haven't seen before?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Train on replaced data..." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "1287.9999999984573" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oov_train = inject_OOVs(train)\n", "oov_vocab = set(oov_train)\n", "oov_test = replace_OOVs(oov_vocab, test)\n", "oov_baseline = UniformLM(oov_vocab)\n", "perplexity(oov_baseline,oov_test)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "## [tinyurl.com/diku-nlp-oov](https://tinyurl.com/diku-nlp-oov)\n", "\n", "([Responses](https://docs.google.com/forms/d/1D9p_ej-puRhwGEl_P-h9RWxwvW9KK427qx63-9NB-20/edit#responses))" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### `OOV` and Perplexity\n", "\n", "* LM can achieve low perplexity by choosing small vocabulary and assigning high probability to unknown words\n", " * Perplexities are vocabulary-dependent" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Training N-Gram Language Models" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "N-gram language models condition on a limited history: \n", "\n", "$$\n", "\\prob(w_i|w_1,\\ldots,w_{i-1}) = \\prob(w_i|w_{i-(n-1)},\\ldots,w_{i-1}).\n", "$$\n", "\n", "What are its parameters (continuous values that control its behaviour)?" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "One parameter $\\param_{w,h}$ for each word $w$ and history $h=w_{i-(n-1)},\\ldots,w_{i-1}$ pair:\n", "\n", "$$\n", "\\prob_\\params(w|h) = \\param_{w,h}\n", "$$\n", "\n", "$\\prob_\\params(\\text{bigly}|\\text{win}) = \\param_{\\text{bigly, win}}$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "### Maximum Likelihood Estimate\n", "\n", "Assume training set \\\\(\\train=(w_1,\\ldots,w_d)\\\\)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "Find \\\\(\\params\\\\) that maximises the log-likelihood of \\\\(\\train\\\\):\n", "\n", "$$\n", "\\params^* = \\argmax_\\params \\log p_\\params(\\train)\n", "$$\n", "\n", "where\n", "\n", "$$\n", "\\prob_\\params(\\train) = \\ldots \\prob_\\params(w_i|\\ldots w_{i-1}) \\prob_\\params(w_{i+1}|\\ldots w_{i}) \\ldots \n", "$$\n", "\n", "**Structured Prediction**: this is your continuous optimisation problem!" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Maximum-log-likelihood estimate (MLE) can be calculated in **closed form**:\n", "$$\n", "\\prob_{\\params^*}(w|h) = \\param^*_{w,h} = \\frac{\\counts{\\train}{h,w}}{\\counts{\\train}{h}} \n", "$$\n", "\n", "where \n", "\n", "$$\n", "\\counts{D}{e} = \\text{Count of } e \\text{ in } D \n", "$$\n", "\n", "Event $h$ means seeing the history $h$, and $w,h$ seeing the history $h$ followed by word $w$. \n", "\n", "Many LM variants: different estimation of counts. " ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Training a Unigram Model\n", "Let us train a unigram model..." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "source": [ "What do you think the most probable words are? \n", "\n", "Remember our training set looks like this ..." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "['bird', 'I', 'know', 'this', '[OOV]', '[OOV]', '[OOV]', 'is', 'this', '[OOV]']" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oov_train[1000:1010]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0MAAAH5CAYAAABDDuXVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAnwUlEQVR4nO3de3BW5Z3A8V+4JVhIVLCAa+RivUAdQIJKoFAcBcXLateVqG0sFqy03QVEOhWxXti1qPUSbQWlVSndFakF640quCriQtkVE+tWxnFUDIthKKhEbQWFs38wvGsMpATBF30+n5kzQ06ec/KcM+RNvjnv+56CLMuyAAAASEyLfE8AAAAgH8QQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACSpVb4nsKds3bo13nzzzWjfvn0UFBTkezoAAECeZFkW7777bhx88MHRosXOr/98YWLozTffjNLS0nxPAwAA2EesXr06DjnkkJ1+/gsTQ+3bt4+IbQdcXFyc59kAAAD5Ul9fH6WlpblG2JkvTAxtf2pccXGxGAIAAP7my2e8gQIAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASWqV7wl8UXW77NF8T2GvW3XdafmeAgAA7DZXhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAk7VYMTZ8+Pbp37x5FRUVRVlYWS5Ys2enY+fPnx7Bhw+Kggw6K4uLiKC8vj8cff7zRuHnz5kWvXr2isLAwevXqFQ888MDuTA0AAGCXNDuG5s6dGxMmTIgpU6ZEdXV1DB48OEaMGBG1tbU7HP/MM8/EsGHDYsGCBbFixYo44YQT4owzzojq6urcmGXLlkVFRUVUVlbGCy+8EJWVlTFy5MhYvnz57h8ZAABAEwqyLMuas8Hxxx8f/fr1ixkzZuTW9ezZM84666yYNm3aLu3jq1/9alRUVMSVV14ZEREVFRVRX18fv//973NjTjnllDjggANizpw5u7TP+vr6KCkpiY0bN0ZxcXEzjmjv6HbZo/mewl636rrT8j0FAABoZFfboFlXhjZv3hwrVqyI4cOHN1g/fPjwWLp06S7tY+vWrfHuu+/GgQcemFu3bNmyRvs8+eSTm9znpk2bor6+vsECAACwq5oVQ+vXr48tW7ZEp06dGqzv1KlTrF27dpf2cdNNN8X7778fI0eOzK1bu3Zts/c5bdq0KCkpyS2lpaXNOBIAACB1u/UGCgUFBQ0+zrKs0bodmTNnTlx99dUxd+7c+PKXv/yp9jl58uTYuHFjblm9enUzjgAAAEhdq+YM7tixY7Rs2bLRFZt169Y1urLzSXPnzo3Ro0fH/fffHyeddFKDz3Xu3LnZ+ywsLIzCwsLmTB8AACCnWVeG2rRpE2VlZbFo0aIG6xctWhQDBw7c6XZz5syJUaNGxb333hunndb4Rffl5eWN9rlw4cIm9wkAAPBpNOvKUETExIkTo7KyMvr37x/l5eUxc+bMqK2tjbFjx0bEtqevrVmzJmbPnh0R20LoggsuiFtvvTUGDBiQuwLUtm3bKCkpiYiI8ePHx5AhQ+L666+PM888Mx588MF44okn4tlnn91TxwkAANBAs18zVFFREVVVVTF16tTo27dvPPPMM7FgwYLo2rVrRETU1dU1uOfQnXfeGR999FH84Ac/iC5duuSW8ePH58YMHDgw7rvvvrjnnnuid+/eMWvWrJg7d24cf/zxe+AQAQAAGmv2fYb2Ve4z9NlznyEAAPZFe+U+QwAAAF8UYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASNJuxdD06dOje/fuUVRUFGVlZbFkyZKdjq2rq4vzzz8/jjzyyGjRokVMmDCh0ZhZs2ZFQUFBo+WDDz7YnekBAAD8Tc2Ooblz58aECRNiypQpUV1dHYMHD44RI0ZEbW3tDsdv2rQpDjrooJgyZUr06dNnp/stLi6Ourq6BktRUVFzpwcAALBLmh1DN998c4wePTrGjBkTPXv2jKqqqigtLY0ZM2bscHy3bt3i1ltvjQsuuCBKSkp2ut+CgoLo3Llzg6UpmzZtivr6+gYLAADArmpWDG3evDlWrFgRw4cPb7B++PDhsXTp0k81kffeey+6du0ahxxySJx++ulRXV3d5Php06ZFSUlJbiktLf1UXx8AAEhLs2Jo/fr1sWXLlujUqVOD9Z06dYq1a9fu9iSOOuqomDVrVjz00EMxZ86cKCoqikGDBsUrr7yy020mT54cGzduzC2rV6/e7a8PAACkp9XubFRQUNDg4yzLGq1rjgEDBsSAAQNyHw8aNCj69esXP/vZz+K2227b4TaFhYVRWFi4218TAABIW7OuDHXs2DFatmzZ6CrQunXrGl0t+lSTatEijj322CavDAEAAHwazYqhNm3aRFlZWSxatKjB+kWLFsXAgQP32KSyLIuampro0qXLHtsnAADAxzX7aXITJ06MysrK6N+/f5SXl8fMmTOjtrY2xo4dGxHbXsuzZs2amD17dm6bmpqaiNj2Jgl//vOfo6amJtq0aRO9evWKiIhrrrkmBgwYEIcffnjU19fHbbfdFjU1NXH77bfvgUMEAABorNkxVFFRERs2bIipU6dGXV1dHH300bFgwYLo2rVrRGy7yeon7zl0zDHH5P69YsWKuPfee6Nr166xatWqiIh455134rvf/W6sXbs2SkpK4phjjolnnnkmjjvuuE9xaAAAADtXkGVZlu9J7An19fVRUlISGzdujOLi4nxPJ7pd9mi+p7DXrbrutHxPAQAAGtnVNmj2TVcBAAC+CMQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSdiuGpk+fHt27d4+ioqIoKyuLJUuW7HRsXV1dnH/++XHkkUdGixYtYsKECTscN2/evOjVq1cUFhZGr1694oEHHtidqQEAAOySZsfQ3LlzY8KECTFlypSorq6OwYMHx4gRI6K2tnaH4zdt2hQHHXRQTJkyJfr06bPDMcuWLYuKioqorKyMF154ISorK2PkyJGxfPny5k4PAABglxRkWZY1Z4Pjjz8++vXrFzNmzMit69mzZ5x11lkxbdq0JrcdOnRo9O3bN6qqqhqsr6ioiPr6+vj973+fW3fKKafEAQccEHPmzNmledXX10dJSUls3LgxiouLd/2A9pJulz2a7ynsdauuOy3fUwAAgEZ2tQ2adWVo8+bNsWLFihg+fHiD9cOHD4+lS5fu3kxj25WhT+7z5JNPbnKfmzZtivr6+gYLAADArmpWDK1fvz62bNkSnTp1arC+U6dOsXbt2t2exNq1a5u9z2nTpkVJSUluKS0t3e2vDwAApGe33kChoKCgwcdZljVat7f3OXny5Ni4cWNuWb169af6+gAAQFpaNWdwx44do2XLlo2u2Kxbt67RlZ3m6Ny5c7P3WVhYGIWFhbv9NQEAgLQ168pQmzZtoqysLBYtWtRg/aJFi2LgwIG7PYny8vJG+1y4cOGn2icAAEBTmnVlKCJi4sSJUVlZGf3794/y8vKYOXNm1NbWxtixYyNi29PX1qxZE7Nnz85tU1NTExER7733Xvz5z3+OmpqaaNOmTfTq1SsiIsaPHx9DhgyJ66+/Ps4888x48MEH44knnohnn312DxwiAABAY82OoYqKitiwYUNMnTo16urq4uijj44FCxZE165dI2LbTVY/ec+hY445JvfvFStWxL333htdu3aNVatWRUTEwIED47777osrrrgifvzjH8dhhx0Wc+fOjeOPP/5THBoAAMDONfs+Q/sq9xn67LnPEAAA+6K9cp8hAACALwoxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkabdiaPr06dG9e/coKiqKsrKyWLJkSZPjFy9eHGVlZVFUVBQ9evSIO+64o8HnZ82aFQUFBY2WDz74YHemBwAA8Dc1O4bmzp0bEyZMiClTpkR1dXUMHjw4RowYEbW1tTsc//rrr8epp54agwcPjurq6rj88stj3LhxMW/evAbjiouLo66ursFSVFS0e0cFAADwN7Rq7gY333xzjB49OsaMGRMREVVVVfH444/HjBkzYtq0aY3G33HHHXHooYdGVVVVRET07Nkznnvuubjxxhvj7LPPzo0rKCiIzp077+ZhAAAANE+zrgxt3rw5VqxYEcOHD2+wfvjw4bF06dIdbrNs2bJG408++eR47rnn4sMPP8yte++996Jr165xyCGHxOmnnx7V1dVNzmXTpk1RX1/fYAEAANhVzYqh9evXx5YtW6JTp04N1nfq1CnWrl27w23Wrl27w/EfffRRrF+/PiIijjrqqJg1a1Y89NBDMWfOnCgqKopBgwbFK6+8stO5TJs2LUpKSnJLaWlpcw4FAABI3G69gUJBQUGDj7Msa7Tub43/+PoBAwbEt771rejTp08MHjw4fvOb38QRRxwRP/vZz3a6z8mTJ8fGjRtzy+rVq3fnUAAAgEQ16zVDHTt2jJYtWza6CrRu3bpGV3+269y58w7Ht2rVKjp06LDDbVq0aBHHHntsk1eGCgsLo7CwsDnTBwAAyGnWlaE2bdpEWVlZLFq0qMH6RYsWxcCBA3e4TXl5eaPxCxcujP79+0fr1q13uE2WZVFTUxNdunRpzvQAAAB2WbOfJjdx4sT45S9/GXfffXesXLkyLrnkkqitrY2xY8dGxLanr11wwQW58WPHjo033ngjJk6cGCtXroy777477rrrrpg0aVJuzDXXXBOPP/54vPbaa1FTUxOjR4+Ompqa3D4BAAD2tGa/tXZFRUVs2LAhpk6dGnV1dXH00UfHggULomvXrhERUVdX1+CeQ927d48FCxbEJZdcErfffnscfPDBcdtttzV4W+133nknvvvd78batWujpKQkjjnmmHjmmWfiuOOO2wOHCAAA0FhBtv3dDD7n6uvro6SkJDZu3BjFxcX5nk50u+zRfE9hr1t13Wn5ngIAADSyq22wW+8mBwAA8HknhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAktcr3BEhTt8sezfcU9rpV152W7ykAANAEV4YAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEmt8j0BoLFulz2a7ynsdauuOy3fUwAAEieGgM8dsQgA7AmeJgcAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJMm7yQF8wXi3PQDYNa4MAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAk9xkCIBkp3IMpwn2YAHaVK0MAAECSxBAAAJAkMQQAACRJDAEAAEnyBgoAQER4gwkgPa4MAQAASXJlCABgF7hy1jTnh88jMQQAAHtZCrH4eQxFT5MDAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACStFsxNH369OjevXsUFRVFWVlZLFmypMnxixcvjrKysigqKooePXrEHXfc0WjMvHnzolevXlFYWBi9evWKBx54YHemBgAAsEuaHUNz586NCRMmxJQpU6K6ujoGDx4cI0aMiNra2h2Of/311+PUU0+NwYMHR3V1dVx++eUxbty4mDdvXm7MsmXLoqKiIiorK+OFF16IysrKGDlyZCxfvnz3jwwAAKAJrZq7wc033xyjR4+OMWPGREREVVVVPP744zFjxoyYNm1ao/F33HFHHHrooVFVVRURET179oznnnsubrzxxjj77LNz+xg2bFhMnjw5IiImT54cixcvjqqqqpgzZ84O57Fp06bYtGlT7uONGzdGRER9fX1zD2mv2LrpL/mewl73ac6189M056dpzk/TnJ+dS+HcRDg/f4vz0zTnp2nOz87tK7+HR/z/XLIsa3pg1gybNm3KWrZsmc2fP7/B+nHjxmVDhgzZ4TaDBw/Oxo0b12Dd/Pnzs1atWmWbN2/OsizLSktLs5tvvrnBmJtvvjk79NBDdzqXq666KosIi8VisVgsFovFYtnhsnr16ib7pllXhtavXx9btmyJTp06NVjfqVOnWLt27Q63Wbt27Q7Hf/TRR7F+/fro0qXLTsfsbJ8R264eTZw4Mffx1q1b46233ooOHTpEQUFBcw7rC6G+vj5KS0tj9erVUVxcnO/p7HOcn51zbprm/DTN+Wma89M056dpzk/TnJ+mpX5+siyLd999Nw4++OAmxzX7aXIR0Sg2sixrMkB2NP6T65u7z8LCwigsLGywbv/9929y3ikoLi5O8j/8rnJ+ds65aZrz0zTnp2nOT9Ocn6Y5P01zfpqW8vkpKSn5m2Oa9QYKHTt2jJYtWza6YrNu3bpGV3a269y58w7Ht2rVKjp06NDkmJ3tEwAA4NNqVgy1adMmysrKYtGiRQ3WL1q0KAYOHLjDbcrLyxuNX7hwYfTv3z9at27d5Jid7RMAAODTavbT5CZOnBiVlZXRv3//KC8vj5kzZ0ZtbW2MHTs2Ira9lmfNmjUxe/bsiIgYO3Zs/PznP4+JEyfGRRddFMuWLYu77rqrwbvEjR8/PoYMGRLXX399nHnmmfHggw/GE088Ec8+++weOswvvsLCwrjqqqsaPXWQbZyfnXNumub8NM35aZrz0zTnp2nOT9Ocn6Y5P7umIMv+1vvNNTZ9+vS44YYboq6uLo4++ui45ZZbYsiQIRERMWrUqFi1alU8/fTTufGLFy+OSy65JP70pz/FwQcfHD/60Y9y8bTdb3/727jiiivitddei8MOOyyuvfba+Id/+IdPd3QAAAA7sVsxBAAA8HnXrNcMAQAAfFGIIQAAIEliCAAASJIYyqOhQ4dGQUFBFBQURE1NTV7nsmrVqtxc+vbtm9e5fFpPP/10FBQUxDvvvJPvqQCwA0OHDo0JEybkexpARMyaNSv233//fE8jb8RQnl100UW5d+Xb7le/+lUcd9xx8aUvfSnat28fQ4YMiUceeaTRtlu2bIlbbrklevfuHUVFRbH//vvHiBEj4j//8z9zY84444w46aSTdvi1ly1bFgUFBfH8889HaWlp1NXVxaWXXrrnD3Iv80P10xs1alScddZZ+Z4GnwO+39gT5s+fH//yL/8SERHdunWLqqqq/E5oHzJq1Ki4+uqr8z2NzwXnij1BDOXZfvvtF507d45Wrbbd8mnSpElx8cUXx8iRI+OFF16I//qv/4rBgwfHmWeeGT//+c9z22VZFueee25MnTo1xo0bFytXrozFixdHaWlpDB06NH73u99FRMTo0aPjySefjDfeeKPR17777rujb9++0a9fv2jZsmV07tw52rVr95kcNwDpOvDAA6N9+/b5ngaAGNqX/OEPf4ibbropfvrTn8akSZPiK1/5SvTs2TOuvfbamDBhQkycODFWr14dERG/+c1v4re//W3Mnj07xowZE927d48+ffrEzJkz4+///u9jzJgx8f7778fpp58eX/7yl2PWrFkNvtZf/vKXmDt3bowePToPR7rnjBo1KhYvXhy33npr7ml+q1atioiIFStWRP/+/WO//faLgQMHxssvv9xg24cffjjKysqiqKgoevToEddcc0189NFHeTgKPmuzZ8+ODh06xKZNmxqsP/vss+OCCy6IiIgZM2bEYYcdFm3atIkjjzwyfv3rX+fGbX9a6cef3vrOO+9EQUFBg3usfdHs7Ptt8eLFcdxxx0VhYWF06dIlLrvsMt9LEfHYY4/F1772tdh///2jQ4cOcfrpp8err76a72ntE7ZfYRw6dGi88cYbcckll+T+T/H/pk+fHocffngUFRVFp06d4h//8R/zPSXyrKnHle0/m+bPnx8nnHBC7LffftGnT59YtmxZg33MmjUrDj300Nhvv/3iG9/4RmzYsCEfh7LPEEP7kDlz5kS7du3i4osvbvS5Sy+9ND788MOYN29eRETce++9ccQRR8QZZ5yxw7EbNmyIRYsWRatWreKCCy6IWbNmxcdvKXX//ffH5s2b45vf/ObeO6DPwK233hrl5eW5pxvW1dVFaWlpRERMmTIlbrrppnjuueeiVatW8Z3vfCe33eOPPx7f+ta3Yty4cfHSSy/FnXfeGbNmzYprr702X4fCZ+icc86JLVu2xEMPPZRbt379+njkkUfiwgsvjAceeCDGjx8fl156afzP//xPXHzxxXHhhRfGU089lcdZ59+Ovt9at24dp556ahx77LHxwgsvxIwZM+Kuu+6Kf/3Xf833dPPu/fffj4kTJ8Z///d/x3/8x39EixYt4hvf+EZs3bo131PbZ8yfPz8OOeSQmDp1au7/FNs899xzMW7cuJg6dWq8/PLL8dhjj+VucE+6duVxZcqUKTFp0qSoqamJI444Is4777zcH6iWL18e3/nOd+L73/9+1NTUxAknnODxOiNvvv71r2fjx4/PfXzKKadkffr02en4kpKS7Hvf+16WZVl21FFHZWeeeeYOx7311ltZRGTXX399lmVZtnLlyiwisieffDI3ZsiQIdl5553XaNurrrqqyTnsiz55Hp966qksIrInnngit+7RRx/NIiL761//mmVZlg0ePDj7yU9+0mA/v/71r7MuXbp8JnPe13z729/e6f+nL6rvfe972YgRI3IfV1VVZT169Mi2bt2aDRw4MLvooosajD/nnHOyU089NcuyLHv99deziMiqq6tzn3/77beziMieeuqpz2L6efPJ77fLL788O/LII7OtW7fm1t1+++1Zu3btsi1btuRhhvuudevWZRGRvfjii/meSt59/P9R165ds1tuuSWv89kXzZs3LysuLs7q6+vzPRX2YR9/XNn+s+mXv/xl7vN/+tOfsojIVq5cmWVZlp133nnZKaec0mAfFRUVWUlJyWc57X2KK0OfI1mWNespBNvHHnXUUTFw4MC4++67IyLi1VdfjSVLljS4UvJF1Lt379y/u3TpEhER69ati4htT6GbOnVqtGvXLrds/2v3X/7yl7zMl8/WRRddFAsXLow1a9ZERMQ999wTo0aNioKCgli5cmUMGjSowfhBgwbFypUr8zHVfdrKlSujvLy8wWPToEGD4r333ov//d//zePM8u/VV1+N888/P3r06BHFxcXRvXv3iIiora3N88z4PBg2bFh07do1evToEZWVlfHv//7vfj6xS48rTf3+s/0x++M++XFqxNA+5IgjjohXX301Nm/e3Ohzb775ZtTX18fhhx+eG/vSSy/tcD/bf2HbPjZi2xspzJs3L+rr6+Oee+6Jrl27xoknnrgXjmLf0bp169y/t/+itv0y8tatW+Oaa66Jmpqa3PLiiy/GK6+8EkVFRXmZL5+tY445Jvr06ROzZ8+O559/Pl588cUYNWpU7vOf/MPDx/8Y0aJFi9y67T788MO9P+l90I7+SLP9vKT++o8zzjgjNmzYEL/4xS9i+fLlsXz58oiIHT7Gwye1b98+nn/++ZgzZ0506dIlrrzyyujTp4/bRiRuVx5Xmvr95+M/t9hGDO1Dzj333HjvvffizjvvbPS5G2+8MVq3bh1nn312buwrr7wSDz/8cKOxN910U3To0CGGDRuWWzdy5Mho2bJl3HvvvfGrX/0qLrzwwi/MLypt2rSJLVu2NGubfv36xcsvvxxf+cpXGi3bf9Hli2/MmDFxzz33xN133x0nnXRS7vVmPXv2jGeffbbB2KVLl0bPnj0jIuKggw6KiGjw+oZ83yvss/LJ77devXrF0qVLG/yAXbp0abRv3z7+7u/+Lh9T3Cds2LAhVq5cGVdccUWceOKJ0bNnz3j77bfzPa190u48hqeiVatWcdJJJ8UNN9wQf/zjH2PVqlXx5JNP5nta5MmeeFzp1atX/OEPf2iw7pMfp6ZVvifA/ysvL4/x48fHD3/4w9i8eXOcddZZ8eGHH8a//du/xa233hpVVVW5X9bOPffcuP/+++Pb3/52/PSnP40TTzwx6uvr4/bbb4+HHnoo7r///vjSl76U23e7du2ioqIiLr/88ti4cWODv4B/3nXr1i2WL18eq1atinbt2u3Si5OvvPLKOP3006O0tDTOOeecaNGiRfzxj3+MF1980QsJE/LNb34zJk2aFL/4xS9i9uzZufU//OEPY+TIkdGvX7848cQT4+GHH4758+fHE088ERERbdu2jQEDBsR1110X3bp1i/Xr18cVV1yRr8P4TH3y++373/9+VFVVxT//8z/HP/3TP8XLL78cV111VUycODHpPywccMAB0aFDh5g5c2Z06dIlamtr47LLLsv3tPZJ3bp1i2eeeSbOPffcKCwsjI4dO+Z7SvuERx55JF577bUYMmRIHHDAAbFgwYLYunVrHHnkkfmeGnmyJx5Xxo0bFwMHDowbbrghzjrrrFi4cGE89thje2nGnxN5fL1S8j75QuTt7rrrrqx///5Z27Zts/322y/72te+lj300EONxn344YfZjTfemH31q1/NCgsLs+Li4uzkk0/OlixZssOvt3Tp0iwisuHDh+90Tp/HN1B4+eWXswEDBmRt27bNIiK75557sojI3n777dyY6urqLCKy119/PbfuscceywYOHJi1bds2Ky4uzo477rhs5syZn/0B7ANSfAOF7SorK7MDDzww++CDDxqsnz59etajR4+sdevW2RFHHJHNnj27wedfeuml3P+7vn37ZgsXLkziDRQ++f32+uuvZ08//XR27LHHZm3atMk6d+6c/ehHP8o+/PDDfE817xYtWpT17NkzKywszHr37p09/fTTWURkDzzwQL6nlncf//m3bNmyrHfv3llhYWHm15L/t2TJkuzrX/96dsABB2Rt27bNevfunc2dOzff0yLPmnpc2dU397nrrruyQw45JGvbtm12xhlnZDfeeGPSb6BQkGWePJgvQ4cOjb59++5Td96++uqr43e/+10yT/mBYcOGRc+ePeO2227L91QAgM9Yus9h2EdMnz492rVrFy+++GJe51FbWxvt2rWLn/zkJ3mdB3xW3nrrrbjvvvviySefjB/84Af5ng4AkAeuDOXRmjVr4q9//WtERBx66KHRpk2bvM3lo48+ilWrVkVERGFhYe61SfBF1a1bt3j77bfjxz/+cUyaNCnf0wEA8kAMAQAASfI0OQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAk/R9hLAt9Ca081wAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unigram = NGramLM(oov_train,1)\n", "plot_probabilities(unigram)\n", "# sum([unigram.probability(w) for w in unigram.vocab])" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "The unigram LM has substantially reduced (and hence better) perplexity than the uniform LM:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "(1287.9999999984573, 128.9093846843014)" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perplexity(oov_baseline,oov_test), perplexity(unigram,oov_test)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Its samples look (a little) more reasonable:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['hands', 'play', 'below', 'never', 'around', 'type', 'grows', 'about', 'debate', 'himself'] \n", "\n", "['the', '[OOV]', 'to', 'in', 'Singing', '[OOV]', '[OOV]', \"'m\", 'live', 'to']\n" ] } ], "source": [ "print(sample(oov_baseline, [], 10), \"\\n\")\n", "print(sample(unigram, [], 10))" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Bigram Model\n", "We can do better by setting $n=2$" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0MAAAH5CAYAAABDDuXVAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAsCUlEQVR4nO3dfZBXdb3A8c/ysLusy64GumAuCz6gkAq6JIKBdE0QzfDWCFouWWhx750RRC2JTHMy1HgyE5+uRtYVURE1tdE1A1HJRty1WzJpJi1jyyBULFoBwrl/MPyu6y4LuwIrfF+vmTPDnt/3d/Z7vvNb2Dfn95CXZVkWAAAAienQ3hMAAABoD2IIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJLUqb0nsLts3bo1/vKXv0TXrl0jLy+vvacDAAC0kyzLYsOGDXHooYdGhw47vv6z38TQX/7ylygvL2/vaQAAAB8Rq1atisMOO2yHt+83MdS1a9eI2HbCJSUl7TwbAACgvTQ0NER5eXmuEXZkv4mh7U+NKykpEUMAAMBOXz7jDRQAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASFKn9p7A/qr3lY+39xT2uJXXn9XeUwAAgDZzZQgAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJHVq7wmQpt5XPt7eU9jjVl5/VntPAQCAFrgyBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJEkMAAECSxBAAAJAkMQQAACRJDAEAAEkSQwAAQJLEEAAAkCQxBAAAJEkMAQAASRJDAABAksQQAACQJDEEAAAkSQwBAABJalMMzZ07N/r06ROFhYVRWVkZS5cu3eHYhx56KE4//fQ4+OCDo6SkJIYMGRJPPvlkk3ELFy6M/v37R0FBQfTv3z8WLVrUlqkBAADsklbH0IIFC2Ly5Mkxbdq0qKmpiWHDhsXo0aOjrq6u2fHPPvtsnH766fHEE0/E8uXL49Of/nScffbZUVNTkxuzbNmyGDduXFRVVcUrr7wSVVVVMXbs2HjxxRfbfmYAAAAtyMuyLGvNHQYPHhwnnnhi3Hrrrbl9/fr1i3POOSemT5++S8f4xCc+EePGjYvvfOc7ERExbty4aGhoiF/84he5MWeccUYcdNBBMX/+/F06ZkNDQ5SWlsb69eujpKSkFWe0Z/S+8vH2nsIet/L6s9p8X+sDAMCesqtt0KorQ5s2bYrly5fHyJEjG+0fOXJkvPDCC7t0jK1bt8aGDRviYx/7WG7fsmXLmhxz1KhRLR5z48aN0dDQ0GgDAADYVa2KobVr18aWLVuirKys0f6ysrJYvXr1Lh1j5syZ8e6778bYsWNz+1avXt3qY06fPj1KS0tzW3l5eSvOBAAASF2b3kAhLy+v0ddZljXZ15z58+fHNddcEwsWLIhDDjnkQx1z6tSpsX79+ty2atWqVpwBAACQuk6tGdy9e/fo2LFjkys2a9asaXJl54MWLFgQEyZMiAceeCA+85nPNLqtR48erT5mQUFBFBQUtGb6AAAAOa26MpSfnx+VlZVRXV3daH91dXUMHTp0h/ebP39+XHjhhXHvvffGWWc1fVH5kCFDmhzzqaeeavGYAAAAH0arrgxFREyZMiWqqqpi0KBBMWTIkLjjjjuirq4uJk6cGBHbnr721ltvxT333BMR20Jo/PjxcdNNN8XJJ5+cuwLUpUuXKC0tjYiISZMmxfDhw+OGG26IMWPGxCOPPBJPP/10PPfcc7vrPAEAABpp9WuGxo0bF3PmzIlrr702Bg4cGM8++2w88cQTUVFRERER9fX1jT5z6Pbbb4/33nsv/uu//it69uyZ2yZNmpQbM3To0Ljvvvvixz/+cRx//PExb968WLBgQQwePHg3nCIAAEBTrf6coY8qnzO09/mcoZb5nCEAgPaxRz5nCAAAYH8hhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASFKbYmju3LnRp0+fKCwsjMrKyli6dOkOx9bX18cXv/jFOProo6NDhw4xefLkJmPmzZsXeXl5TbZ//etfbZkeAADATrU6hhYsWBCTJ0+OadOmRU1NTQwbNixGjx4ddXV1zY7fuHFjHHzwwTFt2rQYMGDADo9bUlIS9fX1jbbCwsLWTg8AAGCXtDqGZs2aFRMmTIiLLroo+vXrF3PmzIny8vK49dZbmx3fu3fvuOmmm2L8+PFRWlq6w+Pm5eVFjx49Gm0AAAB7SqtiaNOmTbF8+fIYOXJko/0jR46MF1544UNN5J133omKioo47LDD4rOf/WzU1NS0OH7jxo3R0NDQaAMAANhVrYqhtWvXxpYtW6KsrKzR/rKysli9enWbJ3HMMcfEvHnz4tFHH4358+dHYWFhnHLKKfH666/v8D7Tp0+P0tLS3FZeXt7m7w8AAKSnTW+gkJeX1+jrLMua7GuNk08+OS644IIYMGBADBs2LO6///7o27dv3HzzzTu8z9SpU2P9+vW5bdWqVW3+/gAAQHo6tWZw9+7do2PHjk2uAq1Zs6bJ1aIPo0OHDvHJT36yxStDBQUFUVBQsNu+JwAAkJZWXRnKz8+PysrKqK6ubrS/uro6hg4dutsmlWVZ1NbWRs+ePXfbMQEAAN6vVVeGIiKmTJkSVVVVMWjQoBgyZEjccccdUVdXFxMnToyIbU9fe+utt+Kee+7J3ae2tjYitr1Jwttvvx21tbWRn58f/fv3j4iI7373u3HyySfHUUcdFQ0NDfHDH/4wamtr45ZbbtkNpwgAANBUq2No3LhxsW7durj22mujvr4+jj322HjiiSeioqIiIrZ9yOoHP3PohBNOyP15+fLlce+990ZFRUWsXLkyIiL+/ve/x9e+9rVYvXp1lJaWxgknnBDPPvtsnHTSSR/i1AAAAHYsL8uyrL0nsTs0NDREaWlprF+/PkpKStp7OtH7ysfbewp73Mrrz2rzfa0PAAB7yq62QZveTQ4AAGBfJ4YAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJLUphiaO3du9OnTJwoLC6OysjKWLl26w7H19fXxxS9+MY4++ujo0KFDTJ48udlxCxcujP79+0dBQUH0798/Fi1a1JapAQAA7JJWx9CCBQti8uTJMW3atKipqYlhw4bF6NGjo66urtnxGzdujIMPPjimTZsWAwYMaHbMsmXLYty4cVFVVRWvvPJKVFVVxdixY+PFF19s7fQAAAB2SV6WZVlr7jB48OA48cQT49Zbb83t69evX5xzzjkxffr0Fu87YsSIGDhwYMyZM6fR/nHjxkVDQ0P84he/yO0744wz4qCDDor58+fv0rwaGhqitLQ01q9fHyUlJbt+QntI7ysfb+8p7HErrz+rzfe1PgAA7Cm72gatujK0adOmWL58eYwcObLR/pEjR8YLL7zQtpnGtitDHzzmqFGjWjzmxo0bo6GhodEGAACwq1oVQ2vXro0tW7ZEWVlZo/1lZWWxevXqNk9i9erVrT7m9OnTo7S0NLeVl5e3+fsDAADpadMbKOTl5TX6OsuyJvv29DGnTp0a69evz22rVq36UN8fAABIS6fWDO7evXt07NixyRWbNWvWNLmy0xo9evRo9TELCgqioKCgzd8TAABIW6uuDOXn50dlZWVUV1c32l9dXR1Dhw5t8ySGDBnS5JhPPfXUhzomAABAS1p1ZSgiYsqUKVFVVRWDBg2KIUOGxB133BF1dXUxceLEiNj29LW33nor7rnnntx9amtrIyLinXfeibfffjtqa2sjPz8/+vfvHxERkyZNiuHDh8cNN9wQY8aMiUceeSSefvrpeO6553bDKQIAADTV6hgaN25crFu3Lq699tqor6+PY489Np544omoqKiIiG0fsvrBzxw64YQTcn9evnx53HvvvVFRURErV66MiIihQ4fGfffdF9/+9rfjqquuiiOOOCIWLFgQgwcP/hCnBgAAsGOt/pyhjyqfM7T3+ZyhlvmcIQCA9rFHPmcIAABgfyGGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIUptiaO7cudGnT58oLCyMysrKWLp0aYvjlyxZEpWVlVFYWBiHH3543HbbbY1unzdvXuTl5TXZ/vWvf7VlegAAADvV6hhasGBBTJ48OaZNmxY1NTUxbNiwGD16dNTV1TU7/s0334wzzzwzhg0bFjU1NfGtb30rLrnkkli4cGGjcSUlJVFfX99oKywsbNtZAQAA7ESn1t5h1qxZMWHChLjooosiImLOnDnx5JNPxq233hrTp09vMv62226LXr16xZw5cyIiol+/fvHSSy/FjBkz4gtf+EJuXF5eXvTo0aONpwEAANA6rboytGnTpli+fHmMHDmy0f6RI0fGCy+80Ox9li1b1mT8qFGj4qWXXorNmzfn9r3zzjtRUVERhx12WHz2s5+NmpqaFueycePGaGhoaLQBAADsqlbF0Nq1a2PLli1RVlbWaH9ZWVmsXr262fusXr262fHvvfderF27NiIijjnmmJg3b148+uijMX/+/CgsLIxTTjklXn/99R3OZfr06VFaWprbysvLW3MqAABA4tr0Bgp5eXmNvs6yrMm+nY1///6TTz45LrjgghgwYEAMGzYs7r///ujbt2/cfPPNOzzm1KlTY/369blt1apVbTkVAAAgUa16zVD37t2jY8eOTa4CrVmzpsnVn+169OjR7PhOnTpFt27dmr1Phw4d4pOf/GSLV4YKCgqioKCgNdMHAADIadWVofz8/KisrIzq6upG+6urq2Po0KHN3mfIkCFNxj/11FMxaNCg6Ny5c7P3ybIsamtro2fPnq2ZHgAAwC5r9dPkpkyZEv/93/8dd999d6xYsSIuvfTSqKuri4kTJ0bEtqevjR8/Pjd+4sSJ8ec//zmmTJkSK1asiLvvvjvuuuuuuPzyy3Njvvvd78aTTz4Zf/rTn6K2tjYmTJgQtbW1uWMCAADsbq1+a+1x48bFunXr4tprr436+vo49thj44knnoiKioqIiKivr2/0mUN9+vSJJ554Ii699NK45ZZb4tBDD40f/vCHjd5W++9//3t87Wtfi9WrV0dpaWmccMIJ8eyzz8ZJJ520G04RAACgqbxs+7sZ7OMaGhqitLQ01q9fHyUlJe09neh95ePtPYU9buX1Z7X5vtYHAIA9ZVfboE3vJgcAALCvE0MAAECSxBAAAJAkMQQAACRJDAEAAElq9VtrA3ued9sDANjzXBkCAACSJIYAAIAkiSEAACBJYggAAEiSN1AA9jneYAIA2B1cGQIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJHVq7wkAsHv1vvLx9p7CHrfy+rPaewoA7AdcGQIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACS1Km9JwAAe0vvKx9v7ynsFSuvP6u9pwCwT3BlCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEhSp/aeAADw0dD7ysfbewp7xcrrz2rT/awP7H9cGQIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkg9dBQDgQ/OhtC1LYX32xQ/sdWUIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABIkhgCAACSJIYAAIAkiSEAACBJYggAAEiSGAIAAJLUphiaO3du9OnTJwoLC6OysjKWLl3a4vglS5ZEZWVlFBYWxuGHHx633XZbkzELFy6M/v37R0FBQfTv3z8WLVrUlqkBAADsklbH0IIFC2Ly5Mkxbdq0qKmpiWHDhsXo0aOjrq6u2fFvvvlmnHnmmTFs2LCoqamJb33rW3HJJZfEwoULc2OWLVsW48aNi6qqqnjllVeiqqoqxo4dGy+++GLbzwwAAKAFnVp7h1mzZsWECRPioosuioiIOXPmxJNPPhm33nprTJ8+vcn42267LXr16hVz5syJiIh+/frFSy+9FDNmzIgvfOELuWOcfvrpMXXq1IiImDp1aixZsiTmzJkT8+fPb3YeGzdujI0bN+a+Xr9+fURENDQ0tPaU9oitG//R3lPY4z7MWlufllmfllmfllmfHUthbSKsz85Yn5ZZn5ZZnx37qPweHvH/c8myrOWBWSts3Lgx69ixY/bQQw812n/JJZdkw4cPb/Y+w4YNyy655JJG+x566KGsU6dO2aZNm7Isy7Ly8vJs1qxZjcbMmjUr69Wr1w7ncvXVV2cRYbPZbDabzWaz2WzNbqtWrWqxb1p1ZWjt2rWxZcuWKCsra7S/rKwsVq9e3ex9Vq9e3ez49957L9auXRs9e/bc4ZgdHTNi29WjKVOm5L7eunVr/PWvf41u3bpFXl5ea05rv9DQ0BDl5eWxatWqKCkpae/pfORYnx2zNi2zPi2zPi2zPi2zPi2zPi2zPi1LfX2yLIsNGzbEoYce2uK4Vj9NLiKaxEaWZS0GSHPjP7i/tccsKCiIgoKCRvsOPPDAFuedgpKSkiQf8LvK+uyYtWmZ9WmZ9WmZ9WmZ9WmZ9WmZ9WlZyutTWlq60zGtegOF7t27R8eOHZtcsVmzZk2TKzvb9ejRo9nxnTp1im7durU4ZkfHBAAA+LBaFUP5+flRWVkZ1dXVjfZXV1fH0KFDm73PkCFDmox/6qmnYtCgQdG5c+cWx+zomAAAAB9Wq58mN2XKlKiqqopBgwbFkCFD4o477oi6urqYOHFiRGx7Lc9bb70V99xzT0RETJw4MX70ox/FlClT4uKLL45ly5bFXXfd1ehd4iZNmhTDhw+PG264IcaMGROPPPJIPP300/Hcc8/tptPc/xUUFMTVV1/d5KmDbGN9dszatMz6tMz6tMz6tMz6tMz6tMz6tMz67Jq8LNvZ+801NXfu3Ljxxhujvr4+jj322Jg9e3YMHz48IiIuvPDCWLlyZSxevDg3fsmSJXHppZfG73//+zj00EPjm9/8Zi6etnvwwQfj29/+dvzpT3+KI444Iq677rr4/Oc//+HODgAAYAfaFEMAAAD7ula9ZggAAGB/IYYAAIAkiSEAACBJYoiPrBEjRkReXl7k5eVFbW1tu85l5cqVubkMHDiwXecCwL7twgsvjHPOOae9p7FP6927d8yZMyf3dV5eXjz88MPtNp/2NGLEiJg8eXJ7T2OfJYb2IRdeeGFcc8017T2Nveriiy/OvWvhdj/5yU/ipJNOigMOOCC6du0aw4cPj8cee6zJfbds2RKzZ8+O448/PgoLC+PAAw+M0aNHx/PPP58bc/bZZ8dnPvOZZr/3smXLIi8vL15++eUoLy+P+vr6uOyyy3b/SbJPmzdvXhx44IHtPY2PLOtDS3b0+PjgL7r7m5tuuinmzZu3W461v6/V+6X4exB7nhjiI62oqCh69OgRnTpt+0isyy+/PL7+9a/H2LFj45VXXonf/OY3MWzYsBgzZkz86Ec/yt0vy7I477zz4tprr41LLrkkVqxYEUuWLIny8vIYMWJE7n+PJkyYEM8880z8+c9/bvK977777hg4cGCceOKJ0bFjx+jRo0cUFxfvlfMGYP9VWlrqPwngI0IM7aN69+4d3/ve92L8+PFRXFwcFRUV8cgjj8Tbb78dY8aMieLi4jjuuOPipZdeau+p7ja//vWvY+bMmfGDH/wgLr/88jjyyCOjX79+cd1118XkyZNjypQpsWrVqoiIuP/+++PBBx+Me+65Jy666KLo06dPDBgwIO6444743Oc+FxdddFG8++678dnPfjYOOeSQJv9D949//CMWLFgQEyZMaIcz3X22bt0aN9xwQxx55JFRUFAQvXr1iuuuuy4iIr75zW9G3759o6ioKA4//PC46qqrYvPmzbn7XnPNNTFw4MD46U9/Gr17947S0tI477zzYsOGDe11OnvEhg0b4ktf+lIccMAB0bNnz5g9e3ajpxxs2rQpvvGNb8THP/7xOOCAA2Lw4MG5z1FbvHhxfOUrX4n169fnnka5v/2v5c7W529/+1uMHz8+DjrooCgqKorRo0fH66+/HhH75/r8/Oc/jwMPPDC2bt0aERG1tbWRl5cXV1xxRW7M17/+9Tj//PNj3bp1cf7558dhhx0WRUVFcdxxxzX6wPGIbZ+xd9xxx0WXLl2iW7du8ZnPfCbefffdvXpOH8aeeHyMGDEi/vznP8ell16a2x8Ru7Se+4r3P02uuSs7AwcObPSzcs0110SvXr2ioKAgDj300LjkkksiIna4Vuy/3n333dzvfj179oyZM2c2ur2ln7mI/78a++STT0a/fv2iuLg4zjjjjKivr9/bp/KRIYb2YbNnz45TTjklampq4qyzzoqqqqoYP358XHDBBfHyyy/HkUceGePHj4/95aOk5s+fH8XFxfH1r3+9yW2XXXZZbN68ORYuXBgREffee2/07ds3zj777GbHrlu3Lqqrq6NTp04xfvz4mDdvXqN1euCBB2LTpk3xpS99ac+d0F4wderUuOGGG+Kqq66KV199Ne69994oKyuLiIiuXbvGvHnz4tVXX42bbrop7rzzzpg9e3aj+7/xxhvx8MMPx2OPPRaPPfZYLFmyJK6//vr2OJU9ZsqUKfH888/Ho48+GtXV1bF06dJ4+eWXc7d/5Stfieeffz7uu++++O1vfxvnnntunHHGGfH666/H0KFDY86cOVFSUhL19fVRX18fl19+eTueze63s/W58MIL46WXXopHH300li1bFlmWxZlnnhmbN2/eL9dn+PDhsWHDhqipqYmIbR8q3r1791iyZEluzOLFi+PUU0+Nf/3rX1FZWRmPPfZY/O53v4uvfe1rUVVVFS+++GJERNTX18f5558fX/3qV2PFihWxePHi+PznP79P/Z29Jx4fDz30UBx22GFx7bXX5vZHxE7Xc3/14IMPxuzZs+P222+P119/PR5++OE47rjjIiJ2uFbsv6644or41a9+FYsWLYqnnnoqFi9eHMuXL8/d3tLP3Hb/+Mc/YsaMGfHTn/40nn322airq9vn/27+UDL2SRUVFdkFF1yQ+7q+vj6LiOyqq67K7Vu2bFkWEVl9fX17TPFDO/XUU7NJkyblvj7jjDOyAQMG7HB8aWlp9h//8R9ZlmXZMccck40ZM6bZcX/961+ziMhuuOGGLMuybMWKFVlEZM8880xuzPDhw7Pzzz+/yX2vvvrqFufwUdLQ0JAVFBRkd9555y6Nv/HGG7PKysrc11dffXVWVFSUNTQ05PZdccUV2eDBg3f7XNtLQ0ND1rlz5+yBBx7I7fv73/+eFRUVZZMmTcr++Mc/Znl5edlbb73V6H6nnXZaNnXq1CzLsuzHP/5xVlpaujenvdfsbH1ee+21LCKy559/Pnf72rVrsy5dumT3339/lmX75/qceOKJ2YwZM7Isy7Jzzjknu+6667L8/PysoaEh93fxihUrmr3vmWeemV122WVZlmXZ8uXLs4jIVq5cudfmvjvtycdHRUVFNnv27J3O4f3ruS/58pe/nPs3qrlzHTBgQHb11VdnWZZlM2fOzPr27Ztt2rSp2WPt6lrtbz543hGRLVq0qN3mszds2LAhy8/Pz+67777cvnXr1mVdunRp1c9cRGR//OMfc2NuueWWrKysbO+dyEeMK0P7sOOPPz735+3/27/9f4vev2/NmjV7d2LtJMuyVj1FYPvYY445JoYOHRp33313RGy7GrJ06dL46le/ukfmubesWLEiNm7cGKeddlqztz/44IPxqU99KvdaqKuuuirq6uoajendu3d07do193XPnj33q8fTn/70p9i8eXOcdNJJuX2lpaVx9NFHR0TEyy+/HFmWRd++faO4uDi3LVmyJN544432mvZes7P1WbFiRXTq1CkGDx6cu71bt25x9NFHx4oVK/b6fPeWESNGxOLFiyPLsli6dGmMGTMmjj322HjuuefiV7/6VZSVlcUxxxwTW7Zsieuuuy6OP/746NatWxQXF8dTTz2V+zkbMGBAnHbaaXHcccfFueeeG3feeWf87W9/a+ez23V7+/Gxs/XcX5177rnxz3/+Mw4//PC4+OKLY9GiRfHee++197RoB2+88UZs2rQphgwZktv3sY99rNU/c0VFRXHEEUfkvt7f/m1vLTG0D+vcuXPuz9t/sW9u3/bntu/r+vbtm/uL4IP+8pe/RENDQxx11FG5sa+++mqzx9n+F8L2sRHb3khh4cKF0dDQED/+8Y+joqJihxGxr+jSpcsOb/v1r38d5513XowePToee+yxqKmpiWnTpjVZ2/c/niK2Pab2l8dTROSejvTBiN6+f+vWrdGxY8dYvnx51NbW5rYVK1bETTfdtNfnu7ftbH2yHTydq7X/MbGvGTFiRCxdujReeeWV6NChQ/Tv3z9OPfXUWLJkSe4pchERM2fOjNmzZ8c3vvGNeOaZZ6K2tjZGjRqV+znr2LFjVFdXxy9+8Yvo379/3HzzzXH00UfHm2++2Z6nt8v29uNjZ+u5r+rQoUOTtXr/U5rKy8vjD3/4Q9xyyy3RpUuX+M///M8YPnx4ozGkYUc/Uzu7/YM/c839276zY+/PxBD7jPPOOy/eeeeduP3225vcNmPGjOjcuXN84QtfyI19/fXX4+c//3mTsTNnzoxu3brF6aefnts3duzY6NixY9x7773xk5/8JL7yla/s87/MHXXUUdGlS5f45S9/2eS2559/PioqKmLatGkxaNCgOOqoo5p9R7393RFHHBGdO3eO3/zmN7l9DQ0NuRebnnDCCbFly5ZYs2ZNHHnkkY22Hj16REREfn5+bNmypV3mv6ftbH369+8f7733XqPXbKxbty5ee+216NevX0Tsn+uz/XVDc+bMiVNPPTXy8vLi1FNPjcWLFzeKoe1XjS644IIYMGBAHH744Y1eyByx7ZeQU045Jb773e9GTU1N5Ofnx6JFi9rjtFptTz4+mtu/K+u5Lzr44IMbvdanoaGhSRB36dIlPve5z8UPf/jDWLx4cSxbtiz+93//NyL2z58xmnfkkUdG586d49e//nVu39/+9rd47bXXImLXfuZoqlN7TwB21ZAhQ2LSpElxxRVXxKZNm+Kcc86JzZs3x89+9rO46aabYs6cOVFeXh4R22LogQceiC9/+cvxgx/8IE477bRoaGiIW265JR599NF44IEH4oADDsgdu7i4OMaNGxff+ta3Yv369XHhhRe201nuPoWFhfHNb34zvvGNb0R+fn6ccsop8fbbb8fvf//7OPLII6Ouri7uu++++OQnPxmPP/74PvML2O7UtWvX+PKXvxxXXHFFfOxjH4tDDjkkrr766ujQoUPk5eVF375940tf+lKMHz8+Zs6cGSeccEKsXbs2nnnmmTjuuOPizDPPjN69e8c777wTv/zlL2PAgAFRVFQURUVF7X1qu8XO1ueoo46KMWPGxMUXXxy33357dO3aNa688sr4+Mc/HmPGjImI2C/Xp7S0NAYOHJj7uydiWyCde+65sXnz5hgxYkREbPvFZeHChfHCCy/EQQcdFLNmzYrVq1fnfil58cUX45e//GWMHDkyDjnkkHjxxRfj7bff3md+admTj4/evXvHs88+G+edd14UFBRE9+7dd7qe+6p/+7d/i3nz5sXZZ58dBx10UFx11VXRsWPH3O3z5s2LLVu2xODBg6OoqCh++tOfRpcuXaKioiIiotm1Yv9UXFwcEyZMiCuuuCK6desWZWVlMW3atOjQYdu1jV35maMZe/tFSuwezb1gMj7w4sE333wzi4ispqZmr85td/ngGyhsd9ddd2WDBg3KunTpkhUVFWWf+tSnskcffbTJuM2bN2czZszIPvGJT2QFBQVZSUlJNmrUqGzp0qXNfr8XXnghi4hs5MiRO5zTvvQGClmWZVu2bMm+973vZRUVFVnnzp2zXr16Zd///vezLNv2ZgjdunXLiouLs3HjxmWzZ89u9ELm5s519uzZWUVFxd47gb2goaEh++IXv5gVFRVlPXr0yGbNmpWddNJJ2ZVXXpllWZZt2rQp+853vpP17t0769y5c9ajR4/s3//937Pf/va3uWNMnDgx69atWxYRuRc97y92tj5//etfs6qqqqy0tDTr0qVLNmrUqOy1115rdIz9cX0uu+yyLCKy3/3ud7l9AwYMyA4++OBs69atWZZte2HzmDFjsuLi4uyQQw7Jvv3tb2fjx4/PvXD+1VdfzUaNGpUdfPDBWUFBQda3b9/s5ptvbo/TabM99fhYtmxZdvzxx2cFBQXZ9l9Vdrae+5L3v4HC+vXrs7Fjx2YlJSVZeXl5Nm/evEZvoLBo0aJs8ODBWUlJSXbAAQdkJ598cvb000/njtXcWqUgxTdQyLJtb6JwwQUXZEVFRVlZWVl24403Nvp9aWc/c829acmiRYuSeux8UF6WJfwkQT7SRowYEQMHDvxIfbL2NddcEw8//HDU1ta291TYQ9599934+Mc/HjNnztznP2dqT7A+tMTjY9ecf/750bFjx/jZz37W3lOB5HnNEB9pc+fOjeLi4txzo9tLXV1dFBcXx/e///12nQe7X01NTcyfPz/eeOONePnll3OfLeUpBdtYH1ri8dE67733Xrz66quxbNmy+MQnPtHe0wHCa4b4CPuf//mf+Oc//xkREb169WrXuRx66KG5q0EFBQXtOhd2vxkzZsQf/vCHyM/Pj8rKyli6dKnn3b+P9aElHh+77ne/+10MHTo0Pv3pT8fEiRPbezpARHiaHAAAkCRPkwMAAJIkhgAAgCSJIQAAIEliCAAASJIYAgAAkiSGAACAJIkhAAAgSWIIAABI0v8BK9t0kjmg7TMAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "bigram = NGramLM(oov_train,2)\n", "plot_probabilities(bigram, (\"I\",)) # bigrams starting with \"I\"" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Samples should look (slightly) more fluent:" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "\"I set em Yo [OOV] enemies [OOV] [OOV] is yours what the [OOV] it So it There 's mind [OOV] Recognize your [OOV] up [OOV] [OOV] wanna get up [OOV] and\"" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\" \".join(sample(bigram, ['I'], 30)) # try: I, FIND, [OOV]" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "How about perplexity?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "inf" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perplexity(bigram,oov_test)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "Some contexts where OOV word (and others) haven't been seen, hence 0 probability..." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bigram.probability(\"[OOV]\",\"money\")" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "subslide" } }, "source": [ "## Out-of-Vocabularly (OOV) Tokens\n", "In your test set, there will virtually always be words with zero training counts.\n", "\n", "Solutions:\n", "1. Remove unseen words from test set (pretend there is no problem)?\n", "2. Use subword tokenisation to ensure there are no `OOV` tokens?\n", "3. Replace unseen words with out-of-vocabularly token, estimate its probability (`OOV` injection)?\n", "4. **Move probability mass to unseen words (smoothing)?**" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Smoothing\n", "\n", "Maximum likelihood \n", "* **underestimates** true probability of some words \n", "* **overestimates** the probabilities of other\n", "\n", "Solution: _smooth_ the probabilities and **move mass** from seen to unseen events." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Laplace Smoothing / Additive Smoothing\n", "\n", "Add **pseudo counts** to each event in the dataset \n", "\n", "$$\n", "\\param^{\\alpha}_{w,h} = \\frac{\\counts{\\train}{h,w} + \\alpha}{\\counts{\\train}{h} + \\alpha \\lvert V \\rvert } \n", "$$" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "0.0007704160246533128" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "laplace_bigram = LaplaceLM(bigram, 0.1) \n", "laplace_bigram.probability(\"[OOV]\",\"money\")" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "Perplexity should look better now:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "255.11837473847797" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "perplexity(LaplaceLM(bigram, 0.001),oov_test)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Example\n", "Consider three events:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
wordtrain countMLELaplaceSame Denominator
smally 0 0/3 1/6 0.5/3
bigly 1 1/3 2/6 1/3
tremendously 2 2/3 3/6 1.5/3
" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = [\"word\", \"train count\", \"MLE\", \"Laplace\", \"Same Denominator\"]\n", "r1 = [\"smally\", \"0\", \"0/3\", \"1/6\", \"0.5/3\"]\n", "r2 = [\"bigly\", \"1\", \"1/3\", \"2/6\", \"1/3\"]\n", "r3 = [\"tremendously\", \"2\", \"2/3\", \"3/6\", \"1.5/3\"]\n", "util.Table([r1,r2,r3], column_names=c)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Interpolation / Jelinek-Mercer Smoothing\n", "* Laplace Smoothing assigns mass **uniformly** to the words that haven't been seen in a context." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "(0.0005656108597285067, 0.0005656108597285067)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "laplace_bigram.probability('rhyme','man'), \\\n", "laplace_bigram.probability('of','man') \n", "# also try: 'skies','skies' vs. '[/BAR]','skies'" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "Problem: not all unseen words (in a context) are equal" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "With **interpolation** we can do better: \n", "* give more mass to words likely under the $n-1$-gram model. \n", " * Use $\\prob(\\text{of})$ for estimating $\\prob(\\text{of} | \\text{man})$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "* Combine $n$-gram model \\\\(p'\\\\) and a back-off \\\\(n-1\\\\) model \\\\(p''\\\\): \n", "\n", "$$\n", "\\prob_{\\alpha}(w_i|w_{i-n+1},\\ldots,w_{i-1}) = \\alpha \\cdot \\prob'(w_i|w_{i-n+1},\\ldots,w_{i-1}) + \\\\ (1 - \\alpha) \\cdot \\prob''(w_i|w_{i-n+2},\\ldots,w_{i-1})\n", "$$\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "(0.0014514278429372768, 0.009276517083120857)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interpolated = InterpolatedLM(bigram,unigram,0.01)\n", "interpolated.probability('rhyme','man'), \\\n", "interpolated.probability('of','man')" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "Can we find a good $\\alpha$ parameter? Tune on some **development set**!" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAz8AAAH5CAYAAACve4DDAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAABWFUlEQVR4nO3dd3yV9d3/8fdZ2ckhiwwSCHuGAAFUnAiCyKir0Npa25u2jjqoaK0dWu/bW6r9Wfdqb1tat7ViAXGACxBXAmGPAAESSAghe57knOv3R4ZEhiQkuc54PR+P85Ccc534jl5A3vl8r+9lMQzDEAAAAAD4OavZAQAAAACgJ1B+AAAAAAQEyg8AAACAgED5AQAAABAQKD8AAAAAAgLlBwAAAEBAoPwAAAAACAh2swN0hsfj0aFDhxQZGSmLxWJ2HAAAAAAmMQxDVVVVSk5OltV66tmOT5afQ4cOKTU11ewYAAAAALxEfn6+UlJSTnmMT5afyMhISc1fYFRUlMlpAAAAAJilsrJSqampbR3hVHyy/LQudYuKiqL8AAAAADity2HY8AAAAABAQKD8AAAAAAgIlB8AAAAAAYHyAwAAACAgUH4AAAAABATKDwAAAICAQPkBAAAAEBAoPwAAAAACAuUHAAAAQECg/AAAAAAICJQfAAAAAAGB8gMAAAAgIFB+AAAAAAQEyg8AAACAgED5AQAAABAQKD8AAAAAAgLlBwAAAECHGIZhdoROsZsdAAAAAIBv+dk/syUZumP6UA1LjDI7zmlj8gMAAADgtB2tbtBHO4u1anuxgu02s+N0COUHAAAAwGlbsaVIbo+h9D5O9Y8LNztOh1B+AAAAAJy2ZTmHJElzMpJNTtJxlB8AAAAAp+VQeZ2+3FcqSZo5OsnkNB1H+QEAAABwWpZvap76TEyLUXKvUJPTdBzlBwAAAMBpWbqxufzMHuN7S94kyg8AAACA07D3SLW2HKyUzWrRZaMSzY7TKZQfAAAAAN9q2cZCSdJ5g+IUGxFscprOofwAAAAAOCXDMLR040FJvrnLWyvKDwAAAIBT2lZYqT1HahRkt2rayASz43Qa5QcAAADAKbVudHDx0N6KDHGYnKbzKD8AAAAATsrjMbS85XqfOT66y1sryg8AAACAk1p/oEwHy+sUEWzXxcN6mx3njFB+AAAAAJxU65K3aSMSFOKwmZzmzFB+AAAAAJxQk9ujFZubl7z56o1Nj0X5AQAAAHBCn+09qpJql6LDHDpvUJzZcc4Y5QcAAADACS3NaV7ydll6khw2368Ovv8VAAAAAOhyDU1uvbu1SJI024dvbHosyg8AAACA43y884iq6puUGBWiiWkxZsfpEpQfAAAAAMdp3eVt1ugkWa0Wk9N0DcoPAAAAgHZqGpr0wfbDknz/xqbHovwAAAAAaGfV9sOqb/QoLTZM6X2cZsfpMpQfAAAAAO207vI2JyNZFot/LHmTKD8AAAAAjlFe69Lq3COS/GeXt1aUHwAAAABt3tlSpEa3oWGJkRqcEGl2nC5F+QEAAADQpm3Jmx9tdNCK8gMAAABAknS4sl6f5x2VJM0eTfkBAAAA4KeWbyqUYUjj+vZSakyY2XG6HOUHAAAAgCRp2cavd3nzR5QfAAAAADpwtFY5+eWyWqSZfrjkTaL8AAAAAJC0bFPz1GfSwDjFRwabnKZ7UH4AAAAAtLuxqb+i/AAAAAABbmdRlXYerpLDZtH0kYlmx+k2lB8AAAAgwC3deFCSdOGQ3nKGOUxO030oPwAAAEAAMwxDyzYWSvLPG5sei/IDAAAABLCc/HIdKK1VqMOmqcN7mx2nW1F+AAAAgADWOvW5ZESCwoLsJqfpXh0uP6tXr9bs2bOVnJwsi8Wit956q93rP/7xj2WxWNo9zj777HbHNDQ06JZbblFcXJzCw8M1Z84cFRQUnNEXAgAAAKBj3B5Dyzf5/y5vrTpcfmpqapSRkaEnn3zypMdceumlKiwsbHusWLGi3esLFizQkiVL9Oqrr2rt2rWqrq7WrFmz5Ha7O/4VAAAAAOiUL/KOqriqQVEhdl0wJN7sON2uw3OtGTNmaMaMGac8Jjg4WImJJ94ir6KiQs8//7xeeOEFTZ06VZL04osvKjU1VatWrdL06dM7GgkAAABAJyzb2Dz1mTEqSUF2/78iplu+wo8//li9e/fWkCFD9LOf/UzFxcVtr2VnZ6uxsVHTpk1rey45OVmjRo3SunXrTvj5GhoaVFlZ2e4BAAAAoPNcTR6t2Fwkyf93eWvV5eVnxowZeumll/Thhx/q4Ycf1ldffaWLL75YDQ0NkqSioiIFBQUpOjq63fsSEhJUVFR0ws+5aNEiOZ3OtkdqampXxwYAAAACyprcI6qoa1R8ZLDOHhBrdpwe0eXbOcybN6/t16NGjdL48ePVr18/vf3227ryyitP+j7DMGSxWE742t13363bb7+97ePKykoKEAAAAHAGlrYseZuZniSb9cTfh/ubbl/Yl5SUpH79+ik3N1eSlJiYKJfLpbKysnbHFRcXKyEh4YSfIzg4WFFRUe0eAAAAADqnzuXWym2HJQXOkjepB8rP0aNHlZ+fr6SkJElSZmamHA6HVq5c2XZMYWGhtmzZokmTJnV3HAAAACDgfbDjsGpdbqXGhGpsai+z4/SYDi97q66u1u7du9s+zsvLU05OjmJiYhQTE6M//OEPuuqqq5SUlKR9+/bpN7/5jeLi4nTFFVdIkpxOp+bPn6+FCxcqNjZWMTExuuOOO5Sent62+xsAAACA7rM0p3nJ2+zRySe99MQfdbj8ZGVlafLkyW0ft16Lc9111+mZZ57R5s2b9c9//lPl5eVKSkrS5MmT9dprrykyMrLtPY888ojsdrvmzp2ruro6TZkyRYsXL5bNZuuCLwkAAADAyVTUNerjnUckSbMD4Mamx7IYhmGYHaKjKisr5XQ6VVFRwfU/AAAAQAe8npWvX72xSYN7R+j9X17g85OfjnQD/7+TEQAAAIA2rTc2nZMRWEveJMoPAAAAEDCOVDXo090lkgJvyZtE+QEAAAACxjtbCuUxpIwUp9Liws2O0+MoPwAAAECAaNvlLQCnPhLlBwAAAAgIB8vrlLW/TBaLNGs05QcAAACAn2rd6GBiWowSnSEmpzEH5QcAAAAIAK1L3uaMCcypj0T5AQAAAPze7uJqbSuslN1q0WWjksyOYxrKDwAAAODnlrYseTt/cJyiw4NMTmMeyg8AAADgxwzD0PKNLHmTKD8AAACAX9t6qFJ7S2oUbLfqkhGJZscxFeUHAAAA8GOtS96mDk9QRLDd5DTmovwAAAAAfsrjMdq2uA7UG5sei/IDAAAA+Kms/WUqrKhXZLBdFw2NNzuO6Sg/AAAAgJ9auvGgJGnayESFOGwmpzEf5QcAAADwQ41uj1ZsLpLELm+tKD8AAACAH1q356hKa1yKDQ/SuQNjzY7jFSg/AAAAgB9amtO80cFl6Umy2/i2X6L8AAAAAH6nvtGt97ey5O2bKD8AAACAn/l4Z7GqGpqU7AxRZt9os+N4DcoPAAAA4Gdab2w6KyNZVqvF5DTeg/IDAAAA+JGq+kZ9sL1YkjSHG5u2Q/kBAAAA/MjKbYfV0OTRgLhwjUyOMjuOV6H8AAAAAH5kWcuSt9kZybJYWPJ2LMoPAAAA4CfKalxak1siiV3eToTyAwAAAPiJFVsK1eQxNDI5SgPjI8yO43UoPwAAAICfaL2xKRsdnBjlBwAAAPADRRX1+nJfqaTmLa5xPMoPAAAA4AeWbzokw5DG94tWn16hZsfxSpQfAAAAwA+03tiUjQ5OjvIDAAAA+Lh9JTXaVFAhm9Wiy9KTzI7jtSg/AAAAgI9rvbfPpIGxiosINjmN96L8AAAAAD7MMIyvl7yx0cEpUX4AAAAAH7ajqEq5xdUKslk1fVSi2XG8GuUHAAAA8GGtU5+LhsYrKsRhchrvRvkBAAAAfJRhGG3X+7DL27ej/AAAAAA+av2BchWU1Sk8yKYpwxLMjuP1KD8AAACAj2qd+lwyIkGhQTaT03g/yg8AAADgg9weQ8s3FUpiydvpovwAAAAAPujzvUdVUt2gXmEOnTco3uw4PoHyAwAAAPigpTnNS95mjEpSkJ1v608H/5UAAAAAH9PQ5NY7W1qWvHFj09NG+QEAAAB8zOpdJaqsb1LvyGBN7B9jdhyfQfkBAAAAfEzrjU1njU6WzWoxOY3voPwAAAAAPqTW1aRV2w5LYpe3jqL8AAAAAD5k1fZi1TW61S82TBkpTrPj+BTKDwAAAOBDWnd5mz06WRYLS946gvIDAAAA+IiK2kZ9sqtYEkveOoPyAwAAAPiId7cWqtFtaFhipIYkRJodx+dQfgAAAAAf0brL22zu7dMplB8AAADABxRX1euzPUclNV/vg46j/AAAAAA+4O1NhfIY0pjUXuobG2Z2HJ9E+QEAAAB8wLKWJW9zWPLWaZQfAAAAwMvll9Zq/YFyWS3SrNFJZsfxWZQfAAAAwMst29Q89Tl7QKx6R4WYnMZ3UX4AAAAAL9d6Y1OWvJ0Zyg8AAADgxXIPV2lHUZUcNosuHZVodhyfRvkBAAAAvFjrvX0uGByvXmFBJqfxbZQfAAAAwEsZhtFWfuaMYcnbmaL8AAAAAF5q88EK7T9aqxCHVVOHJ5gdx+dRfgAAAAAv1brRwdThCQoPtpucxvdRfgAAAAAv5PEYWr6pUBK7vHUVyg8AAADghb7cV6qiynpFhth14dB4s+P4hQ6Xn9WrV2v27NlKTk6WxWLRW2+9ddJjr7/+elksFj366KPtnm9oaNAtt9yiuLg4hYeHa86cOSooKOhoFAAAAMBvtW50cOnIRAXbbSan8Q8dLj81NTXKyMjQk08+ecrj3nrrLX3xxRdKTj5+RLdgwQItWbJEr776qtauXavq6mrNmjVLbre7o3EAAAAAv9Po9uidzS1L3tjlrct0+KqpGTNmaMaMGac85uDBg7r55pv13nvvaebMme1eq6io0PPPP68XXnhBU6dOlSS9+OKLSk1N1apVqzR9+vSORgIAAAD8ytrcEpXVNiouIkjnDIg1O47f6PJrfjwej6699lrdeeedGjly5HGvZ2dnq7GxUdOmTWt7Ljk5WaNGjdK6detO+DkbGhpUWVnZ7gEAAAD4q2UtS95mpifJbuMy/a7S5f8lH3zwQdntdt16660nfL2oqEhBQUGKjo5u93xCQoKKiopO+J5FixbJ6XS2PVJTU7s6NgAAAOAV6hvdem9r8/fFLHnrWl1afrKzs/XYY49p8eLFslgsHXqvYRgnfc/dd9+tioqKtkd+fn5XxAUAAAC8zoc7ilXjcqtPr1CN6xv97W/AaevS8rNmzRoVFxerb9++stvtstvt2r9/vxYuXKi0tDRJUmJiolwul8rKytq9t7i4WAkJJ75rbXBwsKKioto9AAAAAH/UemPT2RnJHR4o4NS6tPxce+212rRpk3JyctoeycnJuvPOO/Xee+9JkjIzM+VwOLRy5cq29xUWFmrLli2aNGlSV8YBAAAAfEplfaM+3FksiRubdocO7/ZWXV2t3bt3t32cl5ennJwcxcTEqG/fvoqNbb8bhcPhUGJiooYOHSpJcjqdmj9/vhYuXKjY2FjFxMTojjvuUHp6etvubwAAAEAgen/rYbmaPBoYH67hSZFmx/E7HS4/WVlZmjx5ctvHt99+uyTpuuuu0+LFi0/rczzyyCOy2+2aO3eu6urqNGXKFC1evFg2GzdvAgAAQOBqvbHpnIw+LHnrBhbDMAyzQ3RUZWWlnE6nKioquP4HAAAAfuFodYMmPvCB3B5DH91xkfrHhZsdySd0pBuwaTgAAADgBVZsKZLbYyi9j5Pi000oPwAAAIAXWJbTuuSNjQ66C+UHAAAAMNmh8jp9ua9UFos0KyPJ7Dh+i/IDAAAAmGz5puapz4S0GCU5Q01O478oPwAAAIDJWnd5m82St25F+QEAAABMtPdItbYcrJTNatFloxLNjuPXKD8AAACAiVqnPucNilNsRLDJafwb5QcAAAAwiWEYWraRXd56CuUHAAAAMMm2wkrtOVKjYLtV00YmmB3H71F+AAAAAJO0Lnm7eFhvRYY4TE7j/yg/AAAAgAk8HkPLNxZKYslbT6H8AAAAACZYf6BMB8vrFBFs1+Rhvc2OExAoPwAAAIAJWpe8TRuRoBCHzeQ0gYHyAwAAAPSwJrdHKzY3L3mbPYYlbz2F8gMAAAD0sM/2HlVJtUvRYQ6dNyjO7DgBg/IDAAAA9LClOc1L3i5LT5LDxrfkPYX/0gAAAEAPamhy692tRZLY5a2nUX4AAACAHvTxziOqqm9SkjNEE9JizI4TUCg/AAAAQA9q3eVt1ugkWa0Wk9MEFsoPAAAA0ENqGpr0wfbDkqTZLHnrcZQfAAAAoIes3HZY9Y0epcWGKb2P0+w4AYfyAwAAAPSQZS1L3uZkJMtiYclbT6P8AAAAAD2gvNal1blHJElzuLGpKSg/AAAAQA94Z0uRGt2GhidFaVDvSLPjBCTKDwAAANADWm9syr19zEP5AQAAALrZ4cp6fZ53VFLzFtcwB+UHAAAA6GbLNxXKMKRxfXspNSbM7DgBi/IDAAAAdLOlG1ny5g0oPwAAAEA3OnC0Vhvzy2W1SDNHU37MRPkBAAAAutGyTc1Tn0kD4xQfGWxymsBG+QEAAAC6Ebu8eQ/KDwAAANBNdhZVaefhKgXZrJo+KtHsOAGP8gMAAAB0k6UbD0qSLhwaL2eow+Q0oPwAAAAA3cAwDC3bWChJms2SN69A+QEAAAC6QU5+uQ6U1irUYdPU4b3NjgNRfgAAAIBu0Tr1uWREgsKC7CangUT5AQAAALqc22No+SZ2efM2lB8AAACgi32Rd1TFVQ1yhjp0wZB4s+OgBeUHAAAA6GLLNjZPfWaMSlSQnW+5vQX/JwAAAIAu5GryaMXmIkksefM2lB8AAACgC63JPaKKukbFRwbrrAGxZsfBMSg/AAAAQBda2rLkbWZ6kmxWi8lpcCzKDwAAANBF6lxurdx2WJI0ZwxL3rwN5QcAAADoIh/sOKxal1upMaEam9rL7Dj4BsoPAAAA0EWW5jQveZs9OlkWC0vevA3lBwAAAOgCFXWN+njnEUksefNWlB8AAACgC7y3tUgut0dDEiI0LDHK7Dg4AcoPAAAA0AVab2zKvX28F+UHAAAAOENHqhr06e4SSdKs0ZQfb0X5AQAAAM7Qis2F8hhSRopTaXHhZsfBSVB+AAAAgDPUuuRtNkvevBrlBwAAADgDB8vrlLW/TBYL5cfbUX4AAACAM9A69Tmrf4wSokJMToNTofwAAAAAZ6D1xqZzMvqYnATfhvIDAAAAdNLu4mptK6yU3WrRjFGJZsfBt6D8AAAAAJ20tGXJ2/mD4xQdHmRyGnwbyg8AAADQCYZhfH1j0zFsdOALKD8AAABAJ2w9VKm8khoF2626ZARL3nwB5QcAAADohNYlb1OHJygi2G5yGpwOyg8AAADQQR6PwY1NfRDlBwAAAOigrP1lKqyoV2SwXRcNjTc7Dk4T5QcAAADooKUbD0qSpo9KVIjDZnIanC7KDwAAANABjW6PVmwuksSSN1/T4fKzevVqzZ49W8nJybJYLHrrrbfavf6HP/xBw4YNU3h4uKKjozV16lR98cUX7Y5paGjQLbfcori4OIWHh2vOnDkqKCg4oy8EAAAA6Amf7i5RaY1LseFBOndgrNlx0AEdLj81NTXKyMjQk08+ecLXhwwZoieffFKbN2/W2rVrlZaWpmnTpunIkSNtxyxYsEBLlizRq6++qrVr16q6ulqzZs2S2+3u/FcCAAAA9IBlGwslSZelJ8luYyGVL7EYhmF0+s0Wi5YsWaLLL7/8pMdUVlbK6XRq1apVmjJliioqKhQfH68XXnhB8+bNkyQdOnRIqampWrFihaZPn/6t/97Wz1lRUaGoqKjOxgcAAAA6pL7RrQn3r1JVQ5P+dcM5mpAWY3akgNeRbtCtVdXlcukvf/mLnE6nMjIyJEnZ2dlqbGzUtGnT2o5LTk7WqFGjtG7duhN+noaGBlVWVrZ7AAAAAD3t453FqmpoUrIzRJl9o82Ogw7qlvKzfPlyRUREKCQkRI888ohWrlypuLg4SVJRUZGCgoIUHd3+ZElISFBRUdEJP9+iRYvkdDrbHqmpqd0RGwAAADilpcfc28dqtZicBh3VLeVn8uTJysnJ0bp163TppZdq7ty5Ki4uPuV7DMOQxXLiE+juu+9WRUVF2yM/P787YgMAAAAnVVXfqA+2N39Pyy5vvqlbyk94eLgGDRqks88+W88//7zsdruef/55SVJiYqJcLpfKysravae4uFgJCQkn/HzBwcGKiopq9wAAAAB60spth9XQ5NGAuHCNTOb7UV/UI9tTGIahhoYGSVJmZqYcDodWrlzZ9nphYaG2bNmiSZMm9UQcAAAAoMOOXfJ2shVL8G72jr6hurpau3fvbvs4Ly9POTk5iomJUWxsrP73f/9Xc+bMUVJSko4ePaqnn35aBQUF+u53vytJcjqdmj9/vhYuXKjY2FjFxMTojjvuUHp6uqZOndp1XxkAAADQRUprXFqbWyJJmjOGJW++qsPlJysrS5MnT277+Pbbb5ckXXfddXr22We1Y8cO/eMf/1BJSYliY2M1YcIErVmzRiNHjmx7zyOPPCK73a65c+eqrq5OU6ZM0eLFi2Wz2brgSwIAAAC61jtbCtXkMTQyOUoD4yPMjoNOOqP7/JiF+/wAAACgJ8177jN9kVequ2cM0/UXDjQ7Do7hNff5CQTZ+8t07fNf6KMdp97NDgAAAL6pqKJeX+4rlSTNYpc3n9bhZW9o772tRVqTW6LK+iZdNDSei98AAAD8zPJNh2QY0oS0aPXpFWp2HJwBJj9n6GfnD1CIw6qN+eX6ZNcRs+MAAACgi7Xu8jaHqY/Po/ycofjIYP3wrH6SpMc+yJUPXkIFAACAk8grqdGmggrZrBbNSE8yOw7OEOWnC/z8wgEKtlu14UC51rRsgQgAAADft7xl6jNpYKziIoJNToMzRfnpAr0jQ/QDpj8AAAB+xTAMlrz5GcpPF7mhZfqTvb9Mn+4+anYcAAAAnKEdRVXKLa5WkN2q6aMSzY6DLkD56SK9o0J0zVl9JUmPfbCL6Q8AAICPa536TB4ar6gQh8lp0BUoP13ohgsHKshu1Vf7yvTZHqY/AAAAvsowDC1rW/LWx+Q06CqUny6UEBWiayY2T38eXcW1PwAAAL5q/YFyFZTVKTzIpouH9TY7DroI5aeL3XDhQAXZrPpyX6k+28v0BwAAwBe1Tn0uGZGg0CCbyWnQVSg/XSzRGaLvTUyVJD22KtfkNAAAAOgot8fQ8k2FkqQ5Y9jlzZ9QfrrBjRc1T3++yCvV50x/AAAAfMrne4+qpLpBvcIcOm9QvNlx0IUoP90gyRmquRNSJDH9AQAA8DVLc5qXvM0YlaQgO98u+xP+b3aTGy8aJIfNos/2HtWXeaVmxwEAAMBpaGhy650tLUveuLGp36H8dJM+vUI1d3zLtT8f7DI5DQAAAE7H6l0lqqxvUkJUsCb2jzE7DroY5acb3TS5efrz6e6jytrH9AcAAMDbtd7YdNboZNmsFpPToKtRfrpRn16hujqzdfrDtT8AAADerNbVpFXbDkuSZrPkzS9RfrrZTRcNlN1q0ZrcEmXvZ/oDAADgrVZuO6y6Rrf6xYYpI8Vpdhx0A8pPN0uNCdPVmc07vz3Kzm8AAABea9nG5o0OZo9OlsXCkjd/RPnpAb+YPKht+rP+QJnZcQAAAPANFbWN+mRXsSRubOrPKD89IDUmTFeO6yOJ+/4AAAB4o3e3FqrRbWhYYqSGJESaHQfdhPLTQ26ePFg2q0Wf7DqinPxys+MAAADgGK27vLHRgX+j/PSQvrFhumJs6/SH+/4AAAB4i+Kqen2256gkbmzq7yg/PejmyYNks1r00c4j2sj0BwAAwCu8valQHkMak9pLqTFhZsdBN6L89KC0uHBdPqZ5+vM49/0BAADwCq1L3pj6+D/KTw+7+eJBslqkD3YUa3NBhdlxAAAAAlp+aa02HCiX1SLNGp1kdhx0M8pPD+t/zPTnsQ+49gcAAMBMyzY1T33OHhCr3lEhJqdBd6P8mKB1+rNqe7G2HGT6AwAAYJalOSx5CySUHxMMiI9o+w32GNf+AAAAmGL1riPaUVQlh82iGaNY8hYIKD8mufniwbJYpJXbDmvrIaY/AAAAPSm/tFa3vrpBkjRvQqqcYQ6TE6EnUH5MMqh3hGaPbp7+sPMbAABAz6lvdOuGF7NVXtuojBSnfjdzhNmR0EMoPya6dcogWSzSe1sPa3thpdlxAAAA/J5hGPrNm5u19VClYsOD9MwPMxXisJkdCz2E8mOiQb0jNTO9eX0p0x8AAIDu9491+/TmhoOyWS168ppxSu4VanYk9CDKj8lundJ87c87W4q0o4jpDwAAQHf5Mq9U97+9XZJ094xhOmdgrMmJ0NMoPyYbkhCpy1qmP098sNvkNAAAAP6pqKJeN720Xk0eQ98Zk6z55/U3OxJMQPnxArdePFiS9PbmQu0sqjI5DQAAgH9paHLrxpeyVVLdoGGJkVp0ZbosFovZsWACyo8XGJoYqcvSEyVJj3/ItT8AAABd6b5l27ThQLmiQux67tpMhQXZzY4Ek1B+vMStU5qnPys2Fyr3MNMfAACArvDaVwf08hcHZLFIj39/rPrFhpsdCSai/HiJYYlRunRkogxDevxDrv0BAAA4Uzn55fr9W1slSQsvGaKLhvY2ORHMRvnxIq3Tn+WbDml3MdMfAACAziqpbtCNL2bL5fZo2ogE3XTRILMjwQtQfrzIiOQoTRuRIMOQnmD6AwAA0CmNbo9+8dJ6FVbUa2B8uB6emyGrlQ0OQPnxOq3Tn2UbD2nPkWqT0wAAAPieRSt26Iu8UkUE2/XcteMVGeIwOxK8BOXHy4zq49QlIxLkMaQnmf4AAAB0yH9yDupvn+ZJkh6em6FBvSNMTgRvQvnxQre1TH/+k3NQe5n+AAAAnJZthyp11783SZJunjxI00cmmpwI3oby44VG9XFq6vDeTH8AAABOU3mtS9e/mKX6Ro8uGBKvX14yxOxI8EKUHy9125Tm37Bv5RxUXkmNyWkAAAC8l9tj6NZXc5RfWqe+MWF6/HtjZGODA5wA5cdLpac4dfEwpj8AAADf5s8rd2r1riMKcVj17A8z1SssyOxI8FKUHy/Weu3PWzkHtf8o0x8AAIBvendLkZ76aI8k6cGrRmtEcpTJieDNKD9eLCO1ly4aGi+3x2D6AwAA8A27i6u08PUcSdL88/rrO2P6mBsIXo/y4+Vapz9vbjioA0drTU4DAADgHarqG/XzF7JV43Lr7AExunvGMLMjwQdQfrzc2L7RumBI8/TnqY+Y/gAAAHg8hm5/faP2HqlRkjNET14zTnYb39bi23GW+IDW6c+/1xcov5TpDwAACGxPf7xbK7cdVpCteYODuIhgsyPBR1B+fEBmv2idPzhOTR5DT3/M9AcAAASuj3YW6+GVuyRJ/3P5SGWk9jI3EHwK5cdHLJjaPP35VxbTHwAAEJj2H63Rba9skGFI15zVV/Mm9DU7EnwM5cdHZPaL0XmDWqc/e8yOAwAA0KNqXU26/oVsVdY3aWzfXrp39gizI8EHUX58yG0t0583svN1sLzO5DQAAAA9wzAM3fXvzdpRVKW4iGA9+8NMBdttZseCD6L8+JAJaTGaNDBWjW5DT7PzGwAACBDPr83Tso2HZLda9MwPxykhKsTsSPBRlB8f07rz2+tZ+TrE9AcAAPi5dXtKtOidHZKk388aoQlpMSYngi+j/PiYswbE6uwBMWp0G3qGa38AAIAfO1hep5tf3iC3x9CV4/roR+f0MzsSfBzlxwfdNmWIJOm1r/JVWMH0BwAA+J/6RrdufDFbpTUujUyO0gNXpMtisZgdCz6O8uODzhkYq7P6x8jl9uhZpj8AAMDPGIahe/6zRZsKKhQd5tCzP8xUiIMNDnDmKD8+qnXnt1e+yldRRb3JaQAAALrOS18c0OtZBbJapCe+P06pMWFmR4KfoPz4qHMGxGpiWoxcTR49+wnTHwAA4B+y95fpvmVbJUm/unSYzhscZ3Ii+JMOl5/Vq1dr9uzZSk5OlsVi0VtvvdX2WmNjo+666y6lp6crPDxcycnJ+tGPfqRDhw61+xwNDQ265ZZbFBcXp/DwcM2ZM0cFBQVn/MUEEovF0jb9efnLAzpcyfQHAAD4tuLKet34YrYa3YZmpifp+gsGmB0JfqbD5aempkYZGRl68sknj3uttrZW69ev1+9//3utX79eb775pnbt2qU5c+a0O27BggVasmSJXn31Va1du1bV1dWaNWuW3G5357+SADRpYKzG94tm+gMAAHyeq8mjm15ar+KqBg1JiNBDV49mgwN0OYthGEan32yxaMmSJbr88stPesxXX32liRMnav/+/erbt68qKioUHx+vF154QfPmzZMkHTp0SKmpqVqxYoWmT59+3OdoaGhQQ0ND28eVlZVKTU1VRUWFoqKiOhvfL6zJPaJrn/9SwXar1vxqsnpz0y8AAOCD7vnPFv3zs/2KDLFr6c3nqX9cuNmR4CMqKyvldDpPqxt0+zU/FRUVslgs6tWrlyQpOztbjY2NmjZtWtsxycnJGjVqlNatW3fCz7Fo0SI5nc62R2pqanfH9hnnDYrTuL691NDk0XOr95odBwAAoMPeyC7QPz/bL0l6dN4Yig+6TbeWn/r6ev3617/WNddc09bCioqKFBQUpOjo6HbHJiQkqKio6ISf5+6771ZFRUXbIz8/vztj+5Tma3+a7/vz0hf7daSq4VveAQAA4D02F1ToN0s2S5IWTB2sKcMTTE4Ef9Zt5aexsVHf+9735PF49PTTT3/r8YZhnHRdZ3BwsKKioto98LULBsdpTGov1Td69JfVXPsDAAB8Q2mNSze8mC1Xk0dThvXWrRcPNjsS/Fy3lJ/GxkbNnTtXeXl5WrlyZbuykpiYKJfLpbKysnbvKS4uVkICTb8zjt357YXP96ukmukPAADwbk1uj255Zb0Oltepf1y4/jxvjKxWNjhA9+ry8tNafHJzc7Vq1SrFxsa2ez0zM1MOh0MrV65se66wsFBbtmzRpEmTujpOwLhoSLwyWqY/f+XaHwAA4OX+9N5Ofbr7qMKCbHru2kw5Qx1mR0IA6HD5qa6uVk5OjnJyciRJeXl5ysnJ0YEDB9TU1KSrr75aWVlZeumll+R2u1VUVKSioiK5XC5JktPp1Pz587Vw4UJ98MEH2rBhg374wx8qPT1dU6dO7dIvLpBYLBYtmNI8/fnnZ0x/AACA91q+6VDbRk1/ujpDQxIiTU6EQNHh8pOVlaWxY8dq7NixkqTbb79dY8eO1T333KOCggItXbpUBQUFGjNmjJKSktoex+7k9sgjj+jyyy/X3Llzde655yosLEzLli2TzWbruq8sAF00NF6jU5yqa3Trr2uY/gAAAO+zs6hKv3pjkyTp+gsHaOboJJMTIZCc0X1+zNKRvbwDzQfbD2v+P7IUFmTT2rsuVkx4kNmRAAAAJEkVdY36zpNrte9orc4bFKfFP5kgu63b77wCP+dV9/lBz7p4WG+l93Gq1sX0BwAAeA+Px9CCVzdo39Fa9ekVqie+P5bigx7HGednLBaLbm299mfdPpXVuExOBAAAID36Qa4+2nlEwXarnrs2U9GsToEJKD9+aOrw3hqZHKUal1v/t5bpDwAAMNeqbYf1+Ae5kqQHrkjXqD5OkxMhUFF+/NCx059/rNuv8lqmPwAAwBx7j1Trl6/lSJKuO6efrspMMTcQAhrlx09NG5Gg4UlRqm5o0vNr88yOAwAAAlB1Q5OufyFbVQ1NmpAWrd/NGmF2JAQ4yo+fslgsuq1l+vP3T/cx/QEAAD3KMAz96o2Nyi2uVkJUsJ76wTg52OAAJuMM9GPTRiRoWGKkqhua9DemPwAAoAc9+8lerdhcJIfNoqd/kKnekSFmRwIoP/7Mam0//amobTQ5EQAACARrco/oT+/tkCT9Yc5IZfaLNjkR0Izy4+emj0zU0IRIVTU06W+fMv0BAADdK7+0Vre8skEeQ5o3PlXXTOxrdiSgDeXHz1mtX+/89rdP81RRx/QHAAB0jzqXW9e/kK3y2kZlpDh133dGymKxmB0LaEP5CQAzRiVqSEKEquqbtPjTfWbHAQAAfsgwDP1myWZtK6xUbHiQnvlhpkIcNrNjAe1QfgKA1WrRLRc3T3+eX7tXlfVMfwAAQNf6x7p9WrLhoGxWi568ZpySe4WaHQk4DuUnQFyWnqRBvSNUWd+kfzD9AQAAXejLvFLd//Z2SdLdM4bpnIGxJicCTozyEyBsVotuuXiQJOn/1uapiukPAADoAkUV9brppWw1eQx9Z0yy5p/X3+xIwElRfgLIrNHJGhgfroq6Rv1j3T6z4wAAAB/X0OTWDS9mq6TapWGJkfrjlaPZ4ABejfITQGzH7Pz2f2vzVN3QZHIiAADgy/6wdJty8svlDHXoL9eOV2gQGxzAu1F+Asys0ckaEB+u8lqmPwAAoPNe/fKAXvnygCwW6bHvjVHf2DCzIwHfivITYNpd+7Nmr2qY/gAAgA7acKBM9/xnqyRp4SVDdNHQ3iYnAk4P5ScAzR6drP5x4SqrbdQ/P9tvdhwAAOBDjlQ16MYX18vl9mjaiATddNEgsyMBp43yE4DsNqtuntz8B9Vfmf4AAIDT1Oj26OaX16uosl4D48P18NwMWa1scADfQfkJUN8Zk6y02DCV1rj04udMfwAAwLdbtGKHvsgrVUSwXc9dO16RIQ6zIwEdQvkJUHabVb9omf78ZfVe1bqY/gAAgJP7T85B/e3TPEnSw3MzNKh3hMmJgI6j/ASwK8b2Ud+YMB2tcemlzw+YHQcAAHiprYcqdNe/N0mSbp48SNNHJpqcCOgcyk8As9usurll57fnVu9RncttciIAAOBtymtduuHFbNU3enThkHj98pIhZkcCOo3yE+CuGNtHqTGhKql26aUvuPYHAAB8ze0xdMsrG5RfWqe+MWF67HtjZGODA/gwyk+Acxyz89uzn+xl+gMAANo8/P5OrcktUYjDqmd/mKleYUFmRwLOCOUHunJcilKiQ1VS3aCXv+TaHwAAIL27pVBPf7xHkvTgVaM1IjnK5ETAmaP8QI5jdn579pM9qm9k+gMAQCDLPVylha9vlCTNP6+/vjOmj8mJgK5B+YEk6apxKerTK1RHqhr0CtMfAAACVmV9o65/IVs1LrfOHhCju2cMMzsS0GUoP5AkBdmtumnyQElMfwAACFQej6GFr2/U3pIaJTlD9OQ142S38e0i/AdnM9p8NzNVyc4QHa5s0Gtf5ZsdBwAA9LCnPtqtldsOK8jevMFBXESw2ZGALkX5QZsgu1U3tlz788zHTH8AAAgkH+0s1p9X7ZIk3f+dUcpI7WVuIKAbUH7QztzxKUpyhqiosl6vZzH9AQAgEOwrqdFtr2yQYUg/OKuv5k5INTsS0C0oP2gn2G7TTRc1X/vzzMd71NDE9AcAAH9W62rSDS9mq7K+SWP79tI9s0eYHQnoNpQfHGfuhFQlRoWosKJer2cVmB0HAAB0E8Mw9Ks3NmlHUZXiIoL17A8zFWy3mR0L6DaUHxwn2G7Tja3Tn492M/0BAMBP/d+aPC3fVCi71aJnfjhOCVEhZkcCuhXlByc0b0KqEqKCdaiiXm9kM/0BAMDfrNtdokXvbJck/X7WCE1IizE5EdD9KD84oRCHTTdc2Dz9efqjPXI1eUxOBAAAusrB8jrd/MoGeQzpynF99KNz+pkdCegRlB+c1Pcn9lV8ZLAOltfp3+uZ/gAA4A/qG9268cVslda4NKpPlB64Il0Wi8XsWECPoPzgpI6d/jz10W41upn+AADgywzD0O/f2qJNBRWKDnPo2R9mKsTBBgcIHJQfnNIPzuqruIhgFZTV6U2mPwAA+LQXvzigf2UXyGqRnvj+OKVEh5kdCehRlB+cUvP0Z4Ak6YkPmf4AAOCrsveX6r+XbZUk/erSYTpvcJzJiYCeR/nBt/rBWf0UFxGkgrI6LVl/0Ow4AACgg4or63XDi+vV6DY0Mz1J118wwOxIgCkoP/hWoUE2XX9B87U/T3LtDwAAPsXV5NGNL63XkaoGDUmI0ENXj2aDAwQsyg9Oyw/O7qvY8CAdKK3VWxuY/gAA4Cvuf3ubsveXKTLErueuHa/wYLvZkQDTUH5wWsKC7Pp5y4j8yY92q4npDwAAXu+N7AL987P9kqRH541R/7hwkxMB5qL84LRde04/xYQHaf/RWv0n55DZcQAAwClsLqjQb5ZsliQtmDpYU4YnmJwIMB/lB6ctLMiun53P9AcAAG93tLpBN7yYLVeTR1OH99atFw82OxLgFSg/6JAfndNP0WEO5ZXUaNkmpj8AAHibJrdHt7yyQQfL69Q/Llx/njdGVisbHAAS5QcdFB5s109bpj9PfLBbbo9hciIAAHCsh97bqXV7jiosyKbnrs1UVIjD7EiA16D8oMOum5SmXmEO7S2p0bKNTH8AAPAWyzYe0l9W75Uk/enqDA1JiDQ5EeBdKD/osIjgr6/9efzDXKY/AAB4gR1FlfrVG5skSddfOEAzRyeZnAjwPpQfdMqPzuknZ6hDe4/UaDnX/gAAYKqyGpeufyFbdY1unTcoTndOG2p2JMArUX7QKZEhDv30vP6SpCc+5NofAADM8vHOYl362GrtP1qrPr1C9cT3x8pu41s84ET4nYFOu+7cNEWF2LW7uForNheaHQcAgIBS09Ck3yzZrB///SsdrmzQgPhw/f0nExQdHmR2NMBrUX7QaVEhDs0/r2Xntw9z5WH6AwBAj/gyr1SXPrZaL39xQJL0k3PTtOLW89ngAPgWlB+ckR+fm6bIELt2Ha7WO1uKzI4DAIBfq29064EV2zXvL58pv7ROfXqF6uWfnaV7Z49UiMNmdjzA61F+cEacoQ7917nN1/48/gHTHwAAusvmggrNfmKt/rJ6rwxDmjs+Re8uOF+TBsaZHQ3wGZQfnLH/Ore/IoPt2nm4Su9tZfoDAEBXanR79OiqXbri6U+VW1ytuIhg/d+PxuuhqzMUyQ1MgQ6h/OCMOcMc+sm5aZKkx5j+AADQZXIPV+mqZ9bp0VW5avIYmjk6SSt/eYGmjkgwOxrgkyg/6BL/dV5/RQTbtaOoSu9vY/oDAMCZ8HgM/d+avZr5xFptKqiQM9Shx78/Vk9dM47d3IAzQPlBl+gVFnTM9Gc30x8AADopv7RW3/vr57r/7e1yNXl00dB4vf/LCzQnI9nsaIDPo/ygy8xvmf5sL6zUyu2HzY4DAIBPMQxDr3x5QJc+ulpf5pUqPMimRVem6+8/nqCEqBCz4wF+ocPlZ/Xq1Zo9e7aSk5NlsVj01ltvtXv9zTff1PTp0xUXFyeLxaKcnJzjPkdDQ4NuueUWxcXFKTw8XHPmzFFBQUFnvwZ4iV5hQbpuUj9JzTu/GQbTHwAATsfhynr9ZPFXuvvNzapxuTWxf4zeue0CfX9iX1ksFrPjAX6jw+WnpqZGGRkZevLJJ0/6+rnnnqs//vGPJ/0cCxYs0JIlS/Tqq69q7dq1qq6u1qxZs+R2uzsaB17mp+cNUHiQTVsPVWrV9mKz4wAA4PWWbjykaY+s1sc7jyjIbtXvZg7Xqz87W31jw8yOBvgde0ffMGPGDM2YMeOkr1977bWSpH379p3w9YqKCj3//PN64YUXNHXqVEnSiy++qNTUVK1atUrTp0/vaCR4kejwIP1oUpqe+XiPHvtgl6YO781PrAAAOIGyGpd+958tentToSQpvY9Tf56bocEJkSYnA/xXj1/zk52drcbGRk2bNq3tueTkZI0aNUrr1q074XsaGhpUWVnZ7gHv9bPzBygsyKYtByv14Q6mPwAAfNOHOw5r2qOr9famQtmtFi2YOlhv3jSJ4gN0sx4vP0VFRQoKClJ0dHS75xMSElRUdOItkhctWiSn09n2SE1N7Ymo6KSY8CBde07ztT+Pce0PAABtquobddcbm/Rfi7N0pKpBg3tHaMlN52rB1CFy2NiHCuhuXvO7zDCMky6Puvvuu1VRUdH2yM/P7+F06Kifnz9AoQ6bNhVU6KOdTH8AAPhsz1Fd+ugavZaVL4tF+tn5/bXslvOUnuI0OxoQMHq8/CQmJsrlcqmsrKzd88XFxUpIOPHdioODgxUVFdXuAe8WGxH89fRnFdMfAEDgqm90675lW/X9v36ug+V1So0J1as/O1u/nTlCIQ6b2fGAgNLj5SczM1MOh0MrV65se66wsFBbtmzRpEmTejoOutHPLxigEIdVGwsq9PGuI2bHAQCgx+Xkl+uyx9fo75/ukyR9f2JfvXPbBTprQKy5wYAA1eHd3qqrq7V79+62j/Py8pSTk6OYmBj17dtXpaWlOnDggA4dOiRJ2rlzp6TmiU9iYqKcTqfmz5+vhQsXKjY2VjExMbrjjjuUnp7etvsb/ENcRLCuPbuf/romT4+tytVFQ+LZ+Q0AEBBcTR498WGunv54j9weQ70jg/Xg1aM1eWhvs6MBAa3Dk5+srCyNHTtWY8eOlSTdfvvtGjt2rO655x5J0tKlSzV27FjNnDlTkvS9731PY8eO1bPPPtv2OR555BFdfvnlmjt3rs4991yFhYVp2bJlstkY/fqbn18wUCEOq3Lyy7U6t8TsOAAAdLudRVW64ulP9cSHu+X2GJqTkaz3f3kBxQfwAhbDBy/GqKyslNPpVEVFBdf/+ID/Wb5Nz6/N07i+vfTvGycx/QEA+CW3x9Bf1+zVn9/fJZfbo+gwh+6/PF0zRyeZHQ3wax3pBl6z2xv81/UXDlCw3ar1B8q1djfTHwCA/9l/tEbznvtMf3xnh1xuj6YM6633fnkBxQfwMpQfdLvekSG65qy+ktj5DQDgXwzD0Auf79elj65R1v4yRQTb9dBVo/V/141X78gQs+MB+AbKD3rEDRcOVJDdqqz9ZVq356jZcQAAOGOFFXX60d++1O/f2qK6RrfOHhCjdxecr7kTUlniDXgpyg96REJUiK6Z2Dz9eXTVLqY/AACfZRiGlmwo0LRHVmtNbomC7VbdO3uEXv7p2UqJDjM7HoBToPygx9xw4UAF2az6al+ZPmP6AwDwQUerG3Tji+v1y9c2qqq+SRmpvbTitvP1k3P7y2pl2gN4O8oPekyiM0Tfn5gqSXr0g1yT0wAA0DHvby3S9EdX692tRbJbLbpj2hD9+4ZzNDA+wuxoAE4T5Qc96oaLmqc/X+aVMv0BAPiEyvpGLXx9o37+QrZKql0amhCp/9x8rm6+eLDsNr6VAnwJv2PRo5KcoZo3oXn689gHu0xOAwDAqX26u0SXPrJa/15fIKuleQn30lvO1chkp9nRAHSC3ewACDw3XjRQr351QJ/vLdUXe4/qrAGxZkcCAKCdOpdbf3xnu/7x2X5JUr/YMP15boYy+8WYnAzAmWDygx6X3CtUc8c3T39+/58t2nKwwuREAAB8LXt/mS57fE1b8bn27H5657bzKT6AH2DyA1P8YvIgLd9UqF2HqzX7ybWaNz5Vd0wfqriIYLOjAQACVEOTW4+tytWzn+yRx5ASo0L0p++O1vmD482OBqCLWAwfvOFKZWWlnE6nKioqFBUVZXYcdNKh8jr98Z0dWrrxkCQpMtiuW6cM1nWT0hRkZygJAOg52w5V6vbXc7SjqEqSdOXYPrp3zkg5Qx0mJwPwbTrSDSg/MN1X+0p137Kt2nKwUpI0IC5cv581QpOH9TY5GQDA3zW5PXpu9V49umqXGt2GYsOD9L9XpOvSUYlmRwNwmig/8Dluj6E3svP1p/d2qqTaJUm6aGi8fjdzhAb15v4JAICut/dItRb+a6M2HCiXJE0bkaAHrkxnCTbgYyg/8FmV9Y168sPd+vuneWp0G7JbLfrxpDTdOnWwokJYegAAOHMej6F/frZPf3x3h+obPYoMtusPc0bqynF9ZLFYzI4HoIMoP/B5e49U6/63t+vDHcWSpNjwIN05fai+Oz5VNit/MQEAOudgeZ3u/NdGrWu50fZ5g+L00NWjldwr1ORkADqL8gO/8dHOYv3P8m3ae6RGkjQyOUr3zh6pif3ZbhQAcPoMw9Ab2QX672XbVNXQpFCHTb+5bJh+cFY/WfmhGuDTKD/wK41uj/6xbp8e+yBXVfVNkqTZGcn69Yxh6sNP6gAA3+JIVYPufnOzVm0/LEka17eXHp47Rv3jwk1OBqArUH7gl0qqG/Tw+7v06lcHZBhSiMOqGy4cqOsvGKjQIJvZ8QAAXuidzYX67VtbVFrjUpDNql9eMkQ/v2AAS6gBP0L5gV/bcrBC/71sm77cVypJ6tMrVHdfNkwz05O4UBUAIEmqqG3UvUu36K2c5nvJDU+K0iPzMjQske8bAH9D+YHfMwxDb28u1ANvb9ehinpJ0sS0GN0ze4RG9XGanA4AYKZPdh3RXW9sUlFlvawW6aaLBunWKYO5gTbgpyg/CBh1LreeW71Hz36yR/WNHlks0vcmpOqOaUMVy30aACCg1DQ06YEV2/XSFwckNd80++G5GRrbN9rkZAC6E+UHAedgeZ0Wrdiu5ZsKJUmRIXbdNmWwrpuUJoeNn/QBgL/7al+pFr6+UQdKayVJP56UprsuHcY1oUAAoPwgYH2ZV6r7lm3V1kOVkqSB8eH6/awRumhob5OTAQC6Q32jW4+s3KW/rNkrw2i+DvRPV4/WpEFxZkcD0EMoPwhobo+hf2Xl60/v7dTRGpck6eJhvfW7mcM1ID7C5HQAgK6y5WCFbn89R7sOV0uSvpuZot/PHqGoEIfJyQD0JMoPIKmirlFPfJCrxev2qcljyGGz6Cfn9tfNFw/iL0YA8GFNbo+e/niPHv8gV00eQ3ERwVp0ZbouGZFgdjQAJqD8AMfYc6Ra/7N8mz7eeUSSFBcRpDunD9V3M1O5qzcA+JjdxVVa+PpGbSyokCRdlp6o+y9PV0x4kMnJAJiF8gOcwEc7ivU/y7dpb0mNJCm9j1P3zh6h8WkxJicDAHwbj8fQ39ft00Pv7lBDk0dRIXb9z+WjNCcjmXu8AQGO8gOchKvJo3+s26fHP8hVVUOTJGlORrLuvmyYkpyhJqcDAJxIfmmt7vjXRn2R13xz6wuHxOvBq0Yr0RlicjIA3oDyA3yLI1UNevj9nXotK1+GIYU6bLrxooH6+QUDFOJgW1QA8AaGYej1rHz997JtqnG5FRZk029nDtc1E/sy7QHQhvIDnKbNBRW6b9lWZe0vk9S8RepvZw7XjFGJ/MUKACYqrqzXr9/crA93FEuSJqRF6/99N0P9YsNNTgbA21B+gA4wDEPLNhVq0YrtKqyolySdPSBG98waqRHJnF8A0NOWbzqk3721ReW1jQqyWXXH9CGaf94A2dikBsAJUH6ATqh1NenZT/bquU/2qKHJI6tF+v7Evlo4bSi7CAFAN6t1NemdzUV6PSu/7dqeUX2i9Oe5YzQkIdLkdAC8GeUHOAMFZbVa9M4Ovb2pUJIUFWLXgqlDdO05/eSwWU1OBwD+wzAMfZlXqjeyC7Ric6FqXG5Jks1q0S8mD9ItFw/iz10A34ryA3SBz/ce1X3Ltml7YaUkaVDvCN0za4QuGBJvcjIA8G0FZbX6d/ZB/Xt9gQ6U1rY93y82TFePS9GVmSnq04sdOAGcHsoP0EXcHkOvfZWv//f+TpXWuCRJU4f31m9njlD/OC66BYDTVetq0rtbivRGdoHW7Tna9nxEsF0z05N09fgUje8XzWYzADqM8gN0sYraRj32Qa7++dk+NXkMOWwW/dd5/XXz5EGKDHGYHQ8AvJJhGPpqX5neyM7X25u+XtZmsUiTBsbq6swUTR+ZqLAgu8lJAfgyyg/QTXYXV+m+Zdu0JrdEkhQXEaxfXTpUV49LkZVdiABAUvOytjfXNy9r23/0+GVtV4zro5ToMBMTAvAnlB+gGxmGoQ93FOt/lm/Tvpa/1EenOHXv7JHK7BdtcjoAMMfJlrWFB9k0a3Qyy9oAdBvKD9ADGprcWvzpPj3x4W5VNzRJkq4Y20d3XTpMic4Qk9MBQPczDENZ+8v0RlaB3t5c2PZnofT1srZLR7GsDUD3ovwAPai4ql7/772d+ld2gQxDCnXY9IvJA/XT8wcoxGEzOx4AdLmD5XV6M7tAb3xjWVvfmDBdnZmiK1nWBqAHUX4AE2wqKNd9y7Ype3+ZJCklOlS/mzlc00cmsswDgM+rc7n17tbCtmVtrd89hAfZNHN0kq7OTNWENJa1Aeh5lB/AJIZhaOnGQ1q0YoeKKuslSecMiNU9s0doeBLnKgDfYhiGsveX6V8sawPgxSg/gMlqXU165uM9em71XrmaPLJapB+c1U+3XzJE0eFBZscDgFM6WF6nJesL9EZ2QdvGLpKUGhOqq8el6spxfZQaw7I2AN6B8gN4ifzSWi16Z7tWbC6SJDlDHfrl1MH6wdn95LBZTU4HAF+rc7n13tbm3do+3VPStqwtLMjWfBPSzBRNSIthW38AXofyA3iZdXtK9N/LtmlHUZUkaXDvCN07e6TOGxxncjIAgax1Wdsb2QVavqn9srZzBny9rC08mGVtALwX5QfwQk1uj179Kl8Pv79TZbWNkqRLRiTodzOHq19suMnpAASSQ+V1enN9gf69/qDySmranmdZGwBfRPkBvFh5rUuPrsrVC5/vl9tjKMhm1fzz++sXkwcpgp+uAugmdS633t/WvKxt7e72y9oua1nWNpFlbQB8EOUH8AG5h6v038u3aU1uiSSpd2Sw7rp0mK4Y24dvPgB0CcMwtP5Ay7K2jYWqOmZZ29kDYnR1ZqpmsKwNgI+j/AA+wjAMrdperPvf3tZ2o8CM1F76w+wRGts32uR0AHzVofI6LdlwUG9kF7Rb1pYSHaqrM1N01bgUlrUB8BuUH8DHNDS59fdP9+mJD3JV43JLkq4c20d3zRimhKgQk9MB8AX1jV/v1sayNgCBhPID+Kjiyno99N5OvZFdIKn5m5ZfTB6k+ef1V4jDZnI6AN6meVlbecuytkPtlrWd1T9GV2em6LL0JJa1AfBrlB/Ax23ML9cflm3VhgPlkpp3YPrtZSM0fWSCLBZ+agsEusKKOr25/qD+nV2gvd9Y1nbVuOZlbX1jWdYGIDBQfgA/4PEY+s/Gg/rjOzt0uLJBknTuoFjdM2ukhiZGmpwOQE872bK2UMfXy9rO6s+yNgCBh/ID+JGahiY9/fFu/XVNnlxNHtmsFl01ro8uGBKvCWkxXBME+DHDMLQhv1z/yirQ8k2HVFV//LK2GelJbJMPIKBRfgA/dOBorR5YsV3vbi1q93xKdKgmpMUos1+0JqTFaHDvCH7yC/i4oop6vbmhQG9kF2jvka+XtfXpFaqrMlN01bg+3BwZAFpQfgA/9vneo3pnc6G+2lemHUWV8nzjd3BUiF2Z/aI1Pi1G4/tFKyO1F5slAD6gvtGt97cdbl7Wlnuk7fd2qMOmGemJujozRWf3j+WHGwDwDZQfIEBU1Tdqw4FyZe0rVdb+Mm04UK66Rne7Yxw2i0b1cbZNh8b3i1ZsRLBJiQEcq3VZ2xvZBVq2sf2ytonH7NbGsjYAODnKDxCgGt0ebS+s1Ff7ypS9v1Rf7SvTkaqG444bEBeu8WlfT4f6x4WzixzQg1jWBgBdh/IDQFLzT5XzS+v0VctkKGtfqXKLq487LjY8qO2aocy0aI1KdirIbjUhMeC/6hvdWtmyrG3NMcvaQhxWXTaqebe2swewrA0AOoryA+Ckymtdyt5f1jYd2lhQIVeTp90xwXarMlJ7aUJatMb3i9G4ftFyhjpMSgz4LsMwlHPMsrbKY5e1pbXu1paoyBB+fwFAZ1F+AJy2hia3thys0Ff7ypTVUojKahvbHWOxSEMTIr+eDvWLVkp0KEvlgG+ob3Rr75Ea5RZXadfhKr27pUh7vrmsbVwfXTkuRWlxLGsDgK5A+QHQaYZhaM+RmrZNFLL2lWrf0drjjkuMClFmWrQmtOwsNywxUnYbS+UQGGpdTdpT3FxycourlXu4WruLq3SgtPa4HRhZ1gYA3atby8/q1av1pz/9SdnZ2SosLNSSJUt0+eWXt71uGIbuu+8+/eUvf1FZWZnOOussPfXUUxo5cmTbMQ0NDbrjjjv0yiuvqK6uTlOmTNHTTz+tlJSULv8CAZy5I1UNbRsoZO0v09aDFWr6xnd44UE2je0brfFpzdOhMam9FM4OVfBxlfWN2l1crd2Hq9sVnYPldSd9jzPUocG9IzQ4IUJjU6NZ1gYA3awj3aDD35nU1NQoIyNDP/nJT3TVVVcd9/pDDz2kP//5z1q8eLGGDBmi+++/X5dccol27typyMhISdKCBQu0bNkyvfrqq4qNjdXChQs1a9YsZWdny2bjfiSAt4mPDNalo5J06agkSVKdy62c/K+32F6/v0xVDU1au7tEa3eXSJJsVotGJEW1LZUbnxathKgQM78M4KTKalzNxaa4qrnstJScosr6k74nLiJIg3pHaHDvSA1OiNCg+AgNSohQfEQwS0IBwEud0bI3i8XSbvJjGIaSk5O1YMEC3XXXXZKapzwJCQl68MEHdf3116uiokLx8fF64YUXNG/ePEnSoUOHlJqaqhUrVmj69Onf+u9l8gN4F7fH0K7DVcraV9qykULZCX8ynhoTqvH9movQ+H4xGtw7giVA6DGGYaik2tVWcHIPf112SqpdJ31fYlSIBvWOaC46Cc1lZ1DvCMWEB/VgegDAyXTr5OdU8vLyVFRUpGnTprU9FxwcrAsvvFDr1q3T9ddfr+zsbDU2NrY7Jjk5WaNGjdK6detOWH4aGhrU0PD1vUoqKyu7MjaAM2SzWjQ8KUrDk6J07TlpkqRD5XVt1wxl7SvTjqJK5ZfWKb/0oJZsOChJigqxa3zLBgoT0mI0OsWpEAfTX5wZwzBUVFnfUm6ar8Vp/XVFXeNJ39enV2hLuWkpOAnNhSeKJWsA4De6tPwUFRVJkhISEto9n5CQoP3797cdExQUpOjo6OOOaX3/Ny1atEj33XdfV0YF0M2Se4VqTq9QzclIliRV1Tdqw4Gvl8ptOFCuyvomfbijWB/uKJYkOWwWpfdxtt18NbNftGIjgs38MuDFPB5DB8vrmq/FOdyyVK1lyVp1Q9MJ32OxSP1iwjSoZalaa9EZEB/ONWoAEAC65U/6b651NgzjW9c/n+qYu+++W7fffnvbx5WVlUpNTT3zoAB6TGSIQxcMidcFQ+IlSY1uj7YXVrbdb+irfWU6UtWg9QfKtf5Auf7S8r4B8eEa37Kj3IS0GKXFhnE9RYBpcnuUX1an3MNVbeWmdblafaPnhO+xWy1KiwvX4JblaoOOKTlMFwEgcHVp+UlMTJTUPN1JSkpqe764uLhtGpSYmCiXy6WysrJ205/i4mJNmjTphJ83ODhYwcH89BfwJw6bVaNTeml0Si/NP6+/DMPQgdJaZe0rU9b+5qVyucXV2nukRnuP1Oj1rAJJUmx4UNs1Q+PTojUy2akgO1ts+wNXk0f7j9a07ajWWnD2ltQcdyPeVkE2qwbEh7fbeGBw7wj1iw3nvAAAHKdLy0///v2VmJiolStXauzYsZIkl8ulTz75RA8++KAkKTMzUw6HQytXrtTcuXMlSYWFhdqyZYseeuihrowDwIdYLBb1iw1Xv9hwXZXZvO19WY1L6w+UtU2HNuZX6GiNS+9tPaz3th6WJAXbrRqT2qu5EKXFaFzfaDlDuUbDmx17I9C2ndWKq7WvpOa4LdRbhTisbQWn+Z/N05y+MWHcXwoAcNo6XH6qq6u1e/futo/z8vKUk5OjmJgY9e3bVwsWLNADDzygwYMHa/DgwXrggQcUFhama665RpLkdDo1f/58LVy4ULGxsYqJidEdd9yh9PR0TZ06teu+MgA+Lzo8SFOGJ2jK8ObJcUOTW1sOVjTfb6ilEJXVNuqLvFJ9kVcqaY8sFmloQmS7Lbb79AplqZwJOnIj0FYRwfZ25aZ1d7U+vULZGRAAcMY6vNX1xx9/rMmTJx/3/HXXXafFixe33eT0ueeea3eT01GjRrUdW19frzvvvFMvv/xyu5ucnu51PGx1DUBqvlZwz5GaY7bYLtW+o7XHHZcYFaLxadFtN1512Kxy2CyyW5v/2fyxVXabpe1ju9WqIHvzMXabRUE2q+w26zGvW2SzWihVOvMbgQ7qHdn268SoEP6bAgA6pCPd4Izu82MWyg+Akymuqtf6/c1L5bL2l2nrwYqTLqU6UxaL5GgpUO2Kkc3S8nxroTqmbNmtclgtx79mszYXLGvrr1s/Z+t7LS3vtcphb1/c7LaWz2lvfv+xZS7o2H+Ptf2/s6Ml49gbgeYertaeI524EWjLr+Migig5AIAuYdp9fgDAbL0jQ3TpqCRdOqp505U6l1s5+c1bbO8oqlJDk1uNbkONbo+a3IYaPZ6vf+32qNFtqMntkcttqMnT/LzL3XzMN39UZBiSy+2Ryy1J7h7/Ws9Uc9FqKWrtilP74mWxWHSwrPZbbwTaWm6OvTaHG4ECALwJ5QeAXwsNsumcgbE6Z2DsGX8ut6elNHkMNTZ5WopTc1k6rlA1NR/nanmuuVAdU7I8re870Xubi9exZaztGE9rSTvxa64mT7vS1nTMa9/U5DHU5DFUL4/UcIIv+AS4ESgAwJdRfgDgNNmsFtmsLfeI8bHd9w3DOKY4Gd+YdnnavXZsUWtqKXjJzlBuBAoA8Hn8LQYAAcBi+XozBwAAAhV/CwIAAAAICJQfAAAAAAGB8gMAAAAgIFB+AAAAAAQEyg8AAACAgED5AQAAABAQKD8AAAAAAgLlBwAAAEBAoPwAAAAACAiUHwAAAAABgfIDAAAAICBQfgAAAAAEBMoPAAAAgIBA+QEAAAAQECg/AAAAAAIC5QcAAABAQKD8AAAAAAgIdrMDdIZhGJKkyspKk5MAAAAAMFNrJ2jtCKfik+WnqqpKkpSammpyEgAAAADeoKqqSk6n85THWIzTqUhexuPx6NChQ4qMjJTFYjE7jiorK5Wamqr8/HxFRUWZHQd+jvMNPY1zDj2J8w09jXPO9xmGoaqqKiUnJ8tqPfVVPT45+bFarUpJSTE7xnGioqL4TYMew/mGnsY5h57E+Yaexjnn275t4tOKDQ8AAAAABATKDwAAAICAQPnpAsHBwbr33nsVHBxsdhQEAM439DTOOfQkzjf0NM65wOKTGx4AAAAAQEcx+QEAAAAQECg/AAAAAAIC5QcAAABAQKD8AAAAAAgIlB8AAAAAAYHycxqefvpp9e/fXyEhIcrMzNSaNWtOefwnn3yizMxMhYSEaMCAAXr22Wd7KCn8RUfOuTfffFOXXHKJ4uPjFRUVpXPOOUfvvfdeD6aFP+jon3OtPv30U9ntdo0ZM6Z7A8KvdPR8a2ho0G9/+1v169dPwcHBGjhwoP72t7/1UFr4g46ecy+99JIyMjIUFhampKQk/eQnP9HRo0d7KC26lYFTevXVVw2Hw2H89a9/NbZt22bcdtttRnh4uLF///4THr93714jLCzMuO2224xt27YZf/3rXw2Hw2G88cYbPZwcvqqj59xtt91mPPjgg8aXX35p7Nq1y7j77rsNh8NhrF+/voeTw1d19JxrVV5ebgwYMMCYNm2akZGR0TNh4fM6c77NmTPHOOuss4yVK1caeXl5xhdffGF8+umnPZgavqyj59yaNWsMq9VqPPbYY8bevXuNNWvWGCNHjjQuv/zyHk6O7kD5+RYTJ040brjhhnbPDRs2zPj1r399wuN/9atfGcOGDWv33PXXX2+cffbZ3ZYR/qWj59yJjBgxwrjvvvu6Ohr8VGfPuXnz5hm/+93vjHvvvZfyg9PW0fPtnXfeMZxOp3H06NGeiAc/1NFz7k9/+pMxYMCAds89/vjjRkpKSrdlRM9h2dspuFwuZWdna9q0ae2enzZtmtatW3fC93z22WfHHT99+nRlZWWpsbGx27LCP3TmnPsmj8ejqqoqxcTEdEdE+JnOnnN///vftWfPHt17773dHRF+pDPn29KlSzV+/Hg99NBD6tOnj4YMGaI77rhDdXV1PREZPq4z59ykSZNUUFCgFStWyDAMHT58WG+88YZmzpzZE5HRzexmB/BmJSUlcrvdSkhIaPd8QkKCioqKTvieoqKiEx7f1NSkkpISJSUldVte+L7OnHPf9PDDD6umpkZz587tjojwM50553Jzc/XrX/9aa9askd3OXyM4fZ053/bu3au1a9cqJCRES5YsUUlJiW666SaVlpZy3Q++VWfOuUmTJumll17SvHnzVF9fr6amJs2ZM0dPPPFET0RGN2PycxosFku7jw3DOO65bzv+RM8DJ9PRc67VK6+8oj/84Q967bXX1Lt37+6KBz90uuec2+3WNddco/vuu09DhgzpqXjwMx35M87j8chiseill17SxIkTddlll+nPf/6zFi9ezPQHp60j59y2bdt066236p577lF2drbeffdd5eXl6YYbbuiJqOhm/MjuFOLi4mSz2Y77yUBxcfFxP0FolZiYeMLj7Xa7YmNjuy0r/ENnzrlWr732mubPn69//etfmjp1anfGhB/p6DlXVVWlrKwsbdiwQTfffLOk5m9ODcOQ3W7X+++/r4svvrhHssP3dObPuKSkJPXp00dOp7PtueHDh8swDBUUFGjw4MHdmhm+rTPn3KJFi3TuuefqzjvvlCSNHj1a4eHhOv/883X//fezisfHMfk5haCgIGVmZmrlypXtnl+5cqUmTZp0wvecc845xx3//vvva/z48XI4HN2WFf6hM+ec1Dzx+fGPf6yXX36ZNcnokI6ec1FRUdq8ebNycnLaHjfccIOGDh2qnJwcnXXWWT0VHT6oM3/GnXvuuTp06JCqq6vbntu1a5esVqtSUlK6NS98X2fOudraWlmt7b9Fttlskr5ezQMfZtZOC76idXvE559/3ti2bZuxYMECIzw83Ni3b59hGIbx61//2rj22mvbjm/d6vqXv/ylsW3bNuP5559nq2t0SEfPuZdfftmw2+3GU089ZRQWFrY9ysvLzfoS4GM6es59E7u9oSM6er5VVVUZKSkpxtVXX21s3brV+OSTT4zBgwcbP/3pT836EuBjOnrO/f3vfzfsdrvx9NNPG3v27DHWrl1rjB8/3pg4caJZXwK6EOXnNDz11FNGv379jKCgIGPcuHHGJ5980vbaddddZ1x44YXtjv/444+NsWPHGkFBQUZaWprxzDPP9HBi+LqOnHMXXnihIem4x3XXXdfzweGzOvrn3LEoP+iojp5v27dvN6ZOnWqEhoYaKSkpxu23327U1tb2cGr4so6ec48//rgxYsQIIzQ01EhKSjJ+8IMfGAUFBT2cGt3BYhjM7wAAAAD4P675AQAAABAQKD8AAAAAAgLlBwAAAEBAoPwAAAAACAiUHwAAAAABgfIDAAAAICBQfgAAAAAEBMoPAAAAgIBA+QEAAAAQECg/AAAAAAIC5QcAAABAQPj/UN04f44T4pAAAAAASUVORK5CYII=", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "alphas = np.arange(0,1.1,0.1)\n", "perplexities = [perplexity(InterpolatedLM(bigram,unigram,alpha),oov_test) \n", " for alpha in alphas]\n", "plt.plot(alphas,perplexities)" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Backoff \n", "* When we have counts for an event, trust these counts and not the simpler model\n", " * use $\\prob(\\text{bigly}|\\text{win})$ if you have seen $(\\text{win, bigly})$, not $\\prob(\\text{bigly})$\n", "* **back-off** only when no counts for a given event are available." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "### Stupid Backoff\n", "Let \\\\(w\\\\) be a word and \\\\(h_{m}\\\\) an n-gram of length \\\\(m\\\\): \n", "\n", "$$\n", "\\prob_{\\mbox{Stupid}}(w|h_{m}) = \n", "\\begin{cases}\n", "\\frac{\\counts{\\train}{h_{m},w}}{\\counts{\\train}{h_{m}}} &= \\mbox{if }\\counts{\\train}{h_{m},w} > 0 \\\\\\\\\n", "\\prob_{\\mbox{Stupid}}(w|h_{m-1}) & \\mbox{otherwise}\n", "\\end{cases}\n", "$$" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "skip" } }, "source": [ "What is the problem with this model?" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "outputs": [ { "data": { "text/plain": [ "1.0684727180010114" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "stupid = StupidBackoff(bigram, unigram, 0.1)\n", "sum([stupid.probability(word, 'the') for word in stupid.vocab])" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "source": [ "Discuss with your neighbour and enter your answer here:\n", "## [tinyurl.com/diku-nlp-backoff](https://tinyurl.com/diku-nlp-backoff)\n", "([Responses](https://docs.google.com/forms/d/1UMmtDqpzf7pXxWqqsgK7pH3sA5EVXXWzfRaXHe9LICM/edit#responses))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Solution\n", "\n", "The score is not a probability distribution (probabilities do not sum to 1). Sampling thus requires further normalisation." ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "run_control": { "frozen": false, "read_only": false }, "slideshow": { "slide_type": "skip" } }, "source": [ "### Exercise\n", "\n", "How can we check whether a language model provides a valid probability distribution? Solve [Task 2](../exercises/language_models.ipynb)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Absolute Discounting\n", "Recall that in test data, a constant probability mass is taken away for each non-zero count event. Can this be captured in a smoothing algorithm?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Yes: subtract (tunable) constant $d$ from each non-zero probability:\n", "\n", "$$\n", "\\prob_{\\mbox{Absolute}}(w|h_{m}) = \n", "\\begin{cases}\n", "\\frac{\\counts{\\train}{h_{m},w}-d}{\\counts{\\train}{h_{m}}} &= \\mbox{if }\\counts{\\train}{h_{m},w} > 0 \\\\\\\\\n", "\\alpha(h_{m-1})\\cdot\\prob_{\\mbox{Absolute}}(w|h_{m-1}) & \\mbox{otherwise}\n", "\\end{cases}\n", "$$\n", "\n", "$\\alpha(h_{m-1})$ is a normaliser" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Unigram Backoff\n", "\n", "Assume, for example:\n", "* *Mos Def* is a rapper name that appears often in the data\n", "* *glasses* appears slightly less often\n", "* neither *Def* nor *glasses* have been seen in the context of the word *reading*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "Then the final-backoff unigram model might assign a higher probability to\n", "\n", "> I can't see without my reading Def\n", "\n", "than\n", "\n", "> I can't see without my reading glasses\n", "\n", "because $\\prob(\\text{Def}) > \\prob(\\text{glasses})$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "But *Def* never follows anything but *Mos*, and we can determine this by looking at the training data!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Kneser Ney Smoothing\n", "\n", "Absolute Discounting, but as final backoff probability, use the probability that a word appears after (any) word in the training set: \n", "\n", "$$\n", "\\prob_{\\mbox{KN}}(w) = \\frac{\\left|\\{w_{-1}:\\counts{\\train}{w_{-1},w}> 1\\} \\right|}\n", "{\\sum_{w'}\\left|\\{w_{-1}:\\counts{\\train}{w_{-1},w'}\\} > 1 \\right|} \n", "$$\n", "\n", "This is the *continuation probability*" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Modified Kneser Ney Smoothing\n", "\n", "Rather than using a single discount $d$:\n", " use three different discounts $d_1$, $d_2$, $d_3$\n", " for 1-grams, 2-grams and n-grams with count 3 or more\n", "\n", "See [Chen and Goodman 1998, p. 19](https://dash.harvard.edu/bitstream/handle/1/25104739/tr-10-98.pdf?sequence=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "### Interpolation vs. Backoff\n", "\n", "* Both combine information from higher- and lower-order models, e.g. 2-gram and 1-gram\n", "* Both use lower-order models to determine probability of n-grams with zero counts\n", "* Difference: \n", " * Interpolated models use lower-order models also for n-grams with **non-zero** counts\n", " * Backoff models only do it for n-grams with **zero** counts" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Summary\n", "\n", "* LMs model probability of sequences of words \n", "* Defined in terms of \"next-word\" distributions conditioned on history\n", "* N-gram models truncate history representation\n", "* Often trained by maximising log-likelihood of training data and ...\n", "* smoothing to deal with sparsity" ] }, { "cell_type": "markdown", "metadata": { "hideCode": false, "hidePrompt": false, "slideshow": { "slide_type": "slide" } }, "source": [ "## Background Reading\n", "\n", "* Jurafsky & Martin, [Speech and Language Processing (Third Edition)](https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf): Chapter 3, N-Gram Language Models.\n", "* Bill MacCartney, Stanford NLP Lunch Tutorial: [Smoothing](http://nlp.stanford.edu/~wcmac/papers/20050421-smoothing-tutorial.pdf)\n", "* Chen, Stanley F. and Joshua Goodman. 1998. [An Empirical Study of Smoothing Techniques for Language Modeling.](https://dash.harvard.edu/bitstream/handle/1/25104739/tr-10-98.pdf?sequence=1) Harvard Computer Science Group Technical Report TR-10-98.\n", "* [Sampling from language models](https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277), a practical guide\n", "* [Interpretation of Perplexity](https://towardsdatascience.com/perplexity-in-language-models-87a196019a94)" ] } ], "metadata": { "celltoolbar": "Slideshow", "hide_code_all_hidden": false, "hide_input": false, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 1 }