{ "metadata": { "name": "", "signature": "sha256:4cd9c2afcb5d64f0442d17f6ea5000cfe0b31403d2c1f195177cf4ffb08be612" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "code", "collapsed": false, "input": [ "# Utils\n", "\n", "from IPython.display import Image" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Tutorial Derivation: Gibbs Sampling for Latent Dirichlet Allocation

\n", "\n", "## Introduction\n", "\n", "This notebook describes in painful detail the Gibbs sampling algorithm for Latent Dirichlet Allocation. It is based on \"Finding scientific topics\", the PNAS paper by Griffiths and Steyvers ( [link](http://psiexp.ss.uci.edu/research/papers/sciencetopics.pdf) ), and not the original paper by Blei et al.\n", "\n", "## Probabilistic Model and Generative Story\n", "\n", "We assume the following generative story for document creation: \n", "\n", "A number of $D$ documents are generated independently. There are a number of $T$ topics that characterize the documents. There are $W$ unique words in the documents. Each document has a length of $L_d$ words.\n", "\n", "\n", "\n", "Each document $d$ has a topic distribution $\\theta_d$, a categorical distribution with $T$ categories (and there are $D$ such distributions).\n", "\n", "Each topic $t$ has a word distribution $\\phi_t$, a categorical distribution with $W$ categories (and there are $T$ such distributions).\n", "\n", "To generate a document, we generate each word independently. For each document $d$ and word position $l$, a topic corresponding to that position is sampled from the topic distribution:\n", "\n", "$P(z_{dl} = t) = \\theta_d(t)$\n", "\n", "According to the sampled topic, the word is generated according to:\n", "\n", "$P(w_{dl} = w) = \\phi_{z_{dl}}(w)$\n", "\n", "We place Dirichlet conjugate priors on $\\theta$:\n", "\n", "$\\theta \\sim Dirichlet(\\alpha)$ \n", "\n", "and $\\phi$: \n", "\n", "$\\phi \\sim Dirichlet(\\beta)$.\n", "\n", "Furthermore, to simplify things, we assume $\\alpha$ and $\\beta$ are symmetric priors, and in experiments we initalize them to $1$.\n", "\n", "The probabilistic model in plate model notation is illustrated below:" ] }, { "cell_type": "code", "collapsed": false, "input": [ "lda_pm = Image(filename='images/lda_pm.png')\n", "lda_pm" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "png": "iVBORw0KGgoAAAANSUhEUgAAAZsAAAE3CAYAAAB8RuxtAAAABmJLR0QA/wD/AP+gvaeTAAAgAElE\nQVR4nO3deVxO6f8/8NfdIimUkrEMFYpQ1oQsRTXWhrSYwtBYIvtg8BmGYox9RMzYiSjrMBItyBhS\nmWEoS0mWGQwm2rTdvz98u39jbKX73OdeXs/Hw8NM3fe53jnV65z3uc51JFKpVAoiIiLhRGqJXQER\nEak/hg0REQmOYUNERIJj2BARkeAYNkREJDiGDRERCY5hQ0REgmPYEBGR4Bg2REQkOIYNEREJjmFD\nRESCY9gQEZHgGDZERCQ4hg0REQlOR+wCNMXEiRORmpoqdhlERBVWtWpVrF27Fg0bNvzgbUj4PBvF\nkEgkcHBwwMcffyx2KUREFRIZGYk9e/bAy8vrgzfBMxsFmjJlSmV2FhGRKCQSSaW3wWs2REQkOIYN\nEREJjmFDRESCY9gQEZHgGDZERCQ4hg0REQmOYUNERIJj2BARkeAYNqSxZsyYAYlEwj/8o7F/ZsyY\nobCfN64gQBorMzMTDg4OmDp1qtilECncihUrkJmZqbDxGDak0T7++GN4enqKXQaRwkVGRip0PLbR\niIhIcAwbIiISHMOGiIgEx7AhIiLBMWyIiEhwDBsiIhIcw4aIiATH+2yIiCqosLAQqampuH79Om7c\nuIGsrCwUFhYiJycHwMvHKBsZGUFPTw9NmzaFlZUVrKysYGFhIXLl4mHYEBGVQ1paGg4fPoyYmBgk\nJCQgPz8fEokE9evXR7169VClShVUq1YNACCVSnHnzh3k5eVh+/btyM7OBgDUq1cPvXr1Qq9evdC/\nf38YGRmJ+SUpFMOGSMk8ffoUBQUFyM/PBwDo6+ujatWqMDY2FrkyzfPPP/9gx44d2LFjBy5cuABT\nU1O0b98e06ZNg52dHSwsLKCnp/fe7Tx9+hQ3btxAcnIyLly4gD179gAABgwYgOHDh6N3797Q0lLv\nqxoMGyKRlJSUIDk5GbGxsUhMTER6ejrS09ORl5f3xtdXq1YNjRs3RuPGjdGxY0f07NkTbdu2hba2\ntoIrV3+PHj3CqlWrEBISAolEAhcXF4wbNw5t27b9oFAwNjaGvb097O3tERAQgNzcXMTExODo0aNw\nd3eHtbU1Zs+eDW9vb+joqOevZfX8qoiU2NmzZ7Ft2zZERkbi6dOnMDMzQ+fOndG7d280bdoUDRs2\nhL6+vqwlk5eXh7y8PGRlZeHmzZu4du0aVqxYgVmzZqFWrVrw9vbGsGHD4ODgIPJXpvqKioqwfPly\nBAUFoUaNGggICMDgwYOhr68v13EMDAzg7u4Od3d3ZGVlYfPmzRgxYgSCgoIQGhoKZ2dnuY6nDBg2\nGuLJkydITU3FtWvXcOvWLeTm5iI3NxfAy298AwMDWFhYwNraGjY2NmzZyFlRURHCw8OxZMkSXLly\nBW3atMH//vc/uLm5wcbGBhKJpELbk0qluHLlCo4dO4awsDCsW7cOrVq1wowZM+Dj46O2R8dCSkhI\nwOjRo3H37l0EBgbCx8cHurq6go/bsGFDfPPNNwgICMDy5cvRq1cv+Pj44Pvvv0ft2rUFH19R+B2p\npkpKSnD8+HEcOHAAv/zyC1JTUyGVSmFmZgZra+vX+swvXrzAjz/+iIcPH0IikaB58+ZwdHTEoEGD\n4OLiovb9ZCGFh4dj1qxZuH//Pvz8/LB79260bNmyUtuUSCRo2bIlWrZsiS+//BKXL1/G8uXLMWLE\nCHz99ddYvHgxvL295fQVqLeSkhIEBQUhODgYzs7OWLt2LczMzBReR506dbBkyRIMGjQIwcHBsLOz\nw86dO+Hk5KTwWoTwWthkZWVh/PjxKCgoEKMeuWjevDlWr14tdhmiePz4MUJCQrBp0ybcvXsXLVq0\nQO/evbFy5Uq0b98etWrVeuf7nzx5gqSkJERHR+PYsWP48ccf0aBBA4waNQqBgYHvfT/9f9evX0dA\nQADi4+Ph5+eH+fPnCzb1tVWrVti6dSu+/vprzJ07F0OGDMGmTZsQGhqKJk2aCDKmOnj8+DE8PDzw\n66+/Yvbs2fDy8hK7JDg4OGD37t2YN28eXFxcMH/+fMyZM0fssirttbA5d+4cjhw5orLP+Lhz5w5C\nQkI0LmyePXuGRYsWITQ0FAYGBhg9ejS8vb1hY2NToe3UqlULrq6ucHV1xfLly3HlyhXs3r0boaGh\nWLZsGQIDAzFr1ixUr15doK9EPWzfvh3jx49Ho0aNcPLkSXTr1k0h4zZu3Bg7d+7EqFGjZBe0169f\nj88++0wh46uSu3fvwtXVFU+fPsWOHTsq/LMiJENDQyxbtgw7duzA3LlzcffuXaxZs0alJ4O8tY0W\nERGhyDrkJiIiQuPaB7t27cKXX34JiUSCb7/9Fv7+/qhatapctt2iRQsEBQVh9uzZ2LhxIxYtWoRt\n27ZhxYoVGvfvXB7FxcUICAjAxo0bMW7cOCxfvlxu+6IievTogeTkZEyZMgW+vr5ISEhQ+V9W8pSe\nng4nJydUqVIFO3bswEcffSR2Sa+RSCQYNmwY6tati1mzZuHx48cIDw9X2X3IRrwKe/bsGby9vTF8\n+HB4e3sjNTUV48ePF+SXm76+PiZMmIC0tDR4eHjA19cXn332GZ4/fy73sVRVQUEBBg8ejJ07d2LP\nnj1Yu3atKEFTRl9fH+vXr8euXbuwdetWeHl54cWLF6LVoywePHgAV1dXGBkZYdu2bUoZNP/m4uKC\n9evX4+eff8bYsWMhlUrFLumDMGxUVFpaGtq1a4ezZ88iPj4eK1euRI0aNQQft2bNmli9ejXi4uJw\n6tQptG/fHtevXxd8XGX34sUL9O/fH6dOnUJ0dLRS9P7LDBkyBMeOHUNsbCwGDBiAwsJCsUsSzfPn\nz9G7d29IJBKsXbsWNWvWFLukcmnfvj2WL1+OrVu3Yt68eWKX80EYNirowoUL6Nq1K+rXr4+LFy/C\n0dFR4TV069YNFy9ehJmZGRwdHZGUlKTwGpRFSUkJfH19kZSUhJMnT6Jr165il/Sa7t27Iz4+HufP\nn8ewYcNQWloqdkmi+OKLL3Dv3j2sX79e5ab3Ozo6Ijg4GMHBwThw4IDY5VQYw0bFJCYmomfPnujU\nqROioqJgamoqWi1mZmaIjo6Gvb09nJ2dceHCBdFqEdPUqVPx888/49ChQ7CzsxO7nLdq06YNDhw4\ngIMHD2LGjBlil6NwK1aswP79+7Fq1Sqlb529Td++feHr64uhQ4ciNTVV7HIqhGGjQq5evYo+ffqg\nW7du2Ldvn9zvav4Q1apVw4EDB+Do6Ii+ffvi2rVrYpekUHv37kVISAi2bNmisBlnleHk5IQNGzZg\nxYoVOHTokNjlKExqaipmz54Nf39/tGrVSuxyKmXy5MmoX78+Pv/8c5U6Q2XYqIjs7Gz0798f1tbW\niIiIUMidzeWlq6uLyMhIWFpaol+/fnj27JnYJSlEZmYmRo0ahYCAAPj4+IhdTrkNHToU/v7+GDly\nJO7cuSN2OQoxYcIEmJubIyAgQOxSKk1PTw/BwcFITk7Gli1bxC6n3Bg2KkAqlcLX1xclJSU4fPiw\nbM0sZWJgYICffvoJ+fn58PX1VdkZMxURGBiIBg0aYPny5WKXUmGrV6+GmZkZJk6cKHYpgouMjER8\nfDy+/vprlZ02/F/NmzfHkCFD8NVXX+Hp06dil1MuDBsVsGvXLhw7dgxbt25V6jv4zczMsHXrVhw9\nelS2hLq6ioiIQFRUFDZv3izq9OYPpa+vj40bN+LQoUM4ePCg2OUIpqSkBHPmzIG7u7tSX0/7EOPH\njwcAlTnYYdgoub///huTJk3CxIkT0aNHD7HLea9evXph3LhxmDx5Mv755x+xyxFEcXEx5syZg+HD\nh6NDhw5il/PBunTpgs8++wyzZs1Sqd5/RezcuROZmZkYN26c2KXInaGhIT7//HOsXLkSf//9t9jl\nvBfDRsl999130NXVVam59fPnz0dpaSmWLl0qdimC2LlzJ7KysrBgwQKxS6m0oKAg3Lx5E+Hh4WKX\nIojvvvsObm5uKjv77H08PDygq6uL9evXi13KezFslNjDhw8RGhqKOXPmqMzNZ8DL9dVmz56N77//\nHo8ePRK7HLmSSqVYunQpvLy80KBBA7HLqTQLCwt4enpiyZIlYpcid+fOncPVq1fh5+cndimCMTQ0\nxMCBA7Fp0yalv07KsFFioaGh0NfXx4gRI8QupcL8/f2hp6enEkdcFXHu3DlcuXIFkydPFrsUuZk8\neTIuXbqExMREsUuRq61bt8La2hotWrQQuxRBffrpp8jMzMTp06fFLuWdGDZKqrS0FJs3b8aoUaNg\nYGAgdjkVVr16dfj7+2PTpk1qdT1gy5YtsLOzQ7t27cQuRW7s7e1ljyhQF8XFxYiMjETv3r3FLkVw\njRs3hrW1tdJPymHYKKnjx4/j7t278Pf3F7uUD/bFF18gKysLsbGxYpciF6Wlpdi/f79aLtfv4+OD\nffv2KX0rprySkpLw5MkT9OzZU+xSFMLZ2RlRUVFil/FODBsl9dNPP6Fdu3ZyefDVX3/9hT59+qBa\ntWpwdHREenq6HCp8PysrK9jZ2eHw4cMKGU9oKSkpePz4Mfr06SN2KXLXt29fPHz4EL///rvYpchF\nfHw86tSpA3Nzc7FLUQh7e3tkZmYiMzNT7FLeimGjpGJjY+Hi4lLp7RQXF6Nv376QSCTIyMhArVq1\n0KdPH4Wt/Ovq6oqYmBiFjCW0+Ph4fPTRR2p5DcDW1ha1a9dGXFyc2KXIRUJCAtq2bSt2GQrTqlUr\n6OnpKfV1G4aNgIqLi3H8+HHk5+dX6H0PHjzAjRs30L1790rXsGfPHqSkpGDq1Kn46KOPMGnSJFy/\nfl1hq8Z2794daWlpSnUfQGJiIu7evVvh9yUlJaFjx46QSCQCVCUuiUQCBwcHpVu9++zZs/jrr78q\n/L5Lly4p1ZM3haanp4cmTZrg8uXLYpfyVgwbAZ08eRJubm4wMTEBAPz2228oLi5+7/v++OMPSKVS\nuRyZRUZGAnh55ApAtgjhjh07Kr3t8mjbti2kUin++OMPhYxXHq6urmjYsCHi4uKQkZGBx48fl+t9\nqampav0LrHnz5kq1krBUKpU9SqNHjx7YvHlzuW4UzsnJwf379zWmhVamUaNGSrX//kthYbNjxw40\nbNgQJiYmWLduHQCgWbNmkEgkarsYYFmwlJ3ZfPvttzA1NUVAQADOnDnz1oux165dQ61atVC7du1K\n15CSkgIAsvt0yv6+ceNGpbddHh999BGMjIyUajXooqIiSKVSPH78GCkpKahTpw7c3Nywa9cu5Obm\nvvE9paWluHnzJqysrORSg6mpKUaNGoXJkydj8uTJqFOnDrS0tESdfmxlZYUbN24o1SSB0tJSlJaW\n4vTp0xg1ahRq166N/v37IzIy8q0dg7KvQR5hc/78ebRp0wb29vZITk7G8+fP8dVXX6FVq1YYOXIk\nbt68CeDlgYiTk5Ps4E4M5ubmCvu5/hAKCZuzZ89i+PDh6NSpE9LS0pCYmIjExETcuHED1tbW+Pjj\njxVRhlLIzs7Gxo0b0bVrVxgbG2PMmDGvBc/t27dhYWEhl/EePHgA4OVptkQika3jde/evTe+vqCg\nAFFRUeU+2i8Pc3Nz3L59W27bkyepVIqSkhLExsZi2LBhMDIyQp8+fbB9+3bk5OTIXvfPP/8gPz9f\nLt+rxcXFCAwMxIYNG7Bq1Sp88sknePjwIaZPnw57e/s3vudN+yUjIwNr166tdD1lPv74Y+Tm5iI7\nO1tu25QXqVSK0tJSFBcXIyoqCj4+PqhZsyb69u2L7du3Iy8vT/bashuJ5fGsp44dO8LDwwMlJSWw\ntrZG9erVMX/+fNnBYNkEHisrKzRs2BCDBw9+bRsvXrzAmTNnXjkru3v3rtxXbTA1NcXDhw/luk15\nUkjYLFmyBFKpFFOmTEHt2rWxbNky2XpMvXr1UkQJSqXsjCc7OxtbtmxB165d0aBBA0yaNAkpKSl4\n9uyZ3FYMKLvHpaSkBFKpVDZ2SUnJa6/Nzc3FggUL0KdPH7ne+V+zZk2l/AX2byUlJSgpKUFxcTFO\nnDiBzz//HCYmJvD09MThw4dlK+tWr1690mNpa2tj5syZAICnT5/C398fLVu2fOvyN2/aL7dv38bK\nlSsRGBhY6XrKlD1W/Pnz53LbphBKSkpQWlqKoqIi2b4yMzPD0KFDcfjwYdn3mrxWRx8yZAgKCwtx\n5MgRAC8P3GxtbREXFyc7IDl9+jT69ev32vW8/Px8rF+/HgEBAbLvofv372P79u1YtGiRXOorY2Bg\n8MoBkrLRUcQgFy9eBPCyJwwAJiYmsusY8phx9SZCbbciynN2UFRUBODlN+DatWuxevVq1KpVC23a\ntJFLDbVq1cLDhw+Rn58PAwMDWZvIzMzstdcaGBggODgY3377rVzGLlOjRg2cP39eKfYJgPdeNyv7\nfGFhIQ4dOoS9e/fKHiFsaGhY6fElEonswXcTJ07Eo0eP8PPPP0NPT++Nr3/TfmnUqBHGjh2LNWvW\nVLqeMmVB6u3trTI3Epf9/OTm5mLPnj0ICwuDkZERdHV1oaUln2Ppxo0bw97eHnv37oW3tzfu378v\nOzCJioqCp6cnjhw5guDg4Nfeq6+vjwkTJmDjxo2yj9WrVw9eXl5yP7OpVq0aiouLkZ+frxQPVvwv\nhYRNWSvn3z+oWlpa0NHRgZOTkyBjKsPzxV+8eFGh15cdFcmzZ25ra4uYmBj8+eefaNKkiWxftG7d\n+o2vl9cP6H9pa2srxT6pqP/uC3nORNu/fz/CwsKwaNGit+6PMm/aL0I9m6VGjRqysxxVUravhLjm\nNGTIEEyZMgV//PEHDh48iKlTp+KHH37A3r170aZNG9StW/etv+AVte+UfZakQsLGzMwMd+7cQX5+\nvixwfv31V3Tt2lWwb+qIiAhBtlsRx44de+9yGTo6OigpKUG7du3g5+eHwYMHIzg4WG4X1AcNGoSY\nmBikpKSgSZMmspv2vLy8ZK/Zv3+/bMpkRadpl8ezZ8/Qvn17uV5fqAwDA4N33meko6OD4uJimJiY\nwN/fH8OGDYOhoSHMzc3l1mJ6+PAhxo4di06dOmH69Omyj//222+y4BF6v/xX2RNWN2zYoBTXUaVS\n6XsvuJftq/r16+OLL76Ap6cnrl27JrvOIq9f6k5OTqhTpw5CQ0NhbGyMJk2awMvLCyNHjsTChQsR\nFBT0yutjYmJkF+sLCgrkUsP75ObmQldXVynPagAFhY2trS3u3LmDmzdvonXr1vj999+RlZWFrl27\nKmJ4pVP2A2Jubo6xY8fC29v7lZkzNWrUkNujlUeOHInVq1djzZo16NWrF0JDQ9GhQwfZY4zj4uLw\n888/Y9OmTQBezqr57rvv5DJ2mezsbKU/UtbW1kZpaSlq1KiBYcOGwcvLC507d5YdlT558gSAfK5n\nSKVSBAQEIDc3F9u2bYOOjo5sjN27d6N169YK2S//Vfa1yeO6lJDKfn6MjY3h5+f32r66f/8+ACAv\nL09uX4u2tjY8PT2xdu1a7N+/HwDQvn17mJubw8DA4JUVwM+fP4/Tp0/LrsFlZGRg8+bNcqnjXXJy\ncuTS5hWKQiYITJs2DcDLR9E+evQI06dPR4cOHaCtrY2kpKQ3Xj9QV8bGxhg9ejQSEhKQkZGBmTNn\nvjZF09LSUjalsrL09PRw7NgxaGtro0GDBqhatSoOHz4sO+ILDg5+Za2vsutq8iKVSnHz5k1YWlrK\ndbvyIJFIIJFIoK2tDTc3N4SFheHevXtYvXo1HB0dX2l/GBsbw9DQUC6z6vbt24f9+/fDwsICa9eu\nxeTJk+Hv7w97e3vZz4LQ++VNbt++jRo1asDIyEjwsSpKW1sb2tra0NXVRZ8+fRAREfHWfVU2C02e\nMyqBl92AoUOHymagSSQS+Pv7Y/To0a+87scff0Tfvn1l/6+o7/3Hjx/L5XYJoSjkzMbJyQnr16/H\n/PnzsXfvXri4uCAsLAznzp1Dz549MX/+fEWUoXBlv9D19fWRn5+PWbNmYcGCBbIj2bextrZGdnY2\nHjx4gDp16lS6jkaNGiE+Pv6Nn7t48aKg11L+/PNP5OTkwNraWrAxKkpHRwcSiQS1atWCubk5oqOj\nZRNW3kYikaBJkya4fv16pccfPHjwe68rCL1f3uT69eto2rSpQsd8n7IQcXR0xLBhwzBo0KD3hqG1\ntTW0tLSQkZEh1xs7jY2NX2l5Ai+X9/+v1NRUUc7kb926pZCDkg+lsJs6x4wZg/v37+PZs2fYt28f\nzMzMMGDAAGRnZ6vVs0H+rXv37oiOjpYdYbVu3fq9QQMANjY2kEgksll8QqpSpQouXbok2PYvXrwI\niUSiVHfeR0dHIysrC87OzrC0tHxv0JSxsrJCWlqawNW9JPR+eZNr167J7aZVeZBIJDh16hTu3buH\nkydPYuTIkeU666pWrRrq168v2qKUurq6cjkoqajMzEylOqj7Ly5XI6AqVarA1dW1whfszMzM0LJl\nS4UsYOnu7o6goCDZD2bZ3/Kar3/ixAnY2dnJ5QY7eXFwcPigp2w6ODggISFBIc/nKc9+KbtXSh71\nlJSUICEhAQ4ODpXeljw5Ojp+0COdW7ZsqbADg/9ydnbG+vXrZdeO/n0NqYw89x3wcuZrRkaGUi8S\ny7BRUj169FBI2CxduhRdunRBhw4d4OTkhOjoaNjY2ODKlStvvPGzomJiYtCjR4/KF6oEnJ2d8eTJ\nE4Wccbxvv/z555+yZZ++//77ty6zU14XL17EP//8A2dnZ3mULzpnZ2ckJiaKsvTOtGnT0KZNG/j4\n+GDkyJH45Zdf0LhxY9y8eROlpaV49OiRbLZsWFiYXGYaXrx4ES9evFDqm+QVcs2GKs7T0xMhISG4\ndOmSbBFNIdSsWRPbt29/5WNjxoyRy7YvXryIK1euYMOGDXLZnthatWoFExMTHD9+/L33xVTW+/ZL\n3bp1sWbNGrnd1Hn8+HGYmZkp9ZFxRTg5OWH69OlIT0+XyzOhKsLQ0PC11QE8PT1l/127dm3Mnj0b\ns2fPltuYiYmJaNKkCerVqye3bcobz2yUVJcuXdCwYUPs2rVL7FI+2K5du2BhYaF0rZkPpaWlBR8f\nH9l0ZHUhlUqxZcsWDBkyROlvDCwvOzs71KpVC6dOnRK7FIU4ffq00qzQ8TYMGyWlpaWFESNGYOPG\njUq93tHbPHv2DJs2bcLIkSPV5hcYAPj6+uL69etITk4WuxS5uXDhAm7evKlWj7vW0dGBr68vDh48\nqFSrWAshNTUV165dw9ChQ8Uu5Z0YNkpsypQpKC4uxurVq8UupcJWrVoF4OXaX+rEwcEBLVq0kH19\n6mDVqlWwtbV964rTquqzzz5DZmYmrl69KnYpgjp69CjMzc2VvoPAsFFiNWvWxKhRo7By5UrZHeyq\n4NGjR1i1ahUCAgKUfuWAipJIJPj6668RHh6u1M8OKa8rV65gz549mDt3rtilyJ2DgwNsbGwQFhYm\ndimCycnJwYEDB+Dv76/0HQSGjZKbO3cu9PT0ZEvSq4Lp06ejZs2amDNnjtilCMLDwwMWFhZyXx1b\nDN9++y2srKwwcOBAsUsRxLx58xAVFaW0z1OqrF27dkEikahEB4Fho+SqV6+O7777Dlu2bMHJkyfF\nLue9YmNjsWPHDixevFhuzxNRNjo6OggJCcHWrVsRFxcndjkf7OjRo9i5cydWr14t2GrfYvPw8ICl\npaXaTeoAXp7VhIeHY/z48SrRQVDP7zA14+vriyFDhsDLy+utT9hUBpmZmfD09MTQoUPh7e0tdjmC\n+uSTT9CnTx9MmjRJYav6ylN+fj6mTZuGTz/9VOlnMVWGtrY2Fi5ciEOHDslWPFcXZauol609qewY\nNioiJCQEhoaG8PLyeuVOZGWRm5sLHx8fmJiYqOSEhg+xZs0a3L17V2V+2P9t4sSJePjwoUbsK09P\nTzg5OSEoKEguNyorg9TUVISHh2Px4sUq85woho2KMDIywpEjR3Dt2jV4eXnJnlCoDIqKiuDp6YmM\njAwcOXJEJU7p5cHc3BwbNmzAunXrsHv3brHLKbft27dj06ZN2Lx5s1I8t0YRQkJCkJmZqTTPVKqM\ngoICzJkzB+3atcOIESPELqfcGDYqxMbGBkePHsXp06fh4eGhkAdqvU9eXh4GDhyIX375BUePHlXq\nhQCFMHjwYEyYMAEjRozA6dOnxS7nveLj4zF69GhMmTIF7u7uYpejMM2bN8fq1auxceNGlb7OBgBf\nf/01njx5gr1796rUtTbVqZQAAPb29oiNjcWvv/6K3r174++//xatlgcPHsDNzQ0XLlxAXFwc2rdv\nL1otYlqxYgX69u0Ld3d3pb4ucPHiRQwcOBCffvopli5dKnY5Cjd69Gh4enpi/vz5uHPnjtjlfJDI\nyEhER0crzdNUK4Jho4I6dOiAhIQE3Lt3D23atMGZM2cUXkNUVBTs7Ozw8OFDJCQkoF27dgqvQVlo\na2tj586d6NChA3r06CHK/nifU6dOwcnJCZ06dcL27dtV6ohYnjZu3IhGjRohICBApe5dA4CTJ09i\n0aJF+Prrr1VyqrpmfsepgWbNmiE5ORmdO3eGk5MTxo8fL/cnE77JX3/9haFDh6Jv375wdnZGUlKS\nUj0DRSx6eno4fPgwXF1d4erqisjISLFLkgkPD8cnn3yCvn374tChQ6hSpYrYJYmmevXqiIqKgra2\nNsaPH4/s7GyxSyqX5ORkzJgxAyNGjFDZh00ybFRYjRo1sGfPHmzbtg0HDhxA06ZNsXjx4kovN/8m\nOTk5WLRoEZo1a4a4uDiEh4dj165dSv+8ekXS09NDeHg4AgMD4e3tjcDAQFGnRefn52Ps2LHw9fXF\n5MmTERYWptFBU6ZOnTo4fvw4srOz8fnnn+PPP/8Uu6R3OnHiBMaMGYMBAwbIHiuhit76iAFVnXv/\n4MEDsUtQuM8++wz9+vXDokWLsHDhQqxcuRJjxoyBj49PpZ+QeeXKFezevfqO028AABz2SURBVBs/\n/PADCgoKEBgYiFmzZjFk3kJLSwtLlixB165dMWLECJw8eRKhoaHo1q2bQus4efIkxo0bh0ePHuHw\n4cPo27evQsdXdo0bN8bZs2fh6uqKoUOH4vvvv1e6xytIpVLs2LEDy5cvx5gxY7BmzRqVbn++FjYd\nO3aEj4+Pys5HNzY2VtmgrIwaNWpg8eLFmD59uuzu9qCgILRo0QKffPIJXF1d0bp1a5iZmb1zOw8f\nPsRvv/2G6OhoHDt2DFevXoWFhQUCAwMRGBiIWrVqKegrUm39+/fH77//jkmTJqFHjx7w8/PD/Pnz\nYWFhIei46enpmDt3LsLDw+Hp6YmVK1cq9TNOxNSgQQMkJCTAw8MDQ4cOxdixYzF69GixywLwspMw\nb948xMbGYsGCBWqx9NNrYdOoUSOEh4eLUQvJgYmJCb755hvMmzcPv/zyCw4ePIi4uDisXLkSpaWl\nMDQ0hLm5OQwNDWFoaAjg5Td2Tk4Obt26hdzcXGhpaaF169bo06cPNmzYgE6dOin9In/KqH79+ti7\ndy+OHTuGadOmwdraGn5+fpg6dSpatmwp17EuX76M5cuXY+fOnWjWrBmOHTsGV1dXuY6hjkxMTDBz\n5kwMGjQIa9asQWpqKmbMmIG6deuKVtOZM2ewePFiFBYWqtWTbvmkTjUlkUjg6OgIR0dHAMDTp09x\n9epVXLt2TRYqZdd2DAwMYGBgAAsLC1hbW8PGxkZl7kpWBWVnlpGRkQgODkarVq3Qpk0b+Pn5wc3N\nDTY2NhUOc6lUiitXruDYsWMICwvD77//DltbW+zatQseHh4q3W5RlMLCQnz55ZcICQkB8PJJqKdP\nn4a7uztGjBiB4cOHK3R9v8zMTCxbtgynTp2Cr68vVq5cidq1aytsfKExbDSEsbExunTpgi5duohd\nikbS0tKCt7c3vL29cfbsWWzbtg3BwcGYNm0azMzM0LlzZ1hZWaFp06Zo2LAh9PX1Zb/o8vLykJeX\nh6ysLNy8eRPXrl3D2bNn8ejRI5iYmMDLywvr169X+ueZKJO//voLnp6eOHv2rOxjn3/+OUJCQrB8\n+XIEBQUhPDwcn3/+OXx8fAQNndu3b2P9+vWIiopCkyZNEBsbC2dnZ8HGEwvDhkjBOnfujM6dO2Pd\nunX4448/cPr0aSQlJSE+Ph7r1q3D8+fP3/i+GjVqyM48582bh27duqFFixY8i6mgo0eP4rPPPkNe\nXh5KS0sBAIaGhujQoQO0tbXx1Vdfwd/fH6tWrUJISAg2bdoEFxcX9OvXD23btpXLv3dubi5iYmJw\n9OhRnD9/HtbW1ti2bRt8fHygra1d6e0rI4YNkUi0tLRga2sLW1vbVz7+9OlTFBQUyJYj0tfXR9Wq\nVdnarKSioiJZ20wikciCRktLC926dXvll3zt2rWxcOFCTJ8+HTt27MCOHTswYsQImJqaon379mjf\nvj3s7OxgYWEBPT2994799OlT3LhxA8nJyUhKSsLvv/8OiUSCAQMGYPbs2ejdu7faHzQwbIiUDENF\n/u7fvw9PT0+cP38eUqkUUqlU9jltbW306tXrje8zMjLChAkTMGHCBKSlpeHw4cOIiYnBihUrkJeX\nB4lEgvr166NevXqoUqWKrN0mlUqRk5ODvLw8ZGZmym4erVevHnr16oXAwED0798fRkZGwn/xSoJh\nQ0RqLTo6Gj4+PsjNzX3jLR1FRUVwcnJ673aaNWuGZs2aYfr06SgsLERqaiquX7+OGzduICsrC4WF\nhcjJyQHwcoJO2VmPv78/rKysYGVlJfjUd2XGsCEitVRSUoI5c+ZgyZIlr7TN/qtmzZqws7Or0Lar\nVKkCOzu7Cr9PkzFsiEgtjRgxAjt27ACAV9pm/6atrY1u3brxPjIFUO8rUkSksTp06ABdXV3o6uq+\n9TVaWloaueKIGBg2RKSWyi7q29vbv3WmV1FREXr27KngyjQTw4aI1JalpSUSEhLwv//9D9ra2q+d\n5ZiYmKB58+YiVadZGDZEpNaKiopw8OBBtGnTBs2bN5fdT6Ojo4OePXvyeo2CMGyISK0tXLgQGRkZ\niIiIQFJSEubOnQsdHR0UFxer5bIwyophQ0RqKykpCYsWLcK3334LCwsL6OrqYu7cuUhKSoKfnx/c\n3d3FLlFjcOozEamlwsJC+Pv7o0uXLhg3btwrn7Ozs5NNiybFYNgQkVoqa59dunRJ7dcdUwUMGyJS\nO2XtsxUrVmj0EjHKhHFPRGrl3+2z8ePHi10O/R+e2RCRWmH7TDkxbIhIbSQnJ7N9pqQY+0SkFgoL\nCzFy5Ei2z5QUz2yISC2wfabcuEeISOWVtc8WLVqkUu2zX375Bf7+/pBIJJBIJPjqq6+QmJgodlmC\n4JkNEak0VW6fdenSBW3btsXmzZvRsGFDLF68WOySBMOwISKVpurtM319/Vf+VlcMGyJSWZx9pjpU\n7zCAiAiq3T7TRDyzISKVtGjRIpVun2ka7iEiUjnJyclYuHChys0+k7eCggJERUXh8ePHYpfyXgwb\nIlIpbJ+9lJubiwULFqBPnz549OiR2OW8F8OGiFRKWftsy5YtGtM+e/jwIYqKil75mIGBAYKDg0Wq\nqOI0Y08RkVpQx/aZVCp97+fHjRsHbW3t1z6nSmHLCQJEpBLUtX2Wm5sLAHj+/DlKS0tfCZDs7GxM\nnz4dVatWlX18//79uHz5MgAgPz9f8QV/IIYNEakEdZx9dvbsWfzwww8AgPv378PGxgZ169aV/f+t\nW7dQVFSErVu3AgDi4uLw888/Y9OmTQCA1NRUfPfdd6LUXlEMGyJSemXtM3W7ebNz587o3Lkztm3b\nVq7XBwcHY86cObL/b968uVClyZ16HB4QkdpS1/bZh7h48SKMjY3FLuOD8MyGiJSaOrbPPlSVKlVw\n6dIltG3bVuxSKkyz9xwRKTV1nH1WGe7u7ggKCkJmZiYAyP7OyckRr6hyYtgQkVJi++x1S5cuRZcu\nXdChQwc4OTkhOjoaNjY2uHLlCkpKSsQu753YRiMipcT22etq1qyJ7du3v/KxMWPGiFRNxTBsiEjp\nlLXPli9fzvaZmuDhAhEplX+3zwIDA8Uuh+SEZzZEpFTYPlNPDBsiUhpsn6kvHjYQkVIoLCyEv78/\n22dqimc2pNFOnz4NFxcXscsgABkZGbh9+zYcHBzg5uYmdjlq7/Lly+jWrZvCxmPYkMby9PQUuwT6\nP0+fPsWtW7dgZ2eHevXqiV2ORujWrRu8vLwUNh7DhjSWp6cnA0cJFBYWwt7eHt26dUNcXBwnBagp\nhg0RiWrRokVIT0/n7DM1x7AhItFw9pnm4GEEEYmCs880C89siEgUbJ9pFoYNESkc22eah4cTRKRQ\nbJ9pJp7ZEJFCsX2mmRg2RKQwbJ9pLh5WEJFCsH2m2XhmQ0QKwfaZZmPYEJHg2D4jHl4QkaDYPiOA\nZzZEJLBvv/2W7TNi2BCRcJKTkxEcHMz2GbGNRkTCYPuM/o1nNkQkCLbP6N8YNkQkd2Xts2XLlrF9\nRgDYRiMiOft3+2zChAlil6MxTpw4gX79+kEikUAikcDJyQndu3dHmzZt0Lt3b4SGhiI/P1+0+nhm\nQ0RyxfaZOFxcXNClSxcYGBjA3Nwc8fHxAIDS0lIcOXIEU6ZMwdKlS3Ho0CHY2toqvD5+JxCR3JS1\nz4KDg9k+E0G1atUAAHp6erKPaWlpYcCAAThz5gwKCgrg7u6OvLw8hdfGsCEiuWD7TLnVrVsXQUFB\nyMzMxLJlyxQ+PsOGiOSirH22ZcsWts+U1ODBg6GlpYWYmBiFj81rNkRUaZx9phqMjIxgZmaGK1eu\nKHxsHn4QUaWwfaZadHR0UFRUpPhxFT4iEakVzj5THYWFhXjw4AFatWql8LEZNkT0wdg+Uy1xcXEo\nKirCoEGDFD42D0OI6IOwfaZaXrx4gdmzZ6NBgwairFXHMxsi+iBsnymfsvtnCgoKXvl4SkoKJk+e\njKdPn+Lo0aOoWbOmwmtj2BBRhbF9pnzOnDmDLVu2AABu376NHj16QE9PD3p6etDV1YW3tzeGDx8O\nQ0NDUepj2BBRhbB9ppwcHR3h6OiITZs2iV3KGzFsiKhC2D6jD8GwIaJyY/uMPhQPS4ioXMraZ507\nd2b7jCqMZzZEVC5sn1FlMGyI6L3YPqPK4uEJEb1TUVER22dUaTyzIaJ3YvuM5IFhQ0RvlZycjKCg\nILbPqNIYNgoUGxuLf/75R+wyiMqlpKQEixYtgoWFBapWrYoff/xR7JJIhUmkUqlU7CI0QcOGDXHn\nzh2xyyAiqjBtbW2cOnUKXbp0+dBNRDJsiOg1ycnJcHBwwNKlSzF58mSxyyHVx7AholcVFRWhQ4cO\nqFmzJuLj4zkpgOQhktdsiOgVnH1GQmDYEJFMSkoKgoKCsHTpUs4+I7liG42IALB9RoJiG42IXmL7\njITEsCEits9IcGyjEWk4ts9IAdhGI9J0bJ+RIvDMhuj/7N27Fz/88IPYZSjU8+fPkZiYiKZNm6Jh\nw4bvff2YMWMwePBgBVRGaoZnNkRlIiIikJaWhk6dOoldikKUlpYiKSkJpqamsLW1hUQieefrf/31\nV0RERDBs6IMwbIj+pVOnToiIiBC7DIVYsGABoqOjkZKSUq5JAV5eXgqoitQVw4ZIA3H2GSkarwYS\naZiioiKMHDkSnTt3xsSJE8UuhzQEz2yINAxnn5EYGDZEGoTtMxILD2uINATbZyQmntkQaYiy9tnv\nv//O9hkpHMOGSAP8u31maWkpdjmkgXh4Q6Tm2D4jZcAzGyIlkJeXh6tXryItLQ3Xr19HYWHhK5/X\n09ODlZUVrK2t0aJFC+jr65d724sXL2b7jETHsFFi6rBWF9fSerusrCzs3r0bx48fx5kzZ/DixQvo\n6uqicePGMDAweOW1ubm5SE9PR1FREapWrQpHR0e4ubnBx8cHDRo0eOsYKSkpWLBgAdtnJDqGjRJT\n9bW6uJbW60pLS3Ho0CGsWbMGJ0+ehImJCVxcXLB27Vo4ODjAysoKurq6b3xvUVERrl+/jl9//RVx\ncXH47rvvMHPmTDg7O2PChAno37//K+ubsX1GyoRho+RUea0urqX1qn379mHu3Lm4du0aBg0ahJ9+\n+glubm7Q0Snfj6Guri5atGiBFi1a4IsvvkBRURGio6Oxbds2DBo0CDY2NggKCoK7uzsAts9IufA7\nkEhgN27cgKurK7y8vNCpUydcu3YNERER6Nu3b7mD5k10dXXRr18/REZGIjU1Fe3atcPAgQPRp08f\nHD58GAsWLMCCBQvYPiOlwLAhElBYWBjatm2L+/fv48yZM9i4cSMaN24s93GaNm2KLVu24NSpU8jM\nzIS3tzeaNGnC9hkpDYYNkQBKS0sxceJEDBs2DF988QWSk5MVcu2ta9euSE5Ohq+vL9LS0jBz5kzw\n+YikDHjNhkjOioqKMHz4cBw4cAD79+/Hp59+qtDx9fX1sWHDBnTu3BljxozBgwcPsHnz5rdOPCBS\nBIYNkRyVlpZi+PDhOHLkCKKiotCjRw/RahkxYgTq168PDw8PAMD27dvf+zROIqGwjUYkR1OmTMH+\n/ftx8OBBUYOmjKurK/bv34+IiAjMmDFD7HJIgzFsiOQkPDwcISEh2LZtG5ydncUuR8bFxQWbNm3C\nsmXLsHfvXrHLIQ3FsCGSg4yMDIwdOxYTJkyAt7e32OW8xs/PDwEBARg1ahQyMzPFLoc0EMOGSA7G\njh0LCwsLLF26VOxS3mrFihWoV68exo8fL3YppIEYNkSVFB4ejpiYGISGhqJKlSpil/NWVatWRWho\nKKKiorBv3z6xyyENw7AhqoSSkhLMmzcPfn5+6Ny5s9jlvFf37t3h5eWFuXPnorS0VOxySIMwbIgq\nYc+ePbh16xa++eYbsUspt2+++QZpaWmcLEAKxbAhqoT169djwIABKrX+WLNmzdCvXz+sW7dO7FJI\ngzBsiD5QWloaEhISMHr0aLFLqbDRo0fj1KlTSE9PF7sU0hAMGw0UEhICKysr6Ovro2nTpggKCsKL\nFy/ELkvl7Nu3Dx999BF69uwpdikV5urqClNTU04UIIVh2GiYZcuWYeLEiejVqxcePnwIDw8PzJ07\nF2PHjhW7NJVz4sQJfPLJJ5V6TIBYdHV14ebmhhMnTohdCmkIho2G2bhxI4CXF4mrV6+O6dOnAwAi\nIyPFLEs0xcXFOH78OPLz8yv0vhcvXiAxMRHdu3cXqDLhde/eHWfPnkVhYaHYpZAGYNhomKysLACA\nqakpAMiOyuvUqSNaTWKKiYmBm5sbTE1NkZiYiL/++gvFxcXvfV9aWhry8/Nhb29f6RoyMjLg4+OD\nWrVqISIiAiYmJrC0tMRPP/1U6W2/S8eOHZGXl4fr168LOg4RwLDROGVH8GWPCT558iR0dXWxcuVK\nMcsSTVmw5OXlISsrCwkJCTA1NcW4cePwyy+/vPVZMGlpadDR0UGTJk0qNX5GRgbs7e1x5swZHD16\nFNOnT0dcXBzMzc3h4eGBS5cuVWr772JlZQVtbW2kpaUJNgZRGYaNBmvbti0mTZqEpUuXYsCAAWKX\nI7qyYMnOzsaGDRvg6OgIY2NjjBkzBmfOnHkleG7duoWPP/640isGzJo1C48fP8bKlSvh4OCAe/fu\nwc7ODtOnT0dxcTHmzZtXqe2/i56eHurXr49bt24JNgZRGYaNBktJSUFsbCyCgoIwYMAAPH/+XOyS\nlEbZGU92dja2bNmCrl27okGDBpg0aRJSUlLw7NkzGBkZVXqcsgv0Tk5OAF6uSAAAHTp0AAAkJSVV\neox3MTIyQnZ2tqBjEAF8eJrGa9y4MRYsWIDx48dj6tSp2LBhg1y3f/XqVbi4uMh1m/L0999/v/c1\nRUVFAID79+9j7dq1WL16NczMzNCoUaNKj5+TkwMAqFGjxisfr1mzJgDgn3/+eeXje/fuha2tLays\nrCo9dtm4PMggRWDYEAYNGoTx48dj//79cg8bHR0dGBsby3Wb8lRQUFCh15c96fJt13IqqmHDhkhP\nT8eDBw/w8ccfyz5eFoL/viaUkJAAPz8//Pbbb3IZu4y8vhaid2HYEKpXrw4Agjwy2MrKChEREXLf\nrrwcOXIEZ86ceedrdHR0UFJSgnbt2sHPzw+DBw/G2rVrERUVVenxBw4ciGXLliEqKuqVlQhiY2MB\nAEOGDJF97OLFi3jx4gXWrFkDU1NTuazH9uzZs9fOqoiEwLAhXLhwAcD/v25ALwOmuLgYFhYWGDNm\nDHx8fF5pm9WoUUMu1zrmzJmDQ4cOYdasWbLW2Llz5zBjxgxYWVm98uyZiRMnYtKkSQgMDESzZs0q\nPTbwsk1X1rIjEhInCGiomTNn4tGjR0hLS8OUKVNgYGCA+fPni12WqMqmg9eqVUs2Ay09PR0zZ858\n7fpM48aNkZWVVellfoyMjHD27Fl4e3vDx8cHAODu7o6+ffsiISEBBgYGldr+u+Tl5eHu3bto3Lix\nYGMQleGZjYaysLBA+/bt8fjxY3To0AHx8fGwsbERuyyF09bWBgBUq1YNtWvXRsOGDREXF/feJWis\nrKxQUlKC9PT0Sv+7mZqaIjQ0FKGhoZBIJHjw4EGltlde6enpKC0tldtkA6J34ZmNhho7dixu376N\nnJwcxMfHy6baapqePXsiOjoaf//9N+zt7fHRRx+Va62zZs2aQV9fH4mJiQqo8lXyuqB//vx5VKtW\njWFDCsGw0SBlU3iFbM2omipVqsDV1RX6+voVep+enh66dOkiu5CvKHXq1EFqaiqePXtW6W3Fxsai\nW7duSv0oa1IfDBsNUjadtnbt2iJXoh569OiBmJiYcq2lJi+enp6YOHEi4uPjK7WdoqIixMbGclII\nKQzDRoOkpqYCANq1aydyJerBz88PDx8+lMsU6DLva5GFhITg7t27cHd3r9Q4R44cwZMnT+Dr61up\n7RCVFycIaBBnZ2fewCdHjRo1QpcuXbB161b0799f7HIqZOvWrejatSvq168vdimkIXhmQ1QJU6dO\nxYEDB/DHH3+IXUq5paSk4PDhw5g2bZrYpZAGYdgQVcKAAQPQvHlzLFmyROxSym3p0qWwtbVF3759\nxS6FNAjDhqgStLS0sHDhQoSFheHs2bNil/Nep0+fxp49e7Bw4UJBliciehuGDVElffrpp/jkk08w\nbtw4pX7EckFBAcaNG4f+/fvzrIYUjmFDJAdr1qzBrVu3MH36dLFLeaupU6fi3r17WL16tdilkAZi\n2BDJgaWlJdavX4+QkBDs2bNH7HJes2PHDqxbtw4bNmyQy3N4iCqKU5+J5GTIkCE4d+4chg8fjtq1\na8PZ2VnskgC8fBroF198gWnTpmHw4MFil0Maimc2RHK0cuVKDBo0CJ9++qnCl7J5k+PHj2PQoEHw\n9vbG0qVLxS6HNBjDhkiOtLS0sG3bNvTr1w99+vRBWFiYaLVs2bIF/fr1w8CBA7Fp0ybOPiNRMWyI\n5ExXVxdhYWEYO3Yshg4dioCAgAo/froy8vPzMWrUKIwcORITJkzAtm3boKurq7Dxid6EYUMkAC0t\nLXz//fcICwtDWFgY2rVrV+nFM8sjJiYGbdq0QUREBHbv3o3ly5fzjIaUAsOGSEC+vr5ISUlB/fr1\n4ezsDE9PT1y5ckXu41y+fBmDBg2Ci4sLLC0tcfHiRXh7e8t9HKIPxbAhEljTpk1x/Phx7N27F1ev\nXoWtrS08PT3x888/V+rxBEVFRThy5Ag8PDzQunVr3Lx5EwcPHsTRo0dhaWkpx6+AqPI49VnJnT59\nGi4uLmKX8UEuX76Mbt26iV2G0vDw8MDAgQNx6NAhrFmzBgMGDICJiQlcXFzg7OyMDh06oEmTJqhW\nrdob35+bm4v09HQkJiYiLi4OJ06cwJMnT+Dk5IQDBw6gf//+bJmR0mLYKDFPT0+xS6iUbt26wcvL\nS+wylIqWlhYGDhyIgQMHIisrC7t378bx48cxfvx4vHjxAgBQt25dVK9eHdWrVwcAPH/+HM+ePcNf\nf/0FAKhatSocHR0xc+ZM+Pj4oEGDBqJ9PUTlJZHyASdEACALxoiICIWPnZeXh+vXr+PWrVvIzMxE\nbm4ucnJyAACGhoYwNDREo0aNYGlpCSsrqwo/xloexPz3IZUXyTMbIiVQrVo1tG7dGq1btxa7FCJB\ncIIAEREJjmFDRESCY9gQEZHgGDZERCQ4hg0REQmOYUNERIJj2BARkeAYNkREJDiGDRERCY5hQ0RE\ngmPYEBGR4Bg2REQkOIYNEREJjmFDRESCY9gQEZHgGDZERCQ4hg0REQmOYUOkZuLj4yGRSFCjRg3Y\n2tqiY8eOkEgkqFq1Kjp27IiWLVuiatWqkEgk+PPPP8UulzQEHwtNpGby8vLQo0cPHDlyBAYGBgAA\niUQCc3NznD9/HgDw+PFjODg4ID8/X8xSSYPwzIZIzeTn52PGjBmyoHkTExMTjBs3jmFDCsMzGyI1\n06dPH1SpUuW9rwsICICWFo83STEYNkRqplq1auV6XdWqVQWuhOj/42ENEREJjmFDRESCY9gQEZHg\nGDZERCQ4hg2RmpNKpWKXQMSwIVJ3ubm5AMB7akhUDBsiNRYdHY0JEyYAALKysvDVV1/h7NmzIldF\nmoj32RCpMTc3N7i5uWHLli1il0Iajmc2REQkOIYNEREJjmFDRESCY9gQEZHgGDZERCQ4hg0REQmO\nYUNERIJj2BARkeAYNkREJDiuIED0f7S1tbF7925IJBKxS1FaPj4+YpdAKkoi5ZKwRACAjIwMJCcn\ni12GUmvXrh0sLS3FLoNUTyTDhoiIhBbJazZERCQ4hg0REQmOYUNERIJj2BARkeAYNkREJDiGDRER\nCY5hQ0REgmPYEBGR4Bg2REQkOIYNEREJjmFDRESCY9gQEZHgGDZERCQ4HQCRYhdBRERq7dz/Azo8\ntEOpbtBKAAAAAElFTkSuQmCC\n", "prompt_number": 3, "text": [ "" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The joint distribution is:\n", "\n", "$P(w,z, \\theta, \\phi ; \\alpha, \\beta) = P(\\theta ; \\alpha) P(z | \\theta) P(\\phi ; \\beta) P(w | z,\\phi)$\n", "\n", "\n", "## Integrating out $\\theta$ and $\\phi$\n", "\n", "\\begin{align*}\n", "P(w,z ; \\alpha, \\beta) &= P(\\theta ; \\alpha) P(z | \\theta) P(\\phi ; \\beta) P(w | z,\\phi)\\\\\n", "&= \\int_{\\phi} \\int_{\\theta} P(\\theta ; \\alpha) P(z | \\theta) P(\\phi ; \\beta) P(w | z,\\phi)\\\\\n", "&= \\int_{\\phi} P(\\phi ; \\beta) P(w | z,\\phi) \\int_{\\theta} P(\\theta ; \\alpha) P(z | \\theta)\\\\\n", "&= f_1 f_2\n", "\\end{align*}\n", "\n", "\\begin{align*}\n", "f_1 &= \\int_{\\phi} (\\prod_t P(\\phi_t ; \\beta)) (\\prod_d \\prod_l P(w_{dl} | z_{dl}, \\phi))\\\\\n", "&= \\int_{\\phi} \\prod_t (\\frac{\\Gamma(\\beta W)}{\\prod_w \\Gamma(\\beta)} \\prod_w \\phi_t(w)^{\\beta-1})\\prod_d \\prod_l \\phi_{z_{dl}}(w_{dl})\\\\\n", "&= k_1 \\int_{\\phi} \\prod_t \\prod_w \\phi_t(w)^{\\beta-1} \\prod_t \\prod_w \\phi_t(w)^{n(w,t)}\\\\\n", "&= k_1 \\int_{\\phi} \\prod_t \\prod_w \\phi_t(w)^{n(w,t)+\\beta-1}\\\\\n", "&= k_1 \\prod_t \\int_{\\phi_t} \\prod_w \\phi_t(w)^{n(w,t)+\\beta-1}\\\\\n", "&= k_1 \\prod_t \\frac{\\prod_w \\Gamma(n(w,t)+\\beta)}{\\Gamma(\\sum_w (n(w,t)+\\beta))}\\\\\n", "&= k_1 \\prod_t \\frac{\\prod_w \\Gamma(n(w,t)+\\beta)}{\\Gamma(n(t)+\\beta W)}\n", "\\end{align*}\n", "\n", "Where:\n", "\n", "$k_1 = \\left(\\frac{\\Gamma(\\beta W)}{\\Gamma(\\beta)^W}\\right)^T$ \n", "\n", "$n(w,t)$ is the number of times word $w$ has been assigned to topic $t$ in all documents\n", "\n", "and\n", "\n", "$n(t)$ is the number of words assigned to topic $t$ in all documents." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The exact same steps can be applied to compute $f_2$ as:\n", "\n", "$$ f_2 = k_2 \\prod_d \\frac{\\prod_t \\Gamma(n(d,t)+\\alpha)}{\\Gamma(n(d)+\\alpha T)} $$\n", "\n", "Where:\n", "\n", "$k_2 = \\left(\\frac{\\Gamma(\\alpha T)}{\\Gamma(\\alpha)^T}\\right)^D$ \n", "\n", "$n(d,t)$ is the number of times a word from document $d$ has been assigned to topic $t$\n", "\n", "and\n", "\n", "$n(d)$ is the number of words in document $d$, i.e. $L_d$ from the plate notation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Gibbs sampling\n", "\n", "$$f_1 = P(w|z)$$\n", "$$f_2 = P(z)$$\n", "\n", "The goal of the sampling procedure is to estimate $\\phi$ and $\\theta$. $\\phi$ informs us what the topics might be, $\\theta$ tells us how important each topic is in a document. But we have just integrated them out! Instead of sampling the values directly, we will estimate them based on sampling $P(z|w)$.\n", "\n", "$P(z|w) \\varpropto P(w|z)P(z)$\n", "\n", "Concretely, we update:\n", "\n", "$$P(z_{ij}=k | z_{-ij}, w)$$\n", "\n", "Where $i$ is a document index, $j$ is a position in document $i$, and $k$ is a topic. $z_{ij}$ is the topic of word $w_{ij}$. All the other assignments $z_{-ij}$ are assumed to be known.\n", "\n", "$$P(z_{ij}=k | z_{-ij}, w) = \\frac{P(z_{ij}=k, z_{-ij}, w)}{P(z_{-ij}, w)} = \\frac{f_1}{g_1} \\frac{f_2}{g_2}$$\n", "\n", "$$g_1 = p(w|z_{-ij})$$\n", "\n", "$$g_2 = p(z_{-ij})$$\n", "\n", "$g_1$, resp. $g_2$ are similar to $f_1$, resp. $f_2$ except that we don't count the assignment $z_{ij}=k$, i.e.\n", "\n", "$$g_1 = k_1 (\\prod_{t\\neq k} \\frac{\\prod_w \\Gamma(n(w,t)+\\beta)}{\\Gamma(n(t)+\\beta W)}) \\frac{(\\prod_{w\\neq w_{ij}} \\Gamma(n(w,k)+\\beta))\\Gamma(n(w_{ij},k)-1 +\\beta)}{\\Gamma(n(k)-1+\\beta W)}$$\n", "\n", "Consequently:\n", "\n", "$$ \\frac{f_1}{g_1} = \\frac{\\Gamma(n(w_{ij},k)+\\beta)}{\\Gamma(n(k)+\\beta W)} \\frac{\\Gamma(n(k)-1+\\beta W)}{\\Gamma(n(w_{ij},k)-1 +\\beta)} = \\frac{n(w_{ij},k)-1+ \\beta}{n(k)-1+\\beta W} $$\n", "\n", "Similarly:\n", "\n", "$$g_2 = k_2 ( \\prod_{d\\neq i} \\frac{\\prod_t \\Gamma(n(d,t)+\\alpha)}{\\Gamma(n(d)+\\alpha T)} ) \\frac{ (\\prod_{t \\neq k} \\Gamma(n(i,t)+\\alpha)) \\Gamma(n(i,k)-1+\\alpha)}{\\Gamma(n(i)-1+\\alpha T)}$$\n", "\n", "Then:\n", "\n", "$$ \\frac{f_2}{g_2} = \\frac{\\Gamma(n(i,k)+\\alpha)}{\\Gamma(n(i)+\\alpha T)} \\frac{\\Gamma(n(i)-1+\\alpha T)}{\\Gamma(n(i,k)-1+\\alpha)} = \\frac{n(i,k)-1+\\alpha}{n(i)-1+\\alpha T} $$\n", "\n", "Finally, the update is:\n", "\n", "$$P(z_{ij}=k | z_{-ij}, w) \\varpropto \\frac{n(w_{ij},k)-1+ \\beta}{n(k)-1+\\beta W}\\frac{n(i,k)-1+\\alpha}{n(i)-1+\\alpha T}$$\n", "\n", "We compute this value for all $k = \\lbrace 1, \\dots, T\\rbrace$, and sample:\n", "\n", "$$z^{(next)}_{ij} \\sim Cat(P'(z_{ij}| z_{-ij}, w))$$\n", "\n", "We use $P'$ to indicate that the parameters of the conditional should be normalized.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Remark on Notation\n", "\n", "This notation is not used in presentations of LDA. Instead, it is often preferred to use a notation closer to the data structures of the algorithm. For instance, let us convert our notation to match that of [Probabilistic Topic Models](http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf). They use two matrices:\n", "\n", "$C^{WT}$ is a $W \\times T$ matrix and an element $C^{WT}_{wk}$ counts the number of times word $w$ has been assigned to topic $k$ excluding the current instance $w_{ij}$. \n", "\n", "$n(w_{ij},k)-1 = C^{WT}_{w_{ij}k}$. \n", "\n", "$n(k)-1 = \\sum_{w} C^{WT}_{wk}$\n", "\n", "$C^{DT}$ is a $D \\times T$ matrix and an element $C^{DT}_{dt}$ counts the number of times topic $t$ has been assigned to a word in document $d$ excluding the current instance $w_{ij}$. \n", "\n", "$n(i,k)-1 = C^{DT}_{ik}$\n", "\n", "$n(i)-1 = \\sum_{k} C^{DT}_{ik}$\n", "\n", "Rewriting the equation for the conditional we obtain the same result as Griffiths and Steyvers:\n", "\n", "$$P(z_{ij}=k | z_{-ij}, w) \\varpropto \\frac{C^{WT}_{w_{ij}k}+ \\beta}{\\sum_{w} C^{WT}_{wk}+\\beta W}\\frac{C^{DT}_{ik}+\\alpha}{\\sum_{k} C^{DT}_{ik}+\\alpha T}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##The Sampling Algorithm\n", "\n", "$z$ = $initialize()$\n", "\n", "for $it =1, \\dots, maxIt$:\n", "\n", "  for $i =1, \\dots, D$:\n", "\n", "    for $j =1, \\dots, L_i$:\n", "\n", "      $C^{WT}, C^{DT} = update(z, i, j)$\n", "\n", "      sample new $z^{(new)}_{ij} \\sim Cat(P'(z_{ij}| z_{-ij}, w))$\n", "\n", "      $C^{WT}, C^{DT} = update(z, z^{(new)}_{ij})$\n", "\n", "  yield $z$\n", "\n", "$initialize$ simply assigns random topics to the words (or sample $\\theta$ and then $z$). $update$ updates the count matrices based on the assignment $z$, and the current instance $i,j$. The update before the sample will decrement $C^{WT}_{w_{ij},z^{(old)}_{ij}}$ and $C^{DT}_{i,z^{(old)}_{ij}}$ and after the sample we will increment $C^{WT}_{w_{ij},z^{(new)}_{ij}}$ and $C^{DT}_{i,z^{(new)}_{ij}}$." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uncollapsing $\\phi$ and $\\theta$\n", "\n", "The sampler returns a set of assignments of topics to the words. However, we are interested in $\\phi$ and possibly $\\theta$, which we have integrated out (or collapsed). We can recover them using:\n", "\n", "$$\\phi'_t(w) = \\frac{C^{WT}_{wt}+ \\beta}{\\sum_{w} C^{WT}_{wt}+\\beta W}$$\n", "\n", "$$\\theta'_d(t) = \\frac{C^{DT}_{dt}+\\alpha}{\\sum_{t} C^{DT}_{dt}+\\alpha T}$$" ] } ], "metadata": {} } ] }