{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Automatic autoencoding variational Bayes for latent dirichlet allocation with PyMC3\n", "\n", "For probabilistic models with latent variables, autoencoding variational Bayes (AEVB; Kingma and Welling, 2014) is an algorithm which allows us to perform inference efficiently for large datasets with an encoder. In AEVB, the encoder is used to infer variational parameters of approximate posterior on latent variables from given samples. By using tunable and flexible encoders such as multilayer perceptrons (MLPs), AEVB approximates complex variational posterior based on mean-field approximation, which does not utilize analytic representations of the true posterior. Combining AEVB with ADVI (Kucukelbir et al., 2015), we can perform posterior inference on almost arbitrary probabilistic models involving continuous latent variables. \n", "\n", "I have implemented AEVB for ADVI with mini-batch on PyMC3. To demonstrate flexibility of this approach, we will apply this to latent dirichlet allocation (LDA; Blei et al., 2003) for modeling documents. In the LDA model, each document is assumed to be generated from a multinomial distribution, whose parameters are treated as latent variables. By using AEVB with an MLP as an encoder, we will fit the LDA model to the 20-newsgroups dataset. \n", "\n", "In this example, extracted topics by AEVB seem to be qualitatively comparable to those with a standard LDA implementation, i.e., online VB implemented on scikit-learn. Unfortunately, the predictive accuracy of unseen words is less than the standard implementation of LDA, it might be due to the mean-field approximation. However, the combination of AEVB and ADVI allows us to quickly apply more complex probabilistic models than LDA to big data with the help of mini-batches. I hope this notebook will attract readers, especially practitioners working on a variety of machine learning tasks, to probabilistic programming and PyMC3. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: THEANO_FLAGS=device=cpu,floatX=float64\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", " from ._conv import register_converters as _register_converters\n" ] } ], "source": [ "%matplotlib inline\n", "import sys, os\n", "# unfortunately I was not able to run it on GPU due to overflow problems\n", "%env THEANO_FLAGS=device=cpu,floatX=float64\n", "import theano\n", "\n", "from collections import OrderedDict\n", "from copy import deepcopy\n", "import numpy as np\n", "from time import time\n", "from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer\n", "from sklearn.datasets import fetch_20newsgroups\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from theano import shared\n", "import theano.tensor as tt\n", "from theano.sandbox.rng_mrg import MRG_RandomStreams\n", "\n", "import pymc3 as pm\n", "from pymc3 import math as pmmath\n", "from pymc3 import Dirichlet\n", "from pymc3.distributions.transforms import t_stick_breaking\n", "plt.style.use('seaborn-darkgrid')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset\n", "Here, we will use the 20-newsgroups dataset. This dataset can be obtained by using functions of scikit-learn. The below code is partially adopted from an example of scikit-learn (http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf_lda.html). We set the number of words in the vocabulary to 1000. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Downloading 20news dataset. This may take a few minutes.\n", "Downloading dataset from https://ndownloader.figshare.com/files/5975967 (14 MB)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Loading dataset...\n", "done in 30.579s.\n", "Extracting tf features for LDA...\n", "done in 3.830s.\n" ] } ], "source": [ "# The number of words in the vocabulary\n", "n_words = 1000\n", "\n", "print(\"Loading dataset...\")\n", "t0 = time()\n", "dataset = fetch_20newsgroups(shuffle=True, random_state=1,\n", " remove=('headers', 'footers', 'quotes'))\n", "data_samples = dataset.data\n", "print(\"done in %0.3fs.\" % (time() - t0))\n", "\n", "# Use tf (raw term count) features for LDA.\n", "print(\"Extracting tf features for LDA...\")\n", "tf_vectorizer = CountVectorizer(max_df=0.95, min_df=2, max_features=n_words,\n", " stop_words='english')\n", "\n", "t0 = time()\n", "tf = tf_vectorizer.fit_transform(data_samples)\n", "feature_names = tf_vectorizer.get_feature_names()\n", "print(\"done in %0.3fs.\" % (time() - t0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each document is represented by 1000-dimensional term-frequency vector. Let's check the data. " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD0CAYAAAC7KMweAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAIABJREFUeJztnXmcJVV96L91l96mu+cyTLMpArIcwKeyuODCquITQY0vz+QRXAgRSTSPBJ/6MGLyEs2iiGjEj2EMLrwYX1QwgCIIQXaUNbIeZNhkmWGGmV7v7b5L1fuj7lL7rbpbd3X/vn/M9K2qc87vLPWr3/mdzbAsC0EQBCE9ZJZbAEEQBCEZorgFQRBShihuQRCElCGKWxAEIWWI4hYEQUgZorgFQRBSRm4QiWzbNtfxnMPx8WHm55d6Kc6KR/K8NpA8r366ye/U1IQRdm/FW9y5XHa5RRg4kue1geR59dOv/K54xS0IgiC4EcUtCIKQMkRxC4IgpAxR3IIgCClDFLcgCELKiDUdUCm1G3A38Dat9SOO66cAnwWqwCVa6019kVIQBEFo0tbiVkrlgX8CSgHXvwycCBwLnKmU2qMfQgqCIAgt4rhKzge+ATznuX4I8JjWeqfWugzcAhzdY/kEYVXw7V8+zWu/dNNyiyGsEiJdJUqpDwHbtNbXKKXO9dyeBGYcv+eA9UHxjI8PdzwRPZvNUCiMdRQ2rUieVx8X3fIkgCuPqz3PQay1PPcrv+183H8IWEqptwKHAd9VSr1La70FmAUmHM9OANNBkXSzxLVQGGN6uthx+DQieV69OPO4VvLsZK3luZv8Tk1NhN6LVNxa62MafyulfgGcVVfaAA8DByqlNgDzwDHYbhVBEAShjyTeZEopdSowrrW+WCl1DnANtq/8Eq31s70WUBAEQXATW3FrrY+r//mI49qVwJU9lkkQBEGIQBbgCIIgpAxR3IIgCClDFLcgDBDL6vhMEUFoIopbEAQhZYjiFoQBIva20AtEcQuCIKQMUdyCIAgpQxS3IAwQGZsUeoEobkEQhJQhilsQBogY3EIvEMUtCIKQMkRxC4IgpAxR3IIwSGR0UugBorgFQRBShihuQRggYm8LvUAUtyAIQspoe5CCUioLbAIUUANO11pvdtw/BzgD2Fa/9BGtte6DrIIgCALxTsA5BUBr/Sal1HHABcC7HfePAD6gtb679+IJwupCxiaFXtDWVaK1/jFwZv3nPsBWzyNHAucqpW5RSp3bY/kEQRAED7HOnNRaV5VS3wF+B/hdz+3vAxcBs8DlSqmTtdZXOR8YHx8ml8t2JGA2m6FQGOsobFqRPK9e1q8fZThvvwtrJc9O1lqe+5XfJIcFf1Ap9Sngl0qpQ7XWC0opA7hQaz0DoJT6CXA44FLc8/NLHQtYKIwxPV3sOHwakTyvXqZnSgzn7I7uWsmzk7WW527yOzU1EXovzuDk+4GXaq3/DigCJvYgJcAk8IBS6hBgATgBuKQjKQVBEIRYxJkOeBlwuFLqJuAa4M+A9yqlzqxb2p8GbgBuBh7UWv+0b9IKQsqRMyeFXtDW4tZaLwDvi7h/KXBpL4USBEEQwpEFOIIgCClDFLcgCELKEMUtCIKQMkRxC8IAkaFJoReI4hYEQUgZorgFQRBShihuQRggMo1b6AWiuAVBEFKGKG5BGCCWDE8KPUAUtyAIQsoQxS0IgpAyRHELwgCRwUmhF4jiFgRBSBmiuAVBEFKGKG5BEISUIYpbEAQhZYjiFoQBIoOTQi+Ic+ZkFtgEKOyzJk/XWm923D8F+CxQBS7RWm/qk6yCIAgC8SzuUwC01m/CVtAXNG4opfLAl4ETgWOBM5VSe/RBTkFYFcjKSaEXtFXcWusfA2fWf+4DbHXcPgR4TGu9U2tdBm4Bju65lILQY27feisPTz9E7rlfkXvqBl7Y9m1Ms7TcYq0qlkyTS7btpLIG/EPff3GG7ZXqwNJr6yoB0FpXlVLfAX4H+F3HrUlgxvF7DljvDT8+Pkwul+1IwGw2Q6Ew1lHYtCJ57j9/8dNPAHD/E0+zdeMQWw6dJGNs54ADPtvXdNevH2P9aB5Y/fV84dNb+fLWHew6PsIH99wVWJ15fqK0xOef3861C0Uue9X+rnv9ym8sxQ2gtf6gUupTwC+VUofWT3+fBSYcj00A096w8/NLHQtYKIwxPV3sOHwakTwPFjNrAFAqTfddhunpItaSrbhXez2/WLTf+x3zi818rsY871wsA7B9qeLLWzf5nZqaCL0XZ3Dy/cBLtdZ/BxQBE3uQEuBh4ECl1AZgHjgGOL8jKQVBEIRYxBmcvAw4XCl1E3AN8GfAe5VSZ2qtK8A59eu3Y88qebZv0gpCyln93l5hELS1uOsukfdF3L8SuLKXQgmCIAjhyAIcQRCElCGKWxAGifhKhB4gilsQBCFliOIWhAEiKyeFXiCKWxAEIWWI4hYEQUgZorgFYYCIo0ToBaK4BUEQUoYobkEYIGtgozxhAIjiFgRBSBmiuAVBEFKGKG5BGCDiKRF6gShuQRCElCGKWxAGiYxOCj1AFLcgCELKEMUtrHnEBu4va6V8B9mZEsUtCANkrSgxob9EnoCjlMoDlwD7AsPA57TWVzjunwOcAWyrX/qI1lr3R1RB6A/Gcguwylkr5WsMMKPtji47DXhRa/1+pdSuwL3AFY77RwAf0Frf3S8BBWE1IWOTQi9op7h/APzQ8bvquX8kcK5Sag/gJ/WT4AVBEIQ+Eqm4tdbzAEqpCWwF/hnPI98HLgJmgcuVUidrra/yxjM+Pkwul+1IwGw2Q6Ew1lHYtCJ5Xh7yQ7m+yzC5fpTC5AiwMvLcT4Z35gEYHR1q5nM15nmiaA8VZjP+vPUrv21PeVdK7Q1cDnxda/09x3UDuFBrPVP//RPgcMCnuOfnlzoWsFAYY3q62HH4NCJ5Xh4q5WrfZZiZKTFimsDKyHM/WVqqAFAqlZv5XI15nlssA1AzTV/eusnv1NRE6L12g5O7A9cCH9NaX++5PQk8oJQ6BFgATsAeyBQEQRD6SDuL+9PALsB5Sqnz6tc2Aeu01hcrpT4N3AAsAddrrX/aP1EFQRAEaO/jPhs4O+L+pcClvRZKEFYrlkwrEXqALMARBEFIGaK4BUEQUoYobkEQhJQhilsQBCFliOIWhAEiQ5NCLxDFLQiCkDJEcQvCAJHZgEIvEMUtrHlEl/aXtVK+cpCCIAiCEIoobkEYINaasT+FfiKKWxAEoQuW41MsiltY8wzyaK21ODgpR5f1HlHcgiAIXSAWtyAIgtAWUdyCIAhdsBwDzqK4BUEQUka7o8vy2MeR7QsMA5/TWl/huH8K8Fns098v0Vpv6p+ogpB+1uLg5GpnJfq4TwNe1FofDbwD+FrjRl2pfxk4ETgWOFMptUe/BBUEQRBs2inuHwDnOX5XHX8fAjymtd6ptS4DtwBH91g+IQCrWsUqlfoUt4lVNV3XarW55EduWSZGeb6HkjkFqkClP/mvVqtUq1Xf9fJilbnKEptnnkkcZ7lcxohplxnlebDM9g+ucipLMyxVFgaaZq0211G4FWdxa63ntdZzSqkJ4IfAZxy3J4EZx+85YH3vRRS8zH7yz3nxxGP7Enf5Gw9Q/sqvm78XFx/nwYeOZufOHyeKZ91tn2fjpoOh3PuXb/0Vv8/UxQf2PF6Ab37zIjZt+prr2vOPTvPjz9/LqQ/cxnt+u8Rzs0/Gjq9arbJp0z/y2tzTQPRLbizNsnHTway74wsdSL7ymJmx1cPs7EybJ/187Oq38I6fv63XIoWyc/pqHnzoaEqlR5IHXgbN3e6Ud5RSewOXA1/XWn/PcWsWmHD8ngCmg+IYHx8ml8t2JGA2m6FQGOsobFppl+ftd/4SoC/lsnXJdMW9bfuzACwu3k6h8Aex48n9xlb0hdEaTLSXM0k955/rT/7zQzlqtaovbv38FgCezL8MgHlzB4XCobHiLNV7RvtnX+RX1X2YnBxpxu3L8/Q2AEY3X0H+HX/dXWZWAKXSAuTWsbi4EJ7nEH4zNAT0p40HsWWr3aaMzJMUCkckCjuRs1feZDP+vPVLf7UbnNwduBb4mNb6es/th4EDlVIbgHngGOD8oHjm55c6FrBQGGN6uthx+DQSN8/9LJdG3MUFu+7KlVqi9DZYFllgdraEWWsfLkk9T3lk7BWVcstF4ox7cbHiem6hWI6d9uLiouv3zEyJ6az9onvznJktsStgmtaqaPO1mgk5qDjaTtL3eVDl0Kj7YoK6bTBbst+Rmmn6wnajv6amJkLvtbO4Pw3sApynlGr4ujcB67TWFyulzgGuwXa5XKK1frYjCQUhRcjEkGRIefWeSMWttT4bODvi/pXAlb0WShCE9DPIvTuWkxU3OCkIgp9uXlSxPoVeIIpbEBIiyjcudZN7lReYWNyCsAwM9MVb5UosiLVyeIQcXSYIK5i1oYaEuIjFLQiCILRFFLcgJMTq4kwXsdZXI7KtqyAMnKRqWJRvPIw1c2iZjRxdJgirBsvzaw2q/VW+l+1yZE8Ud4pJvGOf0Buk2IVlRhR3mhHFvSwkKfU1/XFdI54SmVUiJGMtK4VlpKuVk2uwytake6jPiOJOMwPVAvLyNTHloIPVS/J2Lha3kIxUmG9pkLF/eKtoLZVGx56SVLTr5UUUd5oZqOXX4Wu4Cl9CK5PktVl9+U9M4iJYzjJL3s7F4haSkQqlmAYZk5FscLJvYqx8Op3YvKYLLR6iuNNMGhp4CkRMjJlIdUf+FIJIVyGJxS0kwhroaeDpepn6iSllkYh0TYlMh6xtDwsGUEq9HvgHrfVxnuvnAGcA2+qXPqK11j2VUAgnFS9EGmQcHJFT41JRnwNAyqEtcU55/yTwfmAh4PYRwAe01nf3WjAhBom67N0i/soGFvF7OsmszdVXVp2RssHJZWjjcVwlm4H3htw7EjhXKXWLUurc3oklxGKgrpJOWfnKKKmEZr9m86zCjxyAlVgXprMcBll9bS1urfWPlFL7htz+PnARMAtcrpQ6WWt9lfeh8fFhcrlsRwJmsxkKhbGOwqaVdnneXv9//eQo2R6Xzdb6/430K9VhAIby2UT1kKnPKJicHIEY4Tqp5163i/xQ63Vwxj0yknc9Nzyaj512JlN1/R4fH2mG9eXZHK2HMVZFm8/X3/mcI5+x6rna0hWDKoctW+26HxsbSpzmOsPW2EF565f+iuXjDkIpZQAXaq1n6r9/AhwO+BT3/PxSxwIWCmNMTxc7Dp9G4uZ5ZucCGYb6IkMj/eKCXXflSjVRPWywLLLA7GwJk/bhktTzlEfGXlEpt5SsM+7FxYrruVJpKXbac3NFz+9Fpkft186b5+xskQ2AaVqros1XayZkoVqtNfMTq56ri80/B1UOjbovFuPXbYP5BVveWs30he1Gf01NTYTe61hxA5PAA0qpQ7D93ycAl3QRn5CYFHQpV2n3v1PWYmkkbwLpKqXlkDax4lZKnQqMa60vVkp9GrgBWAKu11r/tNcCChGkYeXkCnwJux1MMhMMCicbm1x5ZbUsLGsxpGNLw1iKW2v9JHBU/e/vOa5fClzaF8mE9qThPV+FyijZbncyqyQ56SoHWYAjJGKwC3A6w1iBL6FX8Sa1scwE0wEDEl9DpMN67RVydJkQj4HO4xb6zirrnXSsyFJWDmJxCwlJVwNfKXS7sX8SvSILcCBpvlZiL22lIYo7zaRhQ/9+Wk/LZJl1s1eJLHmPQ8rKQQ4LFhKRihe9nzJ2GHe35ZYgvFjcJM9WKtr18iKKO83I0WXLQjeuFinFlY4cXSb0mzRYJivQVbJyS23lStYZ9uhkOlRhuhDFnWbW/AKc5XnB++b+SMOHuCMS5mtZy6GTo8tW5u6AwgolFRvUr0iLu8uVkwnC+w4LTkGV9YrO5zWvoULqEFHcaSYVWiANMiYjFR9MYWCIj1tIxpo/uqxDi7vbvUq6WMYeGXK1fhBSla10CCuKO82koo2tRB93AmXaJUl08WpdeJJ8c8B0lUND2kGKLYo7zaRhcHIF+rg7SsqVbL/STZfCas9KHNBuRzr2VxHFnWZSZpn0nuWZDphEcfuejQq75uuzTsrKQXzcQjLS0MDTIGMMkp+bKLRmlayONrCSEMWdZtb64GSnK947WRLiUNxmF+UenfIKLOMekK4TcDpIW/YqEZKQhmlp/R1wWx4fd99CpqA+k9BxdlJWDivWVaKUer1S6hcB109RSt2plLpdKfXhnksnRDPQ/bhX4EBTpwtwvOFiZM1pcSex2JOJmC6F1Z7O8rO8s2vS4RNrq7iVUp8EvgmMeK7ngS8DJwLHAmcqpfboh5BCCGmwTPoo43K94MkcJZ6ph1Eip6E+B0KrHNLQq1wOCeOcObkZeC/+syUPAR7TWu8EUErdAhwN/KCnEgJ3L5T41vZp/nKvKTZk4Be/uI6jjnoz+VKWzz++hTM2FthrW5nc63cHoHrbFjJ7j5PZe5zqr7aS2W0UY+9xqtc9Q+6Ne2BMDDXjHnr8arKzz5DdoSm+9hxqY7uz8OUvUnrLi0xaMHzEF3nh8Ud59qH7OPzk92EtLTF/wRdYd+YfY+5S4KxbT2fn/DP83m5v4fjn/oD/m3mUDQc+wwcPOoPafdvJFTUjw/dSOvJjWEs1qjc8Q+74l2IMZwHQ9/0afcc93HHwEXwss4Hdj96Tb/7bJg6+82D22sui8PZDyOy1rinvYqXWKhiHr7X2yE4o18i+aiPbSi/wnd/8M2c9ui/53fZg+JjjXeU5veVZHrr+Ksxajde89/2MjE8AcNXt97FH9mleWXsZAGbN5K7rnueagz7EGY//gsqOe6i++jDO089Rna9w2g3/yJEnnsIN5XE2z8zx4bedQP6ZW8lvubuZ1tgvv8jsKZdiVU27/I/eE2sUfvvbc1m//m0UCm8PrfcvfOUiyrVRTs++jMvecA8f2H4Qw1bL1rj818/zntccGBq+Vlvguef+nr32+gTZ7KTjjsX+M/szulDlS8Uhjspt45/4KO954Ql+cdBhHP2b/2RubpaJiUkeu+tZ7sn/X+583euADXabuftOqtu2c+f247hw+Fvssf8Z/NVTGQpv2ZutT8zx8OZLKc08zHHrT2bbBlu+YaNGbdch/r1U5JW0ZPnnhzULC0v8SXEjuYPsa1dOH8E1l/6QI169G49umefPC0fATIXMjs08//wLFN5/POO7TvFQaYmrZ+Y4Z/ddKT/+GDdddimvfelx5Hdqho4/lvwhhwLw1PyTnHHTaZx3+F/z8hcP4xvPb2OfzbdwwMnv4KHb7uDEa2/lhZPnOSD7OnZ7ze+T2XMdDz71IBfe/Z/8yZH7MP8v17JjZA/uHS/wsd1fy4bj9+b/vLiTn173BJ954248Uv1X/tvS7/LYPQ/y0gNfx35P/ZrxN7wUyANglLeF1hGA+dwC5uOz5N68J7X7X8SslcCCV+14Fdu3b2NqajduvfeLPLTlt2zc+8957MbPcVLmNbzij/+Y6VKFC298nE+95QBG81ksy+KyH/yAB3bZwMhL9mb4N5tZLOXZ7eBXcOD+69m8VOH1tSw/f2Qbf3rMfsy+sMjjd73ALgct8T3ez7uKWd64S0u20o/+jc3rTb60C3y0OMLY+M848L/8Dfn8VPOZoYe+D+uOZXqxyF0PfZLD1Xnce9X1/PzJp/nbPz0TMiMBue6Otopba/0jpdS+AbcmgRnH7zlgfVAc4+PD5HLZjgTMZjN8e3qWG+eK/DYLlS2/5ZFHHsQwTMZ27M9lB+d44pkX2HRniY1v3w+ArbdvoXY77P43b2Trzc9TA9afejAzD+yAikXhtEOa8eevbnl4hsvbWXjlX/Liv1/G828v8zxwTGGM717wlwAcf9qHmL3yOpZ+eiV5w+TJs9/D43ObAfjZU/fx7idO5qThGn9s/DNnv+5P2Xr9M1RZx8aRv2f4LZ9k/vqnKT+4k/xu6xg/wVaO19/6cx7fdU+u2biO2a3zfKL6OKfc/wq7f7MDKv/6G3b/mzc2ZfzhPc9wWKNc1w0xWhiz8/yT+wDY9ZiX8dn7zueW527hQ1+psgjsfv8DrjL98V+fz+wLW+w41k9w3Ac/AsBT91zPU3l4Ze1lFApj7HhkB5/Z/7XMG3neMXcnu372LO649kauqS3BKFx/0ke55ZP/g7O/8K8wtoFPFMbIX/R7AFjr7I/o8NM3UCiMUbr3BWYf3EEml2XonXlmZq9jbv5m9t33d5r1XKjnBcA0TS59y38F4BPXzHHwfbtSufRvqQD8vv3MV27czIfe+urQtvP0099h5/SVjE/sxX77frx5fagCh+2ol+JTd/G5Nx/JvcYJ3DQM7An7bX+em2++nlNP/QMeu+9HXHDU+1zxZp+e4ZbNI7xQ2cELJ+zHQxWTgxZKfHzrEjd951EOft83YS8o/smd3PvWf4CNdrjKa6b4f6UFzq/nM5vNcGEtByM5PnLjC4wfYRsUZ51wDgCXA+wFZ19j15VJgSkK3P1v3+Okc8/l/Q8+Ttmy+IuDXsojX/gcj77qVcxve5K3PrLAzL9+iAPq9f6Bmz6Jicn/ufcznHX7V7ji9zbAnsfBdAkOfTVvem6Iyf0/wwv8kl2+dyS7/80b+epPnuCOg17PXv/xL+y2fhIoMlUpMvf404zmcly2Tw1et5Ev3vMNhna9lY2Pr2fRKLN4y+4cOP4SFn9aJnfSMNRgdJdfUSj898B6Btj6Jbvtbjx5f7Zea/89dNAQB84eyBVX/ICPf/wTnPf85QAs3f0t/v379wP3Uzj343z1lof4yYNbOXyfXfjgG/bFLBb5q0OPrDciYH9VT6UET5UAGP/5c1RNi//9zkP56QX3szC9xKsyT/CTvc/iZztMnnxFS77tF57P1/7uY9w1/CauW/wWJ+XuYPv2L3PIIRc2n5l+6mdw6LGY1naGatcyN38Qp+9/POz/Sv7XRV9j6rzPhrbRTol1ynsIs8CE4/cEMB304Pz8UseJFApjlMq2lTm/sMTCgh1XpVLDqi9AaXilpqeLrrDO341w1WrNdX3K8XylajI/vxgax/R0kcWFcj19MzBfDf9nkCzVxQoAi4tVqvX7zm6WAb70vXEVi+Xm3/NziywFpFNxWuUBspiOhTtLS1Xf/UaYBUf+Gm6JYqnse9YZplGepmWRdVyv1eUuV6oszlbrciw10y4UxlxymNWqK+6g7qgRkDcnpUU7zaVFdx6L1QVfPO7flq+dtMMAFor+sgmS25lnJ3NzS3WbPpqqaTI9XWy2tdnpkmMtltFMtJFOLeFCrenpYtNFYQaMo9Sq9fbVxkdQqZiQcdeTt5696XqxrPA6np4usrRkt5PFUsWWu1SKFsoh9sxMsfku1Bz5DK33+iPlitl6xunKqf+9tNhqu2ZUfG2YmpoIvdeN4n4YOFAptQGYB44Bzu8ivrbI2bge0rByMoA4p9NbtVrbZ9KxRDyBjB028OXYVnQ10BP3uRW9c02/aiax4lZKnQqMa60vVkqdA1yDPch5idb62V4L6ESap4fUFkgcwfuXua5PLsusrJkHg24GfVvw3wtNmmRVq+vv/syaWlbFrbV+Ejiq/vf3HNevBK7si2QBpOBo3IFgYpDBimW59o5eNsHeWNzL9eUyjT4l3aHiGvR7ESWlW/0Nvn6STdV0zF7pdPNwy4wMa3a+KXkkKVuAs7wm5oqbmrRMmywNhBhuoM5tJHdukr9bfbK4O63PAVdO35LrietiZVnc/SJVinvZfdwrRHE3pRioPF0oK4+csXzcsRT3yqiPXmGkxeKOaApdWdy9qM4ESsJ0Wtwd735pEvVuiMXN8rtKVozF3WgMAxycNLtpKj5FvdyDk93VY6KX0ej/kvdkBzvEJGoTw7hRJM2P43mjqQyTxhH/nXBNCEmWSnAkAZh96p2lSnEvv9ocgAQxGvuyWNxdtb8ONnXqY958rpKkEaysscmBvxdRqjGoaAYqX6Itdx1/d7HffDNsn6zrINKluOMotX6+8IPw1cRIotlQBurj7sZV4n7VrTgWtxnH4l4eOh7Iakfc9uXxVfSjGUQN8vVtcNLxuNHpProduko6dWkYbdqyuEpIl8U9ELdKagYnPaHjyB3jBezUVbJypwPGE8yb+sCnA0ZlP4lryB9zF2GTx9Flv6seSfQ8ln7VTaoUdyzvVR9bcaKTTzrfLLo9jTY2QB93Ly3uWD7uFWxx943Y1qI7533xcUcQ7SpxytaFxd1p7SZ4J5zvc8e+6DY+9X71zlKluGM1g163YefI8yAUdwL6b9U7R927GZzsJEz/ZpV0WzfJPmIDaAd9TyF+egMeCukqjp5Y3FiR7UEsbmJ2cXuuzJwjGP13lcTpaZqD8nE7s96N5eD1ccda8r7cc4jCsTJGf97ImHH6XCUrah63FfJ3nIgds0o69HEnWZTm7OB0PGtKZpW0J1aXcIVY3B0fbxUniw0l2ueVk91Ml3LOSfYP4PTGVdJ5ZXfZSPrU/Y3fvLyuksES1+JuvS8xy2vA87hdBd6xwW1Si2gP/TqrNFWKe1kGJ53KMeLN8vvkVsYMlO7i72aBQrxyCw8eZ3ByeejWbxlqAMQsp+UenDSjFuB0NTjpiGcAKxmdTayWsr1K0qW425SCZVld9xv9L1U8i9vb0DoeMEoSrM+Dk65vVtKG7QqcfDqg2ceVk956TP7Kxg8RJGFnE+3C0+9+rCNZ+MinXaJZ9X8HaXEnGJx0/epwOqBlRk75k+mAJK/Xjhq0T2+3GkKk4vbNre1gChzxmk/rRVjBPm5Xfjsoi4FuoJWMbru/obkfwLc+bgydzuPulRwdW9wdjkN1vjJYLO5QGo3I+SpbVuu3heOPMH3RqKR2Jen9YgdslO77G8jUG1qjwfmsSpcicodtKALLH60fy6FEQ9wJbRtLzNZkmWbzS5LcVeIpqyhFHhTaMzgZlHpci9urhPxKKWHeHB+xVn0TUnn+a6GfpLgfd09TNl2/rOS+lCj3RsAH23lmgzctI+AFjK28AppI0Ccl6FdLB8QxCur6xDWG46jHMJp5dQY0fW4WZ/vq18BxKhR3g7bTuLwKopNCs0xPOs5KCLcCfa4Sn8Udz4KMs7lS0jh7waBdJYOdx53QVWBk3MYQvGACAAAbqElEQVRC0tTCwnR8an1HwZwx9OzpXk0HHMg8bue73bGLO3o64Jp2lTStGmfFGq1G4voQulqVJ4Dr4RAs9zNGmHL0VEjL8vLZAAG/PWGteKJ5Hwpz3bSNJ25b6uYb6LWwXeUV3yqKChHX4vYqAZ+93ZXmM1r/Br2kAZdCDZDYLm5Hms1gjrehyw6FW6YAm9eZsL8wA8LG9XHHKQDve+e5GmdbDMOvTxrrFCIlbT7vegl91r6zvckCHPxdTN/odpSrJC6W6ak9x1c54mvuFaVziztGw2uk1u+9UxzxW0b7puIcUHR+8Hwfv5Rv62oZLeXYyWsZlrO4H5CeDYS3Eo79aJCujn46AT2wuBMtkuuyc24TPTjZL8Xd9gQcpVQG+DrwamAJ+COt9WOO+18F3oR9yjvAu7XWM76IekByq6+TROIrXKflZFiZ0HtJhLG8Jn/gM40/+qu4LIcocVKyQv3YnfgT+ukGcsuTdAZb3wYnly2+ZDEk3x2wX/PeA+SO80407Z7Ws50veY9Or19vaJyjy94DjGit36CUOgr4EvBux/0jgLdrrbf3Q0AnXmXoa0AdDIL5MC13PyRyAY5DcftEiRjkjMC2NKOt234vwAny8cd5+dxT+NwDOO74Y66czEY/M8BdNF3Y5e/rpIc97b8S1hRi9qB8rh+ru4+Jlcjibves431JbHE7DaEYGQpU3J1NB+zmIIUopb+cKyffDPwMQGt9B/Caxo26NX4gcLFS6lal1B/2Rco6PmM4+TvTPo0IS9mruJ1fbL8f1StsH769fXaVON02sSxu14+IfmjPpgN2OpjnDud9Cdq9xF0vwOkqdFh8ncsUJE+U0o1cgBMz/p4QqLg7C97pXjxt3XV9Mi7iWNyTgNP1UVNK5bTWVWAd8I/ABdj20Q1Kqbu01r92RjA+Pkwu18Z8CiGbzZDL2YU6PDrEuvIwAPl8Djx+1/WTo1hlk+2O39vqf68bG2IGyOezFApjgWnlc1nGx4aYdhT2+onh5t+TE8Owboh5YGgoy9i6fPNexuMqmZgcwTkvorB+mPmRPAvAyEiecYcMzg/Q6Ki/Spzyjo0NOf7OM1m/t9XxbD7vLmtvfjOZlqzDw3nffQuL9etHWRzJQ6VxzRZybHTI9ayzYU46ysqpeCcnhymNDTELDA3lyKxrxdFIO5vNuORYGm2lHYaBFVqXALOzdjrDI+48lheG3Q/6DAIjsp24ZHBU3rqxoYgnW0ysH2V9Lks2624zo96yDSGXtWWzjQWLdRMjLAZsNdss20y7Hpy7AFpxg5HxGySZXBYInvXjXDmZzbZ6JWH1DK22OzExyovNEPX0DX/7bco5OcLwsP2+jI4NUSiMsbRuCIrlsKw24wRbLzXfBUcZOdMLciMMDTnaRm24ZVXXIx4ZaekFyzBitaOkxFHcs8CE43emrrQBisBXtNZFAKXUf2D7wl2Ke35+qWMBC4UxqlVbCRRLZRYW7LgqlarvizszXYSK6f5dpxWuxrTj+pQjfKVaY36u5DLBZmaKrr/zC3ajKJdrkfmanllwFdrM9AKVRVsTLS5WqDpkcGq/hflFYMQdl+PZYrHcVKLF+UVMVzz2s5VKLTQ8uF0aS0sV3/1GmGKxDPU22BicLJY8L4WjCnbudOa5dWN2ZoFK0Za5XK5SXCj5ZCsUxlxyzM+VYGTcJ1eQnGGUFm1ZlxbdeZwplVzPBRlF3nbiJMzibrSxdkxPF7Fy/g9DsVj21HwwtZppx1Ev45m5UmDnqyF/rc1Ar1dxO+P2DpZbQLUaZx8ZqNVqzXdp584FjLoSCyvX2ZmS75plhdfx9M4FlpZsVVQqlpmeLlKdLdHOx2bVP3gzc4vNd6Hm0CVRbQrsd7/xTHam6OuBLC62LI5ajPjCmJqaCL0Xp39wK3ASQN3Hfb/j3kHALUqprFIqj+1WuacjKWPg63G3m1USFTg0kYjBSc8t0+GrzfhcJW3iDXk2lksizsLJLlwzjZBuV0n7Pp/LBRGxWClOLuPs8tZpL9Q748fr4m2bV5fidk4Ni5d+6GNx68zb8K3u3BGBYRsXoxbgxI4/+XTAWLNKuhycdLbXzrctbjeZYJlmlQCXA29TSt1Wl+J0pdQ5wGNa6yuUUv8C3IHdsf2u1vrBvkiK/cK5C9tDlH6I27IdqwXB7cPyK5MIH3fEs16c72G849Ea81DjT09MghWgCWJ9UJyWXcQCnHgn4Kzk6YBhN+Jd7HrlZFB8fRqc9L9SVpsTcCIiSjImFavBdTk4GTEMY9+PNxYTlWK/5ka1VdxaaxM4y3P5Ecf9LwBf6LFcgcSy20JrI8FL7lorEmFxR4yCJ5tVknBxSpw4e6HTXEUZY1aJ60MW/Lcv4rCk+3l0meej4Ou4tRl8dM5pb9PPCg4f9lyHdRbV0YwTb1R+g+o9bktutE8rhgz+eDqzuGMp26bF7QgXlJ63nQSK1OYghT4NTqZiAU7T30arYiyrNbrdLH/LaykGKPF29eqxuJ0BbGVsNdNyzeNurqCrW8NexRO1V4nj/7YNz/LkN/gRMpFulLZJ+LRBvAU4YQrJcskaZzqg13IKav8G8V5U/wwf7wPeOfhRD3svG62fPldaSP2EuvOSabdWu/GYqp7CirYJiZzI7lXqFpZ7QaS3YgyPLM0/41ivcVwfzk3fTNe7Y9+On457ybujHkPSbilhd758zcnV1mXlJKblLhT/PG7Cu1sJXCWur6R3HrfT7+tcHehdzZZgtaDz5WiniOyuamMFQYSrpAuLO2hgKpaPO0wey/TURbITcKKy0smMSL8i8ygnp6ssqCCdyiyqjYUUWbfdZ2/vzvR7tly0a1NmlOIO+OYlXYATuAw/MDFniLCYHA8FVX6HJ+AEtu8YH5vAbV2j3Lk9IhWK27UDW+Oayw/dItw7YvkfDkjJ8kXoeTMbvw13R701XtjoEbirLPhpb+re9PzYnYqAvp4nnjbZbP0Z2k22v4JxxkFbsoWakjFCe3Er7qAYjHCng+c5/2Ce+wGf2RgZn603LdeTAfYalhHszAmVOa7FHbRXieGQxKdsOyl/p1ntic2ZsC9qp9LqgdryJG949kIxvI/FKcOMv7YsK2CvEsvyX/NdafdRFIsbE/fgpH+vEneXPPlmBFaAq8ShQEzL9UV3Wdze7naSpfOOyvXtceLBFUuEjy+uxR2kbJuDkw5LLt7KyTC3gtvijnWQQi2inusYnvYQF5/F7R2fwFEuRoCsYSsnY1rc3X7fggYnI3eoa2OFBlnczXoP+OZF+W2DblkYiV0lYSsnXZ/CoHwlGtwPamP++JtXAubK0+YgBXGV1Gk7q6Rbo8+ywgcncbtKIqcDdjAFDmi/0VDkNDtnPLGSC06i8UfASxD1XkRb3An9VnE35erGmGzidZU4XuyAggxdOej9VmeMwPDhlmjwdf/VgAHDrmaVRN31f9Q6PWOmN48Ht/9EzomGp9FlTASowoA57H5xgtw1DmlkcLJRjgmqyN0XipmYV3E70rM8CihSicWfVeL8KrczGEyLaB93w1Dstpfq6byYQYM33iBhytY7aBxnQNGRt9Dd9LDa9lCC8FqgoTMYQqNu2X2usF5Zkvq4w4ov7PlGsHa9tDYfwajpgP6QbRwggXHF9HG7QoRZ3A6JgrZkiDGNtNmbaKdgffUZYnEHuJOa4izjPO5lx3L87zI427lKOkrMHd49j9vyKBSnxd3mGxgll8tfHy2/89MRbAH4okxM89V0JWDHGPVaRB6C67qVzOLul9XSjD/A4m40JSPUVRIj3qCuNcG5j9qatZ3FHeMzGPlsLTKGADdSoqcbf8ap8/aPhPXcmm0vvqek7XRAnyESOh0wXMQ1bXE3PqxOH7dlxTm6LMA6blexlhm6O6Drw+CdDthUmHUF5/vyR00HbFmz7Q7JtfMfZXHXN4aNNI0jk3DF1aB1Jl944FrIy2lgerq1cfbjdiuboE6QQczZX97y9gUK9szGc12ETwdM4uOuhVwH/8vfcKc1HvfuveifDtimkJIuwGmGIyCP/o9EYNkE4aiXoMNTwPMhNR3TAZuvZZyppu7ys/82XP/7HsBpPbt1gn9WiXNgfQ37uN2V41AmQT2U4A9yfGVlelaGeQ8L9rlObHwrJyPncXtuxR+kdiuqoDgbSq2LjodVL8ig6YCRPu5eTgd0HF0WNTjZiauknYa1nEojyEftOnMyPFozxOIOUqSWYYWOb/hLyzueEm3ZWW16L5HTAQMUT0c+7qSKO8bgpGtwsfF3rKPLGm3Z375d+Czu4Gd8HzeZDmjj/HI7ZuM1cX8HwxpLHB9C0Oh3iMVtuCejeY8ui97WNXj6mYG70oOwgHZ7lRi0UdzOsgvr9tcLu3nX8FspviBhNwMWKbTF0/iDqzJerG3PA/XNOXOmE6y4m0o2SDBXPAHhA2SMdEH5psS12osdnxV7OmBQ2lH9Cp8bCSKnAwbZIPH3KvFf8jZP96ySDu3ZgA+q1XwHPe87jjz5/6hPSTU8zzvGrNbydMDGi2bi+cp6H/T4Ut2bHsVJybKtkx5Y3P6Vf+Hmv/tOG8VtORpKiKvETi8eodMBwe591K81XCVR0oUOglnJXSVmTB93pMUdNVjqwrty0rDL2bKCF+DgVEZGS0avwgy1uP3UDCu0cNs13fbTAR1lH2Q0BiyzDVW6RpLBSb8SjCTqfQqK3znDq9lQkyzAcfbeg7oibmMlcKAx0OKOt3isG1KhuMPVXOSD8e95nwubDmhZrobhGpz07VUSYXF7bzm73m3anfvbFJ6pXkwHdLt7/N1Lv2yhJneb34GRhT/t6DxF64Mwpdvm42jYz3hdc877gSklaWOBl0Lk9bkEg4YAO2/8kU3Oa48QMR0yNNWYAcI9fyEPBRhRcT4QDbvH9V2J4yoJiszCv9dL9IeyF6RKcdu7A9avWVAz3Pd9P1x/B3VtA/AtwHFG4bC4Le/ij0ZXy2g96wpstk3b7nm2WSzhGpwM1gAWPRictHC9BA1rIzLauLNK4kwHdCx5N43gLriBFRlVoyx9R955rDLL8r4GDndDmyXvrlW9HmGSWNymEXIj4LJ3cNIuXof/wjs42c7ijrAW/K6S6L1KgtxXcT9qgT1YTzjDcOwFbgbsVZLEsncq7uZ0V8P/XCtxb2pgWb6DFJwKfzmPLlt2TMf/0QtwLHdhJ3aVgPeUd8PjKnGn3/rbtwDH22VzugsiLO5243authQ4OGk/ELnJVBua83qDBicjwsVdORlreCtqZN5q+SMjZ0xYvtc64Bc+R6r9Aa27SgIVkfN5+xUyA10lwWIFyVwLchg34vG9+x6fPdFbrTqNgeDByaiwQf7gKEJG2uMoVLP9+xrkg7aDNtpsjLbVsHvwt+9QeYjvKkFcJTZufWf/G7hXSXjbb11uOzjpbdweE945OBlgcVsuKUPi8VpmnhiisGjJF3zKdYyOaazBSbswmy9KDMMhzOI2nOWG3wJuF1dYicZ2lUS5rZoxOW4bRqstBc4qCU7J/0Eg6Gp47tu13Ua0DQPPeT9qcLLNLIckc8hd+3EHZtqRVvO99cvULrGmxe0ta2fvwDE42fpGx0iouTK2damhlF2zVhq6piWU9w/HU06xnB/KNWxxB23rCkmnA8b89kUNTpqW64tuRfm4Ay3uALlwf+3b7uTmam3BituWJzKaSFq7AzqSijEdMNT/7ZsOGMf6crtKXNG5voFRSsd0/R+efJBVGe7jdj/vHJz0fJDDFuA0jcOWXLUAi71BYDv3xOe65HWVOJVRgCKJWjUYNKvEVZo+xRpxrx2xesjuedzeoPH2rvEr7jj7cYdNGfSPQfTfVdJ25WT9JPevY58luQT8kdb6Mcf9DwMfAarA57TWV/VaSOfHNLpuPZq7E+VlWW4ry+2bcEXqvOV/t8KVqg/Xh6K90C3xIp6N+50KnFUSJEsXPu4Auy2JXCGGL/Fs96BQ7TaZcvSm2m3rGquf5E3f/X/j77il1/5+G02f6G7EFyUgdLAtasT8WDvjCVZ43t0BW2VZ/9AkGJx0T5MMG3F2Je6/YVk09jlp3XV+KNuL0wlxLO73ACNa6zcA/xv4UuOGUmoP4H8CbwLeDvydUmo4MJYuaPm4PYqzrcXdgRI3rejpgCGbTPkamsc0NVw+bo9l5rK424nnyGTEdMBuLO66UK7pgC0fd3jE4dMBPV/cOD5u0+2XDfJe2QtwIuJoyOPbN8b70+sqqf9veRuU+779d71cAn3c0YOTzpWmZoTfxyufb166N2mPVd1uaqUZ0ViCDlIwgyoj6HdY7zc0sdZD4SfgBI85NVdXd7ofd8jiGtfPgAFTo62Puz+a22j3hVJKXQD8Smv9/frvZ7XWL6n//S7gJK31WfXflwN/q7W+0xnHtm1zHamR//ejb/BsvsqPX/Imdg5PcvDM0xyx4wEWcxY5E14YP5Rf7Lofuy3Nc+rzDzJvzgIwnpkEYN6cbf69aBUZMcaoWhUWzLlmGhmzdSKzZRhgZshlS4y8/EkA5h4/lFrVfiaTzZE1IVM1sbIG5ZxBrd6QRowRJjKT1KwaO8wXGTKGWJ8p2HFUtmJmcoxk1jFsjLBkLbJo2ic/zwyZPDrxch4oHMK+xWlO2HInk3WZG8zUdjT/Ni3Il6sYgJnNYObsb+/67AYAZs1pylYFLIvR+oHs1WF3x6pWbZ3UnslkMDK5piwAe5oFStV5csYwF7/8OABOXLyOfR4vceceB3PfhgOb4T/80JVsOvQUAH7n6ZvZsLTTV49mJsdwZowRY5SyVaY68RQju26pl+8r7Ie8iq9W41sHnQzAGc/cy1BllmrJrl8rb39QKmRt+X0p2gxv2MpQYTvlmV1ZenGPVtQGzOftvI4Xqzy0997cNXZE8/6rdj7EQXNPMF7JMrR+Jxdv/B+ueI+dvYP9thaBDL+c2peHC/ty5MzzvHbHIyxWDSYPuAuA4mP7U8xOsJi3T3+/bO93AnDKM7excWkGM5PhO/u93S7H396FUdlJrVbinw98lyu9M397D2O1VjtdqM5SNap8a/93AHDClnvYa+45iiMGw1aODUvDlCtzzXovU3XkfQPff/mbXPH/7rYb2Wuj/cpmnjqamdoO/mO3w3lqYg+O3PYQ+yw+0Xx2F3Mdz43uztW7HQLAaY9dhWVYlPIZMKBQW8ckowD8bLcNPDJ2IEdYv+LwJ3baSjXgA9dou0VznrHMOABbrG2Uc3bNri9nmnkwLIPRsh1BdShX3yrAImMYZOqG/bcPeAdRnPz0zeyyNEc2Y4Bpy7p1rzJXrzsRgNM3X20/aEGuXOWmAw5CDyleZ97GUcZtVOYnWXxhb1seTPTES7h1t8MYM4v8kfF1ytO78o1dTgfg1P+8iXP/4PRIecKYmpoI1fpxFPc3gR9pra+u/34aeLnWuqqUOg14pdb6U/V738U+MPg6ZxylUtnK5bKJBT/qxqt5NvvSxOEEQRBWAnvUtnDnsW/rKGw+nw1V3HF2B5wFJhy/M1rrasi9CWDaG8H8/FKMZPz8/ZLJ5uevp1o1maiWmcsNAXY3qtGJz1lVqkYOez1ha2ZHu79bOEbjG92wTAZqhr1/ROOakWl1gTKZZlc+S8ZedWVZmNTquwQadT+qVY8304w7SIZM1iBjVqlgYFgZcrksVapY1eAuo4GBmctiVKuOqw5/ImCQsaf1hQ2wNc6QdHTrMoaBYVjUzIbcQNYiX8tSydDMc75uOZWzeYxaDQPIYlFtTuxtlGXDE+fPu1UbxsiWm7LlchmqVXfXNJM1yOZzVMoVDDPTmqZoNUrBvdFXIFYWnHN/62Qby8YzWaq1GjnsMh/GomZkqboWXmUZyZTJVE0yZJnP5+wsGXY4w6xSNoxm3gzDwCCLSQ0si0wm0+yXZ6wa1XrZ53IZzGqNjAUVo1HXdlvJZ7PUalUMDKr1DrdlAJkMhuOQibxlUmnUZTaLZVYxjKzPjZZr5skih4VpZLAsi6wBtUwG08qAWXG1t9FMlpJZg2yWjGFh1CyqlomBQd4ysTJZavW6sDAha4Fp2LVcNwiHLYMlw6TRnoLq2a7SRtto+MGyZLEc7iTDngFlWWQNu0xaq4Td209kDIOcaWFiYhpZshl7INjEIGuZVI2MO4xhgWWQszJUvXPa6zOM7HxY9fZUdT9DhiGrZteDlcUyqmAY5DMmh/2X1zM9XfTlNw5TUxOh9+Io7luBU4B/U0odBdzvuPcr4PNKqRFgGDgEeKAjKQM47HXHc1zhnR1nPK0UCmOS5zWA5Hn106/8xlHclwNvU0rdhv3ZPF0pdQ7wmNb6CqXUV4GbsU2Fv9BaL/ZcSkEQBKFJW8WttTaBszyXH3Hc3wRs6rFcgiAIQgipWIAjCIIgtBDFLQiCkDJEcQuCIKQMUdyCIAgpQxS3IAhCymi7clIQBEFYWYjFLQiCkDJEcQuCIKQMUdyCIAgpI86S94HT7vCGtKOUygOXAPti7/HyOeAh4NvYu+w8AHxUa20qpf4SeCf2QRV/prX+1XLI3AuUUrsBdwNvw87Pt1nF+QVQSp0LvAsYwm7TN7KK811v29/Bbts14MOs4rpWSr0e+Aet9XFKqQOImc+wZ+Omu1It7tDDG1YJpwEvaq2PBt4BfA24APhM/ZoBvFspdQRwLPB64PeBi5ZJ3q6pv9D/BJTql1Z1fgGUUscBb8Q+aORYYG9Wf75PAnJa6zcCfw18nlWaZ6XUJ4FvAiP1S0ny6Xs2SdorVXG/GfgZgNb6DuA1yytOz/kBcJ7jdxU4EtsaA7gaeCt2OVyrtba01k8DOaXU1EAl7R3nA98Anqv/Xu35BftUqPuxN2q7EriK1Z/vR7HlzwCTQIXVm+fNwHsdv5PkM+jZ2KxUxT0JzDh+15RSK9Kt0wla63mt9ZxSagL4IfAZwNBaN+ZmzgHr8ZdD43qqUEp9CNimtb7GcXnV5tfBRmyj479jb9T2L9j72a/mfM9ju0kewd587qus0rrWWv8I+8PUIEk+g56NzUpV3FGHN6wKlFJ7AzcAl2qtv4f7IMbGgRSxDqpIAX+IvTXwL4DDgO8Cuznur7b8NngRuEZrXdZaa2AR9wu6GvP959h5Pgh7jOo72P79Bqsxzw2SvMNBz8ZmpSruW7F9ZQQc3pB6lFK7A9cCn9JaX1K/fG/dJwq23/tm7HJ4u1Iqo5R6GfYHbPvABe4SrfUxWutjtdbHAfcBHwCuXq35dXAL8F+VUoZSai9gHXD9Ks/3TloW5g4gzypu2x6S5DPo2disVPeD7/CGZZan13wa2AU4TynV8HWfDXxVKTUEPAz8UGtdU0rdDNyO/ZH96LJI2x8+DmxazfnVWl+llDoG+6SoRn6eYHXn+8vAJfX8DGG39btY3XlukKRN+55NkpAseRcEQUgZK9VVIgiCIIQgilsQBCFliOIWBEFIGaK4BUEQUoYobkEQhJQhilsQBCFliOIWBEFIGaK4BUEQUsb/B857JJVCNdUaAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(tf[:10, :].toarray().T);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We split the whole documents into training and test sets. The number of tokens in the training set is 480K. Sparsity of the term-frequency document matrix is 0.025%, which implies almost all components in the term-frequency matrix is zero. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of docs for training = 10000\n", "Number of docs for test = 1314\n", "Number of tokens in training set = 480263\n", "Sparsity = 0.0253936\n" ] } ], "source": [ "n_samples_tr = 10000\n", "n_samples_te = tf.shape[0] - n_samples_tr\n", "docs_tr = tf[:n_samples_tr, :]\n", "docs_te = tf[n_samples_tr:, :]\n", "print('Number of docs for training = {}'.format(docs_tr.shape[0]))\n", "print('Number of docs for test = {}'.format(docs_te.shape[0]))\n", "\n", "n_tokens = np.sum(docs_tr[docs_tr.nonzero()])\n", "print('Number of tokens in training set = {}'.format(n_tokens))\n", "print('Sparsity = {}'.format(\n", " len(docs_tr.nonzero()[0]) / float(docs_tr.shape[0] * docs_tr.shape[1])))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Log-likelihood of documents for LDA\n", "For a document $d$ consisting of tokens $w$, the log-likelihood of the LDA model with $K$ topics is given as\n", "\\begin{eqnarray}\n", " \\log p\\left(d|\\theta_{d},\\beta\\right) & = & \\sum_{w\\in d}\\log\\left[\\sum_{k=1}^{K}\\exp\\left(\\log\\theta_{d,k} + \\log \\beta_{k,w}\\right)\\right]+const, \n", "\\end{eqnarray}\n", "where $\\theta_{d}$ is the topic distribution for document $d$ and $\\beta$ is the word distribution for the $K$ topics. We define a function that returns a tensor of the log-likelihood of documents given $\\theta_{d}$ and $\\beta$. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "def logp_lda_doc(beta, theta):\n", " \"\"\"Returns the log-likelihood function for given documents. \n", " \n", " K : number of topics in the model\n", " V : number of words (size of vocabulary)\n", " D : number of documents (in a mini-batch)\n", " \n", " Parameters\n", " ----------\n", " beta : tensor (K x V)\n", " Word distributions. \n", " theta : tensor (D x K)\n", " Topic distributions for documents. \n", " \"\"\"\n", " def ll_docs_f(docs):\n", " dixs, vixs = docs.nonzero()\n", " vfreqs = docs[dixs, vixs]\n", " ll_docs = vfreqs * pmmath.logsumexp(\n", " tt.log(theta[dixs]) + tt.log(beta.T[vixs]), axis=1).ravel()\n", " \n", " # Per-word log-likelihood times num of tokens in the whole dataset\n", " return tt.sum(ll_docs) / (tt.sum(vfreqs)+1e-9) * n_tokens\n", " \n", " return ll_docs_f" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the inner function, the log-likelihood is scaled for mini-batches by the number of tokens in the dataset. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## LDA model\n", "With the log-likelihood function, we can construct the probabilistic model for LDA. `doc_t` works as a placeholder to which documents in a mini-batch are set. \n", "\n", "For ADVI, each of random variables $\\theta$ and $\\beta$, drawn from Dirichlet distributions, is transformed into unconstrained real coordinate space. To do this, by default, PyMC3 uses a centered stick-breaking transformation. Since these random variables are on a simplex, the dimension of the unconstrained coordinate space is the original dimension minus 1. For example, the dimension of $\\theta_{d}$ is the number of topics (`n_topics`) in the LDA model, thus the transformed space has dimension `(n_topics - 1)`. It shuold be noted that, in this example, we use `t_stick_breaking`, which is a numerically stable version of `stick_breaking` used by default. This is required to work ADVI for the LDA model. \n", "\n", "The variational posterior on these transformed parameters is represented by a spherical Gaussian distributions (meanfield approximation). Thus, the number of variational parameters of $\\theta_{d}$, the latent variable for each document, is `2 * (n_topics - 1)` for means and standard deviations. \n", "\n", "In the last line of the below cell, `DensityDist` class is used to define the log-likelihood function of the model. The second argument is a Python function which takes observations (a document matrix in this example) and returns the log-likelihood value. This function is given as a return value of `logp_lda_doc(beta, theta)`, which has been defined above. " ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "n_topics = 10\n", "# we have sparse dataset. It's better to have dence batch so that all words accure there\n", "minibatch_size = 128\n", "\n", "# defining minibatch\n", "doc_t_minibatch = pm.Minibatch(docs_tr.toarray(), minibatch_size)\n", "doc_t = shared(docs_tr.toarray()[:minibatch_size])\n", "with pm.Model() as model:\n", " theta = Dirichlet('theta', a=pm.floatX((1.0 / n_topics) * np.ones((minibatch_size, n_topics))), \n", " shape=(minibatch_size, n_topics), transform=t_stick_breaking(1e-9),\n", " # do not forget scaling\n", " total_size=n_samples_tr)\n", " beta = Dirichlet('beta', a=pm.floatX((1.0 / n_topics) * np.ones((n_topics, n_words))), \n", " shape=(n_topics, n_words), transform=t_stick_breaking(1e-9))\n", " # Note, that we defined likelihood with scaling, so here we need no additional `total_size` kwarg\n", " doc = pm.DensityDist('doc', logp_lda_doc(beta, theta), observed=doc_t)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Encoder\n", "Given a document, the encoder calculates variational parameters of the (transformed) latent variables, more specifically, parameters of Gaussian distributions in the unconstrained real coordinate space. The `encode()` method is required to output variational means and stds as a tuple, as shown in the following code. As explained above, the number of variational parameters is `2 * (n_topics) - 1`. Specifically, the shape of `zs_mean` (or `zs_std`) in the method is `(minibatch_size, n_topics - 1)`. It should be noted that `zs_std` is defined as $\\rho = log(exp(std) - 1)$ in `ADVI` and bounded to be positive. The inverse parametrization is $std = log(1+exp(\\rho))$ and considered to be numericaly stable.\n", "\n", "To enhance generalization ability to unseen words, a bernoulli corruption process is applied to the inputted documents. Unfortunately, I have never see any significant improvement with this. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "class LDAEncoder:\n", " \"\"\"Encode (term-frequency) document vectors to variational means and (log-transformed) stds. \n", " \"\"\"\n", " def __init__(self, n_words, n_hidden, n_topics, p_corruption=0, random_seed=1):\n", " rng = np.random.RandomState(random_seed)\n", " self.n_words = n_words\n", " self.n_hidden = n_hidden\n", " self.n_topics = n_topics\n", " self.w0 = shared(0.01 * rng.randn(n_words, n_hidden).ravel(), name='w0')\n", " self.b0 = shared(0.01 * rng.randn(n_hidden), name='b0')\n", " self.w1 = shared(0.01 * rng.randn(n_hidden, 2 * (n_topics - 1)).ravel(), name='w1')\n", " self.b1 = shared(0.01 * rng.randn(2 * (n_topics - 1)), name='b1')\n", " self.rng = MRG_RandomStreams(seed=random_seed)\n", " self.p_corruption = p_corruption\n", " \n", " def encode(self, xs):\n", " if 0 < self.p_corruption:\n", " dixs, vixs = xs.nonzero()\n", " mask = tt.set_subtensor(\n", " tt.zeros_like(xs)[dixs, vixs], \n", " self.rng.binomial(size=dixs.shape, n=1, p=1-self.p_corruption)\n", " )\n", " xs_ = xs * mask\n", " else:\n", " xs_ = xs\n", "\n", " w0 = self.w0.reshape((self.n_words, self.n_hidden))\n", " w1 = self.w1.reshape((self.n_hidden, 2 * (self.n_topics - 1)))\n", " hs = tt.tanh(xs_.dot(w0) + self.b0)\n", " zs = hs.dot(w1) + self.b1\n", " zs_mean = zs[:, :(self.n_topics - 1)]\n", " zs_rho = zs[:, (self.n_topics - 1):]\n", " return {'mu': zs_mean, 'rho':zs_rho}\n", " \n", " def get_params(self):\n", " return [self.w0, self.b0, self.w1, self.b1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To feed the output of the encoder to the variational parameters of $\\theta$, we set an OrderedDict of tuples as below. " ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "OrderedDict([(theta,\n", " {'mu': Subtensor{::, :int64:}.0,\n", " 'rho': Subtensor{::, int64::}.0})])" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "encoder = LDAEncoder(n_words=n_words, n_hidden=100, n_topics=n_topics, p_corruption=0.0)\n", "local_RVs = OrderedDict([(theta, encoder.encode(doc_t))])\n", "local_RVs" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`theta` is the random variable defined in the model creation and is a key of an entry of the `OrderedDict`. The value `(encoder.encode(doc_t), n_samples_tr / minibatch_size)` is a tuple of a theano expression and a scalar. The theano expression `encoder.encode(doc_t)` is the output of the encoder given inputs (documents). The scalar `n_samples_tr / minibatch_size` specifies the scaling factor for mini-batches. \n", "\n", "ADVI optimizes the parameters of the encoder. They are passed to the function for ADVI. " ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[w0, b0, w1, b1]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "encoder_params = encoder.get_params()\n", "encoder_params" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## AEVB with ADVI" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Average Loss = 2.9855e+06: 100%|██████████| 10000/10000 [06:20<00:00, 25.59it/s]\n", "Finished [100%]: Average Loss = 2.9886e+06\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "η = .1\n", "s = shared(η)\n", "def reduce_rate(a, h, i):\n", " s.set_value(η/((i/minibatch_size)+1)**.7)\n", "\n", "with model:\n", " approx = pm.MeanField(local_rv=local_RVs)\n", " approx.scale_cost_to_minibatch = False\n", " inference = pm.KLqp(approx)\n", "inference.fit(10000, callbacks=[reduce_rate], obj_optimizer=pm.sgd(learning_rate=s), \n", " more_obj_params=encoder_params, total_grad_norm_constraint=200, \n", " more_replacements={doc_t: doc_t_minibatch})" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Approximation{MeanFieldGroup[None, 9] & MeanFieldGroup[9990]}\n" ] } ], "source": [ "print(approx)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAD0CAYAAABw3+qlAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvDW2N/gAAIABJREFUeJzt3XeYVNX5wPHvzM72Nruw9KW5cECqSxOkWRARFbuYKIo1hiRqSEww8IMY0yMmlqjBEGMMxogSS0SwgwQBsYHiARRFRGApy+6yvfz+mJnd2al3+szO+3mefZ6dO+feOWfKfe+p19TS0oIQQgjhzBzrDAghhIg/EhyEEEK4keAghBDCjQQHIYQQbiQ4CCGEcCPBQQghhBtLrDMQDmVllSGNx83JSaeqqi5c2Yl7yVZekDInCylzYIqKck3enpOaA2CxpMQ6C1GVbOUFKXOykDKHjwQHIYQQbiQ4CCGEcCPBQQghhBsJDkIIIdwYGq2klOoCbAWmAVnAC8Au+9MPaa2fUkotBmYCjcBtWuvNSqkS4DGgBdgOzNNaN4eaNgzlFkII4YPf4KCUSgUeAWrsm0qBpVrre5zSlAJTgHFAMfAMMAZYCizUWr+plHoYmKWU+jIMaYUQQkSQkZrDH4CHgQX2x6MApZSaha32cBswEVirtW4B9iqlLEqpInvat+z7rQbOBnSoabXWZaEVWwghhC8+g4NS6lqgTGu9RinlCA6bgUe11luVUj8DFgPlwBGnXSuBfMBkP7E7b8sLQ9p2wSEnJz2osb5llXVc+shGHr9uLH0KswLeP1GlpJixWpOnvCBlThZS5vDxV3O4DmhRSp0FjAQeBy7QWh+wP78KuB94Dsh12i8XW8Bo9rCtIgxp2wl2duDOA5XsP17L52VV5CdR17zVmkV5eXWssxFVUubkIGUOTFFRrtfnfJ4StdaTtdZTtNZTgQ+AOcBzSqmx9iRnYuuo3gBMV0qZlVK9AbPW+jDwvlJqqj3tDGB9mNKGldwLTwgh2gtmbaVbgAeUUvXAAeAmrXWFUmo9sBFbwJlnTzsfWKaUSgN2ACu11k1hSBsWVXWNAKzYtJfSC04O56GFECKhmTrCPaSDXXjvhe0HuGvNTgC2zJ8c1jzFM6l6Jwcpc3IIsVlJFt7zxOT1bRFCiOSW1MFBCCGEZ0kdHExI1UEIITxJ7uAgsUEIITxK6uAghBDCMwkOQggh3EhwEEII4Sapg4P0OQghhGfJHRxktJIQQniU5MFBCCGEJ0kdHJzVNjTx4dfHY50NIYSIC0kdHJz7HH7z2m5u+NeH7Cuv8b6DEEIkiaQODs52HaoC4ERdU4xzIoQQsSfBwUWL3N1BCCGSOziYnNqVTDKuVQghWiV3cIh1BoQQIk4ldXAQQgjhWVIHB2lJEkIIz5I7OMQ6A0IIEacsRhIppboAW4FpQCPwGNACbAfmaa2blVKLgZn252/TWm9WSpVEIm1YSu6FjFUSQggDNQelVCrwCOCYHbYUWKi1noTt4nuWUqoUmAKMA2YDD0Y4bXg4j1YykHzTl8f46phMkhNCdHxGmpX+ADwM7Lc/HgW8Zf9/NXAWMBFYq7Vu0VrvBSxKqaIIpg2L5mb3ekKLj6rD91Zu4+LlW8L18kIIEbd8Nisppa4FyrTWa5RSC+ybTVprxym0EsgH8oAjTrs6tkcqbZlzPnNy0rFYUnyX1ANLelvxG+xRITc3A6s1y+d+/p6Pdykp5oQvQ6CkzMlByhw+/vocrgNalFJnASOBx4EuTs/nAuVAhf1/1+3NEUrbTlVVnZ9ieFZdXd/6/2dlJ+zHqqW8PNXnfuXl1UG9XrywWrMSvgyBkjInBylzYIqKcr0+57NZSWs9WWs9RWs9FfgAmAOsVkpNtSeZAawHNgDTlVJmpVRvwKy1Pgy8H6G0EeOrWUkIIZKFodFKLuYDy5RSacAOYKXWukkptR7YiC3gzItw2rCQeQ5CCOGZqaUDXCqXlVUGVYiXdxxi0Uufttv22LdPYUg3z1WtMfesA2DL/MnBvFzckKp3cpAyJ4cQm5W8XiIn9SS4kqLsWGdBCCHiUlIHh9z0YFrVhBCi40vq4CCEEMKzpA4O0h8thBCeJXVwSPyueCGEiIykDg4edYDRW0IIEaqkDg6ehvFeu+IDNn1xLAa5EUKI+JHUwcGbP637HIB95TXc++ZnNEttQgiRZJI6OFgzfa+h9NMXdrBi69fstq+7JIQQySKpg0NGqueVXB0VBakxCCGSVVIHB38kNgghkpUEBw9aXAa5ygJ9QohkI8HBA0eNwTVItE/TwvSHNvLctm+ilCshhIgeCQ4euIaEL4+63ze6uQWOVjfwq1d2RSdTQggRRRIcPNh7tJqP9le01iAWvLiDJpf7TUt3hBCiI5Pg4EFTC1z/5AfttnkLBtIdIYToiCQ4+OAcEDrCTZGEEMIoCQ4GucUGCRZCiA5MgoMvLR7/FUKIDs/vrdCUUinAMkABTcBcIB94AXAM1XlIa/2UUmoxMBNoBG7TWm9WSpUAj2E7v24H5mmtm0NNG5bSB8C1WUmChRCiIzNSczgfQGt9GvB/wFKgFFiqtZ5q/3tKKVUKTAHGAbOBB+37LwUWaq0nYeu/nRWmtBF3or6x9X+vrUoyQ04I0QH5rTlorf+jlHrR/rAPcBAYBSil1CxstYfbgInAWq11C7BXKWVRShXZ075l3381cDagQ02rtS4LufR+HKqqb/1fuhiEEMnEb3AA0Fo3KqX+DlwEXAr0BB7VWm9VSv0MWAyUA0ecdqvE1vxksp/YnbflhSFta3DIyUnHYvG8iF64fHWiofV/qzWLuoYmwFa9sVqzIvra4ZaSYk64PIdKypwcpMzhYyg4AGitr1FK/QTYBEzQWn9tf2oVcD/wHJDrtEsutoDR7GFbRRjStqqqqjNajKBdtbytm6O8vJpae3BwPA7Eq7qMsX2s5GX4XjI8UqzWrIDzHKjjNQ2c9eeN3HvRECb27xTR1zIiGmWON1Lm5BBKmYuKcr0+57fPQSl1tVJqgf1hNbYT+LNKqbH2bWcCW4ENwHSllFkp1Rswa60PA+8rpaba084A1ocpbViM7W0N16EM2X+8lgUv7uBnL34a1deNtl32e2D8Y8u+GOdECBEMIx3SzwKnKKXWAWuw9S/cAvxRKfUmcBpwt9Z6K7aT+UbgGWCeff/5wM+VUhuBNGBlmNKGxSm98kPav7G5hQ+/Pm44fX2TrXK0v6I2pNcN1Za9x5h6/waq6hr9Jw6Co58+El01x6rreX77gQgcWQjhYKRD+gRwuYenJnhIuwRY4rJtJ7bRRmFNGy5zx/Xmkf99GfB+e462VeNu+NeHbJk/2dB+ZvtZ03Vo7PPbDtAlN41T+xYGnJdg/OV/X3KivoldZSdCDpDRdsfzn/DB1xWMLrbSIz8j1tkRokMy3OfQUaWYAx+K+uPnPubN3Uf8J/TA8XIu6/jxi7U7AQwHmURUVddIbWMznbPTQjrOkRO2UWQNTc1+UopYOv8vm+jXKYv7LhkW66yIIMgMaWBIj7yA0gcbGCCyzS3B8HXPinC7ZPkWZjz8TsjHMcnckoRwoLKOjV8ci3U2RJAkOACPXTM6aq/13le2/on9x2vZ8PnRqL2uq2BOrw1NzW5Ll/vl1Hx2tLrBR8LAxUuAFaIjkuAAWLNCa+aA9rOpm5pbeObD/TR6aPa4a83O1v9vW7Xd8PG/OlZDZa3nzuO6xmbK7SfeQ5V1NAZwAg9kct+EP77N91Z+ZCitXNwLkdgkOITJVf94j+3fVPD6zjJWbN3Hb17dzfg/vs22/RU8uH5PyEt+X7x8C1f9Y6vH52b//V2mPbSRsqo6Zv5lE3966/OQXsuXd78yPjILInx1L1WHdsqrG9hzJLnG+IvISfoOaQezyb2TOBD7ymuZu+IDt+3X2W8a1D0/g2kDiwwf77WdZXTPy+Dkbm2TVPZXeJ7st6/cNiz2uW224Z1vf36E+aefZPi1IsGEY1RWJI6dGGoamrjr5Z3MP70/nXPSI/56sx/fypET9R1iUMNH+yv49GAVl5/SI9ZZSVpSc7Db9MPI/qB+/couznjwf37TOWZe//SFHVzzz/fdnj9UWceGPUd56ZODbs85zsOGglyE230cRz98op6yqjpWbA3/ZLh4rzis/fQQr+4s46ENX0Tl9RyjuDqC65/8gN+/vjvW2UhqEhxibMw96xhzz7rWx3et2Yk+WNX62HmZDoCZf9nEbc9uZ/FqDcDXx2tan/uLfb7G/uO1LPzvDhqamrn8b++2O364vbnrMGPuWdeuKevWZ7e1juj6+ngt5z6yiXvfbHv+X+997XacQHiKa89t+4afv6xDOq4vNQ1NvLu3bdWWlpYWnv5gv9vnI0RHIc1KceYVXcYrTgvO/ua13fz3Y/dagsOi/3pehmPNp2XsK69tnaz38YFKhnTLbe2sdpxfnZt9dh8+QUVtA4O75pKZalvI8Muj1XTKTiMnve2rsqusigFFOewrr+HHz38CwBPv7uPWKf0B+N+eY/xvj/chjPe88RmzS3t6fd7h4wOVDO6a0zpx0OFEve2E7DwM9+61tluLLD5HtW47Vl1PbkYqliDmsjj89IVPKKuqp3N2Gq/vOswLN47ljd1HSDGZ+P3ru9l7rMZvE15VXRO1DU1kpEZ2cchwOl7TQG6Gxe29F4Gprm8iKy1xPndnEhzinK/AsOCFHWz7ptLr8x8faHvuWqcmqn9eP5Zq+wnW8dvftr+itX/kzIGd+dV5g/nnu/u4b90eVJccnri6tHX/bz3+HmcNLOLVne1XTR9zzzq+Ncr/SR/grd1HGNvH2hqEGptb+PUrO5l8UmemlHTi9Ac2UFXXxJwxveiZn8EFw7q3nuTL7Eupt7TA3zbtdeuDWL3jIL95ZTfVDU1cOKwbPzlrgD19Cz94djuzS3tSmJVKdX0To4p9r6312s72y3h9cbSapW981vr47c+PUNfYxJ3TBno9xuu7DrOzrIpTeubz308OemzC3Hushie37uNHZ5SQYjaxZschHnx7D6uuH0uK2cS+8hoWvfQpf7p4qKEFG+sbm4NeGuXIiXrOefgdbhzfm5sm9AVsi0UO75FHl9zI9J00NrfQ2NQc8QC6ePWnWDNTuX2q94D+8YFKCjJTQ5p9X1XXyHv7jjP/Px/zy5mDGNw1l+KCzKCPFwsSHBKY68nZqB/++0MOVto6t7/zb9vQ1MzUthbG13Ye5qZ/fciH+ysA0IeqmPPEe4Zee8VWY01GP3ruY07qnEVZVT3DuufRIz+D57cf5PntB7lxfG+q6mzB63H7wn3vfFnOt0f1ZKlT89RvX93F+19XuB37t6/aAgPAf7Yd4D/bDrDrF+fw2ZFq3vniGO84Tcxa/4PT+Pf7+zla3cB5Q7ty5d+3cve5gyjISvXYme7an7OvvJZ95Qe4enSxzx+/LZ1t4MCYe9bx72tH069T2zLLd764A32oiguHdUd1zeEXa3dS19jMsep6ZjyyqTXdG7sOc1q/QjJSU1prc4er6lj1Ufu1pub88z0+O1ztt3P6rd1HGNo9lyMn6qmub2Jkr3wO2/su3tx9hJsm9KWxuYUFL+4A4JXvjsea2Rac3ttXTlF2esgnvnlPf8R7+4675ddxwXH71JNobG7h/nWfc+3YYgo8DD+vbWiivtH3rPmXPjkEwMY9x/j33Lb5TVV1jWSnpWAymVovpILt2H9j12HueP4Thna3DSb5mb12v2X+ZJpbWnhVl3GWKgqpVtbc0sKf3vqcy0b2iNgS5RIckpAjMDiraWj/o3IEBocdTv0g4fLZYVuT14Y97ScDLtu41y3tG7sO88au9lfxngLDRX/d3Nrs5GzAopc95uF3r+3mBXvt7J/2TvOFL3lfMddbv8aOg5Ucq2lg61flzB3Xm+3fVLQ2dXly+WPvMmdMMY9v+YoF0wagD9ne308OVvLMR/ups5/kPncZmmoxm5nxyCYKs1JZc8t4ahua+O7Kbe2GsB6uqmt9b5/96BvOVkXc8fwnmIAJ/Qr59uhefPxNBSaTiR8993G74/s7IV7+t3d58LJhpFtS6JGfwc1PfeS2X3lNA5u/bAvANzz5AWcM7My9b37emu/q+iaWb9rLmQM7M7hrLu/tsw2RPl7TwMoP97d7zRVbv+b2qScx/t71rY8XnFVCn8IsctIsPPrOl4zvV8ivX2l7vzfcOpHDJ2zNgWkW24WP8+z8PUerW5v6HDWl707sy9xxvVvT7CuvoZe1LejpQ1Vc9Y/3+OuVIxneI4+lb3zGyF75nDGgc7v8OvqmPjnQvlb/g2e2MbhrDss3fUVVXSMXj2gbibV6x0EKs9IY0SOPbyrqsJhNrQG3qbmF5pYWUlPMtLS0YDKZ2HmoihVbv+aDryt4bt5pPj+zYJlCHX8fD8rKKkMqhGM99Eh23AoRK+kWc2uwAXjllvFMe2ijoX033j6Jf767jwfW7/GbNis1hZqGpoBHkd1xZgm/e833yKQ+BZl8eazGZxpnvz5vMAte3MEpPfN48LLh/PWdvfz1HfeLDoCZJ3fhv/YahbMBRdnUNTZzw/jeDOuex5LVuvWiadHZA1vXQ3PYMn8yr+gy7rTXsnyZ2L+QC4d1Y1DXXNItZqb92f3z2Hj7JCxmE7ev2s7bnx9l0fSB/GLNTjJTzSyZMYif2Pv79M+nU1Fh/L1xVlSU67X6IsEBCQ5CiMR17fg+zJvQJ6h9fQUHGcoqhBAJ7LGNgd9ywAgJDi7yM6QbRgghJDg4efa6MaycOyZix3eMXhBCiHgnwcFJcUEm1iz/Y8iD5W9MvRBCxAsJDlEUzF3nhBAiFvw2sCulUoBlgAKagLnYVl94DNvaZ9uBeVrrZqXUYmAm0AjcprXerJQqiUTaMJVfCCGEB0ZqDucDaK1PA/4PWGr/W6i1noQtUMxSSpUCU4BxwGzgQfv+kUorhBAiQvwGB631f4Cb7A/7AAeBUcBb9m2rgbOAicBarXWL1novYFFKFUUwbeLpAHNKhBDJwdC4Ta11o1Lq78BFwKXAeVprx5muEsgH8oAjTrs5tpsilLZ1cZ+cnHQsluAX7EpJMUdsfRJn6QYWTBNCiEBF4vxleFC/1voapdRPgE2A8ypbuUA5UGH/33V7c4TStqqq8nyHNKMcM6SN6F2Qyd4ApvE7q6ttCGo/IYTwxej5y1VRkffh9X6blZRSVyulFtgfVmM7gb+rlJpq3zYDWA9sAKYrpcxKqd6AWWt9GHg/QmljoiAz+Kt/aVQSQiQKIzWHZ4G/KaXWAanAbcAOYJlSKs3+/0qtdZNSaj2wEVvQmWfff36E0gohhIgQWXgP92YlXwvwjeiR57actVFzxxXzt01fBbWvEEJ4E+y9J2ThvTjRAeJwwhnbW2alCxEMCQ6iQ5N4LERwJDhEkZyohBCJQoJDgEK47asQQiQMCQ5CCCHcSHCIIumQFkIYZYnxKs4SHAIkJ3ghRDTEuglbgoMQQgg3EhwMunh49zAcRaodQojEIMHBoBmDu8Q6C0KIBDaxf2FA6WPdhC3BIYouGdEj1lkQQiSIWLczSHCIoh75GbHOghBCGCLBQQgh4lCs59tKcDAo1lU8IYSIJgkOAXIee/y7C06OXUaEECKCJDiE4PQBnWOdBSGEiAgJDkIIIdxIcBBCiDgky2cIIYSIOxZfTyqlUoHlQF8gHbgb2Ae8AOyyJ3tIa/2UUmoxMBNoBG7TWm9WSpUAj2Eb7LMdmKe1bg41bbgKb1SXnLTW/2M9a1EIIaLBX83hKuCI1noSMAN4ACgFlmqtp9r/nlJKlQJTgHHAbOBB+/5LgYX2/U3ArDCljar/3nxqzMccCyGSS6wvRH3WHICngZVOjxuBUYBSSs3CVnu4DZgIrNVatwB7lVIWpVSRPe1b9n1XA2cDOtS0Wuuy0IothBDxLdaNFD6Dg9a6CkAplYstSCzE1rz0qNZ6q1LqZ8BioBw44rRrJZAPmOwndudteWFI2y445OSkY7GkGCmvRykpZqzWLK/PW61Z5ByvA8BiMbfbHohA04vQOX9eQsRSoOcoswmaDKaNxLnFX80BpVQxsAr4s9Z6hVLKqrUutz+9CrgfeA7IddotF1vAaPawrSIMadupqqrzVwyfrNYsysurvT5fXl5NVWUtAI2Nze22ByLQ9CJ0DY3N/hMJEQWNjUZP9TaBNCsFe24pKsr1+pzPyyqlVFdgLfATrfVy++Y1Sqmx9v/PBLYCG4DpSimzUqo3YNZaHwbeV0pNtaedAawPU1ohhBAR5K/mcCdQACxSSi2yb/sh8EelVD1wALhJa12hlFoPbMQWcObZ084Hliml0oAdwEqtdVMY0kZdrNv/RHBkIIEQwfHX53ArcKuHpyZ4SLsEWOKybSe20UZhTRtLsZ6YIoRIDrE+10hvnRBCCDcSHIQQQriR4CCEEMKNBAchhBBuJDgIIYRwI8FBdGihDkHulJ3mP5EQERDrtZUkOAQo1h+YEEJEg9/lM5LZdaf25nhNAyCTqYQQyUWCgw/XjCkmKy34Bf1E4pOLApGspFnJh1jPUBQiEd06pX+ssyDCQIKDQdLVkJzkcw9M74JMrhrdK9bZ6BBifXEqwSFAsf7AYqHYmhHrLAQtCT8uIcJCgoMQQgg3EhyEX9K0IkTykeAghBBxKNZzqiQ4CCHCpiXWZzQRNhIchF+J3Kkb6qkqN13muYjYiPXgF5kE50MinxTDyRTrb2kMnNavkPF9CyjMTuPOF3fEOjsiCcW6EiY1ByE8MJngitKepFvkJxKI4oLMWGdBhInPmoNSKhVYDvQF0oG7gU+Ax7DV2LcD87TWzUqpxcBMoBG4TWu9WSlVEom04St+4GIdzYWIZ9bM1FhnQYSJv8uiq4AjWutJwAzgAWApsNC+zQTMUkqVAlOAccBs4EH7/pFKG3XJ17AiQD73QMn71XH4Cw5PA4ucHjcCo4C37I9XA2cBE4G1WusWrfVewKKUKopgWiEMG9ItN9ZZSB5J2D8VKbF+K302K2mtqwCUUrnASmAh8AettaNxpRLIB/KAI067OrabIpS2zDmfOTnpWCzBjypJSTFjtWa5bc/PzyLTviprdnktABanNmhP+/gSaPp4kWJO3B98qsXM3NP6Me/J9wPaLyczFas1i+ycqgjlrGNKS01J2O95pIVyjvInEu+539FKSqliYBXwZ631CqXU75yezgXKgQr7/67bmyOUtp2qqjp/xfDJas2ivLzabfvx49XUpaa0e42mprZsetrHl0DTx4um5sTtaGlsbKamOvDvR/+CTMrLq0P+biWb+vrGhP2eR1pjY1PEjh3se15U5L1W7bNZSSnVFVgL/ERrvdy++X2l1FT7/zOA9cAGYLpSyqyU6g2YtdaHI5hWCCFEBPmrOdwJFACLlFKOvodbgfuUUmnADmCl1rpJKbUe2Igt4Myzp50PLItAWiGEEBHkr8/hVmzBwNUUD2mXAEtctu2MRFohjLI1iCVun4kQsSIzfMLMYu+8vetc5fH5H59REs3sCCFEUCQ4hOiGU3u3e/zSzeNYdf0YZgzu6jF997z0aGRLiJhI3KEL8SfWE24lOITo5tP6tntckJVGL2vklhCYclKniB1buIv1D1SIWJHgkGB+P+vkqL9m4rfYB36GT/wyCxEaCQ5RFuqsx2RcITV0gb9nUmEQyU6CQ4BCbWYwJeA1aaKfKEOJpxKLAyNvV/jE+rsnwcGgePjS52fI7TeEENEhwcGgeLh6fnXeBABG9MijU3ZajHMjhOjIJDgEKOSqXhiqII9eOZKXv3Nq6+O544pDP6gP8VBrihUZrSTiXaTuoSHBoQMw+uVYdsUIn89/e1SvDlcjCTSwnTO4S0TyIUSkvHzrpIgcV4JDB2Fkct3IXvk+n0+zmCjwEGgi0TE2tSQ68zUCvfDvkmN7H5O5tiQSS6Qu6CQ4eBDJ+wZH6qRz6YgeYTlOtEZIRLO5Rk70QgROgkOQhnXPi3UWhBAiYmRsZBC2zJ8c6yxEVSLOzRAi0RmpXTsPTAk3qTlEWULWOCQ2BMV1UcZkIIO7oiuSA0gkOERZboYlIiuzBvqj/OHpJxlOm1yxwfWdDO50N3dcMSN7+h4A4MtlI8PThyQSl8yQFlHVu8C2YmyLhzqrt+9irL+kkfT7C6K/kKERM2RIrYgxCQ5h9LdvjTSULpYTq1LM3s/0HaVJ4MyBnQ2nVV1zXLa4vj+xiYyJGpBjne3rXZbQF8GT4OBDoCfLoV76E07xM7/A1RWnJE6Twn2XDGXdD04LeL9InvwuHNYtcgePklifZBNNaoq8Y+FmaLSSUmoc8Fut9VSlVCnwArDL/vRDWuunlFKLgZlAI3Cb1nqzUqoEeAzbeXY7ME9r3Rxq2rCU3AdPX7MBRdmkW8zccGqfgI/3yOXDA0p/+9STeOr9/QG/TiyM71sY6yx0TIladTBgbG8rm/eWxzobUZGbbqGyrjGofUs6Z7PjYFWYc2Sc35qDUuoO4FEgw76pFFiqtZ5q/3vKHjCmAOOA2cCD9rRLgYVa60nYzrmzwpQ2ojzVGHLSLbx960TG9S0I+Hgmkymg+zCkmE2RG4XgpzrkaQJgIg9l9TTjO1p8NeH5E8KucS/emi+funZUUPsZGdIeSoxfePbA4HcOAyPNSp8BFzs9HgXMVEqtU0r9VSmVC0wE1mqtW7TWewGLUqrInvYt+36rgbPClDYq4uH3+fdvn+L1uQFF2a3/d8sNzwioX5832G1btC9ifzC5X9iOdee0ASEeIfhTWWlxPjeN78Ok/oHXruLhuycCk5WaEtbjZYb5eIHy26yktX5GKdXXadNm4FGt9Val1M+AxUA5cMQpTSWQD5i01i0u2/LCkLbMOY85OelYLMG/kSkpZqzWrNbHjpNhfn4WmWnGj+t8DF/bzV4uC68Y3YvM1BSs1izM9kyc1COfgqxUjlU3uB3ntJLO7Co7QWZmGpef2ocvKup49O09PvNntrfNZma610wy0lMZ1Nv9ROYtv57KZlSqjy/+haOLuW+d93L4k52T0fp/r6557Cqv9Zo2L6/9/b7T020/iczMNKzWLLKzgwu6GempFBZk8+NzB/PAG7tZ//nRgPbPzc3wnyioMfJ3AAAT/ElEQVQOpaVZ/H4nUkP4rfoTzJ0Sc3ODu+e7azlHFOez0eVzds6Pr++8J3l5/r8DVmuW2/krXIKZIb1Ka+1oMFwF3A88B+Q6pcnFFjCaPWyrCEPadqqq6oIoRhurNYvy8urWx47RROXHq6kL4AN1PgbAG9+b4HH7wKJs9h93P2H9aEr/1vTN9kxUVtTwr2tGcfREg9tx6uxtmTU19Rw/XsNIt5E37vlrbrIdt7a2we352jr31wBoaLR9NKN7W3nXpa3YU3ojGhqavD5XUVET1DEdqqra3tvy8mpOnPD+/aioqOFv3xrJ3BUfAFBXZ3tfamsb/O7ri/N7OXtEd/70+u6A9q+qDO07HSv19Y1+vxMNjd4/+1B5GqLtT2VlcN8313I6fife8uPrO+9JRYX3ixrnPLievwJRVJTr9blgRiutUUqNtf9/JrAV2ABMV0qZlVK9AbPW+jDwvlJqqj3tDGB9mNJGVKDXHg9cOozHPAxjzUm3kJPuHn/vOncQf73S2LBXgMKsNEqcmpCizRGo+hWG7+rEHEcdrt5GmYWLJZgOhBi9PWcNjHyrbbz1OURSsN/zLjmxXzo/mJrDLcADSql64ABwk9a6Qim1HtiILeDMs6edDyxTSqUBO4CVWuumMKSNK+P6BNZJnZmawvAeeaguOVTXN/KVj2aPeNBs/zWHs5N05pCuvL4r4nE+YcVP6AxOj7x09lckZu0nnLrmplNe415L92fSSZ1iHkQNBQet9RfAqfb/3wMmeEizBFjism0nttFGYU3bUTxxdSkAY+5Z5/bcT88s4d63Po/YXZ4C4agahzL6xlVako1L75SdxpET9YbTh6NmteS8k1ny4ichHycY/547hol/ejtqrxfKxFK5259nMgkuTk0d0JnnbhiLJcX7RzShn63GMsK+hk9pcfBr+TjLz2h/zdDH3pw0sMh3n0a4hHvo7KAu7fN9XQAL4gXTwelJdgADG2wvHPprFsbgrn6O86yve6LMn2p8Xa9o+csVI/jjRUPDftxEvgSS4ODBPRcOYfJJnciI4E1/wuHUvoVsvH0SQ7rZOpVSnQLJCzeO9bhPSxCV1Yn9C3lyzijOPTk66/0Ecz72NZ+hc077EUdnlHhfXqOn1TZyxbE44oR+wU3yi4NbjQfFx7WIIUby3Sk7OrXhogDa7U/plc9pQQw5dhbKnJrxQcyfirT4PvvFyNg+Bdxz4ZCwXTX6k2ExM2dMcVD7euvs7JaXwZvfd2v982mAl5pBZmoKJUXZUXs/gqnmO++Savb+tT7Lz7pLFw3rxkOXDWeafTqNxWziliDW63Etgq93zlOevL3V/QqzmDOmV8D5MWp+AKv1euSScW81iLG9raG9ju+XBeA7p/XljjNLwvo6voQ+p6ZNPNQ4JDjEgfW3TuT7YZr49fS1o3nmujEAZKe5dyl5a7J5+trRrSdEZ7dO6c/0Qf5rDJmp8fNVGuWheW2KfRTOzCFdfe5rMpkY3dsacCA8qXPwI7mKC9zH2ZtdPifHYoI9rRl8f3L/oF/Ln4Kstqvt527wXPv0xTnXvzh3EE/O8Tz7eFaU17/y97kH69HZI1ov0DyNTDTKU39erGfJx88vWoRF305ZrctyB7qfJ1eN7uW1I/o8px/cW98PfPG9cPnxGe2vDk0mEw9fPpz/3DAmanl45PIR5KR771fw1QY/qpeHq2iXt9zbZ3r3uYMM5S8YPfIDn4jnnO1zBnehuCCTR64YzhNXlYYvY0GIVBPxiJ75vHTzOB6dPcLj80bvR/8zD7WOnkG8/+EkwUEEbWj3tgk0gV5p++p0DuRQK+aUctnI7m7bRxVb6ZnvP0ga/fH6k5+ZyqCu3icU+RpY4Gm9Lue34OHLh3u9g+D0OLvvg6dRVqW9rB6WRnf31LWjfAZYIy4tbWtyC+eF9y98BOGCrLTWQSGu7p45iIFOc5S+N8m9hWB4jzy3fjEI32CIYElw6OCevS56V8++jA6gjTktgF7RAUU5hn5EN060/SiHdmt/ks010BTgrVYVCNe+oX5+jtnL2hbYYnkFeee0AVwzNoD+MAPnswwvqw7075RNjoem0EAUu0zU9NZ/9d+bxvHgpcMotma0e6+9OX2A8XuEOOuSm87NTn1WAb2XMSbBoYPz1J7tylvTxB9mDQn59QcWZTPz5C7cf8mwkI/larCBq1GHcf0K2TJ/Mtas9iNKUg0EojOCPDE4u3tm23tc0jmbFV7a4h3COackFBcN7x7QidFIrmOxoJwJ6Ow0tLdLbjpj+xTw7PVjw1Z79Pa64RDwUOgwkOCQZBxDWXPsX7bzh3T12jQxomfoy0o8eNlwlswY5HMJCdd5CEY9cGlbwAnk7m+x0D0vgxduHMu5J3fhgUuHBbykRqLM0zLaEuKt8/Y3F5wc8Gd5cjfPzXmueXFMOg2G0U9r4dkDPC6l44vrvCLwXqZokuCQpIZ2z+OXMwfx4wgP9YvU9a/qkkNeRlst4EdnRG/IYrC65WXw8xmDInevjjhgdGb3+L4FLJo+kBtcJiQO6ZbLb84/2W/fQ0nntnb8ZVd47gx2Fcr7bjQ4zxrWnSEBrNX14zNK+L9zlNv281xGV8ViLTIJDkns7EFdQqriu35d771oCL85fzCLprfdpCRaX+qgFrcLwfc9dCx62hYKx0Q8CH+QnT6o/bDlmyf0Cct7aPQIJpOJC4Z2a9ce78zoMNqRPfNIi/PJqr5cfkoPj0vkOPrRHr58ONMHFbWrxUdrZGDivqsi7kzs34kzBxZxwdBurW2kzrFhy/zJrRO+LCkmTjHQbHXT+LbbsgYztNKTXgWBH8dxY6UHLx3GI1cMZ3A396awk7vlBjVhzpv7Lh7GnDG96BqmGzk5u3vmYB66rO32tTeM78PG2ycFdayHnW6D62twwKvfHc8r3x1v6JjOtUJv1v/gtHZlcJWoayZtuHVi6/+jiq3cPXNw68rIAFlR6n8IbWiA6JC652VwvLaKFA8/9C3zJ/OrV3ay6qMDPo/h7Ye5YNoAhvXIo7RXPvdfOpzq+kZ+8Mz2cGTbME+TA/1ZfuVIahqa2k0Si7S+nbIiOuEtXEYVGxuJlu9leYnvTuxLnYd7ITg8OWcUJUXZbgtUOo96ykpN4XiT7f4mpb3yeW/fccwmU1zcitsxQdSxvEb/Tll8fsT9/gtPXF1KZW2jx5pQU3P0I50EB+HmT5cM5YN9x8n10FFmlKPj2/XHmZeRyrdG2caip1tMpFvan2xf/e54GmPwQ/AnIzXF6xDMcDhbFTGqOJ9fvxrYTYH8eeHGsZy/bHNIx+gVQI0tmFFWc8f5XgjR071MXL9Xj145kg2fHyXdYqabvTmuhRbOH9KVnYeq+E6QNbq0FBP1TaF9H0cXW1kwbQDn2FcaeOSKEUz780a3dMrHwIwQsxAUaVYSbgqz0jgjxJu+tLTeA8L/ycLitHy369XlhcOdlllI1HYCA3553mAuHtEj7MftZuBWk56WG3GWn5nKlvmTDb1etMbxu34V+hZm8e3RtosO529cRmoKPzt7oNdaiy/zTz8p5FFiJpMJk8nExcO7tzYHBbMMf3MMLpgkOIiIcHyVjVxH/uq8wR63F2alUuRh5igEdzvIZNEtAn0URsViPL6rUL4Zjk751+dNYHZpz9btsW6dao7B912alZLMnDHF3LVmJ11yg287d7S7+1poLJCTd3eXq1vHDGnn4YqA30H0a285lVi3SA2yT8wbEuZx6kaLNaRbLt3y0jkQwj2oJ/UvZGfZiaD3jwYjfQnB3BfkH1eXsvnLYyE1qUZCk/cumYiJr3dARNz5Q7tx9cT+Qd+QHOCGU3vTIy/d4yquDq01hyB6BHMzLDx8+XCfbbCeRLOz2JsJ/Qp58aZxQY8wevGmcVTWNQa0z/cm9eOB9Xv8putdkMneYzV+0y2NwE1vEkVJ52z3i5I4cPqATmz7piKqr2koOCilxgG/1VpPVUqVAI9h+/1vB+ZprZuVUouBmUAjcJvWenOk0oap7CJIqSlmZg1zX+zOWVufQ3Cv4TwC5rkbxnKwso4lL+vgDhZloQw97ZqbHvD+14wtZlRxPnNXfOAz3cKzB3LTUx8yMgwz31WXHPShqtbHGRYztT5GHAXjvkuGog9W+U8YZU9cVUrnAG4kFA5Xje7FfevaLgD6FWax52jwF3hG+A0OSqk7gKsBRz1zKbBQa/2mUuphYJZS6kts938eBxQDzwBjIphWxLkBRdnsOFgVlpUle+RnuM1xSKQFzKLB0ck5pFsuR6rb36v6spE9KOmc1TqKZ2L/TiG/Xt/CzHbBIRLG9y1kfF/Pd2czOnw2EoysMOvPQ5cNN7Too4Pr7+ivV47kcAD3JA+Gkdx9BlwM/MP+eBTwlv3/1cDZgAbWaq1bgL1KKYtSqihSabXWZcEXWUTD/ZcM47MjJyIyc3nV9WMMraQZadeNK2b5pq+C3n9Y91y2fVMZ1L5zxvRqN8SylzWTJ64upX+nLBa99Gm7tM53Q3vllvHkZ4bemjzj5K6s+TQ2P8NV148xNAor3H593mCOGDwhn9Irn/f3Hff6fCCrFHuSm2GJeL+I36NrrZ9RSvV12mSyn6wBKoF8IA844pTGsT1SaSU4BOC6ccVYo9wen5+ZSqmnm9h0ILdM7BdScPjzZcOpbQiuKcbT5DgjfTTOq9L2LbQF2POHBH5Xtp4xODk7RPPCwDYU2zZr5ywffWyu7r1oCAcqgh8UEA+CCT3O3+ZcoByosP/vuj1SadvJyUnHYgl+CF1KihmrNfQ1++PVgvPaL70dj+XNSDWT6idfZnstJC8vM+D8eyuz67Zg35dIv5/Z2bYmnNTUFL+vlWqfrGc2m3ymtVqz2PWLcwLOy7ofTaG7042UrNas1maP/PxMskK8J4M3WxacQWqKmWwfzTEpKWbSUm3PZ2WleS2/0c/r6ZtOZfXHB+jaObCmJCvQM8z3YXrjh5NJs5ix5rYPzJH6PQfzKb6vlJqqtX4TmAG8AewGfqeU+gPQCzBrrQ8rpSKS1jVDVVWhRWirNSuk0TuJJh7L+/q8CZjAZ76a7eP5KipqKA9who63MrtuC/Z9ifT7eeKE7Tve0NDk97VuPrU35VX1TCrpFNZ8/fD0kxjUJYfMlpZ2xy0vr24dunz8eA31EZxJ3gCU13hv2rFas6irt432qq6u91p+o+9L90wL143uFRe/lxwT0NTslpdQfs9FRT7uXhjE8eYDy5RSacAOYKXWukkptR7YiG1i3bwIpxUdjJGb7jhEYr2cif0Lwz7aJpwcC//NMHBb0F7WTO6/dBhZaRbqq8PXaXml06SwRBAP6yolMkPBQWv9BXCq/f+d2EYQuaZZAixx2RaRtEKE271xPra/e16G4SUsYqFfpyzb6LRYZ0SEjUyCE0npgqFd0YfiexZwIrnvkmHog1URXZxQRJcEB5GUFk13v/uWCJ41M5VxfQtinQ0RRhIchAjB1JJOAS93IUQikOAgRAh+P2uI/0QdlOsSGomgT0EmXxpYX0pIcBBCBGn5lSNpaI7fEV6eLJs9wtDig0Lu5yASyHn2mbz5Bu4vLCIvzWIO6parkTahn209Jk+rqxZkpTGip++bGwmb+PtkhfDihvG9uWZsscd77ArhcM7gLkzsX+jzfiPCP3n3RMIwmUykWWQkvfBPAkPo5BJMCCGEGwkOQggh3EhwEEII4UaCgxBCCDcSHIQQQriR4CCEEMKNBAchhBBuTI47OAkhhBAOUnMQQgjhRoKDEEIINxIchBBCuEnaBUiUUmbgz8AIoA64QWu9O7a5Co1SKhVYDvQF0oG7gU+Ax4AWYDswT2vdrJRaDMwEGoHbtNablVIlntJGuRgBU0p1AbYC07CV5zE6cHkBlFILgAuANGzf47fowOW2f7f/ju273QTcSAf+rJVS44Dfaq2nest7IOX0lNZfHpK55nAhkKG1Hg/8FLgnxvkJh6uAI1rrScAM4AFgKbDQvs0EzFJKlQJTgHHAbOBB+/5uaaOc/4DZTxqPAI5F+jt0eQGUUlOBCcBp2MpVTMcv97mARWs9AbgL+CUdtMxKqTuAR4EM+6aQyukjrU/JHBwmAi8DaK3fAUbHNjth8TSwyOlxIzAK21UlwGrgLGxlX6u1btFa7wUsSqkiL2nj3R+Ah4H99scdvbwA04FtwCrgBeBFOn65d2LLvxnIAxrouGX+DLjY6XGo5fSW1qdkDg55wHGnx01KqYRuZtNaV2mtK5VSucBKYCFg0lo7xitXAvm4l92x3VPauKWUuhYo01qvcdrcYcvrpDO2i5nLgO8A/wTMHbzcVdialD4FlgH30UE/a631M9iCn0Oo5fSW1qdkDg4VQK7TY7PWOuHvFK+UKgbeAP6htV4BOLer5gLluJfdsd1T2nh2HTBNKfUmMBJ4HOji9HxHK6/DEWCN1rpea62BWtr/2DtiuW/HVuaB2PoJ/46tv8WhI5bZIdTfsLe0PiVzcNiArR0TpdSp2KrpCU0p1RVYC/xEa73cvvl9exs12Poh1mMr+3SllFkp1RtbYDzsJW3c0lpP1lpP0VpPBT4A5gCrO2p5nbwNnKOUMimlegDZwGsdvNzHaLv6PQqk0oG/2y5CLae3tD4ldDNKiFZhu+r8H7aOm7kxzk843AkUAIuUUo6+h1uB+5RSacAOYKXWukkptR7YiO0CYZ497XxgmXPaqOY+PNzK0NHKq7V+USk1GdhMW3n20LHLfS+w3F6eNGzf9Xfp2GV2COk77SOtT7J8hhBCCDfJ3KwkhBDCCwkOQggh3EhwEEII4UaCgxBCCDcSHIQQQriR4CCEEMKNBAchhBBuJDgIIYRw8//laz1whmxI2wAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plt.plot(approx.hist[10:]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Extraction of characteristic words of topics based on posterior samples\n", "By using estimated variational parameters, we can draw samples from the variational posterior. To do this, we use function `sample_vp()`. Here we use this function to obtain posterior mean of the word-topic distribution $\\beta$ and show top-10 words frequently appeared in the 10 topics. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To apply the above function for the LDA model, we redefine the probabilistic model because the number of documents to be tested changes. Since variational parameters have already been obtained, we can reuse them for sampling from the approximate posterior distribution. " ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Topic #0: people think god don just said know say time like\n", "Topic #1: file use windows drive program using scsi does like software\n", "Topic #2: ax max g9v b8f 75u a86 34u bhj pl 1d9\n", "Topic #3: space key use information chip new encryption data public government\n", "Topic #4: just like don good year time car think team better\n", "Topic #5: know don like just does thanks think good ve need\n", "Topic #6: 00 10 25 11 15 20 17 12 16 14\n", "Topic #7: know like thanks new mail good just does don price\n", "Topic #8: edu com mail cs like send just know don list\n", "Topic #9: know like just don does thanks good edu mail new\n" ] } ], "source": [ "def print_top_words(beta, feature_names, n_top_words=10):\n", " for i in range(len(beta)):\n", " print((\"Topic #%d: \" % i) + \" \".join([feature_names[j]\n", " for j in beta[i].argsort()[:-n_top_words - 1:-1]]))\n", "\n", "\n", "doc_t.set_value(docs_tr.toarray())\n", "samples = pm.sample_approx(approx, draws=100)\n", "beta_pymc3 = samples['beta'].mean(axis=0)\n", "\n", "print_top_words(beta_pymc3, feature_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We compare these topics to those obtained by a standard LDA implementation on scikit-learn, which is based on an online stochastic variational inference (Hoffman et al., 2013). We can see that estimated words in the topics are qualitatively similar. " ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/anaconda3/lib/python3.6/site-packages/sklearn/decomposition/online_lda.py:314: DeprecationWarning: n_topics has been renamed to n_components in version 0.19 and will be removed in 0.21\n", " DeprecationWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 49.7 s, sys: 538 ms, total: 50.3 s\n", "Wall time: 37.9 s\n", "Topic #0: people gun armenian war armenians turkish states said state 000\n", "Topic #1: government people law mr president use don think right public\n", "Topic #2: space science nasa program data research center output earth launch\n", "Topic #3: key car chip used keys bit bike clipper use number\n", "Topic #4: edu file com mail available ftp image files information list\n", "Topic #5: god people does jesus think believe don say just know\n", "Topic #6: windows drive use thanks does card know problem like db\n", "Topic #7: ax max g9v pl b8f a86 cx 34u 145 1t\n", "Topic #8: just don like know think good time ve people year\n", "Topic #9: 00 10 25 15 20 12 11 16 14 17\n" ] } ], "source": [ "from sklearn.decomposition import LatentDirichletAllocation\n", "\n", "lda = LatentDirichletAllocation(n_topics=n_topics, max_iter=5,\n", " learning_method='online', learning_offset=50.,\n", " random_state=0)\n", "%time lda.fit(docs_tr)\n", "beta_sklearn = lda.components_ / lda.components_.sum(axis=1)[:, np.newaxis]\n", "\n", "print_top_words(beta_sklearn, feature_names)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Predictive distribution\n", "In some papers (e.g., Hoffman et al. 2013), the predictive distribution of held-out words was proposed as a quantitative measure for goodness of the model fitness. The log-likelihood function for tokens of the held-out word can be calculated with posterior means of $\\theta$ and $\\beta$. The validity of this is explained in (Hoffman et al. 2013). " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def calc_pp(ws, thetas, beta, wix):\n", " \"\"\"\n", " Parameters\n", " ----------\n", " ws: ndarray (N,)\n", " Number of times the held-out word appeared in N documents. \n", " thetas: ndarray, shape=(N, K)\n", " Topic distributions for N documents. \n", " beta: ndarray, shape=(K, V)\n", " Word distributions for K topics. \n", " wix: int\n", " Index of the held-out word\n", " \n", " Return\n", " ------\n", " Log probability of held-out words.\n", " \"\"\"\n", " return ws * np.log(thetas.dot(beta[:, wix]))\n", "\n", "def eval_lda(transform, beta, docs_te, wixs):\n", " \"\"\"Evaluate LDA model by log predictive probability. \n", " \n", " Parameters\n", " ----------\n", " transform: Python function\n", " Transform document vectors to posterior mean of topic proportions. \n", " wixs: iterable of int\n", " Word indices to be held-out. \n", " \"\"\"\n", " lpss = []\n", " docs_ = deepcopy(docs_te)\n", " thetass = []\n", " wss = []\n", " total_words = 0\n", " for wix in wixs:\n", " ws = docs_te[:, wix].ravel()\n", " if 0 < ws.sum():\n", " # Hold-out\n", " docs_[:, wix] = 0\n", " \n", " # Topic distributions\n", " thetas = transform(docs_)\n", " \n", " # Predictive log probability\n", " lpss.append(calc_pp(ws, thetas, beta, wix))\n", " \n", " docs_[:, wix] = ws\n", " thetass.append(thetas)\n", " wss.append(ws)\n", " total_words += ws.sum()\n", " else:\n", " thetass.append(None)\n", " wss.append(None)\n", " \n", " # Log-probability\n", " lp = np.sum(np.hstack(lpss)) / total_words\n", " \n", " return {\n", " 'lp': lp, \n", " 'thetass': thetass, \n", " 'beta': beta, \n", " 'wss': wss\n", " }" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`transform()` function is defined with `sample_vp()` function. This function is an argument to the function for calculating log predictive probabilities. " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "inp = tt.matrix(dtype='int64')\n", "sample_vi_theta = theano.function(\n", " [inp],\n", " approx.sample_node(approx.model.theta, 100, more_replacements={doc_t: inp}).mean(0)\n", ")\n", "def transform_pymc3(docs):\n", " return sample_vi_theta(docs)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 50.6 s, sys: 740 ms, total: 51.3 s\n", "Wall time: 40.8 s\n", "Predictive log prob (pm3) = -6.184892268516339\n" ] } ], "source": [ "%time result_pymc3 = eval_lda(transform_pymc3, beta_pymc3, docs_te.toarray(), np.arange(100))\n", "print('Predictive log prob (pm3) = {}'.format(result_pymc3['lp']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We compare the result with the scikit-learn LDA implemented. The log predictive probability is comparable with AEVB-ADVI, and it shows good set of words in the estimated topics." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 1min 40s, sys: 1.26 s, total: 1min 41s\n", "Wall time: 1min 17s\n", "Predictive log prob (sklearn) = -6.014771065227896\n" ] } ], "source": [ "def transform_sklearn(docs):\n", " thetas = lda.transform(docs)\n", " return thetas / thetas.sum(axis=1)[:, np.newaxis]\n", "\n", "%time result_sklearn = eval_lda(transform_sklearn, beta_sklearn, docs_te.toarray(), np.arange(100))\n", "print('Predictive log prob (sklearn) = {}'.format(result_sklearn['lp']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "We have seen that PyMC3 allows us to estimate random variables of LDA, a probabilistic model with latent variables, based on automatic variational inference. Variational parameters of the local latent variables in the probabilistic model are encoded from observations. The parameters of the encoding model, MLP in this example, are optimized with variational parameters of the global latent variables. Once the probabilistic and the encoding models are defined, parameter optimization is done just by invoking an inference (`ADVI()`) without need to derive complex update equations. \n", "\n", "This notebook shows that even mean field approximation can perform as well as sklearn implementation, which is based on the conjugate priors and thus not relying on the mean field approximation." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "* Kingma, D. P., & Welling, M. (2014). Auto-Encoding Variational Bayes. stat, 1050, 1.\n", "* Kucukelbir, A., Ranganath, R., Gelman, A., & Blei, D. (2015). Automatic variational inference in Stan. In Advances in neural information processing systems (pp. 568-576).\n", "* Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.\n", "* Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. W. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1), 1303-1347.\n", "* Rezende, D. J., & Mohamed, S. (2015). Variational inference with normalizing flows. arXiv preprint arXiv:1505.05770.\n", "* Salimans, T., Kingma, D. P., & Welling, M. (2015). Markov chain Monte Carlo and variational inference: Bridging the gap. In International Conference on Machine Learning (pp. 1218-1226)." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "pymc3 3.8\n", "arviz 0.8.3\n", "numpy 1.17.5\n", "last updated: Thu Jun 11 2020 \n", "\n", "CPython 3.8.2\n", "IPython 7.11.0\n", "watermark 2.0.2\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -n -u -v -iv -w" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2" }, "latex_envs": { "bibliofile": "biblio.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": true, "eqNumInitial": 0 }, "nav_menu": {}, "toc": { "navigate_menu": true, "number_sections": true, "sideBar": true, "threshold": 6, "toc_cell": false, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }