{ "cells": [ { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "## A Hacker's Guide to Having Your NYTimes Article Comments Noticed\n", "\n", "

\n", "Gavril Bilev\n", "allattention at gmail.com\n", "June 14, 2018\n", "

\n", "\n", "### The problem: predicting the number of upvotes ('recommendations') that a comment posted to an article on the [New York Times](www.nytimes.com) will receive. \n", "\n", "![Imgur](https://i.imgur.com/Vm9tAsj.png?1)\n", "\n", "\n", "### **TL&DR**\n", "If you want upvotes:\n", "* **Time is of the essence, the sooner you comment, the better!**\n", "* **Commenting on comments doesn't bring in the upvotes**\n", "* **There is an optimal article length for getting upvoted comments - at about 800 words.**\n", "* **Print page - most of the important big-headline articles appear in the first 30 pages of the paper, where they attract more readers and more comments.**\n", "* **Getting the coveted 'NYTimes Pick' helps a lot. So does being a trusted user or a NYTimes reporter.**\n", "* **Hot button issues bring about reactions, but not always upvotes** \n", "* **Effort pays off:**\n", " * say more\n", " * use a rich vocabulary \n", " * refer to people, places and organizations, but sparingly \n", " * spell-check\n", "* **Don't be too negative or too positive, be slightly positive.**\n", "\n", "\n", "\n", "\n", "\n", "Readers of the [Gray Lady](www.nytimes.com) are able to post comments on articles and react to the comments of others by either upvoting ('Recommend' button) or replying. For a comment-author, recommendations are desirable because they bring about more visibility - recommended comments float up to the top where they are seen by more readers and can receive even more upvotes. Once a comment 'snowballs' it can be seen by potentially millions of readers. Presumably we write comments because we want others to see them.\n", "\n", "I got curious about what makes some comments rise to the top while most others are completely ignored. [Aashita Kesarwani](https://www.kaggle.com/aashita https://www.kaggle.com/aashita) posted a cool dataset on [Kaggle](www.kaggle.com) of more than 2 million comments geared toward addressing this and similar questions. Be sure to check out the dataset [here](https://www.kaggle.com/aashita/nyt-comments) as well as her wonderful exploratory data analysis of it [here](https://www.kaggle.com/aashita/exploratory-data-analysis-of-comments-on-nyt/data). The data come from two time periods: Jan-May 2017 and Jan-April 2018 and contain features on both the comments (with the full raw text body of each comment) and the more than 9 thousand NYTimes articles the comments were responding to.\n", "\n", "I like this as a prediction task because it is challenging. Presumably what makes people upvote comments or posts, not just at the Times but also in other social media settings (Reddit, Twitter, FB, etc) is the _meaning_ of the comment - they find it helpful, or funny or simply agree with it and want others to see it too. Since we can't easily quantify 'meaning,' we'll have to rely on some feature engineering in order to get any traction with our predictions. The task is also challenging because we don't know much about the readers who are responding to comments and do not have the full text of the articles. Additionally, some comments are pretty short, so they are not easily amenable to a more complicated language model. After extracting what we can from the features already present in the dataset, we will have to wrangle useful information out of noisy and messy raw text - the body of the comments.\n", "\n", "The main point to consider here is that we need to set our expectations low, because this is no handwritten digit recognition exercise with 99%+ accuracy. Even a human scorer would probably not do well in terms of predicting upvotes. Consider these two comments, both made in response to the same article:\n", "\n", "> A. '_Everyone should have walked out. Spicer could have talked to himself._'\n", "\n", "> B. '_If people are \"alarmed\" and \"appalled\" that Trump did this, followed by his cronies justification of it, then they haven't been paying attention. I'm just surprised he took this long to do it. The story here, the one that people should actually be alarmed and appalled by is that the rest of the press stayed. Brietbart, One America, not so surprising but there is absolutely no excuse for rest to have shown so little respect for the one of the most important and defining features of America... The First Amendment, a free press. To echo Mr Kahn's question to President Trump...have you even read the constitution? Here, let me give you a head start... Amendment I: Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances._'\n", "\n", "Without any contextual knowledge, I would have guessed that the two comments would be roughly on par or the first one would receive fewer upvotes. They are generally similar in terms of the reaction to the article, though the second comment is longer and contains more information. The reality is quite different: comment A received over 10,000 upvotes and is the most upvoted comment in the entire dataset, while comment B got ... zero upvotes. \n", "\n", "If you are interested in the takeaways (and one very plausible explanation about the popularity of these two comments, it is not simply brevity!), skip right to the last section. In the sections that follow we will go through the traditional steps of classification, with an emphasis on feature engineering out of the numerical and text data.\n", "\n", "The original target variable here is a count variable of recommendations. For the artificial purpose of using classification algos and metrics, I will discretize it to four meaningful and roughly equal-sized categories. Since the largest of them is about 29%, this is our prediction baseline: if we simply picked that always, our classification would be accurate 29% of the time.\n", "\n", "Categories:\n", "1. None ~29%\n", "2. One or two ~29%\n", "3. Three to eight ~23%\n", "4. More than eight (up to tens of thousands) ~18%\n", "\n", "Naturally, you also choose to keep the original target variable (though cropping the right tail of the distribution might be a good idea due to how skewed it is) and rely on regression and metrics such as mean squared error. (In that case, it would probably be most appropriate to use a negative binomial model for the first simple fit, due to the count nature of the data and the fact that zero has meaning.)\n", "\n", "\n", "### Table of contents:\n", "1. [Data Loading & Preparation](#data)\n", " 1. [Load data files & combine](#dataloading)\n", "\t2. [Discretize **recommendations** (target)](#cut)\n", "\t3. [Turn categorical to dummies](#todummies)\n", "2. [Initial prediction](#initpredict)\n", "\t1. [Simple Multinomial Logistic](#logit)\n", "\t2. [A bag of classifiers](#classifiers)\n", "3. [Feature Engineering](#featureengineering)\n", " 1. [Features based on original variables](#originalvars)\n", " 1. [Replyupvotes](#reply)\n", " 2. [byline](#byline)\n", " 3. [Time](#time)\n", " 2. [Features based on raw text data](#textdata)\n", " 1. [A NYTimes vocabulary & IDF profile](#vocab)\n", "\t\t2. [Basic stats, sentiment analysis, spelling errors & part of speech](#kitchensink)\n", "\t\t3. [Text token features](#texttokens)\n", "4. [Training & Re-evaluation](#training)\n", "\t1. [Simple Multinomial Logistic](#finallogit)\n", "\t2. [A bag of classifiers](#finalclassifiers)\n", "5. [Takeaways](#tldr)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Data loading \n", "\n", "First, let's prepare the data. We'll load almost all features present in the .csv files in their original shapes into a large dataframe, drop features that cannot be useful (too many missing cases) and convert the categorical features to dummies. This will enable us to do a first-cut prediction without feature engineering - almost as if we know nothing about the dataset and are doing a blind prediction. 'Almost' because we will deliberately exclude one feature that is surely related to the target - the number of replies a comment has received (**replyCount**). We would expect this to be an effect (consequence), rather than a cause of the number of upvotes - upvoted comments are highly visible and therefore much more likely to receive replies.\n", "\n", "We will initially ignore the raw text features (body of comments and keywords of the article) but keep them for later. Finally, to speed up our work here, we will take a random sample of about 10% of the cases (sometimes called 'dev set'). \n", "\n", "Load libraries and already-downloaded data first. There are two separate sets of files, the first containing comments and comment-features and the second containing features about the articles. We read in all of the data, merge them into one Pandas Dataframe where each row is a comment and eliminate all duplicates. \n", " \n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "import glob\n", "import pandas as pd\n", "import numpy as np\n", "import calendar\n", "import warnings\n", "import matplotlib as mpl\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "from sklearn.feature_extraction.text import CountVectorizer\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "# set a few options:\n", "sns.set(style='whitegrid')\n", "warnings.filterwarnings(\"ignore\")\n", "pd.options.display.max_colwidth = 100\n", "%matplotlib inline\n", "# Kernel to predict upvotes ('recommendations' as target); will cut into intervals to turn this into a classification exercise\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "1. Load data files & combine " ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 9335 entries, 0 to 1323\n", "Data columns (total 16 columns):\n", "abstract 167 non-null object\n", "articleID 9335 non-null object\n", "articleWordCount 9335 non-null int64\n", "byline 9335 non-null object\n", "documentType 9335 non-null object\n", "headline 9335 non-null object\n", "keywords 9335 non-null object\n", "multimedia 9335 non-null int64\n", "newDesk 9335 non-null object\n", "printPage 9335 non-null int64\n", "pubDate 9335 non-null object\n", "sectionName 9335 non-null object\n", "snippet 9335 non-null object\n", "source 9335 non-null object\n", "typeOfMaterial 9335 non-null object\n", "webURL 9335 non-null object\n", "dtypes: int64(3), object(13)\n", "memory usage: 1.2+ MB\n", "None\n", "\n", "Int64Index: 9335 entries, 0 to 1323\n", "Data columns (total 7 columns):\n", "articleID 9335 non-null object\n", "byline 9335 non-null object\n", "headline 9335 non-null object\n", "keywords 9335 non-null object\n", "pubDate 9335 non-null object\n", "snippet 9335 non-null object\n", "documentType 9335 non-null object\n", "dtypes: object(7)\n", "memory usage: 583.4+ KB\n", "None\n" ] } ], "source": [ "# small function to read all .csv files containing a string\n", "def read_files(string='Articles', path='./Data/'):\n", " '''Read zipped .csv files starting with string and concat them into a pd.DataFrame'''\n", " files = glob.glob(path + string + '*.csv.zip')\n", " list_dfs = []\n", " for csv in files:\n", " df_ = pd.read_csv(csv)\n", " list_dfs.append(df_)\n", " return pd.concat(list_dfs)\n", "\n", "\n", "# read in the articles files\n", "df_articles = read_files('Articles')\n", "\n", "# brief look inside:\n", "print(df_articles.info())\n", "\n", "# Leave only features which are not already present in the comments data, do not have too many missing values and could potentially be useful (poked around a bit)\n", "useful_columns = ['articleID',\n", " 'byline',\n", " 'headline',\n", " 'keywords',\n", " 'pubDate',\n", " 'snippet',\n", " 'documentType'\n", " ]\n", "\n", "df_articles = df_articles[useful_columns]\n", "print(df_articles.info())" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Do the same for the comments files, merge the two into one DataFrame and drop all duplicates:\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 2176364 entries, 0 to 233406\n", "Data columns (total 34 columns):\n", "approveDate int64\n", "articleID object\n", "articleWordCount float64\n", "commentBody object\n", "commentID float64\n", "commentSequence float64\n", "commentTitle object\n", "commentType object\n", "createDate float64\n", "depth float64\n", "editorsSelection int64\n", "inReplyTo float64\n", "newDesk object\n", "parentID float64\n", "parentUserDisplayName object\n", "permID object\n", "picURL object\n", "printPage float64\n", "recommendations float64\n", "recommendedFlag float64\n", "replyCount float64\n", "reportAbuseFlag float64\n", "sectionName object\n", "sharing int64\n", "status object\n", "timespeople float64\n", "trusted float64\n", "typeOfMaterial object\n", "updateDate int64\n", "userDisplayName object\n", "userID float64\n", "userLocation object\n", "userTitle object\n", "userURL object\n", "dtypes: float64(15), int64(4), object(15)\n", "memory usage: 581.2+ MB\n", "None\n", "(2086862, 40)\n" ] } ], "source": [ "# read in the Comments files (large-ish, 500M+)\n", "df_comments = read_files('Comments')\n", "print(df_comments.info())\n", "# join the two by articleID; drop the comments that are not associated with articles\n", "df = pd.merge(df_comments, df_articles, how='inner',\n", " on='articleID')\n", "# drop duplicate comments\n", "df.drop_duplicates(subset='commentID', inplace=True)\n", "print(df.shape)\n", "df.reset_index(drop=True, inplace=True)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Count missing values and drop the features if there are too many (> 1 million) missing. We can come back later and try to extract useful information from them." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "approveDate 0\n", "articleID 0\n", "articleWordCount 0\n", "commentBody 0\n", "commentID 0\n", "commentSequence 0\n", "commentTitle 73979\n", "commentType 0\n", "createDate 0\n", "depth 0\n", "editorsSelection 0\n", "inReplyTo 0\n", "newDesk 0\n", "parentID 0\n", "parentUserDisplayName 1528830\n", "permID 22\n", "picURL 0\n", "printPage 0\n", "recommendations 0\n", "recommendedFlag 2086862\n", "replyCount 0\n", "reportAbuseFlag 2086862\n", "sectionName 149613\n", "sharing 0\n", "status 0\n", "timespeople 0\n", "trusted 0\n", "typeOfMaterial 0\n", "updateDate 0\n", "userDisplayName 641\n", "userID 0\n", "userLocation 480\n", "userTitle 2086552\n", "userURL 2086841\n", "byline 0\n", "headline 0\n", "keywords 0\n", "pubDate 0\n", "snippet 0\n", "documentType 0\n", "dtype: int64\n", "(2086862, 35)\n" ] } ], "source": [ "print(df.isnull().sum())\n", "\n", "# several features are missing for most or all cases, drop them\n", "df = df.loc[:, df.isnull().sum() < 10**6]\n", "print(df.shape)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Discretizing the target variable \n", "\n", "Next, look at the target (**Recommendations**) and turn it into four categories for classification. The original variable is a count heavily slanted toward zero:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Target is recommendations, heavily slanted distribution\n", "sns.distplot(df.recommendations.loc[df.recommendations < 30], kde = False)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Take a look at the bottom of the distribution, dominated by zeros:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.0 385013\n", "1.0 278544\n", "2.0 209162\n", "3.0 162014\n", "4.0 127792\n", "5.0 103044\n", "6.0 84626\n", "7.0 71178\n", "8.0 60258\n", "9.0 51066\n", "Name: recommendations, dtype: int64\n" ] } ], "source": [ "print(df.recommendations.value_counts().head(10))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "and the top:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "504511 7709.0\n", "77679 7938.0\n", "696412 8124.0\n", "1027939 8125.0\n", "1987659 8160.0\n", "696432 8514.0\n", "2062146 8639.0\n", "1738646 8713.0\n", "915605 9279.0\n", "2062163 10472.0\n", "Name: recommendations, dtype: float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.recommendations.sort_values().tail(10)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "So the vast majority have fewer than 15 upvotes and there are only a handful with over 6000. Let's slice the variable into four roughly equal sized categories, and rename it to **recs**. We also change it to an _int_ that ranges 1 to 4 in order to make throwing it into a variety of models a bit easier (including multinomial logistic regression). (The alternative is a labeled categorical variable)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3, 9) 0.292\n", "[9, 100000) 0.290\n", "[1, 3) 0.234\n", "[0, 1) 0.184\n", "Name: reclabels, dtype: float64\n" ] } ], "source": [ "# cut the interval into 4 bins\n", "df['reclabels'] = pd.cut(df.recommendations, bins=(0, 1, 3, 9, 100000),\n", " include_lowest=True, right=False)\n", "print(df.reclabels.value_counts(normalize=True).round(3))\n", "# change it to int\n", "df['recs'] = df.reclabels.cat.codes\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "The distribution of the new categorical variable looks much more balanced:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAEGCAYAAACpXNjrAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFdNJREFUeJzt3X+Q3fVd7/HnEijFHzRpq5FJMgO28d2JGaEsknjr3FtB0wUZg/dSBLUJyKV/QFs66i20FydzKfTGX20zShk7kLJxaiGltERn6TZDq73eaWhYbkVbeGuMMEkEoiRAK7ZM8Nw/ziewruecHLKfc07OyfMxc2a/3/f3x+ez38ny4vv5fva7Y41GA0mSajph0B2QJI0ew0WSVJ3hIkmqznCRJFVnuEiSqjtx0B04VszMzDhtTpKOwvj4+NjcmuEyy/j4+KC7IElDZWZmpmXdYTFJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnU9/Q39iFgI3A6sBBrArwEJ3A2cDjwOXJqZByNiDNgEXAi8AFyRmQ+X86wHbiynvTkzJ0t9HLgTOAWYAq7LzEZEvL5VG738XqVBmlp35aC7cMy4cMunBt0F0fvXv2wCvpiZl0TEa4DvAz4EPJCZGyPiBuAG4HrgAmB5+awCbgNWlaDYAJxDM6BmImJbCYvbgKuBB2mGywRwfzlnqzYkqaOP/M/PDroLx4wP3fLOoz62Z8NiEfE64D8DdwBk5ouZ+SywFpgsu00CF5fltcCWzGxk5g5gYUScBrwD2J6ZB0qgbAcmyrZTM3NHZjaALXPO1aoNSVIf9PLO5Qzgn4BPRcSZwAxwHbA4M58s+zwFLC7LS4A9s47fW2qd6ntb1OnQRkftXsAmaXj4c1zPfK5lL8PlROBs4L2Z+WBEbKI5PPWy8nykp6+6fzVt+FZkDaupQXfgGDLfn+Ppe3dX6snw6+ZaDuKtyHuBvZn5YFm/h2bYPF2GtChf95ft+4Bls45fWmqd6ktb1OnQhiSpD3oWLpn5FLAnIqKUzge+BWwD1pfaeuC+srwNWBcRYxGxGniuDG1NA2siYlFELALWANNl2/MRsbrMNFs351yt2pAk9UGvZ4u9F/h0mSm2G7iSZqBtjYirgCeAS8u+UzSnIe+iORX5SoDMPBARHwZ2lv1uyswDZfkaXpmKfH/5AGxs04YkqQ96Gi6Z+Q2aU4jnOr/Fvg3g2jbn2QxsblF/iObv0MytP9OqDUlSf/gb+pKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6k7s5ckj4nHg28BLwKHMPCciXg/cDZwOPA5cmpkHI2IM2ARcCLwAXJGZD5fzrAduLKe9OTMnS30cuBM4BZgCrsvMRrs2evm9SpJe0Y87l5/JzLMy85yyfgPwQGYuBx4o6wAXAMvL593AbQAlKDYAq4BzgQ0Rsagccxtw9azjJo7QhiSpDwYxLLYWmCzLk8DFs+pbMrORmTuAhRFxGvAOYHtmHih3H9uBibLt1MzckZkNYMucc7VqQ5LUBz0dFgMawJciogH8UWZ+ElicmU+W7U8Bi8vyEmDPrGP3llqn+t4WdTq00dHMzEw3u0k6hvlzXM98rmWvw+WnM3NfRPwwsD0iHpu9sTwfafSyA6+mjfHx8V52ReqZqUF34Bgy35/j6Xt3V+rJ8OvmWrYLoJ4Oi2XmvvJ1P/B5ms9Mni5DWpSv+8vu+4Blsw5fWmqd6ktb1OnQhiSpD3oWLhHx/RHxg4eXgTXA3wDbgPVlt/XAfWV5G7AuIsYiYjXwXBnamgbWRMSi8iB/DTBdtj0fEavLTLN1c87Vqg1JUh/0clhsMfD5iDjczp9k5hcjYiewNSKuAp4ALi37T9GchryL5lTkKwEy80BEfBjYWfa7KTMPlOVreGUq8v3lA7CxTRs6RlzxqesG3YVjxp1Xbhp0F6TqehYumbkbOLNF/Rng/Bb1BnBtm3NtBja3qD8ErOy2DUlSf/gb+pKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6k7sdQMRsQB4CNiXmRdFxBnAXcAbgBngXZn5YkScDGwBxoFngF/KzMfLOT4IXAW8BLwvM6dLfQLYBCwAbs/MjaXeso1ef6+SpKZ+3LlcBzw6a/23gY9l5puBgzRDg/L1YKl/rOxHRKwALgN+HJgAPhERC0po3QpcAKwALi/7dmpDktQHPQ2XiFgK/Dxwe1kfA84D7im7TAIXl+W1ZZ2y/fyy/1rgrsz8Xmb+A7ALOLd8dmXm7nJXchew9ghtSJL6oNfDYh8HPgD8YFl/A/BsZh4q63uBJWV5CbAHIDMPRcRzZf8lwI5Z55x9zJ459VVHaKOjmZmZ7r4rqSL/3dXl9axnPteyZ+ESERcB+zNzJiLe3qt2ahofHx90F44fj2wZdA+OGTX+3U1V6MeomO/1nL53d6WeDL9urmW7AOrlsNjbgF+IiMdpDlmdR/Ph+8KIOBxqS4F9ZXkfsAygbH8dzQf7L9fnHNOu/kyHNiRJfdCzcMnMD2bm0sw8neYD+S9n5q8AXwEuKbutB+4ry9vKOmX7lzOzUeqXRcTJZRbYcuDrwE5geUScERGvKW1sK8e0a0OS1AeD+D2X64Ffj4hdNJ+P3FHqdwBvKPVfB24AyMxvAluBbwFfBK7NzJfKM5X3ANM0Z6NtLft2akOS1Ac9/z0XgMz8c+DPy/JumjO95u7zXeCdbY6/BbilRX2KFsPN7dqYj1/+wKdrnm6o/cnv/MqguyDpGOdv6EuSqjNcJEnVGS6SpOoMF0lSdYaLJKk6w0WSVJ3hIkmqznCRJFXXVbhExNZuapIkQfd3Lm9uUXtLzY5IkkZHx9e/RMTVwLuBH4uIr8/a9Doge9kxSdLwOtK7xb4E/B3wh8D/mFV/HnikV52SJA23juGSmU8ATwAr+9MdSdIo6OqtyBERwI3Am2Yfk5lV3zwsSRoN3b5y/y7gs8CngJd61x1J0ijoNlxOyMyP9LQnkqSR0e1U5K9FxE/0tCeSpJHR7Z3LKuDKiEjgu4eLPnORJLXSbbi8v6e9kCSNlK7CJTP/otcdkSSNjm6nIu8EGnPrDotJklrpdljsN2ctvxa4HPjH+t2RJI2CoxoWi4gvAX/Zkx5Jkobe0f49l1OBH6nZEUnS6DiaZy4nAD8K/H6vOiVJGm5H88zlELA7M5/sdEBEvBb4KnByaeeezNwQEWfQfJ3MG4AZ4F2Z+WJEnAxsAcaBZ4BfyszHy7k+CFxF89Uz78vM6VKfADYBC4DbM3Njqbdso8vvVZI0T10Ni5VnLv8X+GfgWeCfujjse8B5mXkmcBYwERGrgd8GPpaZbwYO0gwNyteDpf6xsh8RsQK4DPhxYAL4REQsiIgFwK3ABcAK4PKyLx3akCT1Qbd/5vgc4O+BzwNfAP4uIs7udExmNjLzO2X1pPJpAOcB95T6JHBxWV5b1inbz4+IsVK/KzO/l5n/AOwCzi2fXZm5u9yV3AWsLce0a0OS1AfdDottAn4tMx8AiIjzgD8A3tbpoHJ3MUPzzyTfSjOgns3MQ2WXvcCSsrwE2AOQmYci4jmaw1pLgB2zTjv7mD1z6qvKMe3a6GhmZqab3Y57Xqe6vJ51eT3rmc+17DZcvv9wsABk5pcj4qNHOigzXwLOioiFNO963nJ03eyP8fHx9hvvfqx/HTnGdbxO3Xpky/zPMSJqXM+pCv0YFfO9ntP37q7Uk+HXzbVsF0DdTkV+ISLefnglIv4L8EKXx5KZzwJfAX4KWBgRh0NtKbCvLO8DlpXznwi8juaD/Zfrc45pV3+mQxuSpD7oNlzeB0xGxN9GxN/SfI7x3k4HRMQPlTsWIuIU4OeAR2mGzCVlt/XAfWV5W1mnbP9yZjZK/bKIOLnMAlsOfB3YCSyPiDMi4jU0H/pvK8e0a0OS1AfdDostBH4S+OGyvh9YeYRjTqMZSAtohtjWzPyziPgWcFdE3Az8P+COsv8dwB9HxC7gAM2wIDO/GRFbgW/RnAZ9bRluIyLeA0zTnIq8OTO/Wc51fZs2JEl90G24/C5wdmbuB4iIE4DfA9rOGMvMR4C3tqjvpjnTa279u8A725zrFuCWFvUpWgw3t2tDktQf3Q6LjZXhJgAy899o3i1IkvQfdBsu346IVYdXyvK/9KZLkqRh1+2w2AeAL0TE4WcaK4D/2psuSZKGXbev3P9aebXKT5XS1zLzYO+6JUkaZt3euVDCxN/VkiQd0dH+PRdJktoyXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnUn9urEEbEM2AIsBhrAJzNzU0S8HrgbOB14HLg0Mw9GxBiwCbgQeAG4IjMfLudaD9xYTn1zZk6W+jhwJ3AKMAVcl5mNdm306nuVJP17vbxzOQT8RmauAFYD10bECuAG4IHMXA48UNYBLgCWl8+7gdsASlBsAFYB5wIbImJROeY24OpZx02Uers2JEl90LNwycwnD995ZOa3gUeBJcBaYLLsNglcXJbXAlsys5GZO4CFEXEa8A5ge2YeKHcf24GJsu3UzNyRmQ2ad0mzz9WqDUlSH/RsWGy2iDgdeCvwILA4M58sm56iOWwGzeDZM+uwvaXWqb63RZ0ObXQ0MzPTzW7HPa9TXV7Purye9cznWvY8XCLiB4DPAe/PzOcj4uVt5flIo5ftv5o2xsfH22+8+7FaXRp6Ha9Ttx7ZMv9zjIga13OqQj9GxXyv5/S9uyv1ZPh1cy3bBVBPZ4tFxEk0g+XTmXlvKT9dhrQoX/eX+j5g2azDl5Zap/rSFvVObUiS+qBn4VJmf90BPJqZH521aRuwviyvB+6bVV8XEWMRsRp4rgxtTQNrImJReZC/Bpgu256PiNWlrXVzztWqDUlSH/RyWOxtwLuAv46Ib5Tah4CNwNaIuAp4Ari0bJuiOQ15F82pyFcCZOaBiPgwsLPsd1NmHijL1/DKVOT7y4cObUiS+qBn4ZKZfwmMtdl8fov9G8C1bc61Gdjcov4QsLJF/ZlWbUiS+sPf0JckVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVXdir04cEZuBi4D9mbmy1F4P3A2cDjwOXJqZByNiDNgEXAi8AFyRmQ+XY9YDN5bT3pyZk6U+DtwJnAJMAddlZqNdG736PiVJ/1Ev71zuBCbm1G4AHsjM5cADZR3gAmB5+bwbuA1eDqMNwCrgXGBDRCwqx9wGXD3ruIkjtCFJ6pOehUtmfhU4MKe8Fpgsy5PAxbPqWzKzkZk7gIURcRrwDmB7Zh4odx/bgYmy7dTM3JGZDWDLnHO1akOS1Cc9GxZrY3FmPlmWnwIWl+UlwJ5Z++0ttU71vS3qndo4opmZmW53Pa55neryetbl9axnPtey3+HysvJ8pHEstTE+Pt5+492P1ejSSOh4nbr1yJb5n2NE1LieUxX6MSrmez2n791dqSfDr5tr2S6A+j1b7OkypEX5ur/U9wHLZu23tNQ61Ze2qHdqQ5LUJ/0Ol23A+rK8HrhvVn1dRIxFxGrguTK0NQ2siYhF5UH+GmC6bHs+IlaXmWbr5pyrVRuSpD7p5VTkzwBvB94YEXtpzvraCGyNiKuAJ4BLy+5TNKch76I5FflKgMw8EBEfBnaW/W7KzMOTBK7hlanI95cPHdqQJPVJz8IlMy9vs+n8Fvs2gGvbnGczsLlF/SFgZYv6M63akCT1j7+hL0mqznCRJFVnuEiSqjNcJEnVGS6SpOoMF0lSdYaLJKk6w0WSVJ3hIkmqznCRJFVnuEiSqjNcJEnVGS6SpOoMF0lSdYaLJKk6w0WSVJ3hIkmqznCRJFVnuEiSqjNcJEnVGS6SpOoMF0lSdYaLJKk6w0WSVJ3hIkmq7sRBd6BXImIC2AQsAG7PzI0D7pIkHTdG8s4lIhYAtwIXACuAyyNixWB7JUnHj5EMF+BcYFdm7s7MF4G7gLUD7pMkHTfGGo3GoPtQXURcAkxk5n8v6+8CVmXme9odMzMzM3oXQpL6YHx8fGxubWSfubxarS6OJOnojOqw2D5g2az1paUmSeqDUb1z2Qksj4gzaIbKZcAvD7ZLknT8GMk7l8w8BLwHmAYeBbZm5jcH2ytJOn6M5AN9SdJgjeSdiyRpsAwXSVJ1o/pAf+T4Opt6ImIzcBGwPzNXDro/wy4ilgFbgMVAA/hkZm4abK+GU0S8FvgqcDLN/z7fk5kbBturo+OdyxDwdTbV3QlMDLoTI+QQ8BuZuQJYDVzrv8+j9j3gvMw8EzgLmIiI1QPu01ExXIaDr7OpKDO/ChwYdD9GRWY+mZkPl+Vv05yhuWSwvRpOmdnIzO+U1ZPKZyhnXTksNhyWAHtmre8FVg2oL1JbEXE68FbgwQF3ZWiVkYoZ4M3ArZk5lNfSOxdJVUTEDwCfA96fmc8Puj/DKjNfysyzaL5Z5NyIGMrngobLcPB1NjqmRcRJNIPl05l576D7Mwoy81ngKwzp80HDZTi8/DqbiHgNzdfZbBtwnyQAImIMuAN4NDM/Ouj+DLOI+KGIWFiWTwF+DnhssL06Ov6G/pCIiAuBj9Ocirw5M28ZcJeGVkR8Bng78EbgaWBDZt4x0E4NsYj4aeD/AH8N/FspfygzpwbXq+EUET8BTNL8OT+B5qurbhpsr46O4SJJqs5hMUlSdYaLJKk6w0WSVJ3hIkmqznCRJFVnuEiSqjNcpGNARPieP40U/0FLAxIRDeB/AT8PfBH4rYi4HvhvNH829wFXZ+ZT5c0MH6H5KpCXgN2Z+YsR8Z+AP6T5P4onATdn5mf6/91I/553LtJg/Wtm/mRm/lZE/CrwJmB1Zp4NTAG/X/b7IPCjwNnlb31cXerXA79bXnS4Eri/v92XWvPORRqsyVnLvwCcAzwcEdD8+XyubLuI5h/kehEgM/+51L8C3BgRbwK2D+vr2TV6DBdpsL4za3mM5rDW5m4PzsyPR8SfAj8L/EFEfCkzb6zdSenVclhMOnZsA66JiEUAEXFyRJxZtv0Z8P7y7IWIeGP5+mOZ+feZ+UfAJpp/tVQaOO9cpGNEZv5xCY2/KMNiJwCfAP4K2Aj8b+AbEfEisAu4BHhfRPwM8CLNv7/+3kH0XZrLtyJLkqpzWEySVJ3hIkmqznCRJFVnuEiSqjNcJEnVGS6SpOoMF0lSdf8fe+AUm7GC4N4AAAAASUVORK5CYII=\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.countplot(df.recs)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Turn categorical to dummies \n", "\n", "We'll do it using a wrapper around the useful Pandas function **get_dummies**. (There is an analogous function in Scikit-learn.) Note the fact that we drop one (usually the first) category in order to prevent colinearity issues (at least in the case of Logistic Regression for example). Look at non-numeric features:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 2086862 entries, 0 to 2086861\n", "Data columns (total 18 columns):\n", "articleID object\n", "commentBody object\n", "commentTitle object\n", "commentType object\n", "newDesk object\n", "permID object\n", "picURL object\n", "sectionName object\n", "status object\n", "typeOfMaterial object\n", "userDisplayName object\n", "userLocation object\n", "byline object\n", "headline object\n", "keywords object\n", "pubDate object\n", "snippet object\n", "documentType object\n", "dtypes: object(18)\n", "memory usage: 286.6+ MB\n" ] } ], "source": [ "df.select_dtypes(include='object').info()" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Of these we'll pick the potentially useful ones and leave the raw text features (**byline**, **headline**, **keywords**, and **commentBody**) for now. One of the categorical variables in the dataset (whether or not the comment has been recommended by the NYTimes staff, **editorsSelection**) is really coded as _int_, so we include it here. We'll also ignore several features (such as **userLocation** and **userDisplayName**) for now, which contain quite a bit of noise, though they could be utilized later." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2086862, 121)\n", "\n", "RangeIndex: 2086862 entries, 0 to 2086861\n", "Columns: 121 entries, editorsSelection_1 to documentType_blogpost\n", "dtypes: uint8(121)\n", "memory usage: 240.8 MB\n", "None\n" ] } ], "source": [ "# one hot encoding function for categorical\n", "def OneHotEncoding(list_of_variables, data=df):\n", " '''One-hot-encodes a list of string (categorical) variables from a dataframe\n", " into binary features for modeling. It drops the last category to prevent colinearity problems (reference category).\n", " Outputs a dataframe the same number of rows as original df.'''\n", " new_df = []\n", " for var in list_of_variables:\n", " new_df.append(pd.get_dummies(\n", " data[var],\n", " drop_first=True,\n", " prefix=(var)))\n", " return pd.concat(new_df, axis=1)\n", "\n", "# \n", "list_of_categorical = ['editorsSelection',\n", " 'newDesk',\n", " 'sectionName',\n", " 'typeOfMaterial',\n", " 'commentType',\n", " 'documentType'\n", "]\n", "\n", "\n", "df_temp = OneHotEncoding(list_of_categorical, data=df)\n", "print(df_temp.shape)\n", "print(df_temp.info())\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "And let's grab the numeric features which could be meaningful and combine into a new dataframe. These are most if not all of the numerical features which could be related to the target: length measured in words of the article, page of the newspaper it appeared on, whether the author of the comments is a member of the NYTimes staff and so on. Again, we will not include the number of responses a comment has received as that should be a consequence, rather than a predictor of upvotes." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2086862, 129)\n" ] } ], "source": [ "list_of_numeric = ['approveDate',\n", " 'articleWordCount',\n", " 'depth',\n", " 'printPage',\n", " 'timespeople',\n", " 'trusted',\n", " 'sharing',\n", " 'recs'\n", "]\n", "# combine the two types of features\n", "X_full = pd.concat([df_temp, df[list_of_numeric]], axis=1)\n", "X_full.reset_index(drop=True, inplace=True)\n", "del df_temp\n", "print(X_full.shape)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "One last housekeeping chore: the raw text of the comment bodies, which we will use later, contain some html markup which we should remove (as suggested by the original author of the dataset). We do that with a simple wrapper:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# This list of artefacts to remove is from the original dataset exploration post by Aashita Kesarwani https://www.kaggle.com/aashita/exploratory-data-analysis-of-comments-on-nyt/code\n", "replacements = {\"(
)\": \"\",\n", " \"(
)\": \"\",\n", " '().*()': '',\n", " '(&)': '',\n", " '(>)': '',\n", " '(>)': '',\n", " '(<)': '',\n", " '(\\xa0)': ' ',\n", " }\n", "\n", "\n", "def preprocess(commentBody):\n", " '''Function to clean up the body of comments by removing artefacts'''\n", " for pattern, replacement in replacements.items():\n", " commentBody = commentBody.str.replace(pattern, replacement)\n", " return commentBody\n", "\n", "\n", "# clean up the comments\n", "df['commentBody'] = preprocess(df.commentBody)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "To wrap up the data loading, let's create a smaller dev set (only 5% or about 100K comments) that we can play with without waiting for long executions. We'll keep the index so that we may pull more features from the full dataset later (not run here)." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Int64Index: 104343 entries, 454762 to 846424\n", "Columns: 129 entries, editorsSelection_1 to recs\n", "dtypes: float64(5), int64(2), int8(1), uint8(121)\n", "memory usage: 18.5 MB\n", "None\n", "(104343, 129)\n" ] } ], "source": [ "dev_set_size = .05\n", "dev_set_index = X_full.sample(frac=dev_set_size, random_state=12).index\n", "X = X_full.loc[dev_set_index, :]\n", "print(X.info())\n", "print(X.shape)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Initial prediction \n", "\n", "First, let's load the libraries we will use for this and try to fit a simple multinomial logistic model. To fit the logistic regression we will need to remove features that have no variance (rare categories that end up with no examples because of our reduced sample, for example) or are extremely highly correlated with other features. Both of these pose problems for the simple logistic model (extreme or perfect multicolinearity) and are generally not going to contribute much to the prediction anyway.\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from scipy import stats\n", "from sklearn.linear_model import LogisticRegression, RidgeClassifier\n", "from sklearn.ensemble import RandomForestClassifier\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.preprocessing import StandardScaler, label_binarize\n", "from sklearn.pipeline import Pipeline, make_pipeline, FeatureUnion\n", "from sklearn.decomposition import PCA\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, roc_auc_score\n", "import statsmodels.api as st\n", "from xgboost.sklearn import XGBClassifier\n", "import xgboost\n", "scaler_standard = StandardScaler()\n", "################################################################################\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Next, let's create a simple function that reports the highest correlations within a dataset. We need it in order to eliminate features that are highly correlated and we'll be using it during the feature creation phase." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "depth commentType_userReply 0.989546\n", "sectionName_Book Review newDesk_BookReview 0.984232\n", "typeOfMaterial_Op-Ed newDesk_OpEd 0.969793\n", "typeOfMaterial_Letter newDesk_Letters 0.951944\n", "typeOfMaterial_Editorial newDesk_Editorial 0.945840\n", "typeOfMaterial_Op-Ed typeOfMaterial_News -0.812191\n", "typeOfMaterial_News newDesk_OpEd -0.806177\n", "sectionName_The Daily newDesk_Podcasts 0.774582\n", "sectionName_Retirement newDesk_SpecialSections 0.752679\n", "documentType_blogpost newDesk_Unknown 0.698441\n", "typeOfMaterial_Blog newDesk_Unknown 0.698441\n", "typeOfMaterial_Obituary (... newDesk_Obits 0.674114\n", "sectionName_Television newDesk_Culture 0.574249\n", "articleWordCount newDesk_Magazine 0.567521\n", "sectionName_Unknown sectionName_Politics -0.560913\n", "newDesk_Well sectionName_Family 0.552346\n", "sectionName_Politics newDesk_National 0.530280\n", "newDesk_Washington sectionName_Politics 0.528095\n", "sectionName_Europe newDesk_Foreign 0.479791\n", "typeOfMaterial_News sectionName_Politics 0.468780\n", "dtype: float64" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def top_correlations(df, n=20, abbreviate=True):\n", " '''Display top n correlations less than 1 from a pd.DataFrame'''\n", " original_names = df.columns\n", " if abbreviate:\n", " df.columns = [str(x)[:25] + '...' if len(x) >\n", " 25 else x for x in original_names]\n", " corrs = df.corr().unstack()\n", " corrs = corrs.reindex(corrs.abs().sort_values(ascending=False).index)\n", " corrs = corrs[corrs != 1]\n", " df.columns = original_names\n", " return corrs.iloc[::2].head(n)\n", "\n", "top_correlations(X)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Evidently many of the section and news desk categories largely overlap. Also, the feature **depth** which measures whether a comment is a reply to another comment or a reply to a reply, etc (level 1 is an original comment, level 2 is a response and so on) unsurprisingly overlaps with the 'userReply' value of **commentType**. For now we'll solve this be eliminating one of each pair, preferring to keep the more informative feature **depth** and eliminating all **newDesk** features, as the original variable has more missing values." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "sectionName_Unknown sectionName_Politics -0.560913\n", "sectionName_Politics typeOfMaterial_News 0.468780\n", "typeOfMaterial_News printPage -0.446288\n", " sectionName_Unknown -0.396765\n", "printPage sectionName_Unknown 0.388485\n", "recs depth -0.378531\n", "sectionName_Australia typeOfMaterial_Biography 0.377921\n", "printPage typeOfMaterial_Editorial 0.303589\n", "typeOfMaterial_Editorial typeOfMaterial_News -0.303015\n", "typeOfMaterial_News articleWordCount 0.299464\n", "typeOfMaterial_Blog sectionName_Room For Deba... 0.299045\n", "documentType_blogpost sectionName_Room For Deba... 0.299045\n", "typeOfMaterial_Review sectionName_Television 0.298029\n", "typeOfMaterial_News sectionName_Sunday Review -0.291467\n", "printPage sectionName_Politics -0.284387\n", "sectionName_Sunday Review sectionName_Unknown -0.275152\n", "typeOfMaterial_Review sectionName_Book Review 0.270038\n", "sectionName_Book Review typeOfMaterial_Interview 0.237462\n", "timespeople approveDate -0.215452\n", "sectionName_Unknown typeOfMaterial_Editorial 0.181141\n", "dtype: float64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# do not use inplace here, we want a copy! \n", "X = X.drop(['commentType_userReply',\n", " 'typeOfMaterial_Op-Ed'], \n", " axis=1)\n", "X = X.drop(list(X.filter(regex='newDesk')), axis=1)\n", "top_correlations(X)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "First fit, Multinomial Logistic: " ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "We separate the target and predictors, split into a train and test set, which we'll consider a hold-out set and not train on, and remove all features with zero variance within a category (otherwise our logistic regression optimization will not converge with the default settings)." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# separate out the target\n", "predictors = X.columns[X.columns != 'recs']\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X[predictors], X['recs'], test_size=0.25, random_state=12)\n", "\n", "# check for no variance within category of the target (problem for fitting regression)\n", "variance_per_category = X_train.groupby(y_train).var() == 0\n", "novariance = variance_per_category.sum() > 0\n", "X_train.drop(X_train.columns[novariance],\n", " axis=1,\n", " inplace=True)\n", "X_test.drop(X_test.columns[novariance],\n", " axis=1,\n", " inplace=True)\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 1.261488\n", " Iterations 7\n", " precision recall f1-score support\n", "\n", " 0 0.41 0.39 0.40 4893\n", " 1 0.35 0.15 0.21 6025\n", " 2 0.33 0.33 0.33 7506\n", " 3 0.43 0.64 0.51 7662\n", "\n", "avg / total 0.38 0.39 0.37 26086\n", "\n", "MLogit accuracy is about 0.39\n" ] } ], "source": [ "# fit \n", "# This is equivalent to using SKlearn's MLogit with 'newton-cg' solver\n", "# LogisticRegression(multi_class='multinomial', solver='newton-cg'),\n", "X_logit = st.add_constant(X_train, prepend=False)\n", "MLogit = st.MNLogit(y_train, X_logit)\n", "logit_fit = MLogit.fit()\n", "# this is necessary for the summary:\n", "stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)\n", "# print(logit_fit.summary())\n", "logit_y_hat = logit_fit.predict(exog=st.add_constant(X_test, prepend=False))\n", "# need to convert it to actual predictions as these are probabilities\n", "logit_y_hat = logit_y_hat.idxmax(axis=1)\n", "print(classification_report(y_test, logit_y_hat))\n", "print('MLogit accuracy is about %.2f' % accuracy_score(y_test, logit_y_hat))\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, now we have our first (simple) fit and the first accuracy score - the model classified about 39% of cases correctly, which is an improvement of about 10% over the baseline (constant) model. The classification report shows that the model has the greatest amount of trouble guessing the second category (One or two upvotes), and is considerably better at guessing the fourth (most popular) category (recall column reports the share of cases of each category the classifier was able to correctly identify). Of all of the comments which received between one and two upvotes, our logistic regression classifier was able to recall only about 15%, in contrast to almost 70% for the fourth category (more than eight upvotes).\n", " \n", "Without delving too deeply in the interpretation of the coefficients (uncomment if you want to take a look, long output...), it seems like the numeric features generally had the greatest impact. We'll say more about them in the next section, but these are features such as the length of the original article, the **depth** of the comment, whether the comment was recommended by the NYTimes staff (**editorSelection**), posted by NYTimes staff members or trusted users. While the various categories derived from the **sectionName** variable generally weren't that important, there were a few very popular categories there - for example 'Family' and 'Sunday Review.' Of course, all of the coefficients here are to be interpreted relative to the omitted reference category, but our focus here is on prediction so we won't dwell on the coefficients." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "A bag of classifiers \n", "\n", "Next, let's create a bag of classifiers. We are somewhat agnostic as to the most appropriate one at this stage, so we'll include several. In addition to logistic regression, we'll do Ridge regression (essentially a penalized regression), nearest neighbors and two ensemble models (Random Forest and Extreme Gradient Boost). Feel free to drop in your favorite classifier in the mix. I like XGB both for speed (run on a GPU it is quite fast), which allows more experiments, and for the fact that it can report feature importance - which features were present in more trees and how much gain in coverage did each provide. We won't do much model tuning at this stage, relying on the rather conservative default parameter values baked into Scikit-learn." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "models = [('MLogit', LogisticRegression(n_jobs=-1)),\n", " ('Ridge', RidgeClassifier()),\n", " ('KNN', KNeighborsClassifier(n_jobs=-1)),\n", " ('RandomForest', RandomForestClassifier(n_jobs=-1)),\n", " ('XGB', XGBClassifier(tree_method='gpu_hist',\n", " n_estimators=50)),\n", " ]" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Let's put together a pipeline to report the scores on each of our these classifiers. It will allow us to compare classification performance after the feature engineering stage. To be on the safe side, we re-scale the data with Scikit-learn's standard scaler (center at zero and divide by standard deviation), although given the classifier families we chose this is not strictly necessary." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Fitting MLogit\n", "Accuracy of MLogit: 0.3893\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.41 0.40 0.40 4893\n", " 1 or 2 0.35 0.14 0.20 6025\n", " 3 to 8 0.34 0.30 0.32 7506\n", " 9 or more 0.42 0.67 0.51 7662\n", "\n", "avg / total 0.38 0.39 0.36 26086\n", "\n", "ROC_AUC_score: 0.5650\n", "\n", "\n", "Fitting Ridge\n", "Accuracy of Ridge: 0.3823\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.40 0.43 0.42 4893\n", " 1 or 2 0.36 0.10 0.16 6025\n", " 3 to 8 0.34 0.18 0.23 7506\n", " 9 or more 0.39 0.77 0.52 7662\n", "\n", "avg / total 0.37 0.38 0.33 26086\n", "\n", "ROC_AUC_score: 0.5622\n", "\n", "\n", "Fitting KNN\n", "Accuracy of KNN: 0.4086\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.41 0.49 0.44 4893\n", " 1 or 2 0.31 0.34 0.32 6025\n", " 3 to 8 0.36 0.35 0.35 7506\n", " 9 or more 0.58 0.47 0.52 7662\n", "\n", "avg / total 0.42 0.41 0.41 26086\n", "\n", "ROC_AUC_score: 0.5879\n", "\n", "\n", "Fitting RandomForest\n", "Accuracy of RandomForest: 0.4270\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.42 0.44 0.43 4893\n", " 1 or 2 0.32 0.34 0.33 6025\n", " 3 to 8 0.37 0.37 0.37 7506\n", " 9 or more 0.59 0.54 0.56 7662\n", "\n", "avg / total 0.43 0.43 0.43 26086\n", "\n", "ROC_AUC_score: 0.5905\n", "\n", "\n", "Fitting XGB\n", "Accuracy of XGB: 0.3948\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.44 0.40 0.42 4893\n", " 1 or 2 0.36 0.18 0.24 6025\n", " 3 to 8 0.33 0.20 0.25 7506\n", " 9 or more 0.41 0.75 0.53 7662\n", "\n", "avg / total 0.38 0.39 0.36 26086\n", "\n", "ROC_AUC_score: 0.5675\n" ] } ], "source": [ "def predict_recs(X_train=X_train, \n", " y_train=y_train,\n", " X_test=X_test,\n", " y_test=y_test,\n", " models=models,\n", " target_names=['None', '1 or 2', '3 to 8', '9 or more']):\n", " '''Fits and scores models'''\n", " Accuracy = {}\n", " for name, model in models:\n", " pipe = make_pipeline(scaler_standard, model)\n", " print()\n", " print('\\nFitting ' + name)\n", " pipe.fit(X_train, y_train)\n", " y_hat = pipe.predict(X_test)\n", " Accuracy[name] = np.round(accuracy_score(\n", " y_test, y_hat), 4)\n", " print('Accuracy of %s: %.4f' % (name, Accuracy[name]))\n", " print('\\nClassification report:\\n')\n", " print(classification_report(\n", " y_test, y_hat, target_names=target_names))\n", " y_hat_bin = label_binarize(y_hat, range(3))\n", " y_true_bin = label_binarize(y_test, range(3))\n", " ROC_AUC_score = roc_auc_score(y_true_bin, y_hat_bin)\n", " print('ROC_AUC_score: %.4f' % ROC_AUC_score)\n", " # print(confusion_matrix(\n", " # y_test, y_hat))\n", " return Accuracy\n", "\n", "\n", "# predict with only existing features\n", "minimal_features = predict_recs(X_train, y_train)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Some of our models did a little better than others, notably Random Forest seems to give the best predictions, at least at this stage. All of them had trouble with the second category, perhaps because it is difficult in principle to identify comments which received one-to-two upvotes as opposed to zero or more than two. " ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Feature Engineering \n", "\n", "Time to create some meaningful features! We'll start with the numeric features that are already in the dataset and which showed much promise in the simple logistic model.\n", "\n", "We will create features only on the dev set for now, which would take quite a bit less time. Eventually we would like to utilize the full data that we have. In order to speed things up a bit, let's use some multiprocessing (ie all of the cores we have available). I wrote a little function wrapper that spreads the task over as many cores as the system makes available and displays a nice progress bar so we know how long it will take. The overhead involved with multiprocessing usually makes it less time efficient to bother, but with a dataset of 2 million rows, it will tend to be faster. (We won't run it )\n" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "from multiprocessing import cpu_count, Pool\n", "from tqdm import tqdm_notebook\n", "\n", "def multip(func, iterable):\n", " '''Simple wrapper for multiprocessing a function over an iterable. Preserves order.'''\n", " with Pool(cpu_count()) as pool:\n", " out = list(tqdm_notebook(pool.imap(func, iterable),\n", " total=len(iterable),\n", " mininterval=1))\n", " return out\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Features based on the original variables \n", "\n", "The original numeric features in the data contain information on some potentially important attributes of both comments and articles. We care about the article features simply because some articles are much more likely to garner more attention. Consequently, the comments on these articles will tend to get more upvotes.\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "reply \n", "\n", "**inReplyTo** - an original variable which simply tells us which comment the current comment is responding to (based on the ID of the comment) or holds a value of zero if it is not a response.\n", "One way to extract a useful feature from this would be to create a variable which takes a value of the number of recommendations/upvotes which the original comments obtained and a value of zero if this is not a reply at all. The expectation is responding to a popular comment should get you more visibility and, by extension, more upvotes. We can think of this as the Reddit top-comment hijacking strategy, which will be familiar to anyone who has used Reddit, though the visibility of the responses is nowhere near as prominent as it is on Reddit." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a31311dbcc0444968b45e8d77f7aff94", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=104343), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "count 104343.000000\n", "mean 30.265471\n", "std 175.474726\n", "min 0.000000\n", "25% 0.000000\n", "50% 0.000000\n", "75% 1.000000\n", "max 8160.000000\n", "Name: reply, dtype: float64\n" ] } ], "source": [ "# feature is zero if not a reply and number of upvotes of original comment if it is a reply\n", "def replycounter(x):\n", " if x == 0:\n", " return 0\n", " elif x != 0:\n", " return df.recommendations.at[int(np.where(df.commentID == x)[0])]\n", " else:\n", " return np.nan\n", "\n", "# run only on the dev set for speed\n", "X['reply'] = multip(replycounter, df.inReplyTo[dev_set_index])\n", "\n", "# uncomment this for full set\n", "# X_full['reply'] = multip(replycounter, df.inReplyTo)\n", "\n", "print(X.reply.describe())\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Another 'object'-type variable which we could utilize is **byline**, which reports the authors of an article. \n", "Here's a quick look:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "By THE EDITORIAL BOARD 140540\n", "By PAUL KRUGMAN 66631\n", "By DAVID BROOKS 55968\n", "By CHARLES M. BLOW 50041\n", "By FRANK BRUNI 44710\n", "By NICHOLAS KRISTOF 40030\n", "By ROSS DOUTHAT 36244\n", "By GAIL COLLINS 32452\n", "By DAVID LEONHARDT 25004\n", "By ROGER COHEN 23732\n", "By THE LEARNING NETWORK 20692\n", "By THOMAS L. FRIEDMAN 20481\n", "By PETER BAKER 19019\n", "By DEB AMLEN 18411\n", "By JULIE HIRSCHFELD DAVIS 17310\n", "By MICHELLE GOLDBERG 17250\n", "By MAUREEN DOWD 17225\n", "By BRET STEPHENS 16457\n", "By THOMAS B. EDSALL 15487\n", "By MICHAEL D. SHEAR 14170\n", "Name: byline, dtype: int64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.byline.value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Unsurprisingly, some authors tend to generate many more comments than others (the top being the Editorial Board of course, followed by many of the op-ed contributors). One quick and easy way to make use of this is to simply get dummies for each of the top authors, in our case we can do it for the top 20, although obviously one could go for more." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "30e27d4dd4dd44a8a6664dc2229a8d6e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=2086862), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# get column names of top 20 authors\n", "top_authors = list(df.byline.value_counts().head(20).index)\n", "\n", "# deliberately picked 'Another Author' as the replacement string as it should end up being removed as the first category\n", "def author_pick(x):\n", " if x in top_authors:\n", " return x\n", " else:\n", " return 'Another Author'\n", "\n", "\n", "series_temp = multip(author_pick, df.byline)\n", "df_temp = OneHotEncoding(['author'], pd.DataFrame(\n", " series_temp, columns=['author']))\n", "\n", "# add them to the full set\n", "X_full = pd.concat([X_full, df_temp], axis=1)\n", "# and place them in the dev set \n", "X = pd.concat([X, df_temp.loc[dev_set_index, :]], axis=1)\n", "del df_temp\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Next, I noticed in the MLogit output that one of the time variables seemed to have a large impact - **approveDate**. This variable effectively report the exact timing of a comment being approved, and therefore appearing online - kind of like the publication time of a comment. Let's take a peek at what it looks like:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "0 1491245186\n", "1 1491188619\n", "2 1491188617\n", "3 1491167820\n", "4 1491167815\n", "Name: approveDate, dtype: int64" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.approveDate.head()" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Ooh, that's ugly - and meaningless at first glance. After some digging around we figure out that this variable is in Posix format: it reports seconds since 1970-01-01. Let's create a few features that can be meaningful for us: \n", "\n", "1. **approveHour** - Hour of the day in which a comment was published. We expect, for example, that people are reading and responding to online content more during the day than at night. Categorical.\n", "\n", "2. **approveDay** - Same for day of the week, perhaps weekends are more procrastination-prone than weekdays. Categorical.\n", "\n", "2. **hoursAfterArticle** - The number of hours elapsed between an article's publication and the corresponding comments' approval (and therefore publication). We can use the **pubDate** original variable from the articles dataset for the former. That one is coded as a string, so we'll have to convert it to a time format as well. We expect that if too much time passes after an article is published, it will simply get too few eyeballs. Continuous.\n", " " ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 2017-04-01 00:15:41\n", "1 2017-04-01 00:15:41\n", "2 2017-04-01 00:15:41\n", "3 2017-04-01 00:15:41\n", "4 2017-04-01 00:15:41\n", "Name: pubDate, dtype: object\n", "0 2017-04-01 00:15:41\n", "1 2017-04-01 00:15:41\n", "2 2017-04-01 00:15:41\n", "3 2017-04-01 00:15:41\n", "4 2017-04-01 00:15:41\n", "Name: pubDate, dtype: datetime64[ns]\n" ] } ], "source": [ "# fix article publication date\n", "print(df.pubDate.head())\n", "df.pubDate = pd.to_datetime(df.pubDate)\n", "print(df.pubDate.head())" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 1491245186\n", "1 1491188619\n", "2 1491188617\n", "3 1491167820\n", "4 1491167815\n", "Name: approveDate, dtype: int64\n", "0 2017-04-03 18:46:26\n", "1 2017-04-03 03:03:39\n", "2 2017-04-03 03:03:37\n", "3 2017-04-02 21:17:00\n", "4 2017-04-02 21:16:55\n", "Name: approveTime, dtype: datetime64[ns]\n", "count 2086862\n", "unique 1233915\n", "top 2017-04-26 15:53:49\n", "freq 42\n", "first 2017-01-02 00:51:07\n", "last 2018-05-02 03:50:35\n", "Name: approveTime, dtype: object\n" ] } ], "source": [ "# convert all time columns from posix to regular format (we'll only use one of them)\n", "print(df.approveDate.head())\n", "timecolumns = ['approveDate', 'createDate', 'updateDate']\n", "df[['approveTime', 'createTime', 'updateTime']] = df[timecolumns].apply(\n", " lambda t: pd.to_datetime(t, unit='s'))\n", "print(df.approveTime.head())\n", "print(df.approveTime.describe())" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 6PM\n", "1 3AM\n", "2 3AM\n", "3 9PM\n", "4 9PM\n", "Name: approveHour, dtype: object\n", "(104343, 133)\n" ] } ], "source": [ "# change the index to approveTime\n", "df.index = df.approveTime\n", "# sample by hour, (to the next hour)\n", "df['approveHour'] = [\n", " str(x) + 'AM' if x < 12 else str(x - 12) + 'PM' for x in df.index.hour]\n", "# sample by day\n", "df['approveDay'] = df.index.weekday_name\n", "df.reset_index(drop=True, inplace=True)\n", "print(df.approveHour.head())\n", "# add dummies for both to df using the function we built earlier\n", "df_temp = OneHotEncoding(['approveHour',\n", " 'approveDay'])\n", "# combine the two types of features and remove the old one\n", "X_full = pd.concat([X_full, df_temp], axis=1)\n", "del X_full['approveDate']\n", "# and place them in the dev set as well\n", "X = pd.concat([X, df_temp.loc[dev_set_index, :]], axis=1)\n", "del X['approveDate']\n", "print(X.shape)\n", "del df_temp\n" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 66.0\n", "1 50.0\n", "2 50.0\n", "3 45.0\n", "4 45.0\n", "Name: hoursAfterArticle, dtype: float64\n", "count 2.086862e+06\n", "mean 2.956882e+01\n", "std 2.166681e+02\n", "min -8.000000e+00\n", "25% 7.000000e+00\n", "50% 1.300000e+01\n", "75% 2.000000e+01\n", "max 1.030800e+04\n", "Name: hoursAfterArticle, dtype: float64\n" ] } ], "source": [ "# create a feature for time of posting of comment since article published\n", "df['timeDelta'] = df.approveTime - df.pubDate\n", "# convert this timedelta object to hours (downsample)\n", "df['hoursAfterArticle'] = df.timeDelta.astype(\n", " 'timedelta64[h]')\n", "\n", "print(df.hoursAfterArticle.head())\n", "print(df.hoursAfterArticle.describe())" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Ooops! Negative hours... Hmm, the minimum value of the new variable is -8. Since we haven't yet discovered time travel, this means that either we are experiencing issues with time zones or there's simply some errors in the data - some articles appeared before their publication time or some comments' approval time was misrecorded. Either way, the issue affects very few cases, so we'll simply set all of the negative values to zero, noting the fact that we may be introducing some noise and even bias here. \n" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1373\n", "count 104343.000000\n", "mean 30.577068\n", "std 230.122268\n", "min 0.000000\n", "25% 7.000000\n", "50% 13.000000\n", "75% 20.000000\n", "max 10182.000000\n", "Name: hoursAfterArticle, dtype: float64\n" ] } ], "source": [ "print(df[df.hoursAfterArticle < 0].shape[0])\n", "df.loc[df.hoursAfterArticle < 0, 'hoursAfterArticle'] = 0\n", "# And assign to the large DataFrame:\n", "X_full['hoursAfterArticle'] = df['hoursAfterArticle']\n", "# and the small dev set\n", "X['hoursAfterArticle'] = X_full.loc[dev_set_index, 'hoursAfterArticle']\n", "print(X.hoursAfterArticle.describe())" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Features based on raw text data " ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, here comes the messy raw text manipulation. We have so far not utilized the actual body of comments as well as some of the raw text data we possess about the articles. So let's do that.\n", "My intuition about comment upvotes is that generally comments will be more popular when they read like their author put in time and effort to write them, made them more informative, polished and balanced. This could mean quite a few different things, but here's one way to approach the quantification problem. We expect that popular comments will not be too short (low effort!), will use more sophisticated language (ie more rare words, longer words, longer sentences, etc.), will have no or few spelling mistakes (no excuses now that most browsers offer spell-check out of the box), and will not be too negative-sounding (positive or neutral sentiment). These are of course little more than hunches at this stage.\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "A NYTimes vocabulary \n", "\n", "To get measures of all of these, we'll use a few external libraries and one more dataset. I would like to measure the extent to which comments use _the language of the NYTimes_. To that effect, it makes sense to create a vocabulary based on the text of NYTimes articles. I found another wonderful dataset on Kaggle - available \n", "[here](https://www.kaggle.com/nzalake52/new-york-times-articles/data). It contains over 8,000 articles, and while not very well documented, should more than suffice for our purposes here.\n", "\n", "For both speed and ease of use, we can use the built-in tokenizer in Scikit-learn, tuning it a bit in order to get only words. We tokenize the raw text files of the articles' bodies immediately upon reading them, append them to a list and transform it into a Pandas Series (for convenience). Then, we fit the tokenizer to them to learn the frequency of each word's usage. Frequency is important here, as we want to get a sense whether a comment uses rare English words." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2846 years revival glass menagerie return broadway starring time academy award winner sally field ama...\n", "dtype: object\n" ] }, { "data": { "text/plain": [ "CountVectorizer(analyzer='word', binary=False, decode_error='strict',\n", " dtype=, encoding='utf-8', input='content',\n", " lowercase=True, max_df=1.0, max_features=None, min_df=1,\n", " ngram_range=(1, 1), preprocessor=None, stop_words='english',\n", " strip_accents='unicode', token_pattern='(?u)\\\\b\\\\w\\\\w+\\\\b',\n", " tokenizer=None, vocabulary=None)" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get some NYTimes articles to train on\n", "counter = CountVectorizer(stop_words='english',\n", " strip_accents='unicode',\n", " analyzer='word',\n", " min_df=1)\n", "# first get the tokenizer tool from CountVectorizer\n", "tokenize = counter.build_analyzer()\n", "docs = []\n", "doc = ''\n", "\n", "with open('./Data/nytimes_news_articles.txt', encoding='utf-8') as file:\n", " for line in file:\n", " if line.startswith('URL'):\n", " if doc != '':\n", " docs.append(' '.join(tokenize(doc)))\n", " doc = ''\n", " else:\n", " doc += line\n", "\n", "docs = pd.Series(docs)\n", "print(docs.sample())\n", "# fit the CountVectorizer to the data\n", "counter.fit(docs)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Whenever we tokenize it is a good idea to take a look at the beginning and end of the token lists, remembering that raw text is messy business. A useful little built-in function allows us to explore the dictionary we are building:" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['힘들어지고', '힘을', '힘이', '发烧一次', '富二代', '百毒', '素质', '聪明一次', '贤二机器僧', '高洪波']\n" ] } ], "source": [ "token_names = counter.get_feature_names()\n", "# last 10 tokens\n", "print(token_names[-10:])\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Uh-oh! This doesn't look like english!\n", "After some digging around, the culprit is several articles in Asian characters. \n", "Let's identify and remove all of the articles in which a large portion of the words are not in English:" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6109 12.66\n", "8606 13.01\n", "3632 13.01\n", "4550 13.17\n", "2671 13.53\n", "7998 13.62\n", "1097 13.84\n", "7590 14.94\n", "8506 99.68\n", "7693 99.85\n", "dtype: float64\n" ] } ], "source": [ "def score_English(article):\n", " '''Report the percent of words in a document which are not English (ASCI)'''\n", " if not article:\n", " return np.nan\n", " else:\n", " n_english = 0\n", " for token in article:\n", " try:\n", " token.encode('ascii')\n", " except UnicodeEncodeError:\n", " continue\n", " else:\n", " n_english += 1\n", " return np.round(n_english / len(article) * 100, 2)\n", "\n", "\n", "percent_english = docs.apply(score_English)\n", "print(percent_english.sort_values().head(10))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "We remove the non-English articles and refit:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['zuther', 'zuwarah', 'zuyu', 'zuyus', 'zverev', 'zvi', 'zvolen', 'zwd', 'zweden', 'zweibel', 'zweig', 'zwick', 'zwicker', 'zwilling', 'zwirner', 'zwolski', 'zyaratgah', 'zych', 'zydeco', 'zyklon', 'zylinski', 'zynga', 'zytiga', 'zytkow', 'zywicki', 'zz', '发烧一次', '百毒', '聪明一次', '贤二机器僧']\n" ] } ], "source": [ "# remove them:\n", "docs = docs[percent_english > 20]\n", "# and try again:\n", "counter.fit(docs)\n", "# get the token names again\n", "feature_names = counter.get_feature_names()\n", "# the end looks good now:\n", "print(feature_names[-30:])\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "And the beginning of the list is full of numbers, let's get rid of those as well:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['00', '000', '0000', '0000364334', '0001', '000s', '000th', '001', '0017', '002']\n", "['____________________________________________________', '_idkmatilda', 'a1', 'a10', 'a16', 'a1c', 'a2', 'a20', 'a24', 'a2z']\n" ] } ], "source": [ "print(feature_names[:10])\n", "vocab = feature_names[2330:]\n", "print(vocab[:10])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Obviously we are not quite done yet - the tokenizer picked up quite a few non-words, let's run each by a proper English dictionary and eliminate the non-words." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['ab', 'aback', 'abacus', 'abandon', 'abandoned', 'abandoning', 'abandonment', 'abandons', 'abasement', 'abashed'] ['zoological', 'zoologist', 'zoology', 'zoom', 'zoomed', 'zooming', 'zooms', 'zoos', 'zucchini', 'zydeco']\n" ] } ], "source": [ "import enchant\n", "us_english = enchant.Dict(\"en_US\")\n", "# create a clean copy of the dictionary only with recognized English words\n", "\n", "vocab = [word for word in vocab if us_english.check(word)]\n", "print(vocab[:10], vocab[-10:])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, this looks much better, we have a proper english vocabulary based on the NYTimes. This is why did all of the work: using an IDF counter we can get the frequency of each recognized word and use that to build several features based on the diversity of a comment's vocabulary.\n", "\n", "All of this will be built on a measure commonly used in document classification in Natural Language Processing tasks called 'IDF' or 'inverse document frequency.' It measures how infrequently a word is used in a body of documents (how rare the word is), which is quite useful for the typical document classification task, usually normalized by multiplying by term frequency, get 'TFIDF.' This is not what we need here though - we simply want IDF, so here's what it looks like and how to get it.\n", "First, we train a a tokenizer to the docs using only our cleaned up vocabulary:" ] }, { "cell_type": "code", "execution_count": 38, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',\n", " dtype=, encoding='utf-8', input='content',\n", " lowercase=True, max_df=1.0, max_features=None, min_df=1,\n", " ngram_range=(1, 1), norm=None, preprocessor=None, smooth_idf=True,\n", " stop_words=None, strip_accents=None, sublinear_tf=False,\n", " token_pattern='(?u)\\\\b\\\\w\\\\w+\\\\b', tokenizer=None, use_idf=True,\n", " vocabulary=['ab', 'aback', 'abacus', 'abandon', 'abandoned', 'abandoning', 'abandonment', 'abandons', 'abasement', 'abashed', 'abate', 'abated', 'abatement', 'abatements', 'abating', 'abbey', 'abbot', 'abbreviated', 'abbreviation', 'abbreviations', 'abdicate', 'abdicated', 'abdicating', 'abdication'...ogical', 'zoologist', 'zoology', 'zoom', 'zoomed', 'zooming', 'zooms', 'zoos', 'zucchini', 'zydeco'])" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tfidf = TfidfVectorizer(norm=None,\n", " vocabulary=vocab,\n", " min_df=1,\n", " )\n", "\n", "tfidf.fit(docs)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Now, let's see if it is behaving as expected. Here's a very simple example that shows us the IDF value for a word that appears in a given percentage of documents, using the conventional formula for IDF ([wiki](https://en.wikipedia.org/wiki/Tf-idf)):" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "A term which appears in 0.01% of documents has an IDF of 9.52.\n", "\n", "A term which appears in 0.05% of documents has an IDF of 8.42.\n", "\n", "A term which appears in 0.10% of documents has an IDF of 7.81.\n", "\n", "A term which appears in 0.50% of documents has an IDF of 6.28.\n", "\n", "A term which appears in 1.00% of documents has an IDF of 5.60.\n", "\n", "A term which appears in 5.00% of documents has an IDF of 3.99.\n", "\n", "A term which appears in 10.00% of documents has an IDF of 3.30.\n", "\n", "A term which appears in 50.00% of documents has an IDF of 1.69.\n", "\n", "A term which appears in 100.00% of documents has an IDF of 1.00.\n" ] } ], "source": [ "# check an individual word's idf:\n", "num_docs = 10000\n", "doc_freq = [1, 5, 10, 50, 100, 500, 1000, 5000, 10000]\n", "idf = [np.log((1 + num_docs)/(1 + x)) + 1 for x in doc_freq]\n", "for x, y in zip(doc_freq, idf):\n", " print('\\nA term which appears in %.2f%% of documents has an IDF of %.2f.' %\n", " (x/num_docs*100, y))" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, now let's see run some English words, in decreasing order of frequency, by our trained vocabulary:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "'zealot' has an IDF of 8.70.\n", "\n", "'edifice' has an IDF of 7.89.\n", "\n", "'prolific' has an IDF of 5.86.\n", "\n", "'simple' has an IDF of 3.87.\n", "\n", "'good' has an IDF of 2.23.\n", "\n", "'like' has an IDF of 1.53.\n" ] } ], "source": [ "def word_idf(word, tfidf=tfidf):\n", " idf = np.round(tfidf.idf_[tfidf.vocabulary_[word]], 2)\n", " print('\\n\\'' + word + '\\' has an IDF of %.2f.' % idf)\n", "\n", "\n", "word_idf('zealot')\n", "word_idf('edifice')\n", "word_idf('prolific')\n", "word_idf('simple')\n", "word_idf('good')\n", "word_idf('like')\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "That will do. \n", "Let's get back to the feature engineering.\n", "\n", "Using our trained IDF counter, we can transform all comments into a matrix of IDF scores (each row is an array of the IDF scores of the the tokens present in a comment). Then we can use that matrix (and some additional wizardry) to create a bin count - a count of each IDF integer. Think about this as the vocabulary rarity profile of a comment. While we could just rely on the mean IDF (or one of the other basic stats), doing a bin count allows us to probe a little deeper - two comments might have a very different IDF profile yet have the same mean IDF.\n", "For additional convenience, we create a simple summarizer function which reports a count, minimum, mean, standard deviation and maximum value given an array. In case a comment contains no recognizable words, we simply assign a value of zero to all measures. A safer choice here would be to exclude the comment altogether - but very few comments fall in this category so are are not worried about bias too much (especially given the size of the full dataset)." ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# grab only the dev set comments:\n", "comments = df.commentBody[dev_set_index]\n", "# uncomment next line for full set:\n", "# comments = df.commentBody\n", "\n", "def summarizer(array):\n", " '''Simple summary from an array: count, min, mean, stand dev, max'''\n", " return np.array([array.size,\n", " np.min(array),\n", " np.mean(array),\n", " np.std(array),\n", " np.max(array)])\n", "\n", "\n", "def comment_idf_profile(row):\n", " '''Summarize idf scores for each recognized token in a comment and provide a bin count percent of idf scores.'''\n", " tokens = comment_tf[row, :].data\n", " if tokens.size != 0:\n", " idfs = comment_matrix[row, :].data / tokens\n", " out_summary = summarizer(idfs)\n", " out_bin = np.bincount(idfs.astype(int), minlength=10)[1:]\n", " out_bin_percent = out_bin / len(tokens)\n", " output = np.hstack([out_summary, out_bin_percent])\n", " else:\n", " output = np.zeros(14)\n", " return output\n", "\n", "comment_matrix = tfidf.transform(comments)\n", "comment_matrix.sort_indices() # it's inplace\n", "counter = CountVectorizer(vocabulary=vocab,\n", " min_df=1)\n", "comment_tf = counter.transform(comments)\n", "# comment_tokens = list(map(tokenize, comments))\n" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7399dadc02b0434a950b0ff4b34877b4", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=104343), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "RangeIndex: 104343 entries, 0 to 104342\n", "Data columns (total 14 columns):\n", "recognizedWordCount 104343 non-null float64\n", "MinIdf 104343 non-null float64\n", "MeanIdf 104343 non-null float64\n", "StdDevIdf 104343 non-null float64\n", "MaxIdf 104343 non-null float64\n", "Idf1Percent 104343 non-null float64\n", "Idf2Percent 104343 non-null float64\n", "Idf3Percent 104343 non-null float64\n", "Idf4Percent 104343 non-null float64\n", "Idf5Percent 104343 non-null float64\n", "Idf6Percent 104343 non-null float64\n", "Idf7Percent 104343 non-null float64\n", "Idf8Percent 104343 non-null float64\n", "Idf9Percent 104343 non-null float64\n", "dtypes: float64(14)\n", "memory usage: 11.1 MB\n", "None\n", " recognizedWordCount MinIdf MeanIdf StdDevIdf \\\n", "count 104343.000000 104343.000000 104343.000000 104343.000000 \n", "mean 26.594597 2.036913 4.319403 1.524829 \n", "std 22.324581 0.733856 0.640760 0.422875 \n", "min 0.000000 0.000000 0.000000 0.000000 \n", "25% 10.000000 1.533347 3.992844 1.339523 \n", "50% 20.000000 1.861595 4.313917 1.575367 \n", "75% 37.000000 2.278323 4.635546 1.777224 \n", "max 130.000000 9.397959 9.397959 4.083889 \n", "\n", " MaxIdf Idf1Percent Idf2Percent Idf3Percent \\\n", "count 104343.000000 104343.000000 104343.000000 104343.000000 \n", "mean 7.645723 0.043301 0.180081 0.268976 \n", "std 1.482213 0.063563 0.122437 0.138241 \n", "min 0.000000 0.000000 0.000000 0.000000 \n", "25% 6.833010 0.000000 0.111111 0.200000 \n", "50% 7.893882 0.027778 0.166667 0.263158 \n", "75% 8.704812 0.063158 0.238095 0.333333 \n", "max 9.397959 1.000000 1.000000 1.000000 \n", "\n", " Idf4Percent Idf5Percent Idf6Percent Idf7Percent \\\n", "count 104343.000000 104343.000000 104343.000000 104343.000000 \n", "mean 0.199408 0.135141 0.082615 0.050508 \n", "std 0.122862 0.107708 0.086615 0.070343 \n", "min 0.000000 0.000000 0.000000 0.000000 \n", "25% 0.133333 0.071429 0.000000 0.000000 \n", "50% 0.200000 0.125000 0.071429 0.034483 \n", "75% 0.255814 0.181818 0.119403 0.076923 \n", "max 1.000000 1.000000 1.000000 1.000000 \n", "\n", " Idf8Percent Idf9Percent \n", "count 104343.000000 104343.000000 \n", "mean 0.030566 0.006184 \n", "std 0.058250 0.024845 \n", "min 0.000000 0.000000 \n", "25% 0.000000 0.000000 \n", "50% 0.000000 0.000000 \n", "75% 0.045455 0.000000 \n", "max 1.000000 1.000000 \n" ] } ], "source": [ "scored_df = multip(comment_idf_profile, range(comment_matrix.shape[0]))\n", "scored_df = pd.DataFrame(scored_df)\n", "idf_labs_percent = ['Idf' + str(x) + 'Percent' for x in range(1, 10)]\n", "scored_df.columns = ['recognizedWordCount',\n", " 'MinIdf',\n", " 'MeanIdf',\n", " 'StdDevIdf',\n", " 'MaxIdf'] + idf_labs_percent\n", "# quick sanity check\n", "print(scored_df.info())\n", "print(scored_df.describe())" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "All in all, we ended up creating 14 new features, we won't worry about the correlations between them just yet, but we'll take a look at those before prediction. " ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Time to throw in the kitchen sink! \n", "\n", "Let's create more features based on more of the indicators we are looking for. Relying on two NLP libraries (TextBlob and the conventional NLTK), we obtain a host of features: basic stats on words and word length, basic stats on sentences and sentence length, sentiment polarity (positive-negative) and subjectivity (subjective-objective) scores, spelling (percent of correctly spelled words), and even part-of-speech breakdown (how many nouns, adjectives, adverbs and so on as a percentage of all words.) The full list of parts-of-speech tags is available \n", "[here](https://www.clips.uantwerpen.be/pages/mbsp-tags).\n", "\n", "For most of these features, our expectation is that generally comments should fall in an ideal range - not too short, nor too long in terms of sentence length and comment length; not too many adjectives but also not too few and so on. Of course, in terms of spelling we expect lower error rate to be better.\n", "\n", "A few additional small considerations. We put all of these indicators in one large function simply to economize on the overhead associated with multiprocessing. Also, there is a little regular expression magic at the beginning of the function to fix a very common error - no space after end of sentence punctuation (.?!). We fix that because it leads to more noise in all of our features. We spell-check all words which are not 'proper nouns', meaning names, as that should generate more accurate counts. Finally, comments shorter that three characters are coded as all zeroes - they are simply too short to get much information. Again, one could also eliminate them out of the sample if worried about bias." ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "03ac8ac072b0416194ee7a7b715847cc", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=104343), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# get all the POS tags to create an array\n", "from nltk.data import load\n", "from nltk import FreqDist\n", "from nltk.tokenize import WhitespaceTokenizer\n", "from enchant.checker import SpellChecker\n", "import re\n", "from textblob import TextBlob\n", "spell_checker = SpellChecker(us_english)\n", "universal_tags = load('help/tagsets/upenn_tagset.pickle')\n", "tags = universal_tags.keys()\n", "tag_labels_percent = [label + '_Percent' for label in tags]\n", "\n", "\n", "def comment_stats(comment):\n", " '''Report stats for a comment. Stats on sentences, words, part-of-speech tags as portion of the total, spelling and sentiment scores. Comments shorter than 3 characters output zeroes.'''\n", " if len(comment) < 3:\n", " return np.zeros(58)\n", " ################################################################################\n", " # insert space after period or comma if there isn't one already (common error)\n", " comment = re.sub(r'(?<=[.,?!])(?=[^\\s])', r' ', comment)\n", " ################################################################################\n", " # sentence stats\n", " blob = TextBlob(comment)\n", " sentences = blob.sentences\n", " tokenized_sents = [sentence.words for sentence in sentences]\n", " sent_lengths = np.array([len(sentence)\n", " for sentence in tokenized_sents])\n", " out_sent = summarizer(sent_lengths)\n", " ################################################################################\n", " # word stats\n", " word_lengths = np.array([len(word) for word in blob.words])\n", " if word_lengths.size == 0:\n", " return np.zeros(58)\n", " out_word = summarizer(word_lengths)\n", " ################################################################################\n", " # sentiment analysis: polarity and objectivity\n", " out_polarity = list(blob.sentiment)\n", " ################################################################################\n", " # spellcheck stats\n", " # replace most punctuation by space for whitespacetokenizer, leave apostrophes for contractions\n", " comment_nopunct = re.sub(r'[,.?!#():-=\"“”%$]', ' ', comment)\n", " blob_nopunct = TextBlob(comment_nopunct, tokenizer=WhitespaceTokenizer())\n", " non_NNP = [word for word, tag in blob_nopunct.tags if tag !=\n", " 'NNP']\n", " if not non_NNP:\n", " out_spell = np.zeros(1)\n", " else:\n", " spell_errors = 0\n", " for word in non_NNP:\n", " try:\n", " spell_errors += not spell_checker.check(word.strip('\\'\\`'))\n", " except:\n", " spell_errors += 1\n", " out_spell = np.array(spell_errors / len(non_NNP))\n", " ################################################################################\n", " # part of speech stats\n", " fd = FreqDist(tag for (word, tag) in blob.tags)\n", " ordered_freqs = np.array(\n", " [fd[key] if key in fd.keys() else 0 for key in tags])\n", " out_pos_percent = np.squeeze(ordered_freqs / len(word_lengths))\n", " ################################################################################\n", " return np.hstack([out_word, out_sent, out_polarity, out_spell,\n", " out_pos_percent])\n", "\n", "\n", "comment_stats_df = multip(comment_stats, comments)\n" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 104343 entries, 0 to 104342\n", "Data columns (total 58 columns):\n", "WordCount 104343 non-null float64\n", "minWordLength 104343 non-null float64\n", "meanWordLength 104343 non-null float64\n", "stdDevWordLength 104343 non-null float64\n", "maxWordLength 104343 non-null float64\n", "SentCount 104343 non-null float64\n", "minSentLength 104343 non-null float64\n", "meanSentLength 104343 non-null float64\n", "stdDevSentLength 104343 non-null float64\n", "maxSentLength 104343 non-null float64\n", "commentPolarity 104343 non-null float64\n", "commentObjectivity 104343 non-null float64\n", "commentSpellErrorsPercent 104343 non-null float64\n", "LS_Percent 104343 non-null float64\n", "TO_Percent 104343 non-null float64\n", "VBN_Percent 104343 non-null float64\n", "''_Percent 104343 non-null float64\n", "WP_Percent 104343 non-null float64\n", "UH_Percent 104343 non-null float64\n", "VBG_Percent 104343 non-null float64\n", "JJ_Percent 104343 non-null float64\n", "VBZ_Percent 104343 non-null float64\n", "--_Percent 104343 non-null float64\n", "VBP_Percent 104343 non-null float64\n", "NN_Percent 104343 non-null float64\n", "DT_Percent 104343 non-null float64\n", "PRP_Percent 104343 non-null float64\n", ":_Percent 104343 non-null float64\n", "WP$_Percent 104343 non-null float64\n", "NNPS_Percent 104343 non-null float64\n", "PRP$_Percent 104343 non-null float64\n", "WDT_Percent 104343 non-null float64\n", "(_Percent 104343 non-null float64\n", ")_Percent 104343 non-null float64\n", "._Percent 104343 non-null float64\n", ",_Percent 104343 non-null float64\n", "``_Percent 104343 non-null float64\n", "$_Percent 104343 non-null float64\n", "RB_Percent 104343 non-null float64\n", "RBR_Percent 104343 non-null float64\n", "RBS_Percent 104343 non-null float64\n", "VBD_Percent 104343 non-null float64\n", "IN_Percent 104343 non-null float64\n", "FW_Percent 104343 non-null float64\n", "RP_Percent 104343 non-null float64\n", "JJR_Percent 104343 non-null float64\n", "JJS_Percent 104343 non-null float64\n", "PDT_Percent 104343 non-null float64\n", "MD_Percent 104343 non-null float64\n", "VB_Percent 104343 non-null float64\n", "WRB_Percent 104343 non-null float64\n", "NNP_Percent 104343 non-null float64\n", "EX_Percent 104343 non-null float64\n", "NNS_Percent 104343 non-null float64\n", "SYM_Percent 104343 non-null float64\n", "CC_Percent 104343 non-null float64\n", "CD_Percent 104343 non-null float64\n", "POS_Percent 104343 non-null float64\n", "dtypes: float64(58)\n", "memory usage: 46.2 MB\n" ] } ], "source": [ "comment_stats_df = pd.DataFrame(comment_stats_df)\n", "comment_stats_df.columns = ['WordCount',\n", " 'minWordLength',\n", " 'meanWordLength',\n", " 'stdDevWordLength',\n", " 'maxWordLength',\n", " 'SentCount',\n", " 'minSentLength',\n", " 'meanSentLength',\n", " 'stdDevSentLength',\n", " 'maxSentLength',\n", " 'commentPolarity',\n", " 'commentObjectivity',\n", " 'commentSpellErrorsPercent'] + tag_labels_percent\n", "# quick sanity check\n", "comment_stats_df.describe()\n", "comment_stats_df.info()\n" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# add the both sets of new features to the dev set\n", "X = pd.concat([X.reset_index(drop=True),\n", " scored_df,\n", " comment_stats_df], axis=1)\n", "\n", "# uncomment for full set\n", "# X_full = pd.concat([X_full,\n", "# scored_df,\n", "# comment_stats_df], axis=1)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, before we continue with the feature engineering, let's do a small unit test: let's look at the body of a comment and see if the features we just created match." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "On top of the confusion and conflict, boys (and girls) must contend with bullying, and more importantly a tacit acceptance of bullying, porn viewing, isolation that comes with technology and dubious nutrition.Band-aid proposals until these kids do the work for us and get the AR-15s out of the classroom could be requirements to attend 1-2 hours a week (on a rotation basis or as elective) in middle and high school dedicated to:Drama exercises/public speaking/verbal articulationCivics (no brainer)Group talks/debates and outward bound/team building exercises where boys and girls are separated (transgenders get to choose which group)Meditation or a martial art\n", "\n", "reply 0.000000\n", "author_By BRET STEPHENS 0.000000\n", "author_By CHARLES M. BLOW 0.000000\n", "author_By DAVID BROOKS 0.000000\n", "author_By DAVID LEONHARDT 0.000000\n", "author_By DEB AMLEN 0.000000\n", "author_By FRANK BRUNI 0.000000\n", "author_By GAIL COLLINS 0.000000\n", "author_By JULIE HIRSCHFELD DAVIS 0.000000\n", "author_By MAUREEN DOWD 0.000000\n", "author_By MICHAEL D. SHEAR 0.000000\n", "author_By MICHELLE GOLDBERG 0.000000\n", "author_By NICHOLAS KRISTOF 0.000000\n", "author_By PAUL KRUGMAN 0.000000\n", "author_By PETER BAKER 0.000000\n", "author_By ROGER COHEN 0.000000\n", "author_By ROSS DOUTHAT 0.000000\n", "author_By THE EDITORIAL BOARD 0.000000\n", "author_By THE LEARNING NETWORK 0.000000\n", "author_By THOMAS B. EDSALL 0.000000\n", "author_By THOMAS L. FRIEDMAN 0.000000\n", "approveHour_0PM 0.000000\n", "approveHour_10AM 0.000000\n", "approveHour_10PM 0.000000\n", "approveHour_11AM 0.000000\n", "approveHour_11PM 0.000000\n", "approveHour_1AM 0.000000\n", "approveHour_1PM 0.000000\n", "approveHour_2AM 0.000000\n", "approveHour_2PM 0.000000\n", "approveHour_3AM 0.000000\n", "approveHour_3PM 0.000000\n", "approveHour_4AM 0.000000\n", "approveHour_4PM 0.000000\n", "approveHour_5AM 0.000000\n", "approveHour_5PM 1.000000\n", "approveHour_6AM 0.000000\n", "approveHour_6PM 0.000000\n", "approveHour_7AM 0.000000\n", "approveHour_7PM 0.000000\n", "approveHour_8AM 0.000000\n", "approveHour_8PM 0.000000\n", "approveHour_9AM 0.000000\n", "approveHour_9PM 0.000000\n", "approveDay_Monday 0.000000\n", "approveDay_Saturday 0.000000\n", "approveDay_Sunday 0.000000\n", "approveDay_Thursday 1.000000\n", "approveDay_Tuesday 0.000000\n", "approveDay_Wednesday 0.000000\n", "hoursAfterArticle 19.000000\n", "recognizedWordCount 50.000000\n", "MinIdf 2.127298\n", "MeanIdf 4.758959\n", "StdDevIdf 1.507268\n", "MaxIdf 8.145196\n", "Idf1Percent 0.000000\n", "Idf2Percent 0.160000\n", "Idf3Percent 0.120000\n", "Idf4Percent 0.280000\n", "Name: 88888, dtype: float64\n", "Idf5Percent 0.180000\n", "Idf6Percent 0.220000\n", "Idf7Percent 0.020000\n", "Idf8Percent 0.020000\n", "Idf9Percent 0.000000\n", "WordCount 103.000000\n", "minWordLength 1.000000\n", "meanWordLength 5.349515\n", "stdDevWordLength 3.406757\n", "maxWordLength 18.000000\n", "SentCount 2.000000\n", "minSentLength 32.000000\n", "meanSentLength 51.500000\n", "stdDevSentLength 19.500000\n", "maxSentLength 71.000000\n", "commentPolarity 0.260000\n", "commentObjectivity 0.423333\n", "commentSpellErrorsPercent 0.071429\n", "LS_Percent 0.000000\n", "TO_Percent 0.029126\n", "VBN_Percent 0.019417\n", "''_Percent 0.000000\n", "WP_Percent 0.000000\n", "UH_Percent 0.000000\n", "VBG_Percent 0.000000\n", "JJ_Percent 0.087379\n", "VBZ_Percent 0.038835\n", "--_Percent 0.000000\n", "VBP_Percent 0.029126\n", "NN_Percent 0.223301\n", " ... \n", "WP$_Percent 0.000000\n", "NNPS_Percent 0.000000\n", "PRP$_Percent 0.000000\n", "WDT_Percent 0.009709\n", "(_Percent 0.000000\n", ")_Percent 0.000000\n", "._Percent 0.000000\n", ",_Percent 0.000000\n", "``_Percent 0.000000\n", "$_Percent 0.000000\n", "RB_Percent 0.009709\n", "RBR_Percent 0.009709\n", "RBS_Percent 0.000000\n", "VBD_Percent 0.000000\n", "IN_Percent 0.126214\n", "FW_Percent 0.000000\n", "RP_Percent 0.000000\n", "JJR_Percent 0.000000\n", "JJS_Percent 0.000000\n", "PDT_Percent 0.000000\n", "MD_Percent 0.019417\n", "VB_Percent 0.048544\n", "WRB_Percent 0.009709\n", "NNP_Percent 0.038835\n", "EX_Percent 0.000000\n", "NNS_Percent 0.097087\n", "SYM_Percent 0.000000\n", "CC_Percent 0.097087\n", "CD_Percent 0.000000\n", "POS_Percent 0.000000\n", "Name: 88888, Length: 63, dtype: float64\n" ] } ], "source": [ "print(comments.iloc[88888])\n", "print()\n", "print(X.iloc[88888, 83:143])\n", "print(X.iloc[88888, 143:])\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Great, it's a match: number of words, sentences, spelling, etc.\n", "\n", "Final step in the feature engineering: Token features \n", "\n", "Basically, these are simply binary features of whether a certain token (usually word) is included or not. To economize on time and memory, we'll limit ourselves to the top (most frequent) 50 tokens. Obviously, we could easily increase the count or eliminate the limitation altogether, but we would end up with many more features.\n", "The first one will be based on the list of keywords that came with each article, **keywords**. Quite simply, our expectation here is that some keywords describing articles may be associated with more comments and upvotes in general (hot topics).\n", "This is what they look like:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "54478 ['Biological and Chemical Warfare', 'Reality Television', 'Trump, Donald J', 'Syria', 'Russia']\n", "2077425 ['United States Politics and Government', 'Federal Budget (US)', 'Ryan, Paul D Jr', 'Trump, Dona...\n", "1648858 ['Gun Control', 'Demonstrations, Protests and Riots', 'Voting and Voters', 'Parkland (Fla)', 'Na...\n", "1651376 ['United States Politics and Government', 'Veterans', 'Appointments and Executive Changes', 'Vet...\n", "1347153 ['Democracy (Theory and Philosophy)', 'Books and Literature', 'Berry, Wendell']\n", "Name: keywords, dtype: object" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.keywords.sample(5)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "We simply tokenize the keyword lists, this time tuning the tokenizer to remove the artefacts of the lists and split on commas. Essentially, we are building a word list of the most commonly occurring keywords and record whether an article (and therefore all the comments attached to it) contains them in the description or not. We fit (train) on the full set but transform (count) only the dev set. The result is a sparse matrix, which we will convert to dense before the prediction. " ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Keyword:\"\"', 'Keyword:\"appointments and executive changes\"', 'Keyword:\"blacks\"', 'Keyword:\"china\"', 'Keyword:\"classified information and state secrets\"', 'Keyword:\"clinton, hillary rodham\"', 'Keyword:\"comey, james b\"', 'Keyword:\"cyberwarfare and defense\"', 'Keyword:\"democratic party\"', 'Keyword:\"demonstrations, protests and riots\"', 'Keyword:\"discrimination\"', 'Keyword:\"espionage and intelligence services\"', 'Keyword:\"executive orders and memorandums\"', 'Keyword:\"federal budget (us)\"', 'Keyword:\"federal bureau of investigation\"', 'Keyword:\"global warming\"', 'Keyword:\"gun control\"', 'Keyword:\"health insurance and managed care\"', 'Keyword:\"house of representatives\"', 'Keyword:\"illegal immigration\"', 'Keyword:\"immigration and emigration\"', 'Keyword:\"international trade and world market\"', 'Keyword:\"justice department\"', 'Keyword:\"labor and jobs\"', 'Keyword:\"law and legislation\"', 'Keyword:\"mueller, robert s iii\"', 'Keyword:\"news and news media\"', 'Keyword:\"obama, barack\"', 'Keyword:\"parkland, fla, shooting (2018)\"', 'Keyword:\"patient protection and affordable care act (2010)\"', 'Keyword:\"politics and government\"', 'Keyword:\"presidential election of 2016\"', 'Keyword:\"putin, vladimir v\"', 'Keyword:\"refugees and displaced persons\"', 'Keyword:\"republican party\"', 'Keyword:\"russia\"', 'Keyword:\"russian interference in 2016 us elections and ties to trump associates\"', 'Keyword:\"ryan, paul d jr\"', 'Keyword:\"school shootings and armed attacks\"', 'Keyword:\"senate\"', 'Keyword:\"special prosecutors (independent counsel)\"', 'Keyword:\"supreme court (us)\"', 'Keyword:\"syria\"', 'Keyword:\"trump, donald j\"', 'Keyword:\"united states\"', 'Keyword:\"united states defense and military forces\"', 'Keyword:\"united states economy\"', 'Keyword:\"united states international relations\"', 'Keyword:\"united states politics and government\"', 'Keyword:\"women and girls\"']\n" ] } ], "source": [ "keywords = df.keywords.iloc[dev_set_index]\n", "\n", "# # uncomment for full set\n", "# keywords = df.keywords\n", "\n", "def keyword_tokenizer(keyword):\n", " '''Split into keywords (phrases and remove all quotes and brackets)'''\n", " return [phrase.strip('\\'\\[\\] ') for phrase in keyword.split('\\',')]\n", "\n", "\n", "counter_keywords = CountVectorizer(stop_words='english',\n", " analyzer='word',\n", " tokenizer=keyword_tokenizer,\n", " max_features=50)\n", "counter_keywords.fit(df.keywords)\n", "common_keywords = counter_keywords.transform(keywords)\n", "keyword_features = counter_keywords.get_feature_names()\n", "keyword_features = ['Keyword:\"' + feature +\n", " '\"' for feature in keyword_features]\n", "print(keyword_features)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Yep, this does seem like a list of hot topics in the news for the two periods the articles covered." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Finally, the last feature to be engineered is the same counter for most commonly occuring words, but this time applied to the comments themselves. How many times does a comment include the name 'Trump' for example? We'll be a little more generous and allow the top 100 such tokens to be counted (and again you could easily increase the number of tokens here). We will also exclude extremely common tokens - ones that appear in more than 30% of the comments and look for either single tokens or bigrams of tokens. Warning: this will take a few minutes... " ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['CommentWord:\"actually\"', 'CommentWord:\"administration\"', 'CommentWord:\"america\"', 'CommentWord:\"american\"', 'CommentWord:\"americans\"', 'CommentWord:\"article\"', 'CommentWord:\"believe\"', 'CommentWord:\"best\"', 'CommentWord:\"better\"', 'CommentWord:\"care\"', 'CommentWord:\"change\"', 'CommentWord:\"children\"', 'CommentWord:\"come\"', 'CommentWord:\"congress\"', 'CommentWord:\"country\"', 'CommentWord:\"day\"', 'CommentWord:\"democrats\"', 'CommentWord:\"did\"', 'CommentWord:\"didn\"', 'CommentWord:\"does\"', 'CommentWord:\"doesn\"', 'CommentWord:\"doing\"', 'CommentWord:\"don\"', 'CommentWord:\"donald\"', 'CommentWord:\"election\"', 'CommentWord:\"fact\"', 'CommentWord:\"far\"', 'CommentWord:\"going\"', 'CommentWord:\"good\"', 'CommentWord:\"gop\"', 'CommentWord:\"government\"', 'CommentWord:\"great\"', 'CommentWord:\"health\"', 'CommentWord:\"help\"', 'CommentWord:\"hope\"', 'CommentWord:\"house\"', 'CommentWord:\"just\"', 'CommentWord:\"know\"', 'CommentWord:\"law\"', 'CommentWord:\"left\"', 'CommentWord:\"let\"', 'CommentWord:\"life\"', 'CommentWord:\"like\"', 'CommentWord:\"little\"', 'CommentWord:\"long\"', 'CommentWord:\"look\"', 'CommentWord:\"make\"', 'CommentWord:\"man\"', 'CommentWord:\"maybe\"', 'CommentWord:\"media\"', 'CommentWord:\"money\"', 'CommentWord:\"mr\"', 'CommentWord:\"need\"', 'CommentWord:\"new\"', 'CommentWord:\"news\"', 'CommentWord:\"obama\"', 'CommentWord:\"office\"', 'CommentWord:\"party\"', 'CommentWord:\"pay\"', 'CommentWord:\"people\"', 'CommentWord:\"person\"', 'CommentWord:\"point\"', 'CommentWord:\"political\"', 'CommentWord:\"power\"', 'CommentWord:\"president\"', 'CommentWord:\"problem\"', 'CommentWord:\"public\"', 'CommentWord:\"read\"', 'CommentWord:\"real\"', 'CommentWord:\"really\"', 'CommentWord:\"republican\"', 'CommentWord:\"republicans\"', 'CommentWord:\"right\"', 'CommentWord:\"russia\"', 'CommentWord:\"said\"', 'CommentWord:\"say\"', 'CommentWord:\"school\"', 'CommentWord:\"state\"', 'CommentWord:\"states\"', 'CommentWord:\"support\"', 'CommentWord:\"tax\"', 'CommentWord:\"thing\"', 'CommentWord:\"things\"', 'CommentWord:\"think\"', 'CommentWord:\"time\"', 'CommentWord:\"times\"', 'CommentWord:\"trump\"', 'CommentWord:\"use\"', 'CommentWord:\"ve\"', 'CommentWord:\"vote\"', 'CommentWord:\"want\"', 'CommentWord:\"war\"', 'CommentWord:\"way\"', 'CommentWord:\"white\"', 'CommentWord:\"women\"', 'CommentWord:\"won\"', 'CommentWord:\"work\"', 'CommentWord:\"world\"', 'CommentWord:\"year\"', 'CommentWord:\"years\"']\n" ] } ], "source": [ "\n", "# exclude very common words (appear in more than half the comments)\n", "counter = CountVectorizer(stop_words='english',\n", " analyzer='word',\n", " ngram_range=(1, 2),\n", " max_df=.3,\n", " max_features=100,\n", " )\n", "\n", "counter.fit(df.commentBody)\n", "common_tokens = counter.transform(comments)\n", "token_names = counter.get_feature_names()\n", "token_names = ['CommentWord:\"' + feature + '\"' for feature in token_names]\n", "print(token_names)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Add them to the dev set. We also downgrade a few of the data types to save on memory (this is not strictly necessary, but it is useful with the full set which easily runs over memory limits)." ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 104343 entries, 0 to 104342\n", "Columns: 356 entries, editorsSelection_1 to CommentWord:\"years\"\n", "dtypes: float64(79), int64(151), int8(1), uint8(125)\n", "memory usage: 195.6 MB\n", "None\n", "\n", "RangeIndex: 104343 entries, 0 to 104342\n", "Columns: 356 entries, editorsSelection_1 to CommentWord:\"years\"\n", "dtypes: float32(79), int8(152), uint8(125)\n", "memory usage: 59.0 MB\n", "None\n" ] } ], "source": [ "X = pd.concat([X,\n", " pd.DataFrame(common_keywords.toarray(),\n", " columns=keyword_features),\n", " pd.DataFrame(common_tokens.toarray(),\n", " columns=token_names)],\n", " axis=1)\n", "\n", "# # uncomment for full set\n", "# X_full = pd.concat([X_full.reset_index(drop=True),\n", "# pd.DataFrame(common_keywords.toarray(),\n", "# columns=keyword_features),\n", "# pd.DataFrame(comment_tokens.toarray(),\n", "# columns=token_names)],\n", "# axis=1)\n", "\n", "# Memory saving type conversion\n", "# this is dangerous when storing very large numbers, but we don't have those here\n", "print(X.info())\n", "# ints\n", "for column in X.columns[X.dtypes.eq('int')]:\n", " X[column] = X[column].astype('int8')\n", "# floats \n", "for column in X.columns[X.dtypes.eq('float')]:\n", " X[column] = X[column].astype('float32')\n", "print(X.info())\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Before putting our newly created features to good use, let's make sure the correlations between them are not too high. We can do it manually, we can do it using a dimension-reduction technique such as Principal Component Analysis, and we can write a function to eliminate the one feature in a highly correlated pair that has a lower correlation to the target. Why don't we try all three:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "recognizedWordCount WordCount 0.975403\n", "author_By THE EDITORIAL B... typeOfMaterial_Editorial 0.973378\n", "stdDevSentLength maxSentLength 0.814365\n", "MaxIdf StdDevIdf 0.790523\n", "meanSentLength minSentLength 0.761150\n", "WordCount SentCount 0.757449\n", "Keyword:\"health insurance... Keyword:\"patient protecti... 0.756432\n", "stdDevWordLength maxWordLength 0.741531\n", "meanSentLength maxSentLength 0.739466\n", "recognizedWordCount SentCount 0.733457\n", "Keyword:\"executive orders... Keyword:\"refugees and dis... 0.699262\n", "Keyword:\"\" author_By THE LEARNING NE... 0.690874\n", "Keyword:\"parkland, fla, s... Keyword:\"school shootings... 0.689578\n", "Keyword:\"school shootings... Keyword:\"gun control\" 0.686446\n", "Keyword:\"comey, james b\" Keyword:\"federal bureau o... 0.672945\n", "Keyword:\"special prosecut... Keyword:\"mueller, robert ... 0.671814\n", "recognizedWordCount maxSentLength 0.670206\n", "WordCount maxSentLength 0.669039\n", "Keyword:\"trump, donald j\" Keyword:\"united states po... 0.629903\n", "Keyword:\"gun control\" Keyword:\"parkland, fla, s... 0.579318\n", "dtype: float64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top_correlations(X)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, several issues here. It seems our two number-of-words counters basically report the same measure, even though they use different tokenizers. Same for editorials - author and type of material. Additionally, too many of our standard deviation and maximum features are highly correlated (which makes a lot of sense given how a standard deviation is calculated.) Let's remove the worst offenders for now using a small function, setting a rather arbitrary limit of Pearson's r less than 0.7 (or greater than -0.7):" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "List of dropped: \n", "['WordCount', 'author_By THE EDITORIAL BOARD', 'stdDevSentLength', 'StdDevIdf', 'minSentLength', 'Keyword:\"patient protection and affordable care act (2010)\"', 'stdDevWordLength', 'meanSentLength', 'SentCount']\n", "(104343, 347)\n" ] }, { "data": { "text/plain": [ "Keyword:\"refugees and dis... Keyword:\"executive orders... 0.699262\n", "Keyword:\"\" author_By THE LEARNING NE... 0.690874\n", "Keyword:\"parkland, fla, s... Keyword:\"school shootings... 0.689578\n", "Keyword:\"school shootings... Keyword:\"gun control\" 0.686446\n", "Keyword:\"comey, james b\" Keyword:\"federal bureau o... 0.672945\n", "Keyword:\"special prosecut... Keyword:\"mueller, robert ... 0.671814\n", "maxSentLength recognizedWordCount 0.670206\n", "Keyword:\"united states po... Keyword:\"trump, donald j\" 0.629903\n", "Keyword:\"parkland, fla, s... Keyword:\"gun control\" 0.579318\n", "MeanIdf MaxIdf 0.576268\n", "CommentWord:\"health\" CommentWord:\"care\" 0.572974\n", "sectionName_Politics sectionName_Unknown -0.560913\n", "Keyword:\"refugees and dis... Keyword:\"immigration and ... 0.548510\n", "Keyword:\"mueller, robert ... Keyword:\"russian interfer... 0.546594\n", "approveDay_Sunday sectionName_Sunday Review 0.533464\n", "Keyword:\"house of represe... Keyword:\"senate\" 0.530111\n", "MaxIdf recognizedWordCount 0.520249\n", "recognizedWordCount maxWordLength 0.511606\n", "Keyword:\"cyberwarfare and... Keyword:\"russia\" 0.498118\n", "VB_Percent MD_Percent 0.496314\n", "dtype: float64" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def feature_pruner(df=X, y='recs', max_corr=.7):\n", " '''Removes features with higher corrcoef than max_corr. Picks based on corr with y (higher absolute is better). Returns DataFrame'''\n", " top = top_correlations(df, n=df.shape[1] ** 2, abbreviate=False)\n", " top = top[np.abs(top) >= max_corr]\n", " if top.size == 0:\n", " return df\n", " left = top.index.get_level_values(0)\n", " right = top.index.get_level_values(1)\n", " pairs = [[left, right] for left, right in zip(left, right)]\n", " to_drop = []\n", " for a, b in pairs:\n", " if a in to_drop or b in to_drop:\n", " continue\n", " if np.abs(df[y].corr(df[a])) >= np.abs(df[y].corr(df[b])):\n", " to_drop.append(b)\n", " else:\n", " to_drop.append(a)\n", " print('List of dropped: ')\n", " print(to_drop)\n", " df = df.drop(columns=to_drop)\n", " return df\n", "\n", "\n", "X_pruned = feature_pruner(X)\n", "print(X_pruned.shape)\n", "top_correlations(X_pruned)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "A quick look at the correlations with the target. Obviously this can only tell us about linear relationships, but it is still a good sanity check before modeling:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "recs 1.000000\n", "depth -0.378531\n", "Keyword:\"\" -0.184677\n", "editorsSelection_1 0.154576\n", "author_By THE LEARNING NETWORK -0.152353\n", "recognizedWordCount 0.130134\n", "MaxIdf 0.112667\n", "reply -0.107110\n", "maxSentLength 0.104036\n", "maxWordLength 0.098088\n", "trusted 0.088921\n", "CommentWord:\"trump\" 0.082718\n", "MinIdf -0.082413\n", "minWordLength -0.081297\n", "hoursAfterArticle -0.078762\n", "author_By DEB AMLEN -0.054650\n", "sectionName_Unknown -0.054377\n", "timespeople 0.054288\n", "commentObjectivity 0.052938\n", "approveHour_10AM 0.047911\n", "dtype: float64" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "correlations = X_pruned.corrwith(X_pruned.recs)\n", "correlations.reindex(correlations.abs().sort_values(\n", " ascending=False).index).head(20)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "We'll have to take a closer look at the empty keyword feature as well as the The Learning Network as author if they show up again, something funny is going on with them (perhaps commenting was disabled for them for some types of articles?) The rest are more or less to be expected, quite a few of our engineered features are coming up, we'll see if the classifier we use for feature importance uses them." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Training & Re-evaluation \n", "At last, time to see whether the feature engineering paid off. \n", "As usual, we split the data into training and test and remove zero-variance features within a category of the target." ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# separate out the target\n", "predictors = X_pruned.columns[X_pruned.columns != 'recs']\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(\n", " X_pruned[predictors], X_pruned['recs'], test_size=0.25, random_state=12)\n", "\n", "# check for no variance within category of the target (problem for fitting regression)\n", "variance_per_category = X_train.groupby(y_train).var() == 0\n", "novariance = variance_per_category.sum() > 0\n", "X_train.drop(X_train.columns[novariance],\n", " axis=1,\n", " inplace=True)\n", "X_test.drop(X_test.columns[novariance],\n", " axis=1,\n", " inplace=True)\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "And fit the logistic regression classifier again: " ] }, { "cell_type": "code", "execution_count": 55, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimization terminated successfully.\n", " Current function value: 1.198865\n", " Iterations 9\n", "\n", " precision recall f1-score support\n", "\n", " 0 0.48 0.40 0.44 4893\n", " 1 0.35 0.22 0.27 6025\n", " 2 0.35 0.41 0.38 7506\n", " 3 0.50 0.62 0.55 7662\n", "\n", "avg / total 0.42 0.43 0.42 26086\n", "\n", "MLogit accuracy is about 0.43\n" ] } ], "source": [ "# MLogit fit\n", "X_logit = st.add_constant(X_train, prepend=False)\n", "MLogit = st.MNLogit(y_train, X_logit)\n", "logit_fit = MLogit.fit()\n", "# print(logit_fit.summary())\n", "logit_y_hat = logit_fit.predict(exog=st.add_constant(X_test, prepend=False))\n", "# need to convert it to actual predictions as these are probabilities\n", "logit_y_hat = logit_y_hat.idxmax(axis=1)\n", "print()\n", "print(classification_report(y_test, logit_y_hat))\n", "print('MLogit accuracy is about %.2f' % accuracy_score(y_test, logit_y_hat))\n" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# print(logit_fit.summary())" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Yay! Accuracy went up by four percent relative to the minimal feature set. And the classifier correctly recalls the second class 22% of the time, a 7% improvement. Uncomment the summary command to see the significance of individual features (long output...).\n", "\n", "Next, let's use our bag of classifiers: \n" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Fitting MLogit\n", "Accuracy of MLogit: 0.4233\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.47 0.41 0.44 4893\n", " 1 or 2 0.35 0.20 0.25 6025\n", " 3 to 8 0.35 0.40 0.37 7506\n", " 9 or more 0.49 0.63 0.55 7662\n", "\n", "avg / total 0.41 0.42 0.41 26086\n", "\n", "ROC_AUC_score: 0.5821\n", "\n", "\n", "Fitting Ridge\n", "Accuracy of Ridge: 0.4173\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.45 0.42 0.44 4893\n", " 1 or 2 0.35 0.18 0.23 6025\n", " 3 to 8 0.35 0.37 0.36 7506\n", " 9 or more 0.47 0.65 0.55 7662\n", "\n", "avg / total 0.40 0.42 0.40 26086\n", "\n", "ROC_AUC_score: 0.5796\n", "\n", "\n", "Fitting KNN\n", "Accuracy of KNN: 0.3211\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.31 0.33 0.32 4893\n", " 1 or 2 0.26 0.31 0.28 6025\n", " 3 to 8 0.31 0.35 0.33 7506\n", " 9 or more 0.43 0.29 0.35 7662\n", "\n", "avg / total 0.34 0.32 0.32 26086\n", "\n", "ROC_AUC_score: 0.5410\n", "\n", "\n", "Fitting RandomForest\n", "Accuracy of RandomForest: 0.3681\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.38 0.38 0.38 4893\n", " 1 or 2 0.28 0.28 0.28 6025\n", " 3 to 8 0.32 0.34 0.33 7506\n", " 9 or more 0.47 0.46 0.47 7662\n", "\n", "avg / total 0.37 0.37 0.37 26086\n", "\n", "ROC_AUC_score: 0.5602\n", "\n", "\n", "Fitting XGB\n", "Accuracy of XGB: 0.4241\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.49 0.40 0.44 4893\n", " 1 or 2 0.35 0.19 0.25 6025\n", " 3 to 8 0.35 0.41 0.38 7506\n", " 9 or more 0.48 0.63 0.55 7662\n", "\n", "avg / total 0.42 0.42 0.41 26086\n", "\n", "ROC_AUC_score: 0.5831\n" ] }, { "data": { "text/plain": [ "{'MLogit': 0.4233,\n", " 'Ridge': 0.4173,\n", " 'KNN': 0.3211,\n", " 'RandomForest': 0.3681,\n", " 'XGB': 0.4241}" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "predict_recs(X_train, y_train, X_test, y_test)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "So increasing the number of predictors posed a problem for some of our classifiers, let's increase their capacity to handle more features by upgrading the key parameter value for each one that can be tuned quickly." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "Fitting MLogit\n", "Accuracy of MLogit: 0.4233\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.47 0.41 0.44 4893\n", " 1 or 2 0.35 0.20 0.25 6025\n", " 3 to 8 0.35 0.40 0.37 7506\n", " 9 or more 0.49 0.63 0.55 7662\n", "\n", "avg / total 0.41 0.42 0.41 26086\n", "\n", "ROC_AUC_score: 0.5821\n", "\n", "\n", "Fitting Ridge\n", "Accuracy of Ridge: 0.4173\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.45 0.42 0.44 4893\n", " 1 or 2 0.35 0.18 0.23 6025\n", " 3 to 8 0.35 0.37 0.36 7506\n", " 9 or more 0.47 0.65 0.55 7662\n", "\n", "avg / total 0.40 0.42 0.40 26086\n", "\n", "ROC_AUC_score: 0.5796\n", "\n", "\n", "Fitting KNN\n", "Accuracy of KNN: 0.3570\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.46 0.20 0.28 4893\n", " 1 or 2 0.30 0.22 0.25 6025\n", " 3 to 8 0.31 0.50 0.39 7506\n", " 9 or more 0.43 0.42 0.42 7662\n", "\n", "avg / total 0.37 0.36 0.35 26086\n", "\n", "ROC_AUC_score: 0.5459\n", "\n", "\n", "Fitting RandomForest\n", "Accuracy of RandomForest: 0.4257\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.48 0.39 0.43 4893\n", " 1 or 2 0.35 0.21 0.27 6025\n", " 3 to 8 0.35 0.39 0.37 7506\n", " 9 or more 0.49 0.65 0.56 7662\n", "\n", "avg / total 0.42 0.43 0.41 26086\n", "\n", "ROC_AUC_score: 0.5815\n", "\n", "\n", "Fitting XGB\n", "Accuracy of XGB: 0.4431\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.52 0.41 0.46 4893\n", " 1 or 2 0.36 0.22 0.27 6025\n", " 3 to 8 0.36 0.46 0.41 7506\n", " 9 or more 0.53 0.62 0.57 7662\n", "\n", "avg / total 0.44 0.44 0.43 26086\n", "\n", "ROC_AUC_score: 0.5932\n" ] } ], "source": [ "models = [('MLogit', LogisticRegression(n_jobs=-1)),\n", " ('Ridge', RidgeClassifier()),\n", " ('KNN', KNeighborsClassifier(n_neighbors=50, n_jobs=-1)),\n", " ('RandomForest', RandomForestClassifier(n_estimators=200, n_jobs=-1)),\n", " ('XGB', XGBClassifier(tree_method='gpu_hist',\n", " n_estimators=300)),\n", " ]\n", "all_features = predict_recs(X_train, y_train, X_test, y_test, models)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "All of the classifiers that can easily be tuned did better with some tuning. In the end, we will use XGB with a cross-validated number of trees, so we should get good performance of the full set.\n", "An alternative and popular method of feature reduction is using PCA. Since some classifiers tend to do better with a reduced number of dimensions, we could also try it. Obviously we could grid-search the optimal number of components to reduce down to, but let's take a quick look at what happens to our predictive accuracy with 100." ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(78257, 100)\n", "(26086, 100)\n", "\n", "\n", "Fitting MLogit\n", "Accuracy of MLogit: 0.3983\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.42 0.37 0.39 4893\n", " 1 or 2 0.35 0.17 0.23 6025\n", " 3 to 8 0.34 0.24 0.28 7506\n", " 9 or more 0.43 0.75 0.54 7662\n", "\n", "avg / total 0.38 0.40 0.37 26086\n", "\n", "ROC_AUC_score: 0.5629\n", "\n", "\n", "Fitting Ridge\n", "Accuracy of Ridge: 0.3914\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.41 0.39 0.40 4893\n", " 1 or 2 0.35 0.14 0.20 6025\n", " 3 to 8 0.34 0.21 0.26 7506\n", " 9 or more 0.41 0.77 0.53 7662\n", "\n", "avg / total 0.38 0.39 0.35 26086\n", "\n", "ROC_AUC_score: 0.5611\n", "\n", "\n", "Fitting KNN\n", "Accuracy of KNN: 0.3577\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.42 0.24 0.30 4893\n", " 1 or 2 0.32 0.23 0.26 6025\n", " 3 to 8 0.31 0.40 0.35 7506\n", " 9 or more 0.41 0.50 0.45 7662\n", "\n", "avg / total 0.36 0.36 0.35 26086\n", "\n", "ROC_AUC_score: 0.5470\n", "\n", "\n", "Fitting RandomForest\n", "Accuracy of RandomForest: 0.4115\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.46 0.40 0.43 4893\n", " 1 or 2 0.34 0.20 0.25 6025\n", " 3 to 8 0.35 0.35 0.35 7506\n", " 9 or more 0.46 0.65 0.54 7662\n", "\n", "avg / total 0.40 0.41 0.40 26086\n", "\n", "ROC_AUC_score: 0.5765\n", "\n", "\n", "Fitting XGB\n", "Accuracy of XGB: 0.4217\n", "\n", "Classification report:\n", "\n", " precision recall f1-score support\n", "\n", " None 0.48 0.41 0.44 4893\n", " 1 or 2 0.34 0.20 0.25 6025\n", " 3 to 8 0.35 0.38 0.36 7506\n", " 9 or more 0.48 0.65 0.55 7662\n", "\n", "avg / total 0.41 0.42 0.41 26086\n", "\n", "ROC_AUC_score: 0.5811\n" ] } ], "source": [ "dim_reduce = PCA(n_components=100)\n", "X_train_reduced = dim_reduce.fit_transform(X_train)\n", "X_test_reduced = dim_reduce.transform(X_test)\n", "print(X_train_reduced.shape)\n", "print(X_test_reduced.shape)\n", "\n", "reduced_features = predict_recs(X_train_reduced, y_train, X_test_reduced, y_test, models)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "That didn't really do much for us (though perhaps it would on the full set). For the two regression classifiers extra features are not really a problem. In principle the same should be true for the ensemble methods (Random Forest and XGB), however it often is not, especially if two features related to the target and highly correlated with each other are run in a model without a sufficiently high number of cases relative to the number of features. In this case, however, we ended up losing useful data and the ensembles classifiers ended up doing worse with the reduced set. The only model that did benefit is KNN - in this case the reduction in variance helped (remember we also increased the parameter K, the number of neighbors a case is compared to). Even that improvement is marginal, however.\n", "\n", "Let's fine-tune the Extreme Gradient Boost using the original data a little bit - there is a way to find the optimal number of trees (estimators) by iteratively adding and comparing accuracy on an evaluation set. Using the non-reduced data also allows us to examine feature importance, which we are interested in. We'll allow the number to go up to 2000." ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Optimal number of trees: 171\n", "Accuracy of score: 0.4464\n", " precision recall f1-score support\n", "\n", " None 0.53 0.41 0.46 4893\n", " 1 or 2 0.36 0.24 0.28 6025\n", " 3 to 8 0.36 0.45 0.40 7506\n", " 9 or more 0.54 0.63 0.58 7662\n", "\n", "avg / total 0.44 0.45 0.44 26086\n", "\n" ] } ], "source": [ "XGB = XGBClassifier(tree_method='gpu_hist',\n", " max_bin=512,\n", " max_depth=7,\n", " n_estimators=2000)\n", "\n", "trained = XGB.fit(X_train,\n", " y_train,\n", " eval_set=[(X_test, y_test)],\n", " early_stopping_rounds=200,\n", " verbose=False)\n", "\n", "optimal_n_trees = trained.best_ntree_limit\n", "print('Optimal number of trees: %d' % (optimal_n_trees))\n", "XGB.set_params(n_estimators=optimal_n_trees)\n", "XGB.fit(X_train, y_train)\n", "y_hat = XGB.predict(X_test)\n", "print('Accuracy of score: %.4f' % (accuracy_score(y_test, y_hat)))\n", "print(classification_report(y_test, y_hat, \n", " target_names = ['None', '1 or 2', '3 to 8', '9 or more']))\n", "\n", "# this is how we can get the feature importance attributes saved by the model. Let's use the ones from the full set, not the dev set \n", "# df_features = pd.DataFrame({'Weight': XGB.get_booster().get_score(importance_type='weight'),\n", "# 'Gain': XGB.get_booster().get_score(importance_type='gain'),\n", "# 'Cover': XGB.get_booster().get_score(importance_type='cover')},\n", "# index=X_train.columns)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "This increased the accuracy of XGB up to 45%. Now that we have a decently performing classifier, why don't we can try to train it on the full set. I ran this separately on my DIY machine learning server at home (with a GTX1080TI that runs XGB fast), as it would take too long in this notebook and requires a few tricks to fit in memory with its over 2 mln cases and a high number of trees. \n", "\n", "At the end, it reached an accuracy of almost 49% on the hold-out set, which is pretty good - an improvement of 20% over the constant baseline model and of 11% of the XGB model with no feature engineering. It also managed to recall 29% of the the problematic second category, which is a solid improvement of more than 11% over the initial run.\n", "Let's take a look at the feature importance attributes of the model: " ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "df_features = pd.read_pickle('./Data/Feature_importance_full.pkl')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Here we are using the feature importance functionality to see what features displayed the highest gain (allowed us to classify cases better when included) and had the greatest weight (were present in greatest number of trees, which means they mattered when classifying a large number of cases). " ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WeightGainCover
Keyword:\"\"497.0682.196571151838.109084
depth3042.0412.020744136746.635392
editorsSelection_1716.0341.794156316714.079275
author_By THE LEARNING NETWORK430.0112.460529224883.973088
trusted1154.066.84917284256.032967
typeOfMaterial_News2553.057.14610938005.193426
author_By DEB AMLEN626.056.199554185494.032612
author_By GAIL COLLINS500.044.423127112118.262682
Keyword:\"syria\"661.034.882420134829.897287
sectionName_Family428.032.341839212454.863198
author_By DAVID BROOKS683.031.381635224897.240937
approveHour_3AM603.027.642708108925.055124
approveDay_Sunday1440.025.43400939614.574964
approveHour_6AM302.025.242134234501.792347
hoursAfterArticle28067.024.09587629920.784922
author_By ROSS DOUTHAT581.024.012026212154.869081
Keyword:\"global warming\"633.023.019341177123.471271
Keyword:\"school shootings and armed attacks\"505.022.711046119801.236389
Keyword:\"gun control\"615.021.96987980282.095568
approveHour_5AM434.021.439328155686.510859
approveHour_4AM474.021.305978152005.843738
approveDay_Saturday2021.019.98463225385.068124
sectionName_Sunday Review967.019.30637459144.668091
Keyword:\"russia\"896.018.88355224850.877874
sectionName_Live273.018.882208185213.265113
approveHour_3PM1613.018.55180650622.438727
sectionName_Politics1636.017.78170336545.528892
Keyword:\"united states international relations\"1127.017.76637127295.655350
Keyword:\"comey, james b\"879.017.759434142679.586790
sectionName_Baseball288.017.516210363076.870441
\n", "
" ], "text/plain": [ " Weight Gain \\\n", "Keyword:\"\" 497.0 682.196571 \n", "depth 3042.0 412.020744 \n", "editorsSelection_1 716.0 341.794156 \n", "author_By THE LEARNING NETWORK 430.0 112.460529 \n", "trusted 1154.0 66.849172 \n", "typeOfMaterial_News 2553.0 57.146109 \n", "author_By DEB AMLEN 626.0 56.199554 \n", "author_By GAIL COLLINS 500.0 44.423127 \n", "Keyword:\"syria\" 661.0 34.882420 \n", "sectionName_Family 428.0 32.341839 \n", "author_By DAVID BROOKS 683.0 31.381635 \n", "approveHour_3AM 603.0 27.642708 \n", "approveDay_Sunday 1440.0 25.434009 \n", "approveHour_6AM 302.0 25.242134 \n", "hoursAfterArticle 28067.0 24.095876 \n", "author_By ROSS DOUTHAT 581.0 24.012026 \n", "Keyword:\"global warming\" 633.0 23.019341 \n", "Keyword:\"school shootings and armed attacks\" 505.0 22.711046 \n", "Keyword:\"gun control\" 615.0 21.969879 \n", "approveHour_5AM 434.0 21.439328 \n", "approveHour_4AM 474.0 21.305978 \n", "approveDay_Saturday 2021.0 19.984632 \n", "sectionName_Sunday Review 967.0 19.306374 \n", "Keyword:\"russia\" 896.0 18.883552 \n", "sectionName_Live 273.0 18.882208 \n", "approveHour_3PM 1613.0 18.551806 \n", "sectionName_Politics 1636.0 17.781703 \n", "Keyword:\"united states international relations\" 1127.0 17.766371 \n", "Keyword:\"comey, james b\" 879.0 17.759434 \n", "sectionName_Baseball 288.0 17.516210 \n", "\n", " Cover \n", "Keyword:\"\" 151838.109084 \n", "depth 136746.635392 \n", "editorsSelection_1 316714.079275 \n", "author_By THE LEARNING NETWORK 224883.973088 \n", "trusted 84256.032967 \n", "typeOfMaterial_News 38005.193426 \n", "author_By DEB AMLEN 185494.032612 \n", "author_By GAIL COLLINS 112118.262682 \n", "Keyword:\"syria\" 134829.897287 \n", "sectionName_Family 212454.863198 \n", "author_By DAVID BROOKS 224897.240937 \n", "approveHour_3AM 108925.055124 \n", "approveDay_Sunday 39614.574964 \n", "approveHour_6AM 234501.792347 \n", "hoursAfterArticle 29920.784922 \n", "author_By ROSS DOUTHAT 212154.869081 \n", "Keyword:\"global warming\" 177123.471271 \n", "Keyword:\"school shootings and armed attacks\" 119801.236389 \n", "Keyword:\"gun control\" 80282.095568 \n", "approveHour_5AM 155686.510859 \n", "approveHour_4AM 152005.843738 \n", "approveDay_Saturday 25385.068124 \n", "sectionName_Sunday Review 59144.668091 \n", "Keyword:\"russia\" 24850.877874 \n", "sectionName_Live 185213.265113 \n", "approveHour_3PM 50622.438727 \n", "sectionName_Politics 36545.528892 \n", "Keyword:\"united states international relations\" 27295.655350 \n", "Keyword:\"comey, james b\" 142679.586790 \n", "sectionName_Baseball 363076.870441 " ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_features.sort_values('Gain', ascending=False).head(30)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, some of these (empty keyword, the Learning Network or others as authors) are what we could call 'structural' factors - meaning they pertain to the article rather than the comments themselves. Others are about the comments (depth, selected by editor, written by a trusted user, approved at certain hours of the day or posted a certain number of hours after the original article) and will be of greater interest to us.\n", "\n", "As the simple averages of empty article keyword and Learning Network as the authors indicate, the comment upvotes for those were extremely low - perhaps upvoting was disabled for a portion of time on them? Additional detective work would be necessary to ascertain what happened here." ] }, { "cell_type": "code", "execution_count": 63, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "author_By THE LEARNING NETWORK\n", "0 19.933151\n", "1 0.034280\n", "Name: recommendations, dtype: float64\n", "Keyword:\"\"\n", "0 20.098355\n", "1 0.720330\n", "Name: recommendations, dtype: float64\n" ] } ], "source": [ "# bring in the original unmodified recommendations variable\n", "recommendations = df.loc[dev_set_index, 'recommendations']\n", "X['recommendations'] = recommendations.reset_index(drop=True)\n", "\n", "print(X.groupby('author_By THE LEARNING NETWORK').recommendations.mean())\n", "\n", "print(X.groupby('Keyword:\"\"').recommendations.mean())\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "If we sort our important features by 'Weight' (in how many trees do they appear), we see quite a few surprising factors - hours after article and articleWordCount as well as the percentage of names. I'll say more about them in the next section - there is plenty of intriguing information here." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
WeightGainCover
hoursAfterArticle28067.024.09587629920.784922
articleWordCount20748.011.75355227152.405073
printPage12070.015.04509230600.598564
reply9793.09.45721720839.824937
commentPolarity8195.03.45919711807.220214
meanWordLength8133.03.35857613323.139771
NNP_Percent8077.04.15344110649.774066
NN_Percent7708.03.34862514371.441559
DT_Percent7424.03.3845029474.644469
RB_Percent7282.03.40215511412.144736
commentObjectivity7256.03.36313011736.138772
IN_Percent7185.03.38232213826.117528
Idf3Percent7053.03.3405059648.121062
JJ_Percent7042.03.33851910733.782276
MeanIdf6835.03.5444449056.971679
Idf4Percent6774.03.2701827689.664120
maxSentLength6728.03.97515916023.769561
PRP_Percent6636.04.14371125704.084771
NNS_Percent6531.03.47585011027.535167
CC_Percent6466.03.4232828732.754822
Idf5Percent6464.03.3008319638.631578
recognizedWordCount6333.08.57369813550.883542
VB_Percent6332.03.63928411144.892127
Idf2Percent6153.03.29226212727.000114
VBZ_Percent6100.03.44471114305.933591
VBP_Percent5917.03.41599014272.319094
TO_Percent5890.03.37231513012.840049
VBN_Percent5560.03.33065211091.542413
VBG_Percent5473.03.42843610130.894194
MinIdf5468.03.21039516043.595959
Idf6Percent5438.03.4211146735.352357
MaxIdf5348.03.50503710320.106724
VBD_Percent5342.03.93010610834.886340
MD_Percent5189.04.56300325085.857509
PRP$_Percent5157.06.18486820498.521030
Idf7Percent5072.03.62108213139.742682
maxWordLength4701.03.50034720085.722387
commentSpellErrorsPercent4466.04.37388923221.690001
CD_Percent4242.03.66140910338.213885
Idf1Percent4220.03.4252748179.586358
\n", "
" ], "text/plain": [ " Weight Gain Cover\n", "hoursAfterArticle 28067.0 24.095876 29920.784922\n", "articleWordCount 20748.0 11.753552 27152.405073\n", "printPage 12070.0 15.045092 30600.598564\n", "reply 9793.0 9.457217 20839.824937\n", "commentPolarity 8195.0 3.459197 11807.220214\n", "meanWordLength 8133.0 3.358576 13323.139771\n", "NNP_Percent 8077.0 4.153441 10649.774066\n", "NN_Percent 7708.0 3.348625 14371.441559\n", "DT_Percent 7424.0 3.384502 9474.644469\n", "RB_Percent 7282.0 3.402155 11412.144736\n", "commentObjectivity 7256.0 3.363130 11736.138772\n", "IN_Percent 7185.0 3.382322 13826.117528\n", "Idf3Percent 7053.0 3.340505 9648.121062\n", "JJ_Percent 7042.0 3.338519 10733.782276\n", "MeanIdf 6835.0 3.544444 9056.971679\n", "Idf4Percent 6774.0 3.270182 7689.664120\n", "maxSentLength 6728.0 3.975159 16023.769561\n", "PRP_Percent 6636.0 4.143711 25704.084771\n", "NNS_Percent 6531.0 3.475850 11027.535167\n", "CC_Percent 6466.0 3.423282 8732.754822\n", "Idf5Percent 6464.0 3.300831 9638.631578\n", "recognizedWordCount 6333.0 8.573698 13550.883542\n", "VB_Percent 6332.0 3.639284 11144.892127\n", "Idf2Percent 6153.0 3.292262 12727.000114\n", "VBZ_Percent 6100.0 3.444711 14305.933591\n", "VBP_Percent 5917.0 3.415990 14272.319094\n", "TO_Percent 5890.0 3.372315 13012.840049\n", "VBN_Percent 5560.0 3.330652 11091.542413\n", "VBG_Percent 5473.0 3.428436 10130.894194\n", "MinIdf 5468.0 3.210395 16043.595959\n", "Idf6Percent 5438.0 3.421114 6735.352357\n", "MaxIdf 5348.0 3.505037 10320.106724\n", "VBD_Percent 5342.0 3.930106 10834.886340\n", "MD_Percent 5189.0 4.563003 25085.857509\n", "PRP$_Percent 5157.0 6.184868 20498.521030\n", "Idf7Percent 5072.0 3.621082 13139.742682\n", "maxWordLength 4701.0 3.500347 20085.722387\n", "commentSpellErrorsPercent 4466.0 4.373889 23221.690001\n", "CD_Percent 4242.0 3.661409 10338.213885\n", "Idf1Percent 4220.0 3.425274 8179.586358" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_features.sort_values('Weight', ascending=False).head(40)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "#### Takeaways \n", "So what are the most important factors in predicting why some comments are upvoted and others are not? If we take a look at the top features from our full set model and examine them using the output from the full MLogit model, we get an idea of which features mattered and how.\n", "Let's start with the most surprising finding:\n", "\n", "**Time is of the essence!**\n", " \n", "Commenting on an article within a few hours after it is published is one of the most important factors in predicting popularity. Early comments have a much greater chance of snowballing in upvotes and the window of opportunity closes rather quickly with every passing hour. This is the solution to the puzzle posed in the intro: comment A appeared within an hour following the publication of the article, while comment B clocked in at 24 hours. If you wish for your comment to go the distance, better post early! \n", "Here is a visual of the percentage breakdown of the four categories across hours after article split in 20 equal-sized (meaning equal number of comments in each) bins:" ] }, { "cell_type": "code", "execution_count": 65, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZYAAAEfCAYAAABiR+CGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3Xl4TGf7wPHvTPZEBNkTWWwRxBalgpYWJZZa2lQkSLWUUPVWWxr72mot/VFU6cKrtgpVtUVqSe0kUbRoCCGSyL6QkPX8/nDlvFJLJJJIuD/X5brMmXnmuedkZu55nnPOc2sURVEQQgghyoj2aQcghBDi2SKJRQghRJmSxCKEEKJMSWIRQghRpiSxCCGEKFOSWIQQQpQpSSxCCCHKlCQWIYQQZUoSixBCiDIliUUIIUSZksQihBCiTOk+7QDKUu+Pfi3X5/9tQZ/HelzDhg154403+Oyzz9Rtx48fZ8mSJaxZs6a8wis3b230L9fn/3nAN4/92NzcXBYsWMCPP/5ISEgINjY2ZRpLZGQk06dPJykpCV1dXcaMGcNrr71Wpn08yOE+b5Tr87f/dfNjPS4oKIhly5aRnZ1NzZo1mTFjBi4uLiXq68qVK0ybNo2EhAT09PR455136NevX2nCLpGZH/1Wrs8/dUHvx37s1q1bWblyJZmZmbRu3Zo5c+agr69fjtFVLjJiKScnT57k3LlzTzuMZ86oUaMwNjYut+cfO3Ysffv2ZdeuXcyfP58JEyZw8+bNcuuvMomNjWXatGksW7aM3bt30717dyZOnFji5wkICKBbt27s3r2b77//nrlz53LlypVyiLhyioiI4PPPP+e7775j//79FBQUsHLlyqcdVoWSxFJOxo0bV2TEcq+CggK++uorunfvTvfu3fn000/JysoCYPDgwfz4448MHDiQl156iXHjxlG4AHVYWBhvvPEGXbt25a233iI6OrrCXk9lMWrUKD744INiH3f8+HH69etH9+7d8fLy4uzZswBs2bKF999/Hz8/P7788ssibfLz8xk1ahR9+twdmTZs2BA9PT2uX79e9i+kEtLV1WXBggXY29sD4OHhUaqEEBERgYeHBwBWVlbUqVOHyMjIMo21Mjt27Bht27bF1tYWjUaDn58fe/bseeBjd+3aRa9evejevTtDhgzh2rVrAHz99ddMnjyZN998k1WrVt3XrmHDhvz888/07t2bjh07cvToUcaNG8crr7zCsGHDyMvLA0r2Odi4cSPdu3fn1VdfZdy4cdy5c6fU+0ASSznx9PREURR279593327du3ijz/+YMuWLezYsYOMjIwib559+/bx448/EhQUxLFjxwgPD+fWrVv4+/szbtw4goODGTJkCGPHjq3AV1Q5tGzZstjHZGZmMnbsWCZPnszu3bsZNmwYH3/8MQUFBQAcPnyYGTNmMH78+CLtdHR06NGjB7q6d2eIT58+DYCzs3PZvohKysrKivbt2wOQl5fHL7/8QufOnUv8PB4eHuzcuZOCggIuX77M9evXad68eVmHW2lpNBr1vQZgbGysJox7xcbGMmXKFJYuXcru3bvp1KkTU6dOVe8PCQlhxYoVvP322w/sJzU1ld9++40ePXrwwQcfMGbMGIKCgoiIiODkyZMl+hyEhoayaNEiVq9ezb59+6hWrRqLFi0q9T6QxFKOJk6cyPz588nOzi6y/cCBA/Tt2xdjY2N0dHTo378/hw8fVu/v3r07hoaGGBsb4+zsTFxcHGFhYVhbW6sf/F69enHt2jViY2Mr9DVVBWfOnMHGxoZWrVoB0K1bN1JTU4mJiQHuJorikkVcXBwfffQRkydPxsjIqLxDrlRWr15N+/btCQ0N5eOPPy5x+4kTJxIYGEjbtm3p2bMn/v7+WFpalkOklZOHhweHDx8mIiKCvLw81q5de993ANz9Yn/xxRdxcnICwMvLi+PHj6ujjebNm1OrVq2H9tOlSxcAXFxccHBwoE6dOujr6+Pk5ER8fHyJPgf79u2jR48eWFtbAzBw4MCHjrIexzN18L6yadKkCa1bt+bHH38s8ks7JSUFMzMz9baZmRnJycnq7WrVqqn/19HRIT8/n4yMDKKjo+nevbt6n76+PikpKdjZ2ZXzK6laUlJSqF69epFtpqam6j6+d98/yOXLl3nvvfcYMWIEr7/+ernFWVn5+fkxZMgQduzYgbe3Nzt37sTQ0FC9Pzg4mAULFgAwaNAgBg0aVKT9mDFj+OCDD+jfvz83btzA19eXxo0bP9Zo81lQv359pkyZwrhx49DX1+eNN97A1NT0vselpqYWeZ+ampqiKAqpqalA8e9TExMTALRarfp/uPudUVBQUKLPwc2bNwkODubQoUMAKIpCbm5uSV52EZJYytmHH35I//79qV27trrNwsKCtLQ09XZaWhoWFhaPfB4rKyvq1q3Lli1byi3WZ4W5uXmR/asoCunp6Zibm3P58uVHto2Pj2fYsGF88skneHp6lneolUpkZCTx8fG0a9cOjUZDr169mDVrFleuXKFRo0bq47p27UrXrl0f+BwpKSn8/fff9O599wwqGxsbWrZsSVhY2HOTWAD69eunngl38uTJB55ZZ25uzqlTp9Tb6enpaLVaatasWSYxlORzYGVlRb9+/ZgwYUKZ9C1TYeXMysoKX19fvv76a3Vbp06d2LZtG7dv3yYvL4/AwEA6duz4yOdp3rw5iYmJ6rx/dHQ0n3zyCVJZ+n7NmjUjKSlJ/dDu2LEDGxubIsn9YaZNm4afn99zl1TgblIYP3488fHxwN2TRXJzc3FwcHjs56hRowa1atVi//79wN0vy1OnTtGgQYNyibkyunr1Kn369CEjI4Pc3FyWL19O//7973tc4XRj4Uk4GzZsoH379uoxvidVks/Bq6++yp49e0hJSQHg999/Z8WKFaXuW0YsJVCSs4PufWy3bt1Yv3492dnZXL9+HTc3N1q2bMnrr7+Ooii0aNGCzp07c/36dbKzs0lJSVHbF95OSkpi8uTJTJkyhdu3b6Orq8vQoUPV+dKq7HH3a0pKCuPGjVNvDxw4EB0dHebNm3ffHP6kSZOYMmUKd+7cwczMjAkTJhATE0NKSor6d/i3pKQk9u/fT0RERJHrjd577z3atWtXyldXOTzOPra1tcXb25tBgwahKAp6enpMmjSJtLS0Ir98izNlyhSWLFnCF198gaIodOnShXr16lX5s+seN34dHR1at25Nz5490Wg0vPrqq7Ru3fqB7T/88EOGDx9OXl4eNjY2jBs3juvXr5ORkUFmZuYj+4yLiyMvL+++93Thd0ZKSspjfw7MzMx466238Pb2Jjc3F1tbW2bNmlWKvXSXRnkGf/KGhobi6+v7tMMQQogqae3atbzwwgulbv9MjlgKf72uXbu2zK/MFkKIZ1XhyRZPehbfM5lYdHR0AB57Xl0IIcT/FH6HlpYcvBdCCFGmJLEIIYQoU+WaWCIiIujSpQs//fQTcPcshsGDB+Pj48PYsWPJyckBYNu2bbzxxht4eXmxadMm4O6pjl5eXgwfPly9UCc6Ovqx1okSQgjx9JRbYsnKymLWrFnqYnQAixcvxsfHh3Xr1uHk5ERgYCBZWVksXbqUVatWsWbNGlavXk1aWhr//e9/WbJkCc2bN+fo0aMALFq0iP/85z/lFbIQQogyUG6JRV9fn5UrV2JlZaVuO378uLqo3SuvvMLRo0c5ffo0TZs2xdTUFENDQ9zd3QkPDycjIwMLCwusrKxIT0/n9OnT1KhRg7p165ZXyEIIIcpAuSUWXV3dIusLAdy+fVstdmNubk5iYiJJSUlFFlqrVasWiYmJ2NjYcO3aNa5cuYK9vT3ffPMNffv2ZeLEicyZM0edRhNCCFG5PLXTjR92XWbh9qFDhzJt2jQcHBxITEykbdu2bNy4EX9/f44dO0ZQUJC6HlFZK00lunury5W0/dNq+zT7/nc1Ptln5dv2afZdVeN+mn2XZdxPQ4WeFWZsbKwWj4mPj8fKygorKyuSkpLUxyQkJGBlZYWLiwv//e9/mT59OoGBgfj6+nL9+nXs7Oyws7Or8stDCCHEs6pCE0u7du0ICgoCYM+ePbz00ks0b96cs2fPqmvjhIeHF1lKYN26dQwYMAA9PT3Mzc2JjY0lLi6uyLEbIYQQlUe5TYX99ddffPHFF8TExKCrq0tQUBDz58/n008/ZePGjdjZ2dG3b1/09PT46KOPePfdd9FoNIwePVqtXZCRkUFYWBh+fn7A3doP48aNw8jIiCVLlpRX6EIIIZ5AuSUWNze3IivEFvrxxx/v21ZY+/3fqlevzuLFi9XbLVq0UK9zEUIIUTnJlfdCCCHKlCQWIYQQZUoSixBCiDIliUUIIUSZksQihBCiTEliEUIIUaYksQghhChTkliEEEKUKUksQgghypQkFiGEEGWqQhNLQUEBU6ZMwdvbm8GDBxMZGfnAcsU5OTkMHz4cLy8vwsPD1fb+/v7ExcVVZMhCCCFKqELrsezdu5ebN2+yYcMGrl27xpw5c6hVqxY+Pj54enqycOFCAgMDsbe3x93dnT59+jBv3jzc3d0JCQmhYcOG2NraVmTIQgghSqhCRyxRUVE0a9YMAEdHR2JjYx9Yrjg9PR0LCwssLS1JT08nPz+f1atXM3z48IoMVwghRClUaGJxcXHh0KFD5Ofnc/nyZaKjo4mJibmvXLGtrS3R0dFERUVhb2/P5s2b6dGjBytWrCAgIIBz585VZNhCCCFKoEITS8eOHWnatCm+vr6sXr2aunXroqenp95fWJa4VatWJCQkMGvWLAYMGEBwcDDOzs5otVqmTp1aZCl9IYQQlUuF17z/8MMP1f936dIFa2tr7ty5g6GhoVquWKvVMnfuXAC+/vpr3n33XWJjY7Gzs8PIyIjMzMyKDlsIIcRjqtARy4ULFwgICADgjz/+oHHjxg8sV1woPj6eqKgo2rZti4WFBXFxcdy+fVudOhNCCFH5VOiIxcXFBUVRePPNNzEwMGD+/Pno6OgwYcKEIuWKCy1btowxY8YA0Lp1a1atWsWQIUPw9/evyLCFEEKUQIUmlnunuO71oHLFADNmzFD/r6enx4oVK8otNiGEEGWjwo+xCCGEeHydL60qRaveZR1GiciSLkIIIcqUJBYhhBBlShKLEEKIMiWJRQghRJmSxCKEEKJMSWIRQghRpiSxCCGEKFOSWIQQQpSpCk0smZmZvP/++wwePBhvb28OHjzIhQsX8Pb2xtvbm2nTpgGQkpKiPiYqKgqAvLw8/Pz8ZAFKIYSo5Cr0yvtffvmFOnXq8NFHHxEfH4+fnx+WlpZMnDiRZs2a8dFHHxESEsL169fx8vLC1taWwMBAPv74Y37++Wd69eqFiYlJRYYshBCihCp0xFKzZk3S0tIAyMjIoEaNGsTExKhVJQsrSGZkZGBpaalWkLx16xZ79+7ljTfeqMhwhRBClEKFJpaePXsSGxtL165dGTRoEOPHj6d69erq/fdWkLx27ZpaQXLlypUMGTKEzz77jEmTJnH9+vWKDFsIIUQJVOhU2K+//oqdnR3ff/89Fy5cYPTo0Ziamqr3F1aQ7Nq1KxMmTCA7O5uxY8fyww8/UL9+fVxdXWnTpg3ffvsts2bNqsjQhXimlHxhw6e7qKGoWio0sYSHh9OhQwcAXF1dyc7OJi8vT72/sIKkiYkJS5YsAWDy5MmMGTOG3bt306JFC2xtbWXEIoSocJKMH1+FJhYnJydOnz5Nt27diImJwcTEBHt7e0JDQ3nhhRfYs2cPgwcPVh9//vx5TExMqFOnDhYWFsTGxnLjxg2srKwqMmwhRBl5mkvAV62+q3ZSqtDEMmDAACZOnMigQYPIy8tj+vTpWFpaMnXqVAoKCmjevDnt2rVTH798+XK12FfXrl0ZM2YMGzduZPLkyRUZthDiHs/bl6QouQpNLCYmJixatOi+7evWrXvg4+99bI0aNVizZk25xSaEqPwkqVUNUkFSiKfkSb4kq2JVQfH8kCVdhBBClClJLEIIIcqUJBYhhBBlShKLEEKIMiWJRQghRJmSxCKEEKJMFZtYrl+/TlhYGAA///wzEydOJDIystwDE0IIUTUVm1gCAgLQ09Pj3LlzbNq0iW7dujF79uyKiE0IIUQVVGxi0Wg0NGvWjODgYHx9fenYsaO6CrEQQgjxb8UmlqysLM6cOUNQUBAvv/wyOTk5ZGRklKqzTZs2MXjwYPVfy5YtpTSxEEI8Y4pd0uWdd95hypQpDBgwgFq1arFgwQJ69epVqs68vLzw8vIC4MSJE+zatYs5c+ZIaeJKQtZhEkKUhWITS48ePejWrRspKSkAfPjhh2i1T34y2dKlS/n8888ZNGjQfaWJzczMqFu37n2liVeuXPnE/QohhChfxSaWo0ePMmnSJPT19dm9ezdz587Fw8ODV155pdSdnjlzBltbW3R0dB5YmtjV1ZVr166RnZ19X2ni27dv4+/vT+3atUvdvyh7z+to53l93UI8SrFDj6+++oqff/4ZS0tLAEaOHMk333zzRJ0GBgbSr1+/+7bfW5r44MGDrF27lg4dOhAdHU1GRgaurq6MGDGCb7/99on6F0IIUX6KHbEYGxtjYWGh3q5VqxZ6enpP1Onx48eZPHkyGo2GtLQ0dbuUJn5+yS9/IZ4dxY5YDA0NOXHiBADp6emsW7cOAwODUncYHx+PiYkJ+vr66OnpUbduXUJDQwHYs2cPL730kvpYKU0shBBVT7GJZdq0aXz//fecPXtWnaKaOXNmqTtMTEykVq1a6u2JEyeycOFCvL29cXR0vK80sb+/P3B3emzr1q189NFH+Pr6lrp/IYQQ5avYqTBbW9sixzQKCgqe6KwwNzc3vvvuO/V2/fr1pTSxEEI8Q4pNLFu2bOH27dt4e3szaNAgbty4wfDhw/Hx8amI+IQoV1LiV4iyV+zQY+PGjXh5eREcHEyDBg3Yu3cvu3btqojYhBBCVEHFjlgMDAzQ19cnJCSE119/vUwujhSiLMkZZUJULo+VJWbMmEF4eDht2rTh1KlT5OTklHdcQgghqqhiRyzz589n586dDB48GB0dHWJiYpgxY0ZFxCZKSI4XCPHsWeRT8ssr2pdDHCVR7IjFysoKLy8vDA0NiY2NpUmTJkyePLkiYhNCCFEFFTtiWblyJd9++y05OTkYGxuTnZ1N797yK1cIIcSDFTtiCQoK4siRIzRv3pxjx44xf/58GjRoUBGxCSGEqIKKTSyFy6/k5uYC0LlzZ/bu3VvugQkhhKiaip0KMzMzY9u2bbi4uBAQEEC9evVISEioiNiEEEJUQcWOWL744gvc3d0JCAjAycmJ+Ph4Fi5cWOoOt23bxuuvv07//v05cOAAcXFxDB48GB8fH8aOHUtOTg45OTkMHz4cLy8vwsPD1bb+/v7ExcWVum8hhBDlr9jEoigKZ86cwcjIiJEjR1KnTh0cHBxK1VlqaipLly5l3bp1LF++nL1797J48WJ8fHxYt24dTk5OBAYGcvToUdzd3Vm0aJG6PlhISAgNGzbE1ta2VH0LIYSoGMUmlgkTJpCUlKTezs7OZvz48aXq7OjRo3h4eFCtWjWsrKyYNWsWx48fp3PnzsD/ShOnp6djYWGhlibOz89n9erVDB8+vFT9CiGEqDjFJpa0tDSGDBmi3h46dCgZGRml6uz69evcuXOHkSNH4uPjw9GjR7l9+zb6+vrA/0oT29raEh0dTVRUFPb29mzevJkePXqwYsUKAgICOHfuXKn6F0IIUf6KTSy5ublERkaqt//66y/1DLHSSEtLY8mSJcydO5eAgAC1HDH8rzRxq1atSEhIYNasWQwYMIDg4GCcnZ3RarVMnTqVxYsXl7p/IYQQ5avYs8ICAgIYNWoUN2/epKCggJo1a/Lll1+WqjNzc3NatmyJrq4ujo6OmJiYoKOjw507dzA0NFRLE2u1WubOnQvA119/zbvvvktsbCx2dnYYGRmRmZlZqv6rAllQUQhR1RU7YmnevDlBQUHs2LGD3bt3s2vXLpo2bVqqzjp06MCxY8coKCggNTWVrKws2rVrR1BQEHB/aeL4+HiioqJo27YtFhYWxMXFFZk6E0IIUfkUO2IpVLNmzSfuzNramm7duvHWW28BMHnyZJo2bcqECRPYuHEjdnZ29O3bV338smXLGDNmDACtW7dm1apVDBkyRC1XLIQQovJ57MRSVry9vfH29i6y7ccff3zgY+9dRVlPT48VK1aUa2xCCCGe3EOnwkJCQgDYv39/hQUjhBCi6nvoiOXzzz9Hq9WyaNEiDA0N77vfw8OjXAMTQghRNT00sQwcOJDvv/+emJgYli1bVuQ+jUYjiUUIIcQDPTSx+Pn54efnx9q1a/H19a3ImIQQQlRhxR6879OnD0uXLuXs2bNoNBpatGiBn5/fA6fHhBBCiGITy9SpU7G2tsbb2xtFUThy5AiTJ09m/vz5FRFflSQXOQohnmfFJpakpKQiy+S/8sorDB48uFyDEkIIUXUVe+X97du3uX37tno7KyuL7Ozscg1KCCFE1VXsiGXAgAF4enri5uYGwN9//83YsWPLPTAhhBBVU7GJ5c0336R9+/b8/fffaDQapkyZgrW1dak6O378OGPHjqVBgwYAuLi4MGzYMMaPH09+fj6WlpbMmzcPgNGjR5OWlkZAQADu7u7A3QqSU6dOlWJfQghRiT3Wki62trZl9mXepk2bIsveBwQE4OPjg6enJwsXLiQwMBB7e3vc3d3p06cP8+bNw93dXSpICiFEFVHha4X92/Hjx9U1wV555RV++OEHunbt+sAKkl9//fVTjlYI8bxa5GNVose3L6c4qoJiD94/yJ07d0rd4aVLlxg5ciQDBw7k8OHDUkFSCCGeMcWOWN59912+//77Itt8fX3ZvHlziTtzdnbm/fffx9PTk+joaIYMGUJ+fr56/70VJDdv3sysWbMYP348ixYtYsSIEcTExDB16lQ+/PBDli9fXuL+hRDPr5KOOOD5HnU8iYcmlm3btrF06VJiY2Pp1KmTuj03NxcLC4tSdWZtbU2PHj0AcHR0xMLCgrNnz0oFSSEq2JNM68iUkCjOQxPL66+/Ts+ePZk0aZJabAtAq9ViZVXyzA93k1ViYiLvvvsuiYmJJCcn079/f4KCgujTp89DK0iOGTOGI0eOEBoaKhUkhRCiknvkVJiOjg5z587lwoULpKWlqVNVUVFRpVrd+NVXX+Xjjz9m79695ObmMn36dBo1aiQVJIUQz7TnbZRX7DGWDz74gPPnz2NjY6NuK+2y+dWqVXvgsRGpICmEEM+OYhPL9evXCQ4OrohYhHiuVORxjn+3f1qqatyiZIo93bhOnTrk5ORURCxCCCGeAcWOWLRaLT179qRZs2bo6Oio27/88styDUwIIf7teTtWUVUVm1jatWtHu3btKiIWIYQQz4BiE0u/fv2IiIjg2rVrdOnShYyMDKpXr14RsT01JS/UBVKsSwgh7io2saxatYrt27eTk5NDly5dWLZsGdWrV2fUqFEVEZ8QQogqptiD99u3b+fnn3/GzMwMgPHjx3PgwIHyjksIIUQVVWxiMTExQav938O0Wm2R20IIIcS9ip0Kc3R0ZMmSJWRkZLBnzx527txJvXr1KiI2IYQQVVCxQ4+pU6diZGSEtbU127Zto0WLFkybNq0iYhNCCFEFFZtYdHR0aN68OStWrGDJkiU4Ojqiq/tk9cHu3LlDly5d2LJlC3FxcQwePBgfHx/Gjh1LTk4OOTk5DB8+HC8vL8LDw9V2/v7+xMXFPVHfQgghytdjjVhCQkLU2ydOnGDSpElP1Ok333yjngywePFifHx8WLduHU5OTgQGBnL06FHc3d1ZtGgRa9asAZDSxEIIUUUUO/SIiopi9uzZ6u1PP/2UwYMHl7rDyMhILl26pNZ4kdLElYfU6Ci55/V1C/EoxY5Y7ty5Q1pamno7Pj6e7OzsUnf4xRdf8Omnn6q3pTSxEEI8W4odsYwePZpevXpha2tLfn4+CQkJzJkzp1Sdbd26lRYtWuDg4PDA+6U0sRBCFHX7RPeSNxpQ9nGURLGJpVOnTvz+++9cunQJjUZD3bp1MTIyKlVnBw4cIDo6mgMHDnDjxg309fUxNjaW0sRCCPEMKTaxDBkyhDVr1uDm5vbEnf3f//2f+v+vv/4ae3t7Tp06JaWJhRyrEOIZUmxiadSoEYsWLaJly5bo6emp20tTQfJBxowZI6WJxVMjhaeEKHvFJpbz588DEBoaqm4rbWniexUmDJDSxEII8SwpNrEUXkeiKAoajabcAxJCCFG1FXu68YULF+jfvz+enp4ALF26lNOnT5d7YEIIIaqmYhPLzJkz+eyzz7C0tASgR48efP755+UemBBCiKqp2MSiq6uLq6urertOnTpPvFaYEEKIZ1exGUJXV5fo6Gj1+EpISIh6IaMQlYGcqixE5VJsYhk/fjyjRo3iypUrtGrVCnt7e7788suKiE2UkJw6K4SoDIpNLK6urvz222+kpKSgr69PtWrVKiIuIYQQVdRDE8utW7dYtmwZly9fpnXr1vj5+cmxFSGEEMV66MH76dOnAzBgwAAuXbrEkiVLKiomIYQQVdhDhyAxMTHMnz8fgJdffpm33367omISQghRhT10xHLvtJeOjk6ZdHb79m3Gjh3LoEGD8PLyYv/+/VKaWAghnjEPHbH8e/mWsljOZf/+/bi5uTF8+HBiYmJ45513cHd3x8fHB09PTxYuXEhgYCD29va4u7vTp08f5s2bh7u7+3NTmlhOnRVCVHUPTSynTp1SywcDJCcn06lTJ3XNsAMHDpS4sx49eqj/j4uLw9raWkoTCyHEM+ahiWX37t3l1qm3tzc3btxg+fLlDB069IGliQ8ePPjA0sQJCQkMHjyYxo0bl1t8QgghSu+hx1js7e0f+e9JbNiwgW+++YZPPvmkyFX895YmTkhIYNasWQwYMIDg4GCcnZ3RarVMnTqVxYsXP1H/Qgghyk+FXpjy119/YW5ujq2tLY0aNSI/Px8TExMpTSyEEM+QYhehLEuhoaH88MMPACQlJZGVlUW7du0ICgoCeGhp4rZt22JhYUF/IUcxAAAgAElEQVRcXJyUJhZCiEquQhOLt7c3KSkp+Pj48N577zF16lTGjBnD1q1b8fHxIS0t7ZGlif/66y+GDBmCr69vRYYthBCiBCp0KszQ0JAFCxbct11KEwshxLOjQkcsQgghnn2yqmQ5kIschRDPMxmxCCGEKFOSWIQQQpQpmQoTQojHcPtE95I1GFA+cVQFkliEEKKcPW9JSabChBBClClJLEIIIcqUTIUJIZ4LJZ6Ogio/JfW0VHhi+fLLLwkLCyMvL48RI0bQtGlTxo8fT35+PpaWlsybNw+A0aNHk5aWRkBAAO7u7sDdCpJTp0595ot9CVHenrc5f1GxKjSxHDt2jIsXL7Jx40ZSU1Pp168fHh4eUkFSCCGeIRWaWFq3bk2zZs0AqF69Ordv366UFSRLeuU8yNXz4vkhox1RnApNLDo6OhgbGwMQGBjIyy+/zKFDh6SCpBAlVFWPF1TVuEXJPJWD97///juBgYH88MMPvPbaa+r2eytIbt68mVmzZjF+/HgWLVrEiBEjiImJYerUqXz44YcsX778aYQuhHiKZLRUNVR4Yjl48CDLly/nu+++w9TUFGNjY6kgKZ5L8iUpnlUVeh3LzZs3+fLLL/n222+pUaMGgFSQFEKIZ0yFjlh27txJamoq//nPf9Rtc+fOZfLkyWzcuBE7O7tHVpBctWoVQ4YMwd/fvyLDFkIIUQIVmlgGDBjAgAH3j+elgqQQQjw75Mp7USbkeIEQopAkFqGS5CCEKAuSWESlIElNiGeHJBbxXHvSC/YkIQpxP1k2XwghRJmSEcsz5HldLkNGDUJULjJiEUIIUaZkxFLJyK9vIURVJyMWIYQQZUoSixBCiDL1VBJLREQEXbp04aeffgIgLi6OwYMH4+Pjw9ixY8nJySEnJ4fhw4fj5eVFeHi42tbf35+4uLinEbYQQojHUOGJJSsri1mzZuHh4aFuW7x4MT4+Pqxbtw4nJycCAwM5evQo7u7uLFq0iDVr1gBIeWIhhKgCKvzgvb6+PitXrmTlypXqtspYnvhJyAF4IcTzrMJHLLq6uhgaGhbZdm+NlXvLE0dHRz+wPHFAQADnzp2r6NCFEEI8hkp38P7e8sQJCQnMmjWLAQMGEBwcjLOzM1qtlqlTp7J48eKnHKkQQogHqRTXsUh5YiGEeHZUihGLlCcWQohnR4WPWP766y+++OILYmJi0NXVJSgoiPnz5/Ppp59WmvLEz+uaW0IIURYqPLG4ubmppw/fS8oTCyHEs6FSTIUJIYR4dkhiEUIIUaYksQghhChTkliEEEKUKUksQgghypQkFiGEEGVKEosQQogyJYlFCCFEmZLEIoQQokxVikUoAT777DNOnz6NRqNh4sSJnDp1il27dtGyZUsmTJgAwLZt20hKSuKdd955ytEKIYR4mEqRWE6cOMHVq1fZuHEjkZGRTJw4EY1Gw4YNGxg6dChZWVno6OiwefPmIgXChBBCVD6VYirs6NGjdOnSBYB69eqRnp6Onp4eALVq1eLmzZusXr0aX19fWdVYCCEquUoxYklKSqJJkybq7Vq1anHlyhVyc3NJSEhAq9USHh5O48aNCQgIoGHDhrz99tsPfb78/HwAbty4Uap4crNSStzm+vXrpW7/tNo+zb7vbfs0+66q+0zifrb7Lsu4S6LwO7PwO7S0NEphycanaMqUKXTs2FEdtQwcOJD27dtz+PBhevbsSVRUFH379mXhwoV89913BAQE8OGHH2JjY/PA5wsNDcXX17ciX4IQQjwz1q5dywsvvFDq9pVixGJlZUVSUpJ6OyEhgbfffpv333+fqKgoLly4gJubG7m5uWi1WmxsbIiJiXloYnFzc2Pt2rVYWlqio6NTUS9DCCGqtPz8fBITE3Fzc3ui56kUiaV9+/Z8/fXXeHt78/fff2NlZUW1atUAWLJkCZ988gkAubm5KIpCXFwcVlZWD30+Q0PDJ8q2QgjxvHJycnri56gUicXd3Z0mTZrg7e2NRqNh2rRpwN0pLWdnZ6ytrQHo3bs33t7e1K1bFwcHh6cZshBCiIeoFMdYhBBCPDsqxenGQgghnh2SWIQQQpQpSSxPSUpKCvHx8cU+rqCgoNR9PEnbsmhf1ckssRCl+x6QxPIU7Ny5k7feeov/+7//e+hjcnJy+Oijj5g6dSpQsi+5J2n7pO2zs7MJCAjgs88+K1Gfhe7cucPs2bNZsWJFids+ad/3Wr16Nd9+++0TP8++ffs4ffp0hbd9mn2HhIQ81o+m8mj/PO5vePJ9/jArV65k+PDhJW4niaUChYWFcfPmTRwcHJg/fz5arZbg4OD7HpeTk0NGRgY1a9bk8OHDnDlzBo1G81h9PEnbJ22vKAqXL1+mWrVq/Prrr1y4cOGx+y107Ngxbt26xa5du7h8+fJjtyuLvguFhIQQFhbGqVOnOHr0aKmfZ/369UyaNImdO3eWeBWIJ2n7NPteunQpI0aMYOPGjeTl5ZWo7ZO2fx73Nzz5Pn+Q3NxckpKSyM7O5urVq6xatapE7XWmT58+vUwiEQ8VHR3NjBkz+P3334mMjCQzM5Pu3buTmprK/v376dixI7q6uhw6dIgZM2Zw8uRJatSoweDBgzExMWHVqlX069fvkX08SdsnbV/YNiIiAhsbGwYOHIienh4//PADb7zxRrF9HzlyhNmzZ5OcnEydOnUYNGgQqamp7Nmzh9dee61c+y4UGRnJ7NmzycjIwMLCgmHDhpGdnc2BAwfUFSEeR0pKCmvWrCEnJwcPDw8GDx7MgQMHMDY2xsHB4ZEX7D5J26fZd0pKCr/++itarZZu3brh5eXF+vXradiwIZaWlo+1z0rb/nnc30+6zx4lOjqaiRMncvbsWVJTU3nnnXdo3749U6dOxdfXF13dx7tCRRJLBQgJCSEpKYklS5bg7OysLmHTrFkzwsLCiIuLw8bGhuXLl+Pv70/t2rVZv3491atXp0ePHmzYsAE9PT1cXV1RFOW+EUR8fHyp2z5p+4iICJYvX86oUaPQ19dn5cqVtGrVildeeYXvv/+eGjVq4OLi8tB9c+LECVasWMGwYcO4desWy5Yto1+/ftSvX5/AwEDMzc1xdnZ+YNsn7bvQ+fPnmTlzJp07d8bY2JgFCxbg6elJ7dq1OXbsGDdv3qRRo0bFPk9ERAT+/v44ODhw7NgxoqOjadmyJTk5OYSFheHo6EitWrXKvO3T7Pvs2bOMGjUKS0tLNm7ciEaj4cUXXyQ2NpaTJ0/y4osvqgvKlnX753F/l8U+f5i0tDQWLFhA+/bt6dq1K/PmzcPe3p6WLVvyzz//8Mcff9C5c+fHei5JLOXk3i/hiIgIABo3boyFhQW3b99my5YtvPHGG+jo6BAcHEzt2rXZtm0b48aNw9nZmaysLM6ePUvdunVxc3Nj4cKFeHl5PfAXQ1JSEmvXri1V29K2LygoQKPRcOPGDfbs2cOYMWNo2LAh165d48KFC7Rp0wZHR0fmzp2Ln5/fQ/dPdHQ0ly5dYtiwYbi5uXH8+HGioqLo2LEjWq2WDRs20Ldv3wfGXdq+/y05OZmoqCg++OADXFxciIyM5Pfff6dfv37k5uYSHBxM27ZtMTIyeuTzhIeHY29vj7+/P/Xq1ePq1aucPHmSIUOG8Pvvv1NQUICzs/MDV+h+krZPs+/g4GBatWrFsGHDsLOz49y5c8TFxeHt7c2aNWuwsrJ66A+DJ23/PO7vstjnj7Jq1Sree+89HB0dMTY25o8//qBBgwZ069aN2bNn8/LLL2Nubl7s88gxljJWOLd/7y97Q0NDYmJiyMrKAmDUqFFERUVx4MABXn75ZerUqcOhQ4dwd3dn586dALz66qvo6uryxx9/0KZNG5ydnfnmm2+Auwe4CymKgpOT02O3/feB+JK2P3z4MABa7d23jrm5Oc2aNePEiRMADBgwgNjYWA4dOkTHjh1xcHBg0aJF9+2nwv2jq6tL7dq1iYyMBGD06NEcOXKEy5cv07dvX4yNjVmzZs0D93Vp+/631NRUjI2N1YOfEyZM4M8//+TPP/+ka9euWFtbs3Hjxoe2LzxrJjMzkz179gB3yz906NCBa9euce7cOfr06UN4eLj6Osui7dPoOycnp0hbjUbDvn37AGjbti2NGzfmr7/+IjU1lbfeeovNmzeTknL/6rxP0v552t9luc8fpaCgAK1WS7t27dTPU9++fTEyMmL//v2Ympri4+PDzJkzH+v5ZMRSRq5evcqsWbPYvn07165do6CgAEdHR+Dum2fz5s3o6emRkZHBnDlzsLa25vDhw/Tv3x8jIyP++OMPrKysiI2NpUmTJpibm5OYmEhYWBhdunTB1dWVefPmERQURHp6Ok2bNkVHRweNRkN2djbp6emcO3fukW1tbGyYO3eu+kVqbm7+2O3nzJlDeHg4q1evpk2bNuoyO7m5uURERJCRkUH9+vWxsLDg6tWrHD9+nK5du9KkSRNmzZpFw4YNWbx4MXfu3EFXV1cd6ms0Gg4ePIiZmRn29vZYWlpy7tw5jh8/TufOnalVqxbLli3jr7/+wtTUFD09PXUducftu3AfR0dHs27dOvLy8rC2tlbnsa2srNi0aRPm5uY4Ojqip6eHVqtl7dq1vPHGGxgYGLB3717q1auHhYUFt27dYs+ePWRlZWFjY6OOvlxdXdmyZQvGxsY0aNAAAwMDbt26xcWLF3n99dcJDQ0lISGBqKgoCgoKsLa2Vj/Qj9M2NTWV2rVrs3XrVu7cuYO9vX2J+o6Pj+fKlSsoioKVlVWJ2iYmJhIYGMi+ffvo2rWrGre9vT2HDh2ievXqODo6YmhoyNWrV7l9+zY9e/Zk+/bt6o+Xv//+GzMzM/T09ErU/s6dO6SlpaHRaDAzMyvRPouPjycyMpK8vLwS/63S0tKwsbFhxYoVJCQk4Orqqv4gety/dUJCAoaGhiWOu7Dv+fPnExwcXKp93rhx44dOfX/++edcvHiRO3fu4OjoiEajQUdHh8jISG7cuIGVlRU1a9ZET0+PTZs20a9fP9q0acOyZcuwsbGhXr16j/w+lBFLGdm9ezdNmzZl9erVmJubEx4eTnZ2tvorw9vbm/Xr1zNr1ix69eqFpaWluqJzixYtaNasGRcuXEBfX5/t27cD0KdPH/7++2/i4+NxdnbGy8uLqKgoIiIiuHTpktq3oaEhTZo0eWjbevXq8dprrzFr1ixef/11XF1di5yb/qj2UVFRbN26FSMjI+Lj4xkxYgS1a9cG7o52atasiZubGwkJCeoZVIMGDeLixYukpKTQoEED3Nzc+OSTT3jxxRdJSEhg3bp1at+2tra4ubkRFhbGP//8A8CwYcNISkoiMzOT1NRUbt68SVxcHBcvXmT+/Plq28fp28vLi+nTp3Ps2DH8/f3Jz89n06ZN/PTTT6SlpQFgYGCAp6cnO3bsIC4uDrj7a83CwoK0tDQaNWqEh4cHmzZtIjw8HB8fH06fPs2ECRM4dOgQWq1WrV/x9ttvqyMsMzMzDA0NycjIAMDS0pI1a9YQEhLCmDFjOHPmDDo6OuqZPI9q6+npybVr1zhx4gRLly7lwIEDpKWlPXbfTZs25fvvv+fAgQN8++23pKSkoNVq1RHso9rq6uqyfv16srKySExMJCMjQ03KpqamdOjQgV9//RWA2rVrk5eXp/7i9vHxYcOGDfj6+rJ9+3YmTJhAZGSk2r5atWqPbN+qVSsWLlzI9u3bGT16NKGhoUUObD8q7rp167Jq1SoOHTpU4r+Vp6cne/bsYezYsQB069ZN7bPws/Oo9k5OTvz444/s2LGjxHF7enqyd+9ePvjgAwwMDEhKSirxPt+7d+8Da7JcvXqVDz/8kAYNGlCvXj2WLFnC1atX1ftbt25Ndna2Ompp164d+fn5nDx5EoA5c+YU+Qw+jCSWMpCXl8exY8d46aWX0NfXx8LCgosXL2JgYIBGo6GgoICXXnoJR0dHzM3NOXHiBAcPHsTV1RW4W5Tn0KFDtGjRgo4dO7Jt2zZ27tzJqlWraNKkCYaGhuqbKjk5mczMTMLCwtQvRoAGDRrQqVOnB7aFu0kgMTGR1q1b8+KLL2JmZkZ2dnax7XNycmjSpAlvvvkmZ8+e5cCBA+p0X+GXUrt27XBxcWHjxo388ccfrFu3Dnd3d6pVq4aiKOTl5ZGZmUmXLl2oV6/efb92evfujampKevXr+fMmTNs3rwZV1dXTExMuHnzJg0aNCA+Pp7evXuTnp5e5NTHR/UNd7+4QkJC+Ouvvxg5ciTvv/8+Q4cOJSMjo8j01uuvv64eDA0PD2fnzp3o6+tTo0YNqlevzq1btzh+/DiBgYH4+voyadIk3nnnHX788UcA9e/z2muvYW5uzty5c4G7X8qFU6BRUVE4OztjaGhI7969OXbsmPqY4to2b96czMxMtm/fTtOmTalRo4Z6qvrj9J2dnU3t2rUxNjZm9uzZRUaMD2qro6NDVlYWx44d4/bt27i6uqKrq4ubmxvVq1dXv1z19fXp2LEjGo2GJUuWAGBnZ6e+N9q3b6+enDJ58mTatm3L7Nmz1f1uYGDwyPYpKSlYWFhgZmaGn58fmzZtKnIq+aNe8+XLl9Vf9AMHDizR38rBwYG///4buPtDx8jISJ2SKpwGflT7mJgYnJycMDY2ZsCAAY8d961bt7h9+zbJycmYm5szYsQI3N3dS7zPMzIy+O9//6tuS05OBiAxMZFGjRrx9ttv07lzZ9zc3Lhy5Yoal4uLCy+88AKnTp1ixYoVhISEYGZmRv369QEwMjLi2rVrHD9+nEeRqbAnVFBQgI6ODu3bt1d/ySckJJCeno6HhwdarVb98NaqVQtPT09iYmIIDw/H2tqakJAQWrVqRe/evenUqRN2dnbUrVuXM2fOEB0dzX/+8x/Mzc0pKCggOTkZe3t7/Pz82Lt3L87OziiKgrGxMTo6Og9tWxhTrVq1KCgo4IsvvuDUqVNs2bIFDw8PTExM0Gq1D2zv6OhI/fr1yczM5OOPP0ar1WJiYoKzs7OaNHV1dXF1daV69eqEhISQkpLC6NGjqV69OgA7duzghRdeICoqilWrVpGUlER8fDz169fH0NAQPT09XFxcyM/PZ8uWLeTl5fHee+9hbGzMzp07SU9PZ86cOdja2nL+/Hl27NhB9+7d1df9oL4jIyP5559/MDAw4NNPP+XMmTOcOHECT09PatWqhYGBAfv378fZ2RkLCwvg7ocqLy+PdevWkZCQwLvvvsudO3e4fPkyqampvP322+jp6WFvb4+DgwO2traEhITw6quvoqOjo05VtGrVin379vHzzz+rIyUTExMSExNJTk7m5ZdfJjg4GDc3NwoKCrCxsVFPhihs+9tvv3Ho0CF69+5NgwYNiI2N5dixY7zyyitcunRJ3Z/16tXDxMREfT8+qG9ra2tCQ0M5fPgwvr6+rF+/nrCwMPLz89X30L19//zzzxw9ehR/f39cXV1p1qwZoaGhdOvWje3bt+Ph4YGZmZnarlq1ajRs2JDVq1dz8OBBDh48SP/+/XFycuLSpUvs2rWLVq1a8dJLL9GsWTPmz5+Pubk5rq6u5OfnY2pqqrYPCgrit99+44UXXsDFxYVLly5x+vRp/Pz88PT0JCwsjKSkJOrWrav+aCqMe+PGjWzbtg0PDw/q1atHSkoKUVFRDBw4kA4dOjzyb1W4vw0NDcnPz8fAwIDz58+jp6enjvb//PNP8vLyihwYv7fv3377jXbt2tG4cWMuXrzImTNnePfdd+ndu/cj4y7sOycnB41Gg4uLC1evXsXLywsHBweWLFlS7D4/fvw4w4cPx8LCgtjYWEJCQujTpw9ZWVlMmTKFI0eOkJubi5OTE66urlhYWKDRaFi/fj2dO3fG0tKS/Px8tFotzs7O1KtXjxMnTnD69GmGDBlC3bp1gbs/jgYMGFB8WRJFlJn8/HxFURTlyy+/VDZs2FDkvoULFypnzpxRFEVRbt68qURERCj5+fnKzJkzlREjRiiKoijR0dHK2bNnFUVRlIKCArVtbm6u+n9/f3+1j5deekmZO3euoiiKcv369Ue2TU1NVWbMmKEEBAQohw8fVhRFUWbOnKmMHDnysftWFEWZO3eusmXLliL3Xb16Vbl+/bqiKIpy586d+9revHlTuXjxotK/f3/lyJEjyo0bN5Q5c+Yon3/+uaIoinLx4kUlKytLURRFSU9PLxJDbGysMn78eGXSpEnKiBEjlJ9++kn5/PPPleXLlyuKoihRUVFKTEzMfX37+voqY8aMUf7++2/1vj59+ijh4eGKoihKYmKismLFCmX16tWKoihKcnKycuPGDfW+wm19+/ZVevbsqTxIUFCQMmbMmCLbcnJylJs3byrJycmKp6en0qNHj/vabd++Xfnmm2+UX375RXnzzTeVc+fOKYqiKNnZ2cqtW7cURVGU8PDwB/Z97Ngx5aefflIKCgqUxYsXK6NGjVJ27NhRpH1h3/e2vXz5sjJ58mTl/fffV3bs2KH89ttvipeX1319PyzunJwcRVEUZdmyZcrJkycfeF92drZy4MABpVevXsrw4cPV+zdu3KiMHz9e2bp1q7JhwwblP//5j+Lj46Pen52drSiKohw+fFjp3r278s033yhz5sxRFi5cqGzevFn5+uuvlfPnzyuKoihnz55VxowZo0RFRRWJ+8SJE4qnp6fadsGCBcrNmzcf62+lKIry66+/Kn379i3S/saNG8rQoUOVfv36Kdu3b1d27Nih+Pr6KqGhoUX6/v3335UePXooixYtUj7++GPll19+UXbt2qV89dVXyoULFx4Zd0FBgTJ79mxl0KBByqpVq4rEV7hfitvnhX3826VLl5T33ntPOXjwoPLPP/8oHTp0UNLS0hRFufv5SkhIUEaPHq3ExsaqbW7evKl+DxT2X/j4kpCpsMeQl5fHmTNngEfXgs7MzCQiIoKzZ8/SuXNnFEVh7dq1REdH8+abb9K0aVMAjI2NqV+/PlqtlokTJ5KRkUFcXBwHDx6872Bb4YhAURTi4+Oxs7Nj+vTphIWF4eTkRNu2bQE4c+YMMTExwN3pDUVRirStUaMGHh4epKamqsPjyZMnP1bfhf8H6NSpExs2bAD+N4Vz6NAh9fiIgYFBkb7h7nSUrq4uLi4ueHh4YG1tTa9evdSh/9atW9VjRqampsDdaTZFUbC1tWXChAn07duXAQMG4OvrS/PmzdVTf0+cOKGeAWNgYADcnUeuVq0apqam/Pnnn6Snp2NgYED//v1Zvnw5gDpKKfx7/vHHH4SGhha5LyoqijfffBONRsPatWvv+/ufP3+e7t27q7fT0tIIDg7mzJkzXLt2DV9fX7RaLT/99BPwv7N7evbsyciRI+nbty9NmzZl27ZtAOzatUtd1kOj0dzXN9ydj79y5Qrbtm0jODiYmJgYatSoAdxdKuj06dNq3xqNRp3HNzU1xcnJiRs3btCjRw969epF06ZN1WNqhX0/KO68vDz09PTIy8sjPj5efR35+flkZGQQGBjIuXPn1NNje/fuTV5envq6XnvtNXr16kVYWBiRkZF89dVX1KxZk61bt5KXl8fmzZs5d+4cZ86cwd/fn5EjR9KpUycyMzNp1KgRt2/f5vTp0+Tk5ODm5oZWq1Xj3r17N6dPnyYsLIyRI0cycuRIXn31VZKTk9Fqter79lF/K4DY2FiGDh3K8OHD6dy5M8nJydSsWZNRo0bRp08fevbsSY8ePWjatKk6Bblr1y5OnTrFsWPH+Oyzz/jggw+oX78+ubm5tGnThry8PE6dOkV2dvZD487IyODo0aP4+fkxaNAgAPXsRH19fXJzc7lx48Yj93nDhg3ve2/C3Sm7jIwMOnTogIuLC+3atVOv7NdoNFy5cgV9fX1sbW2BuxdH3rhxg2vXrqn9w/8uLSgJmQorRn5+Ph9//DGrV6+md+/emJiYPHBHb9iwgS+//JILFy6QkpJCWloaK1aswMjIiE6dOhU59/v06dNcuHABExMTNm/eTEREBEFBQdy6dYvExEQaNGig9lM4n6vRaDA0NGTNmjU0aNCAL774AlNTU0JCQrhy5QpbtmwhMTGRc+fOqe2Ve66l0Wg0ODs7ExMTQ0xMDIaGhgQFBXHx4kV27979yL4L2wPY2Nhw9uxZbt26RcOGDfnvf//L7t27iY6OJj09HQcHBwwNDe+7mNLY2Jhly5ZhZ2eHs7MzGzZs4J9//mHbtm3k5eVhaGioti3su7C9kZERdnZ26Orqoq+vz6ZNm6hZsybNmzenSZMm91UT1dfXp3fv3lhbW7N3715sbGyws7OjefPmrF27llu3btGiRQvCw8PJzMzkxRdfxNXV9b6LKWNiYmjfvj0vvPACM2bMYMiQIejq6pKXl6cux9OvXz/Onz/PvHnzqFu3Li1btqRevXpcv379vrZ6enrk5OSQlJTEjRs3qFWrFikpKWRmZtK2bVscHR3VKYcHtddqtURFRfHLL7+QmprKiBEjcHBw4OLFi7i7u1O3bl3q1q1bpO3MmTMZMmQI1apVw9DQkNTUVGJjY3FzcyM9PZ2bN28W6ftB/erq6pKbm4uenh43btwgMDCQPn36oNVqMTAwwNHRUS28Fx0dTevWrXFyclJXbTA2NsbJyYlXXnmFl156CYB//vmH1q1bY2Njg4ODAw4ODly7do1GjRphYWGBhYUFK1asUKcjL168SFJSEo0aNSI1NZXq1avTuHFjateuTb169Yq0NTc354cffqBnz57qD5BH/a0Atb2lpaXa3tPTk7p169KiRQuysrLQ09MjKSmJO3fu0KZNGxwcHKhfvz4pKSm4urqSnJzM4sWLKSgowNTUlNzcXG7dusX169dp3LjxfXHXqVMHIyMj8vLy+PPPPzE2Nmbu3LkcPHiQ5ORkzMzMMDc3JzY2li1btjx0nxe6fPkyAwcOpFOnTpiZmWFsbKxOHTVbjvoAACAASURBVM+cORNDQ0P27NmjXudy8uRJjIyMMDU15ZNPPiE5OZmePXvSoEGDIs9b0qQCkliKlZCQQHR0NObm5vz555+8/PLLQNGdnZOTw4oVK5gwYQKdOnVi8eLFREdH88UXX9C7d+/7roKNiYnh4MGDrF69mvz8fHJycpg0aRL9+vUjNDSUHTt20K1btyJ95Ofno6urS7du3fDw8ADA2dkZNzc31q9fz4QJEx7avnAEo9Vq1S/PX3/9lYSEhMfq+16FX4z5+fnUrl2bVatWMWPGDJo3b05oaCihoaG0b9/+vtj19PQwMzPjyJEjLF++XD1YOXv27Ee2LXTt2jV2797NZ599Rs2aNdUlZx5EV1cXHR0drK2tOX36tDrSq169Oi4uLuzfv58NGzYQFxfHO++8Q82aNR+4qoCdnR0GBgZYWVnx559/EhoaSqdOndSEO2fOHM6dO8fJkyfx9PSkQ4cO6q+8h7XNz8/nyJEjrFixgqNHj7Jv3z6GDh2KtbV1kffJv9ufPHmSTp06YWRkRNOmTfHz88PBwYH8/HwaNmxYpP3D2tasWRMnJye+++47Tpw4we7du3n33Xcf2bYwbo1Gg0ajUS+YdXFxUU+nLzxmAHfPhjIzM6N27dqEhoYSGRlJ69atURSFyMhIpk2bxr59+/j/9s4zIKpra8PPIL1XaRJEsCAWrChYsURIYoslRk0xXtu1xRRNopIqYowx0cSuGFARUOxYY0cFUVEUFBAUBAFRlCoMnO8H3+zL0E3M9cbM80tnznvWOmcPZ5+99tprX7t2jaFDh2JgYIC2tjYymUx0DADnzp0jIyMDb29vbG1t0dTUZNWqVVy+fJkTJ07w7rvvYmpqioaGRo3arKwsvL29ldrq+vXrXLx4UamtFO1ek97Lyws1NTWuXbvGL7/8wpEjRzhx4gTvvfeeuGeK1GV9fX2Sk5Pp2rUrTk5OREdH8/jxY3r27Fmj34rfiUwmo3379gQGBnLt2jXGjh2Lu7s7sbGxXL16lR49etCuXbs677mCK1eucOTIEXJzc+nTpw/q6ur06tWLzMxM2rVrx1dffYW2tjZHjhyhS5cuJCYmsnbtWm7fvs3kyZOfqQRSvTxT4OwfQFZWlvTNN99IW7dulVJSUqScnBzp9OnT0v3796VRo0ZJMTExkiRJklwuF5rk5GRpwYIFUlZWliRJknTjxg1pwIAB0qlTp6Ty8nKlWGVl0tLShPbBgweSJFXEMhVaSfpPLLUy5eXlYj4nOTlZmj9/vpSTk9NgvSRJ0pMnT/6QbUmSlOYBPvnkE3Hs3bt3lWLQNelLS0ul27dvC215eXmDtZJUMZ/z5MkTKTExUcR9a4r/Kj6Lj4+X5s6dK125ckXKzs6WHj58KElSxbyOJEnivtWG4j5nZmZK3bp1k548eSL+7+PjI61cubLBWkV8+86dO1JcXJy0bdu2Z7Kt0Ofm5ko5OTl1xr2rahVzV8XFxVJaWpr4HT/LNSti8VeuXJEkqf64e0xMjDR8+HCpqKhIkqSK+bMTJ05IoaGhtWoUf1erV68W85Tl5eVSXl6eFB8fL0VERDyTtrS0VLp//760ePFiacmSJXX6W5NekiTp7t270qlTp6SQkJA69ZU5cuSImP9MSkqSDh48WKfNmJgY6fDhw+LzU6dOSYsWLRL3/ubNm9W0irklxW84KChIunjxojRw4ECl+7Rx40bpu+++E/8fNWqUlJCQIB09elTas2eP0jkVbf9nUY1YKpGSksKcOXNo27YtGhoaBAQE4OrqSocOHdDX16egoICwsDA6deokMp6gYuX23r17RZjHwsICAwMDNm3axMiRI0UWiqJiqJ6eHjKZDENDQ4yMjPjtt9+ws7PD3t4emUwmtCNGjBBamUxGSUkJN2/epHHjxiKDw8jIiICAAJo0aVKnvrS0lOzsbPGWr6Wl9Uy2i4uL2bt3L7q6uqLIXePGjfnxxx9p1aoVdnZ2GBkZUVxczOHDh3n11Vdp1KgRkiRRWlpKTk6OyD4zMTHB0tKSZcuW4ezsXKdWMdpSjCbCw8P5/vvviY+PJy4uTil0V3nEofi3ubk5GRkZbNy4ke3bt4tspIiICPz8/IiIiMDY2JjGjRuL+aiq5yktLcXQ0BC5XM6CBQsIDw8nLCxMZAw1btxYKWRZlzYmJoaUlBT27t1LVlYWxsbGmJmZoaGhUa/thQsXcuDAAUJDQ7ly5coz+33gwAGCgoK4fv06LVu2pHHjxmIk0hC/b926hb+/P3fu3MHa2hoLC4s6wySWlpY8ePCAH374gbCwMMLCwrh37x7du3ev1bbiPh4/fpz27dtz9+5dZs6cSWhoKImJibRp0+aZtIsXL+bmzZskJiby9OlTjI2NMTc3r/F+V9XfuXOHmTNnivVN7u7utdqGiqobt2/fxtbWlhMnTpCZmUlhYSFr167lzp07Nba1wqalpSWOjo7k5OSgq6vL8ePHefjwoSjCqgilK3QnTpxg3rx5PHz4kG3bttG+fXvc3d2xtbVFS0uLzZs3M2LECCRJoqCggIsXL1JeXk5KSgqJiYn0798fV1dXpfmZyuHnP4uqY6FiBXejRo3IyckhKyuLDz/8kHbt2vHw4UOlKrlRUVHs37+fc+fOYWpqirq6OllZWfz666/069ePgIAARowYQXl5OY6Ojpw5cwZra2vU1dX56KOPCA4OJioqClNTU+zs7JTmMfz9/WvVBgQEYGtry08//cT27dsZPHiwmNhT5OTXpLeyskJdXZ1PPvmErVu3EhUVhbOzs0g7bqjtmzdvsmTJEvbs2cPbb7+NoaEhpaWlqKuro6GhwerVqxk9ejSSJKGnp0dsbCzNmjUjLy+POXPmsHfvXlE1WbFaXE1NrV7txo0b6d69u/ixVw45NjR0l5aWxpo1a7CyssLX15euXbuSk5PDxo0bmTt3LhYWFhw/fpxHjx7h4uJS43kaNWpEamoqhw4dorCwEG1tbfz8/IT24cOHtGnTpl6tXC5n/PjxXLhwgblz52Jubs7x48fJzc1tsG0dHR38/PyE9ln8rknbEL9LS0sZOnQo0dHRzJ8/n9LSUvbv34+hoWG1OH9lEhMT2bVrF0+fPkVTUxNfX98G2S4sLCQ0NJQTJ05w69YtZDJZg/2urL1z5w4jRowgOjqaefPmYWZmJhaW1nbPKusTEhKeyXZqaio+Pj6cPHmS9PR0Ro0axa5duxrc1snJycydO5eDBw+Snp7OBx98IMJzChS6jRs3MnToUCZNmkRubi7BwcF4enqiqamJi4sLoaGhPH36lHbt2tGkSROMjIw4ePAgKSkpzJo1SyllGlCaT30e/KOzwuLi4pg0aRLLli3j+PHjlJaW8vDhQ5GB8e6775Kfn8/u3btJT08nJiYGX19fMjIy8PPz48CBAzRv3pwBAwbQtGlTUddKkiQ0NTUxMTHhlVdeISkpiczMTGbOnEmXLl3Ytm2bkh+Va2JV1Zqbm2NoaEhOTo54e6+60Ks2vSLenZaWxrhx4/joo4+qTXTXZdvQ0JAjR47w+++/s27dOqZMmcKNGzcAREx+9OjRAKLKqpmZGU+fPsXGxoaUlBTS0tKYNWsWnTt3JiQkROkPqi6tnZ0dpaWlJCQkiOPT09MxMzOjcePG6Orq8uGHHxIbG8vp06eBiheEqhQUFDB37lyWLVuGra0t5eXl3Lt3j7S0NBwcHBgwYAAeHh4kJCSIrLCa9rSIjY2lX79+fPnllxQUFODg4ED//v3x8PAgMTGxQdrt27djYmJCWloaTZs2fS62/2pt37592b59OxYWFuTk5GBnZ8fo0aNxcXEhKipKrPauKVvy0qVLeHt7M3/+fGG7IdesmOx/8803mTJlCvn5+Q32u7JWkX2muN8DBw5skP6VV155ZtuSJNGhQwcCAgKYPXs2q1evRldXt8G/M0mScHBwwM/Pj2nTprFq1SqxgBogIyNDZH3K5XIaN27M48ePAZgwYQJaWloi4wxg3rx5BAcHk5CQwPr162nbti1ff/01ixcvxtHR8S/fHfUfO2LJzMxk2bJlDBs2jNatW/PDDz/g6enJvn37MDAwEJkR165dIzw8HHt7e27evCnScxWL+CwtLbG2tsbAwABXV1eOHj3KxYsXOXjwINHR0WL1eXl5Oc7OztjY2HDv3j3atm2Lrq6u8Kdly5ZK2uzsbDIzM7l79y69e/cmKyuL5ORkpk6dSkhICB06dMDExESMWqrqL126hIaGBqamply8eJF+/fphaGhIdHQ02traGBoaiod8Tbbv379Pbm4uw4cPp2PHjjRt2pSIiAjatGmDpaUlkiSJ4XPLli3ZsmULDx484MCBA8THx6OhoUFZWRmJiYm4u7uLlcyurq5KGWtVtaWlpXh4eKCuro6lpSX5+flYWlqKOlENCftV7rzMzc1FXTO5XC4m9U+ePIlMJqN58+YYGBiIzqZ9+/ZKk+hFRUVoaGjg5OSEk5MTlpaWnDhxQiRC1KWVy+VIkkSLFi1wdHREJpNhbm7O6dOnUVNTq9f2gwcPRD2pZ7X9Z/0+efIkAwYMEH4bGBgQGxuLvr4+dnZ2GBsbExMTg0wmw8nJSalciSJrzsXFRSQWKGzXd82K37NioWHltqrP79LSUlHlwtnZGZlMhoWFBadOnWqQ7WvXrmFpaYmHh8cz21ZEAHR0dMTiQ0VbN+R3ptAbGBhgZWUF/Cc8lZqaypQpU4iOjuaNN94QCQVyuRwbGxv09fUxNTVlw4YNDBs2jEaNGmFlZcWqVas4cOAAPXr0oGPHjkpLB573CKUq/9iOpaioiK1bt/Lxxx9jZ2dHfn4+ycnJtGjRgpCQEF577TU0NTVJSkoiLy+PiRMnYmVlRV5eHkuXLhWVb/X09NDS0hI/qG7dulFcXIyJiQmurq7Y2trSv39/+vTpQ1FRET4+Pujr67Nv3z60tbVFemlV7bx587hz5w6Wlpa0adMGOzs7Dh48SMeOHTE2NhbzHYohbU227e3tad++PatWrUJLS4tDhw6Rnp7OoUOHxB9LbbZTUlIwNTUVKaEymYyIiAhSUlLo1q2bUnzYysqKzp07k5mZKWzb2NjQv39/goODSUtLY9GiRbRs2ZL9+/djaWkpqhRU1X7yyScEBwezZs0akpKS8PDwwNDQUCn+W1/oztHRER0dHX7//Xf8/Px48OAB+vr6mJubi45HkiR+//13+vbti6GhIQ8ePOD27dt069aNx48fs3DhQoKCgrh165aYE1I8MIE6tf7+/hQWFvLdd9+RkJCAiYkJjRs3Ri6Xi/mQuvSbNm1ixYoV5Obm0rp1a6Xspfpsz58/n6CgINLT00UhwcrU53dJSQmLFy/ml19+YdCgQSK2r8gGTEhIoHPnzpiZmZGUlERKSgo9evQgIyOD5cuXExISQklJCdra2piYmIiHY322Fy5cyM6dO3n06BH6+vqYmZkJbX33y9/fn7y8PBYtWkR8fLy432VlZWKuri79V199xY4dOygsLKRTp07CZkN+Jwrb33//vfidKSplNMS2v78/Xbp0qfFBr/jsypUrmJmZkZqaKl7G1NXVOXnyJObm5lhZWWFvb8+hQ4fIy8ujdevW+Pn50aJFC5YuXUqnTp2Uzvu85lHq4h8ZCpMkCR0dHbp168apU6cAGDNmDCkpKbRo0QInJyd++eUXkpKSiIiIIDMzEwMDA7p168bkyZPFwj4jIyPR+Hfv3uWnn35CX1+f1157jdGjRxMbG6tU/trGxobNmzfj6+vLgAEDxCZfKSkprFixQkkLFYseFQXiFLWDFAUkjxw5wt69e4GKScOq+tjYWLFocfr06YSFhTFjxgy+++47Bg4cyI0bN0hPT6/V9rVr14RtxZC9b9++yOVy8vPzxTUlJCSwbds2mjRpwujRoxk/fjyxsbEiPr169WoMDQ0JDw/H19cXNzc3zp49S15eXjXtuHHjuHnzJufPn2fx4sVi8eCZM2eEvbpCdxYWFhgaGnLv3j1OnTrFli1bmDhxIgALFixQelNT1F9ShBXd3d3FIrxbt24RExNDv379sLe356OPPgL+syC0Lq2ZmRmxsbH8+OOPzJgxA3NzcxYtWtRgvWKxY2FhIffu3VNaMFmf9unTp9y4cYNu3brh4eFRLdxRl1ZDQ4Pjx4+zbNky3n33XRYuXCgWqwIYGhri4uJCcXEx+/fvB2DQoEFERkZSUlLCuXPnOHXqFB07dqSsrAxfX1/gP+HaumzHxsYSExMjyoQo2qohWjMzM+7evcvKlSv59NNPRXvJZLJ673dBQQE//PADp0+fpnv37syYMUNoGnK/zczMSE5OZuXKlX/od2ZmZoYkSZw8eZK6UFNTw8vLi3feeYdt27Yhl8tp164dzZo14+LFi0RHRwMVJfTt7e3R0NBgypQpzJs3DyMjozoXdf9VvPQjluTkZH744QeKiopo1KgRpqam4keTmJhIdnY2dnZ2mJiYiHLoX375JRkZGQQFBYkKssOGDRNvzTt37qRRo0Z4enpy7do1oqOjadmyJQ4ODkr1fLS1tdm1axfe3t7i7UUul6OpqYmpqSm7d+9m8ODBIu+/shYqctXDwsLw8vLCwMCAdevWsWHDBvLy8vD29hZltg0NDavptbW12b17N15eXjg7O7N3715RG8nMzIw9e/bw+uuvY2xsXKdtb29vkXOfmppKUlIS7dq1E2G84uJi7O3tlbLktLS0hG19fX02bdqEkZERrVq1wsTEhLCwMAYPHkxZWRl2dnYYGhqyYcMG0tLSKCoq4sKFC7z99tu0bt2agwcPcufOHZo1a4aRkRFQc+iuX79+6OjoYGFhQW5uLsXFxWRkZPDOO+/g6urKiRMnyM7Opm3btqipqaGvr0+TJk1YsmQJNjY2HDhwAB0dHdzd3YmLi6OoqIhevXoxaNAgDh48iKOjowhRGBgYKGnDw8PR1tbG3d1dLEI0MzMTWzrr6enRvn178QZcVa+w7ebmhp6eHvb29oSHh+Ps7CxK81cuqlmb35mZmZw8eZJRo0bRrl07nj59iiRJYsFqXX4/fvwYR0dHPD09cXNzY/ny5bi6utK4cWMxylMkrKxcuRIHBwcOHDiAlZUV3bt3F1tFvPbaa/Tu3Zt169aRl5cn3par2laUdu/duzcZGRnk5ubSr18/vL29622r8PBwNDU1cXBwEO2dlpbGhAkTaNeuXb3ttW/fPmQyGX379hWVC3r27Im5uTnx8fFoaWmJdSK12XZychKjquTkZKZMmUKHDh0a5LuivbS1tWndujW5ubnY2NjUOpKoujYoISGBrl274uDgQG5uLoGBgURFRXH+/HlGjx6NsbGxWBiqSJT5b/NSdyxXrlxhwYIFuLu7U1BQQFBQEB4eHujo6Ii3IcVDpFWrVrRq1YpVq1bRp08fPDw86NKlC97e3qSmplJSUiLmXRITE9HV1eXw4cPs2LGDXr164ejoKB58ih+IpqYmCQkJlJSU4OTkRFpaGgsXLqRZs2bizaNXr17o6upW01bVN2/enMLCQgYOHMi0adPo1KkTMTExYoK/JtuKFMvmzZvToUMH1q1bh62tLaGhocjlcnr27Imenl69thWVTS0sLNiwYQPW1tY4ODgAFW+ylTsVqOhYKtvW1NRk48aNuLq6smPHDjQ1NXF3d8fU1FRob926haWlJT179mT79u2ioqqiiqpcLheTmZVDd8bGxnz22WccP36cdevWMXz4cGxsbEhLSyMvLw9ra2uMjY2xt7cnICCA7t27Y2BggEwmw8TEBGdnZ3bv3s358+dZuXIlBgYGGBgYiPmFhw8fEhkZydixY5XmEUxNTWnZsiWJiYlcvnwZNTU1hgwZIu7doEGDSElJ4d1330VfX5+tW7fSv39/UfG6JtuKbDtdXV0ePHhA7969iY+PR01NjUaNGqGvry/StWvyW11dnZSUFEpLS/npp5+Ii4tj7dq1vPHGG8JuVb9lMhlDhgzByMgIBwcHbG1tkclk5OXlkZ+fr7QHibq6Oq+88orY+OzixYssW7YMbW1tbt++TW5uLra2ttja2pKcnCz281GU2jE1NaVVq1YEBQVx5swZDAwMMDU1FXNRinnDmtpKoU1ISODcuXM8fvyYy5cvo6Ojg7OzM6NGjUJTU5OcnJxa26sm287Ozty8eZO4uDgx2b127VoGDhyIrq5ujbafPHlCTEyM2K+krKwMGxubWn9nlfW5ubl88sknYjSoqamJra2tmBOrmgJdGTU1NSwtLfH39xcvlHZ2dnTo0AEjIyO++OILMeJV8N8Ie9XES92x3L17F5lMxuTJk3F1deXGjRscOHBA1AyytramoKBAlFNJSkriyZMnvP7662hoaKCnp4euri7Z2dlcvnwZZ2dn9PX1xRyAt7c3Pj4+taZcKh4Qly9fpnXr1tja2pKRkUFkZCQGBgYsXLhQ/NHVpb906RLt2rXDzc1NPNABevfuXS2GXlV75coVWrVqhb29Pc2aNRObTC1YsKDG1bu1+a6npyf2Dqn8kK/PtouLC66urhQXF3Pq1CmMjIz47LPPqm27qsgMU6wZ2rVrF9u2bRNVhPPy8ujYsSOJiYls2bKFnj170qJFC9q0aQMgSu5raGjg4uKCXC4nIiICMzMzbG1tsbKyIioqihs3btCrVy9SUlKIj4/Hzc2NyMhIMjIy0NPTE6uoFW98OTk5nDhxQow6oWLkduPGDdzc3OjYsSMxMTFcu3ZN2DYzM0NdXR1jY2MGDRrEyJEjuXDhApGRkfTp04fk5ORqtvX19UVp+oKCAkJDQ3nvvfeIiIjg119/RZIkPDw86vQ7Pz+flJQUbty4wfTp0xk3bhwxMTFcvHiR3r17K9mt7LempiatW7dWmg85d+4cjRs3xtHRUcwtJSUlkZOTQ+fOnYmOjub+/fvo6uri7OyMgYEBWVlZBAUFsX//fpo3b05RURG3bt3C3d2dlJQU4uLisLOz48yZM6xZswZDQ0MuX76Mra0tsbGxdbZVXFwcbm5u4rslS5agp6fHhQsXaNeunUjQqKu9Kts2MDAgMjKS1q1bo62tTUREBPPmzWPMmDHcunWLqKioavespKSExMREfvjhBx49ekRERAQDBgwQcx31+d6xY0d69uwpNjqTyWTs2bOHJUuWcPnyZaytratlbVal8tqgyMhIrK2tadu2rXj5q9yGL5IX78FzpGpMOTMzU2mLzo8++oiYmBguXbokPhs4cCATJ07k5s2bnDlzhsmTJyvtba6mpoanpycGBgZim9uPP/6YY8eO8fbbbwO1F6ZUU1Ojb9++Stp//etfzJ8/X2wgVHnDrdr0hoaG/Pjjjw265ppsr1ixAoDOnTvz1ltvMWPGjAbbrqwHGDlyZK170NekXb58OVCxsdHChQuZOXNmjba9vLw4efIkJSUleHt7s3r1alauXMmcOXPo1q2b2LukSZMmYmSwYcMGQkJCSE5OxtDQkNmzZxMSEkJeXp7Y9+XixYvcunULqNhX4969e0iSxMaNG7l8+TIpKSno6+sze/ZsgoODKSwsVNoM6ty5czRp0gQNDQ3S09NJTU0V2x+HhoaSnJws9CEhIeTn54uQl/T/OycCfPrpp0RHR1NSUoK/v38129u3bxd7eRQVFWFmZsb06dOJjo6mS5cuItFi8+bNtfrduHFj2rRpQ1lZmSg2+Nlnn3Hx4kVKSkrIysqq0e/g4GDy8/OVNh2zs7MTBSwV8w6rV6/m6NGjJCcnY2BgIK5ZEa4bP348CxcuZM6cOUyaNInJkyeL+biNGzdy/fp1rl27hpqaGqampvTr148LFy7g5ORE27ZtiYqKqrGtMjIyOHPmDHv27OHq1ati1DZgwABiYmLEfautvTZs2FCj7UuXLiFJEq+++irz5s0TEYmZM2eKe5aZmcnp06fZt28fDx8+xN7eHgMDAzp16oSenh42Nja0atWKK1euiP1WqvquyGpUoAhPJSUlcfDgQXx8fGjXrh0bN24UG9bVRmJiIteuXUNbW5tx48ZVK19feZT2InlpRiwbN24kKSkJFxcX8TbQtGlTVq5cSfPmzbGxsUFNTQ11dXUOHTrEq6++ysOHD0WBODc3N7y8vMSEWuUfgq6uLq6urqxfv16krCqqmKqpqdX5hlBZW1ZWRnl5OdbW1uLBWt/bhUK/bt06ysrKkCRJvJ1B3UPdqrYVWkVn9EdtK/TPYru8vBwbG5tar1sRulOE/eRyOYGBgchkMgIDA2nbti0dOnQQNcegInHA2tqa9u3b4+rqSosWLYiNjeXKlSv06NEDe3t74uLiOHv2LE2aNGHv3r1YW1uLnSydnJyqaa9evSomvtXU1Dh8+DDdunUjMjISX19fWrVqhYeHB2lpaTXaVujlcrmwraurS1hYGEZGRvTr169W2zExMXh4ePD06VMOHz5M9+7d8fHxQUdHh+vXr+Pk5EReXl6NWsU1m5ubI0kSZ8+excLCgv3792Nqakrfvn1p0qRJvX4rJp6bN2/Ovn37RFVqqKgw4ejoWKvfUBFyunHjBpIkERgYiL29PR06dCAzM5NmzZrh6elJ3759RT23s2fP8uqrr+Lk5FRrW9nZ2ZGamoqVlZXQa2ho0KhRIzFq0NfXRyaT1dheiuzLqrYjIiLo37+/qGBw48YN5HI5O3fuxMTEROmeWVhY0L9/f9zd3bl9+zY+Pj7o6uoSHh6Oo6MjRUVFnDt3Dhsbm2q+V40qKJ4vCQkJ/P7770ycOJE2bdpw7949kpOTMTc3x9TUtMbRx9GjR3F2dubzzz8XVYn/F3lpRiz6+vpiGKmmpkZJSQlaWlqMGTOGlStXijfQjh07is7j0qVLokS1IiRVW4loXV1dli5diomJCatXryY/P18UoasPhdbY2JhVq1aRn5//TOUTKut//fVXpaysP6KtqRzFX6F/luu2sLCgdevWXLhwgczMTDQ0NLC0F/pSIgAAEd9JREFUtOTo0aOYm5szadKkan9ksbGxYvc7Rfu98847REVFER8fj6mpKWPHjsXNzY1ff/2VgoICxo8fX6f2woUL3Lp1S7z5FRcX8+GHH5KcnMxvv/1G796969XfvHlTxN7T09OZP38+OTk5YpRamzYyMlL47evry1tvvQVA165dmTp1KhYWFvVes76+PsOGDcPb25t9+/Zx//59pk+fXu89U1y3urq66Py9vb3FIrz6/FaMNAoKCsjKymLhwoXIZDJRteL69evcvn1bSfvkyRMKCgrQ0NDAxMSEd999t8a2qkuvyM5U/KaKi4uZPXu2UnvVp4WKJJ/z588za9YscnJyar1n8J/sziVLluDh4cG9e/fw9PSkc+fOrFmzpprvCo4fP670fxcXFxwcHMQoRbEkIS4uTikrDv6TmTlq1CgGDx4M1L2FxwvnzxQa+19i/vz5YtOmygUiHz9+LE2dOlVavXq1lJaWJv3222/SwoUL/5Styuf/b2pfdtvZ2dnS0qVLpc8//1x8VrkoXtUCeWfPnpUmTJggClYqvt+0aZP02WefSXfu3JF+//13SZIkpU2fysvL69R+/vnn0p07d6Rjx45JR48elRITE6v5UJd+3rx50t27d6Vjx45JkiSJIp+K4xri97Fjx6oVL63P788++0xKSUkR11xZ3xC/K193WVlZtU3enuV+Vy7uWZPfkiRJW7duFZu9Xb16VWxYVrWtarJdk37fvn3SkSNHqrVXfdrY2FhhW7HZW1337OnTp8LH9PR0acqUKVJBQYEkSZIoGlnZ9zNnzkiTJ0+WWrZsKYqfSlLFs8nf319atmyZOLe/v7/k5+cnzh0cHCz9HXlpQmGV02PV1dVJSkpi2rRpaGpqMmzYMLKysti0aRM5OTlMmDBBhAz+SNbEn5kc+7MTay+z7ZpCd4pte2s6R9XMNUVbGhgY8N1333H+/Hk8PT1F2XX4z6rjurTffvstERERDBo0CA8PD0xNTcWcicKHhtpW7DNfWd8Qbf/+/UWJdAUymaxe7YULF+jXrx+2trbijbfyuor6rlthu0mTJkKj+Dt5lvuto6NT5zWDcnblzp076dWrF3Z2dtXaqia/q+pDQkLo378/PXv2rNZe9WlDQ0OFbUWljLraunJ2Z1BQEHK5nB49eqClpaUU+cjPz2fJkiWcPHmS999/n/bt24vkEKgYPcnlcm7dusWjR49o1aoV1tbWrFu3jqFDh1JeXk50dDSFhYViru5vwwvpzv4C7t+/L3311VdSeHi4JEmSFB0dLR04cEDpmIyMDPHvZ91qU8V/j+TkZOnAgQPS5MmTld5eq1JWViYFBwdLPj4+YsuC6OhoacyYMdKaNWvqtPFntC/S9svgt2JU8OWXX0ouLi7Sli1bnsnvZ9U/T9uZmZmSJEnS2rVrpa+++kpavnx5teMLCwulyMhIKT09Xal8/fvvvy/FxsaKc0qSJBUVFUmnTp2SXn/9dSkiIkJatmyZtHjxYjHazMnJEdtH/52QSdJfXI3sv0R5eTk7duzg+vXrTJs2TSltT1HaofKx/wspeSrqpmq71cSDBw/YvHkzjx494ttvv6W8vJzc3FxMTU3rPcef0b5I2393v3Nycli0aBEFBQWiFtyz+P1H9M/L9sOHD/nuu+8A5eeI4t+7du0iODiY7t27M336dKWISGBgIPr6+jVmVR47doyrV69SUFDAnDlzxALkkpKSaqn5fwdemlCYTCbD0tKSuLg4Tp06haenp/iuaifyohYNqXg2niV8tm7dOuRyOeXl5Tg4OFBWVqZUz+x5a1+k7b+7338ku/LP6p+n7ZqyO8vKyvj3v/9NbGwsfn5+9O/fXzxnpP8PJda3NsjNzY3evXsrrXP5X0kfflZemo4F6k/NVfFyoqGhQdeuXcnPz2fLli306dNHbHf7V2pfpO2XyW8dHZ0/5XdD9c/TdmBgIH369EFLS4v169dTWFiIkZER9+/fZ8yYMaSmpnL06FEMDAxEpY/MzEy2bt3K8OHDlRaiWllZYWxsrNQR/d0jKi9NKKwyitWuu3fvZunSpWKyTMXLT0PCZ3+F9kXaVvn9Ym0HBQVhZ2eHh4cHQ4YMwcXFhSdPnmBqakp2djZ9+/Zl1KhRALz//vuMGDGC11577Q/b/jvwUo1YFBgbG9O8eXO8vLzqLFui4uXjZc6a+yu0L9L239Xvqvrt27dTXl5Ohw4dsLa25uTJk3z55ZcMGTIEDQ0Nrl+/LgpJlpWVUVBQQNu2bf+U/f911F+0A38lf9f4pAoVKv4+eHl5sX79esaOHUufPn2ws7PD1tYWAFdXV/bs2SOSGwYPHqy0wdfLyt87kKdChQoVLxhHR0fs7e05evQoUFFnbcOGDSQmJrJp0yb09fXFJL+iU3kJZyCUUHUsKlSoUPEnUJQjioyMJCsrC01NTdTU1Fi/fj3a2tr4+flV21riZc9MfSkn71WoUKHiv0lN61xKS0vFCOWftnZO1bGoUKFCxXOgsLCQd955h+HDh+Pi4kL79u1FheKXfYRSlX9OF6pChQoVfyGVK6Arqnk3atToH9epgGrEokKFChXPnT+7TubvjqpjUaFChQoVzxVVKEyFChUqVDxXVB2LChUqVKh4rqg6FhUqVKhQ8VxRdSwqlLhw4QJjxoz5r9tdv349zs7OZGZmKn0+a9Yshg0bxv3799m7d69YwfwslJSU4ObmxsKFC+s87uTJk+Tm5gLw4YcfVvOlMp6enty5c+eZfWkI8+bNIyQkhOzsbGbOnAlAZmam2Bu9Mjt37iQkJKTB546MjGT06NGMGzeOcePGkZqaCkBMTAxvvfUWY8eOZeLEiTx8+PD5XMwzUrmNx48fT1lZGStWrODHH398If6o+GOoOhYV/xPs2LEDJycndu3apfT54cOH2bZtG1ZWVqxYseIPdSxHjhyhcePGhIeHU1xcXOtx/v7+PH78GIAff/zxhW+5YGFhwc8//wxUdPjnz5+vdszw4cMZOXJkg85XXl7OnDlz8PPzIzAwkIEDB/Lrr78CFZ3Z559/zpYtW/Dw8HhhD/LKbRwQEPCPzqz6O/NSF6FU8ccoLy/Hx8eHuLg4NDU1WbNmDXp6eoSGhhIUFISOjg5mZmZ8++236Ovr07JlS65fv466ujo7d+4kIiKCpUuX4unpiZeXF6mpqfj6+vLRRx/x5MkT5HI5ffv2ZerUqQBER0fz9OlTvvjiC7755hsmT54MwBdffEF5eTkTJ07Ezs6OO3fu8N5777Fy5Uri4+P55ZdfkCQJdXV1vvnmG+zs7JRsKh7KoaGhvPfee4SEhHDkyBHeeOMNoOKNuFWrVsTFxeHl5cXFixf5+OOP8fX1ZdKkSWzatAk7Ozu+/fZbYmNjgYqy515eXkr3a9myZVy6dIni4mK6dOnCp59+qrR2ITMzk48//hiA4uJiRo8ezYgRIxg/fjytW7cmISGB7OxsJk+ezOuvvy50aWlpvP3222zZsoXly5cjSRLGxsa8//774pgVK1Ygl8v58MMP6dSpE1OmTOH06dNkZ2ezfPlyWrZsKY5VU1MjPDwcAwMDAMzMzHj06BFpaWk8ffqUdu3aARVFFRVl3isTExPDggULMDIyonv37vzyyy/ExMSwatUq4QNUjOY2bdqEhYUFc+fOJTc3l4KCAgYNGsSkSZO4cOECa9euxcrKisTERNTV1Vm/fj3r1q1TamM3NzeuX7+u5MP58+drbPelS5dy/vx5NDU1sbS0xM/P72+58+LLgmrEoqIaSUlJzJgxg+DgYNTV1Tlz5gzp6emsWLECf39/AgICsLa2xt/fv95zNW3alJ9//pmIiAjkcjlbt24lKCgIXV1d8WYaGhrKsGHDcHd35+nTp0RHRwOI0hj+/v74+vqKf2tpaeHj48OKFSsIDAxk3LhxLFmypJpNqHg4X716lUGDBjF8+HB27typ5J+uri6BgYGMHTsWCwsLli5dipOTk/h+z549PHjwgODgYNavX09YWBhlZWXi+/DwcDIzMwkMDCQ0NJS7d+9y/PhxJRvh4eE0a9aMgIAAAgMDlUZNcrmcjRs3snLlShYtWlTjiMzOzo5hw4YxePBgpU6lKvn5+bRo0YLffvuN1157rcYQmaJTKSkpwd/fnzfffJOsrCzMzc3FMebm5mRnZ1fTLl68mNmzZxMQEICTkxNyubxWXwBycnLo168fAQEBBAUFsWbNGvLz8wG4cuUKc+bMYfv27aipqXHmzBkR9vP398fY2Lja+YqKimps98ePH7Nlyxa2b9/O1q1bGTBgAA8ePKjTNxV/LaoRi4pqNGvWTDxorKysePLkCTdu3MDFxUVsmta1a1eCgoLqPVeHDh0A6NixIz///DOzZs2id+/ejBw5EjU1NfLz8zl06BB79+5FTU2NoUOHsnPnTjp16lTrORVv+DNmzAAQ2+JWtQkVcxADBw5ET08Pb29vfH19SU9Px8bGRvhVF1evXsXNzQ0AQ0ND1q5dq/T9hQsXuHLlCuPHjwcgLy+PtLQ0pWN69uzJ1q1bmTdvHr1792b06NHiux49egBgb2+PTCYjJyenTn/qo1u3bgDY2NjUOgeUn5/PtGnT6NWrFwMGDODSpUtK3yu20q1KfHw8Xbt2BcDDw6NeX8zMzIiOjiYoKAgNDQ2ePn0q5rAcHR0xMzMDwNbWVnxeF7W1u5GRET179mTcuHEMGDAAb29vrKys6j2fir8OVceiohoNiWvX9vApLS1V+r+iCJ+ZmRm7d+/m8uXLHDt2jDfffJOwsDDCw8ORJIlp06YBFW/SWVlZzJ8/Hx0dnRpta2pqYmNjQ0BAQI3fVy78FxYWhoaGBkOGDBHfhYWF8e9//1vp2NqQyWR1zutoamoyatQoPvjgg1qPcXR0ZP/+/URFRXHw4EE2b94sOuXK567tnj4LlduuprXPhYWFTJgwgSFDhjB27FgArK2tycrKEsdkZWXVO79UuaBiVZ9LSkoA2Lx5MyUlJWzbtg2ZTCY66Kp+NpS62v3nn38mKSmJkydPMm7cOFasWIGzs/Mz21DxfFCFwlQ0iDZt2nD9+nURyoiIiKB9+/YA6Ovrk5GRAVS8wdfEmTNnOHHiBJ06deLTTz9FV1eXnJwcQkNDWbRoEbt372b37t2Eh4fTvn17Dh06VO0cMpkMuVxO06ZNefToEbdu3QIgKiqK7du3Vzv+7Nmz6OrqcvjwYXH+1atXExYWVuNDV3H+ynTo0IHTp08DFW/6I0eOFA9OgE6dOnHkyBGhW7lyJSkpKUrn2Lt3L9euXcPd3R0fHx8yMjLE8YoJ+eTkZNTU1MSGUA3x7Y/w9ddfM3jwYNGpQEXHYmhoKEKQe/bswdPTs5q2efPm4pjK4T59fX3u378PVIwqFBllOTk5ODo6IpPJOHbsGMXFxUr3ribqus7a2j01NRV/f38cHR2ZMGECAwYMID4+vqG3RMVfgGrEoqJBWFlZMWvWLN5//300NTWxsrJizpw5AEyaNIkPPvgAe3t7WrVqJTqZyjg4ODBv3jzWr19Po0aN6NGjB0VFRdy7d4/+/fsrHTtmzBh+++03hg4dqvR5z549efPNN1m1ahXff/89X3zxBVpaWkDFA7MqoaGh1VKnO3bsiJ6eHlFRUdWO79GjB1OmTMHPz0985uXlxaVLl3jrrbcoKysT169g4MCBXLlyhbfeeotGjRrRunVr7OzslM7r5OSEj48PmpqaSJLEv/71L9TVK/705HI5U6dOJS0tjQULFtRaWr1z5858+OGHaGhoMHv27BqPqY8HDx6we/du0tLSRMdtYmLCzz//zOLFi/n6669FaKnyPVDw6aef8tVXX7F+/XoRcgMYNGgQO3bs4O2336ZNmzZijurNN99kzpw5nDlzhn79+vHGG2/w8ccfM3fu3Fp9rNzGVdHW1q6x3S0tLblx4wYjRoxAT08PIyMjpk+f/ofukYrng6pWmAoVL4jx48czdepU3N3dX7Qrf4jK2YAqVFRGFQpToUKFChXPFdWIRYUKFSpUPFdUIxYVKlSoUPFcUXUsKlSoUKHiuaLqWFSoUKFCxXNF1bGoUKFChYrniqpjUaFChQoVzxVVx6JChQoVKp4r/wcbR9cxcoBXJwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# function to plot the categories across a quantized X variable\n", "def stacked_bar_plotter(x, y='recs', q=20, df=X):\n", " '''Create a percent-stacked-bars plot of y over the quantized x variable'''\n", " plt.style.use('seaborn-white')\n", " df_s = df[[x, y]]\n", " label = x + ' split in ' + str(q) + ' quantiles'\n", " df_s[label] = pd.qcut(df[x], q, duplicates='drop')\n", " # prepare percent categories for each bar\n", " grouped = df_s.groupby(label).recs.value_counts(normalize=True)\n", " df_s = grouped.unstack()\n", " catlabels = ['None', '1 or 2', '3 - 8', '9 or more']\n", " df_s.columns = catlabels\n", " df_s['labels'] = df_s.index\n", " df_s['xticks'] = df_s.labels.cat.codes\n", " fig = plt.figure()\n", " ax = plt.subplot(111)\n", " for i in range(4):\n", " bot = df_s.iloc[:, 0:i].sum(axis=1)\n", " ax.bar(df_s.xticks, df_s.iloc[:, i], bottom=bot)\n", " box = ax.get_position()\n", " ax.set_position([box.x0, box.y0 + box.height * .1,\n", " box.width, box.height * .9])\n", " plt.xticks(df_s.xticks, df_s.labels, rotation=30)\n", " plt.yticks(np.linspace(0, 1, 11),\n", " [str(x) + '%' for x in range(0, 101, 10)])\n", " plt.xlabel(label)\n", " plt.ylabel('Percent of cases')\n", " ax.legend(loc='upper center', bbox_to_anchor=(0.5, 1.1),\n", " labels=catlabels, ncol=4, fontsize='large',\n", " markerscale=1.5)\n", "\n", "stacked_bar_plotter('hoursAfterArticle')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "More than half the comments submitted in the first two hours ended up in the highest category with 9 or more upvotes! This is in contrast to less than a quarter for the ones submitted after just 15 hours. Here are the means for the original variable, **recommendations** as well:" ] }, { "cell_type": "code", "execution_count": 66, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [], "source": [ "# quantile plotter \n", "def quantile_plotter(x, y='recs', q=20, df=X):\n", " '''Create a pointplot of y over the quantized x variable'''\n", " df_small = df[[x, y]]\n", " label = x + ' split in ' + str(q) + ' quantiles'\n", " df_small[label] = pd.qcut(df[x], q, duplicates='drop')\n", " sns.pointplot(x=label,\n", " y=y,\n", " data=df_small,\n", " join=False)\n", " plt.xticks(rotation=30)\n", " \n" ] }, { "cell_type": "code", "execution_count": 67, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('hoursAfterArticle', y='recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "It depicts the average number of upvotes (with error bars) per unit of time after the publication of an article. Each unit of time here is based on 5% of the original observations of hours after the article was published before the comment appeared. The trend is clear - the sooner, the better. \n", "The highest average category (ie highest number of upvotes) happens within the first two hours and then total upvotes gradually decline. There is an uptick around X hours - this is due to the fact that most articles appear online slightly before midnight US Eastern Time, while most people wake up and read articles about 9-11am, so it simply reflects peak readership volume, occuring 9-10 hours later. There is not much point in posting comments roughly two days after the publication of an article." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Commenting on comments doesn't bring in the upvotes**\n", "The next one is about replying to other users' comments and it also surprised me. Unlike Reddit, where replying to the top comment is often the only way to be noticed, at the Times reply comments rarely attract many upvotes. The feature about replying that we used, **depth** ('1' is original comment, '2' is a response to a comment, '3' is a response to a response and so on) is the strongest predictor of a comment's popularity category. We even tried adding a feature based on the number of upvotes of the original comment, guessing that replying to popular comments should result in more upvotes, but it didn't really make any difference. A simple mean conditional on the level of depth speaks volumes here: original comments have a mean of about 25 upvotes, while level 4 responses are at zero:" ] }, { "cell_type": "code", "execution_count": 68, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "depth\n", "1.0 25.883516\n", "2.0 3.008920\n", "3.0 0.675810\n", "4.0 0.142857\n", "5.0 0.000000\n", "Name: recommendations, dtype: float64" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.groupby('depth').recommendations.mean()" ] }, { "cell_type": "code", "execution_count": 69, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('reply', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "And a simple linear correlation between the number of upvotes of the original comment and the reply comment shows us there's nothing there (certainly no linear relationship):" ] }, { "cell_type": "code", "execution_count": 70, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "-0.03" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.reply.corr(X.recommendations).round(2)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**There is an optimal article length for getting upvoted comments - at about 800 words.** \n", "Apparently the ideal zone between too short and too long for most readers is somewhere around 800 words. We should keep in mind, though, that this is where most articles are in terms of length anyway. Looking at the plot below, clearly articles below 500 words are just too short. I am surprised there is such a thing as too short an article, but anything less than 500 simply doesn't do it for many readers.\n", "\n", "There are several interpretations here. One is that readers who read longer articles tend to be more likely to comment and respond to comments - perhaps they have more time to spend on the site; the 'newspaper junkies'. Another is that articles below 500 words are simply too short to be worthy of much of a response - there isn't all that much to say about them. Finally, it is possible that longer articles simply attract more readers which translates into higher overall upvote counts. I would not have guessed that in general about online articles, but this is the NYTimes, so perhaps the attention span of the average reader is longer than usual. \n", "\n", "Also surprising is the fact that on the opposite end of the spectrum, comments made to articles over 2000 words rake in higher counts on average than anything in between. There is still an audience for the long newspaper article and it does not shy from participating in the communal conversation." ] }, { "cell_type": "code", "execution_count": 71, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('articleWordCount', 'recommendations', df=df)" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "And a jointplot to show the histogram of each along with the joint distribution. The darker the color inside the plot, the more cases that occur there. The outside plots simply show the histogram of each of the variables. There is a large number of articles at about 700-800 words, which is why we see the spike (dark vertical column) in the hex plot. " ] }, { "cell_type": "code", "execution_count": 72, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 72, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "X_plot = X[X.recommendations < 20]\n", "sns.jointplot(x='articleWordCount',\n", " y='recommendations',\n", " xlim=(0, 2500),\n", " ylim=(-1, 20),\n", " kind=\"hex\",\n", " data=X_plot,\n", " extent=[0, 2500, 0, 20],\n", " stat_func=None)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Print page - most of the important big-headline articles appear in the first 30 pages of the paper, where they attract more readers and more comments.**\n", "\n", "This is why the vast majority of comments are to articles in the first 30 pages. Generally, more important articles appear in the first few pages on the paper version of the newspaper, so naturally we expect upvotes to go down with every page. It is not quite so simple though - the very first few pages including the front page are a mixed bag and there is a peak around pages 19-25 perhaps because of the editorials and columnist sections. " ] }, { "cell_type": "code", "execution_count": 73, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('printPage', 'recommendations', df=df)" ] }, { "cell_type": "code", "execution_count": 74, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stacked_bar_plotter('printPage', 'recs')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Getting the coveted 'NYTimes Pick' helps a lot. So does being a trusted user.** \n", "\n", "Again, nothing unusual - the greater visibility that comes with a NYTimes Pick icon and being placed on top of a short list of comments in the NYTimes staff tab obviously results in more comments. Simple comparison of means (even without a t-test which given the sample size would only confirm the obivous) would lead us to conclude that on average getting picked by the staff brings about 250 upvotes!" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "editorsSelection_1\n", "0 14.282758\n", "1 301.788889\n", "Name: recommendations, dtype: float64" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.groupby('editorsSelection_1').recommendations.mean()" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "trusted\n", "0.0 17.691285\n", "1.0 75.878780\n", "Name: recommendations, dtype: float64" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.groupby('trusted').recommendations.mean()\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Hot button issues bring about reactions, but not always upvotes** \n", " \n", "Articles about things like 'school shootings', 'global warming', 'russia', are consistently associated with comments that receive _fewer_ upvotes on average. This could be simply because too many people react by writing comments, but I don't think it is that simple. It might be something about the emotional reaction the articles evoke or about a selection effect - who reads them and whether they tend to upvote. Intriguingly some article keywords and even comment words (words included in a comment) are associated with higher upvotes - for example: 'women' and 'illegal immigration.' This might also reflect the period of time the articles covered and might be, in some sense, over-fitting to one or a few articles, but more probing would be necessary to make sense of these features. " ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Keyword:\"global warming\"\n", "0 19.799027\n", "1 16.575217\n", "Name: recommendations, dtype: float64\n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "print(X.groupby('Keyword:\"global warming\"').recommendations.mean())\n", "sns.barplot(x='Keyword:\"global warming\"',\n", " y='recommendations',\n", " data=X)\n", "\n" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.barplot(x='Keyword:\"women and girls\"',\n", " y='recommendations',\n", " data=X)\n" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Effort pays off: say more; use a rich vocabulary; refer to people, places and organizations sparingly; spell-check!**\n", "I was really curious whether comment length, or the use of sophisticated vocabulary (measured by IDF and word length), grammar complexity (max sentence length), spelling error rates, or the use of particular parts of speech more often leads to higher upvote counts. The classification model provides some evidence that all of these things matter. Starting with the easiest one, here is a look at comment word length:\n" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZAAAAEhCAYAAABRKfYcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3XdUVGf6wPHvMDD0Il2UJiJFVEDFXrDEFktMjAajcWM0xiQbN3ETYxJTdGNiyhqjq9FETbHFsmosscTeiKJEwQJiQ0EEQZHe5veHP2YdhWEER9rzOcdznFve+8w7w33m3rdchVqtViOEEEI8JKPqDkAIIUTtJAlECCFEpUgCEUIIUSmSQIQQQlSKcXUHUJ68vDxiYmJwcnJCqVRWdzhCCFErFBcXk5qaSlBQEGZmZgY9Vo1NIDExMYwcObK6wxBCiFpp2bJltGnTxqDHqLEJxMnJCbhbCa6urtUcjRBC1A7Xr19n5MiRmnOoIdXYBFJ628rV1ZXGjRtXczRCCFG7PI5b/9KILoQQolIkgQghhKgUSSBCCCEqRRKIEEKISpEEIoQQolIkgQghhKgUSSBCCFGLLFh3koFvbWDBupPVHYokECGEqC1y84vYcugiAFsPXSQ3v6ha45EEIoQQtURhUQmljwAsUd99XZ0kgQghhKgUSSBCCCEqRRKIEEKISpEEIoQQolIkgQghhKgUSSBCCCEqRRKIEEKISpEEIoQQolIkgQghhKgUgyaQvLw8evXqxbp160hOTmbUqFFERETwxhtvUFBQYMhDCyGEMDCDJpD58+dja2sLwJw5c4iIiGD58uV4enqyZs0aQx5aCCGEgRksgSQkJHD+/Hm6d+8OQGRkJD179gQgPDycw4cPG+rQQgghHgODJZDPP/+cKVOmaF7n5uaiUqkAcHBwIDU11VCHFkII8RgYJIGsX7+e4OBg3N3dy1yvLp1OUgghRK1lbIhC9+zZQ2JiInv27OH69euoVCosLCzIy8vDzMyMlJQUnJ2dDXFoIYSos4pLataPb4MkkNmzZ2v+/+2339KoUSNOnDjBtm3bGDx4MNu3b6dLly6GOLQQQtQ5uflFrN0Vz9Yjl7SW7zqWyOCuTVAoFNUS12MbB/L666+zfv16IiIiuHXrFkOGDHlchxZCiForJ6+Qqf85wKqdcWRmaQ9/+GFjDHNX/1VtzQIGuQK51+uvv675/5IlSwx9OCGEqFN+3nKG81dvl7t+e+RlQvyc6Nyq0WOM6i4ZiS6EEDVUbn4Rfxy7UuF2mw5cfAzRPEgSiBBC1FCXr2eSm19c4XbnLqdXy20sSSBCCCEqRRKIEELUUJ6uNpibKivczt/Lvlp6YkkCEUKIGsrc1JiebT0q3O7JTk0eQzQPkgQihBA12Kh+ATR1tyt3fZ/2nnRs2fAxRvQ/kkCEEKIGszAz4dNXOjGitx+2ViqtdS8NDuLVZ1rV/YGEQgghKsfc1JiRff35dnIPreXhrd2rLXmAJBAhhKg1lEbVlyzKIglECCFEpUgCEUIIUSmSQIQQQlSKJBAhhBCVIglECCFEpUgCEUIIUSmSQIQQQlSKJBAhhBCVIglECCFEpUgCEUIIUSkGeyZ6bm4uU6ZM4ebNm+Tn5zNx4kS2bdtGbGwsdnZ3Z5YcO3Ys3bt3N1QIQgghDMhgCWT37t0EBQUxbtw4rl27xosvvkhISAhvvvkm4eHhhjqsEEKIx8RgCaR///6a/ycnJ+Pi4mKoQwkhhKgGBm8DGTFiBJMnT2bq1KkA/PLLL4wePZp//OMfpKenG/rwQgghDMTgCWTlypXMnz+ff/7znwwePJjJkyfz008/ERAQwNy5cw19eCGEEAZisAQSExNDcnIyAAEBARQXF9OsWTMCAgIA6NGjB3FxcYY6vBBCCAMzWAI5duwYixcvBiAtLY2cnBymTZtGYmIiAJGRkfj6+hrq8EIIUeeYGBtR+gBCI8Xd19XJYI3oI0aM4L333iMiIoK8vDymTZuGhYUFkyZNwtzcHAsLC2bOnGmowwshaogF606y+eBFBnTyZsLQltUdTrWqal2YmxrTv6M3mw9epF9Hb8xNDXYK14vBjm5mZsZXX331wPK1a9ca6pBCiBomN7+ILYcuArD10EVeGBBY7Se96vKo6mLC0JY1JhHLSHQhhMEUFpWgVt/9f4n67uv6qi7WhSQQIYQQlSIJRAghRKVIAhFC1AsL1p1k4FsbWLDuZHWHUmdIAhFC1Hn3N2Dn5hdVc0R1gyQQIUSdVxcbsGsCSSBCCCEqRRKIEEKISpEEIoQQolIkgQghhKgUvRJIYWEh169fB+Ds2bOsX7+e3NxcgwYmhBCiZtMrgUyZMoXo6GhSUlJ4/fXXiYuLY8qUKYaOTQghRA2mVwJJSUmhb9++bNmyhYiICN5++21u375t6NiEEELUYHolkIKCAtRqNTt27KB79+4A5OTkGDIuIYQQNZxeCSQsLIzWrVvj5OSEt7c3S5cuxdvb29CxCSFEnXD9ZjbLt53VWnb2cno1RfPo6DUZ/eTJkxk/fjw2NjYA9OrVi+eff96ggQkhRF1w8K8kvlwWRVGx9uj36T9E8mQnb8Y/1QJF6WMGaxm9Ekh8fDyrV6/m9u3bqEvnAwBmzZplsMCEEKK2u5h0my+XHaOoWF3m+k0HL+LqaMngrj6PObJHQ68EMmnSJPr160dAQICh4xFCiDpjw76EcpNHqfV7zvNkJ2+Uyto3LE+vBOLo6Mhrr71m6FiEEKLGqszzzP+MvV7hNmm380i4dptmHg2qGuJjp1fK69q1KwcOHKCgoICSkhLNPyGEqA8qOx28vtvlFdTO6eX1ugKZP38+WVlZWssUCgVnzpwpd5/c3FymTJnCzZs3yc/PZ+LEifj7+/P2229TXFyMk5MTX3zxBSqVqmrvQAghDKys6eDNTSver6GjFYkpdyrcztXBsooRVg+9EsixY8ceuuDdu3cTFBTEuHHjuHbtGi+++CKhoaFERETQr18/vv76a9asWUNERMRDly2EELXBE+08+WFjjM5tQpo54dzA4jFF9GjpdQsrOzubefPmMWHCBF555RUWLlxIXl6ezn369+/PuHHjAEhOTsbFxYXIyEh69uwJQHh4OIcPH65i+EIIUXP17eCJT2Pbctebmxrz4qCgxxjRo6VXAvnggw/IyspixIgRPPvss6SmpvL+++/rdYARI0YwefJkpk6dSm5uruaWlYODA6mpqZWPXAhRo6nVauITM7SWFRYWP/YYTp1PY/7aaK3l+Y+pzcFMZcyMlzvSuZUb94/08GpozacTO+HV0OaxxGIIet3CSktL4+uvv9a8Dg8PZ9SoUXodYOXKlZw5c4Z//vOfWmNI7v2/EKJuSbuVy6yfj3HmkvZo69e+3M3rzwbTsaWbwWMoLCrmy2VRHDqZ/MC6d+YeYPqEjjRysjJ4HFYWKt4Z3ZYL127xxtd7NctnTOiErZUeDSk1mF5XILm5uVrTt+fk5JCfn69zn5iYGJKT735wAQEBFBcXY2lpqbn1lZKSgrOzc2XjFkLUUDl5hbw3/+ADyQMgK7eQz386yvGzNwwex6L1MWUmD4DUW7lMW3j4sfZ+crTTbueoraPP76VXAhk+fDj9+vXjtdde49VXX2XAgAEVNn4fO3aMxYsXA3evYHJycujYsSPbtm0DYPv27XTp0qWK4QshaprfD18iKS273PUlaliyKdagdyEyMvPYHnlZ5zY30nPYd+KawWKoD/S6hfXMM8/QqVMnYmNjUSgUTJs2DRcXF537jBgxgvfee4+IiAjy8vKYNm0aQUFBvPPOO6xatQo3NzeGDBnySN6EEKLm+ONYYoXbXErO5FJyJt5u5TcwV8Wfp69TXFJxgjp4Mokn2nkaJIb6QGcC2bt3L926dWPNmjVay/fv3w/cTSzlMTMz46uvvnpg+ZIlSyoTpxCilrh5S7+nlabdyjVYAsnO1e/WVE5uoUGOX1/oTCDnzp2jW7duREVFlbleVwIRQtRP1pYqsvMqPoFbW+g/iPhhpxFxtjfXq9zaOv6iptCZQMaPHw9A586dGTBggNa6FStWGC4qIUSt1blVI9bsite5jXMDc3z1nPvp/mlEXhgQiLmp7rvvYYGuWFuYcCdH9xVGrzAPvWK4v9uvPrfH6gOdn8KZM2eIiYlh8eLFWr2wioqKmDdvHs8995zBAxRC1C5PdvZm25FLOk/ew3v7oTTSrxdSZaYRUZko+duTzZnza3S524QFuhLczElnOWq1mjW74ll7X0J8c/ZeXhoURKdWhu+OXJPpTCAqlYqbN29y584drdtYCoWCt99+2+DBCSFqHwdbcz4a14EZiyPJuPNgd/9R/QIeS8N17/8/xuLfYsm6r62jW0gjXh8eUmFX2h82xrJhX8IDy9Nu5fLZT0d5MyKU8Nbujy7oWkZnAvHx8cHHx4f27dsTHBysta60O64QQtyvmUcDFr7bi9+PXNaaC+rfk7rS1P3xTVveu50nXUMbszfqKt+u/t/VyPinWmJqotS57+XrmWUmj3st/O8pOrZ0q7CsukqvbrzOzs7MmjWLjIy70xIUFBQQGRlJnz59DBqcEKL2MjM1pkcbd60E4mz/+GedNTVR0r5FQ60Eoo+KxpHA3YGRh08m0b2eXoXoNZDw7bffxs7OjujoaIKCgsjIyJDH2Qoh6rSrN7Iq3ughtquL9EogSqWS8ePH4+joyMiRI5k/fz7Lli0zdGxCCFFtzFT63ZYy1XO7ukivBJKfn8/169dRKBQkJiZibGzMtWsyBYAQou5qG+Cq13ZhgfptVxfplUBeeuklDh06xNixYxk8eDDt27cnJCTE0LEJIUS16RLSCEc73QMSQ/2d8dRzOnYTYyNKO30ZKe6+ru30akTv1auX5v9//vkn2dnZ2NoaZgoCIYSoCUxNlHz0UnumLTxEeuaD3ZGbNrblrYjWepdnbmpM/47ebD54kX4dvSscDFkb6HwH7777rs6dZ86c+UiDEUKImsSzoQ1z/9mDjfsSWLkjTrN8/JAg+nbwwsT44do/JgxtqddULLWFzmuo0NBQQkNDMTIy4vbt2/j7+9OsWTNu3ryJubl+c80IIURtZm2hYmAXH61l3ULdHzp51EU6r0CGDRsGwI4dO1i4cKFm+ZgxY3j11VcNG5kQQogaTa9WnOTkZDIzMzWvs7OzSUyseM5/IYQQdZderTgjRoygd+/eNG7cGIVCwdWrV5kwYYKhYxNCCFGD6ZVARo4cyeDBg7l8+TJqtRoPDw9sbPTruiaEEJVRVFzCtsOX+O3gRa3lR2KS6R3mUSeeKV7b6ZVAUlNT2bJlC7dv39Z6jvEbb7xhsMCEEPVXYVEJ/1oSSdTZGw+s+/bXaC4m3Wb8kBaSRKqZXm0gL7/8MmfPnsXIyAilUqn5J4QQhrBmV3yZyaPUpgMXOXgy6TFGJMqi1xWIhYVFpcZ8zJo1i6ioKIqKinj55ZfZtWsXsbGx2NnZATB27Fi6d+/+0OUKIequouISzRMIdflt/wU6t2r0GCIS5dErgbRq1YqEhAR8fHwq3vj/HTlyhPj4eFatWkVGRgZPPfUU7du358033yQ8PLzSAQsh6rZrN7K4VcaDqO53+mI6RcUlGCsrvpFSOo2IWl13phGpCfRKIPv372fp0qU0aNAAY2Nj1Go1CoWCPXv2lLtP27Ztadny7ohLGxsbcnNzKS4ufiRBCyHqrhK1/s8bV+u5bV2cRqQm0KsW58+f/9AFK5VKLCwsAFizZg1du3ZFqVTyyy+/sGTJEhwcHPjggw+wt7d/6LKFEHVXQ0dLzE2Nyc0v0rmdV0ObhxoNXtemEakJ9LqOc3JyYs+ePaxYsYJGjRqRlpaGo6OjXgfYuXMna9asYdq0aQwePJjJkyfz008/ERAQwNy5c6sUvBCi7jFTGdMrzKPC7fp39DJ8MP+vLs6k+yjoVQsfffQRV65cITIyEoDY2FimTJlS4X779+9nwYIFLFq0CGtrazp06EBAQAAAPXr0IC4uroIShBC1WWVPvBF9/PHSMU1620AXnmjn+ShC1EvpLTBAboHdQ69P88KFC7z77ruYmZkBEBERwY0b5XexA7hz5w6zZs3iu+++0/S6ev311zVToERGRuLr61uV2IUQNVxlT7xW5ibMfLUzg7o0wdxU+zbVsB6+TB0ThlKPxvNHacLQlvz21WC5DXYPvT5NY+O7m5UO2snJySEvL0/nPlu2bCEjI4NJkyZplg0dOpRJkyZhbm5e6a7BQojapbJtD1bmJowb0oIh3Xx4ccYOzfIh3Zvq1fNKGJ5eCaRv37688MILXL16lRkzZrBv3z4iIiJ07jN8+HCGDx/+wPKnnnqqcpEKIeolU5XcLqqp9Ppknn/+eVq2bMmff/6JSqXi66+/JigoyNCxCSGEqMH0vg5UqVQEBwcTEBBAbm4uR48eNWRcQgghaji9rkAmTJhAfHw8Li4ummUKhYJly5YZLDAhhBA1m96z8f7xxx+GjkUIIUQtotctrKCgIK5evWroWIQQQtQiel2BBAQE0LdvXxwdHVEqlZq5sOSqRAgh6i+9Esj333/P4sWLcXV1NXQ8Qgghagm9Eoifnx9hYWGGjkUIIUQtolcCcXR0ZNSoUYSEhGg9iVAeaStE2RasO8nmgxcZ0Mlbpr4QdZbes/G2a9cOlUolj7QVogK5+UWaJ+ptPXSxwmnJa7IF604y8K0NLFh3srpDETWQXlcgr732mqHjEKLOKCwqofQ5RyXqu6/NTas3psq4PxG+MCBQZqEVWvT6Nnz33Xd8//33ZGVlAWh6YZ05c8agwQkhqk9dSYTCcPRKIOvXr2f9+vXSC0sIIYSGXgnE19cXV1dXafcQQjx2pQ+lUqvlaYA1jV4JZMiQIQwaNIjmzZtrJRF5nocQhvMoenLVhd5gpQ+l2nzwojwNsIbR65OYOXMmgwcP1ppMUQhhOI+iAbsuNYJX9qFUwrD0+jZ5eHhITywhHqNH0YAtjeDC0PRKIK1atWLOnDmEhoZq3cLq0KGDwQITQghRs+mVQEofHnXvQ6QUCoUkECGEqMf0SiA///xzpQqfNWsWUVFRFBUV8fLLL9OiRQvefvttiouLcXJy4osvvkClUlWqbFE31YRG35oQgxC1gV794RISEhg9ejShoaG0bt2asWPHcuXKFZ37HDlyhPj4eFatWsX333/Pp59+ypw5c4iIiGD58uV4enqyZs2aR/ImRN1QE6YAqQkx1BRXb9zRep2WkVNNkYiaSq8EMn36dF588UUOHDjAvn37GDFiBB9++KHOfdq2bcs333wDgI2NDbm5uURGRtKzZ08AwsPDOXz4cBXDF3VJWY2+9TGG6pZXUMQXPx/jnbkHtJZP+vdeFq4/RXGJupoiEzWNXglErVbTvXt3LCwssLS0pHfv3hQXF+vcR6lUYmFhAcCaNWvo2rUrubm5mltWDg4OpKamVjF8IR4kEwBWnlqt5uvlx9kXfe3BdcBv+y+w+LeYxx+YqJH0SiCFhYXExsZqXp88ebLCBFJq586drFmzhmnTpmktV6vlV4x49OQW1F0FhcVExiQ/9H5xVzI4fEr3fpsOXCTtVm5lQxN1iF6N6O+88w5vvfUW6enpwN3p3T/77LMK99u/fz8LFizg+++/x9raGgsLC/Ly8jAzMyMlJQVnZ+eqRS/EfWrC2IeiIv1+XJUnIzOPTQcuaC3Lyi3AxlK/Did7ohJZtCGGzOwCreX/WRPNpBGhmOkYTLg76mqF5ZeUqNl34ipDw331ikfUXXqPA9myZQvZ2dkoFApMTU0xMTHRuc+dO3eYNWsWS5cuxc7ODoCOHTuybds2Bg8ezPbt2+nSpUvV34EQNURxiZo1u+LYsC9Ba/kXPx/j5aEtaOxsXWEZu6MS+fbX6AfaXt74ajf/fL4tYc11T2i678RVvlp+vMx1B08mk537Jx+N74DSSFHmNumZeRXGeHe7fL22E3WbXrewfv/9dyZOnIi1tTVWVlaMHDmS33//Xec+W7ZsISMjg0mTJjFq1ChGjRrFhAkTWL9+PREREdy6dYshQ4Y8kjchRHVTq9XMXnGcX7ae5U52oda66PhU3v52P4kpd8rZ+66/4lKZveJ4mQ33eQUlzPzxKOcTb5W7f1FxCT9s1N0+ER2fytHT18tdb2ul3+WarZV0vxd6XoEsXbqURYsWaV4vXryYsWPH0rdv33L3GT58OMOHD39g+ZIlSyoRpqgN6vP4iaOnU9hzvPzbP3dyClmw7iT/eqVTudus2hmHrg5ORcUlrNkdz5TRbctcf+LcDb2uDHb+eYX2QQ3LXNc1pBG/H76kc3+FAroEN6rwOKLu07sXlrX1/y6/raysUCjKvgQW9VN9b7zeevhShducPJ/GtdSsMtfdupPPqYS0Css4ciqZwnLaWG5k6NewnZJe/niOoCYOtPJ11Ll/zzYeuDpY6nUsUbfpdQUSFBTEpEmTCAsLQ61Ws3//foKCggwdm6hFakLjdXW6mHRbr+0uJWXSyMnqgeV3cgrK2PpBxSVqcvOLMTF+8Nk8lmb6zbRraV5++6VCoWDKC2F8/uNRouMf7GbfPbQxE5+pX1eXonx6fePef/99Nm7cyMmTJ1EoFAwcOJB+/foZOjZRT6jVao6dSWH9fY3PJ86l0DWk8WO72o06m8LPW05rLfto0WFeGBBIK18nnfvq+5Cj8rZrYGOGkZGCkgoG6ZmplOUmitYBLqiMjSioYPBjp5ZuOtdbmZvwycsdOHomhek/RGqWf/pKJ1o01X11IuoXvb71CoWCgIAAOnTowPvvv0+XLl0wMpKngomqU6vVLNoQwyc/RHIyXvsWzpfLjvP9hhi9xwwlpWWxauc5rWVFxfqNJN8TlcjH3x8h4Vqm1vL4xFtMW3iYQyeTdO4f3KziLukqYyMCvO3LXGdlbkK7CnpYAXRv7Y5SWfbfnrWFiv6dvHXu72BrRs+27hUeR6FQ4O+pHatnQ5sK9xP1i15ZYOnSpUydOpU5c+YA8J///If//Oc/Bg1M1A+7oxL5bf+Fctdv3H+B3VGJOstQq9X8tOU0Ez77g437tMt659v9JJXT7lAqK6eAuWv+orw8VVKiZs6qEzrbdZ7s7I1ROV1jS/Vo64G1Rfm9lyL6+GOmKv+x0dYWKp7poXvsxQsDAuke2rjMdfY2Znw8vgMWZrq74AuhL70SyKZNm/j111+xtbUF4O2332bPnj2GjEvUA2q1mvV7Eyrcbv3eBJ1XIev3JrD6j/gyE8D19Bw++O4QOXmFD678f7uOJZJfoHvwX3ZeEftOPDi9RylPVxv+/mww5eWQ5k0cGDuwuc5jeDW04ZPxHXFuYP7AuoaOlvzrlY642FvoLMNYacSbEaF8/lpnuoZo95T68u9d8HSVqwjx6OiVQCwtLbVuWRkZGcktLFFlmdkFXEzKrHC7i0mZD4yqLpVfWMzqP+J07n8jI5edf5Y/e3TCNf0awBOulT8GA6BnWw++eqMbnVpqd5F9cWBzpr/cUecI8FIB3vYsfLcXb0WEai2f9VoXvN1s9YpToVAQ6O3Ay09pN3abqmrn42xFzaVXFvDw8GDu3LlkZmayfft2Jk2ahI+Pj6FjE3Xcw8x0W15bxl9xqdzJKf/qolRZkwOWKm9U9v2My2l7uFdTdzsmPhOstaxnWw+9G9kBlEojQv1dtJZVdHtMiOqg17c6KCgIc3NzXFxc2LhxI61atapwOnchKtLA2lSv+Z1sLFXYlTNCOjNbvyk1yruCAWipZ8+iFj7SA0mIe+l1Tbt7925mzZrF2LFjDR2PqKTaOApcqTTiiXaerNkVr3O7J9p5ltvzyN72wfaCMrezMSt3XadWbizdfJqbt8ufB8rVwYKwQJdy1wtRH+l1BZKXl0fPnj159tlnGTlypOafeDSq+vyKmjAKvKCwcjPQPtPDFy8d3UO9Gtro7HnUqqmjzuRQqmeb8ruumhgrmTomrNwBdjaWKqaOCSs3iQlRX+l1BTJx4kRDx1Fv3X/yf2FAIOZ6NLbeqzpHgReXqPl1xzk27tfuTfXJD0eYMLQlTRvb6dzf0tyEmRM7sfi3WPZEJVJY/L+uVN1CGjFhaEudI6eVSiNeGBDAv1ecKHcbbzcbupXTtbVUM48GfPNmd37deY7tkf9rcO/XwYthPZvhVEbPqLrOxNgIhQLUajBS6D9YUtQfep2pwsLCDB1HvVWbpwBRq9XMXnmcPWU8Q+Lc5QzenXeAf73SiWYeDXSWY2Wh4u/DQ3impy8vz/xDs3z8Uy2x0jFuolSPNh7kF96difb+7rj+Xg2YOiYMlUn54ytKudhb8MKA5loJ5Pl+AXo/h6OuMTc1pn9HbzYfvEi/jt4P/cNG1H3yk0JU2vFzN8pMHqXyCoqZv/YvvUeSW5lX/kTdr4MXP07rw4tPBmot/+DF9jSwrvgWV11UegUBlb+CmDC0Jb99NbjWtKuJx0sSiKi0bUcuV7jN+au3Sbiq3ziLqrI0N6FnmOdjOZahPYqTf+kVBCBXEMIg5BslKu1ycsWDAAEuX8+kqbvuthCh7VHdPpowtKVcPQiDkQQiKk2fdgUAVRlTj4uKyclf1HRyC6uKqtoFtzYL9at4BlpjpZFMAS5EHSUJpApqwviL6tS/kzeqCu7Nh7dujJ11LelWJoR4KAZNIHFxcfTq1YtffvkFgClTpjBw4EBGjRrFqFGjav2MvmV1wa3NHvZqysXegsnPtyl3jqjmTRwYP6TFowxRCFGDGKwNJCcnh+nTp9OhQwet5W+++Sbh4eGGOmy9olarOX0xnd8OaA/iy8zOf+ixC5Ud0NihRUO+ndydtbvi2Xn0f8/tGDc4iH4dvR/74LOqDn57FIPnZACeqC8M9s1WqVQsWrQIZ+eK75OLh1dcoubbX6OZMu8AB/9K1lr31uy9nEpIK2fPslXlaqqxszV/Gxiktax7a/dKn7yh+rquPoqur9J9VtQXBvtmGxsbY2z8YPG//PILS5YswcHBgQ8++AB7+7If8Sl0+3VnHDvKecZFTn4x03+I5D9v98DRrvZMwVFTuq4+it5P0oNK1AeP9dp68ODBTJ48mZ9++omAgADmzp37OA9fZ+QVFLFxn+4n+d3CtqZGAAAgAElEQVR7S6o2kZHPQtQejzWBdOjQgYCAAAB69OhBXJzuJ8mJssVeuElWbsUPUTp8KrnCbdRqNSfPp7Lwv9oN5yUl+k0/IoSovx5rAnn99ddJTLzb0BoZGYmvb/nTdNcGRUWVm8K8qnLy9OsuXNF2OXmFfLjwMO/NP8Te+573/a+lkXolKSFE/WWwNpCYmBg+//xzrl27hrGxMdu2beP5559n0qRJmJubY2FhwcyZMw11eIMqKVGzbs95/rvnvNbyjxYd5uWnWuDnqV+7TlZuIduOXNJapk8PqoYOlnqV39BR93ZfLz/OibjUMtedvZTB5z8e5ZOXO6BQyONUhRAPMlgCCQoK4ueff35geZ8+fQx1yMdm/rqT/H740gPL4xNvMfU/B5k+oSOB3g46yzh2JoUvfjn2wFXC37/aw6QRIXQNKf/5FT6NbfFqaMOlCuai6tXWo9x1CVdvERl7Xef+0fGpnL2UQYC3dHQQQjxIOqg/pNgLN8tMHqUKikqYuzpa5xTmCVdv8enSP8u8xVRYVMJXy6I4db78brgKhYLxQ1qgNCr/yiDAy17nQ5QO/JVU7jrt7a5VvJEQol6SBPKQfr/vllNZElOyOH0xvdz1a3bF6xxnUaKGVTvP6TxGi6aOfDSuPa4OFg+s69DClY/Gtdc5juJOToHO8ktl6rmdEKL+kQTykB5mCvOyFBYVcySm4t5Rf8WncTsrX+c2wc2c+W5KL6aMbqO1/LVhIViYlf8YWAAnPceH6LudEKL+kQTykPSdmry8SQZz8oooKtavi2xmdsW//o2MFLRo6qRXeffq3todfdrGw1u761XeoxhFLoSoXeSv/CGF6DGFuZECWvqWfVK3NDfBVFVxEjJSQAMDzmLrYm/BwM5NdG7Tp70n7i7WepUn03cIUf9IAnlIfTt4YlZBAujUqhHODR5sm4C7z8fopqOHVal2QQ2xsqj8M8L18eKgIJ7q3rTMxvgnwjx45SFHg8sociHqF0kgD8nB1px3Rrct92l8zTzsePWZVjrLGNbTFyvz8tsoTFVKnnvCr0px6kNppODFgc1Z8sETjOrvr7XuhSeboyxnmnYhhABJIJXSJsCFbyd3p097T63lLz4ZyMyJnbHUkRwAXB0smTGhY5kDAu2tTfl4XAe83Wwfacy6NLAxo29778d2PCFE3SAJpJLcHK0Y3T9Qa1nPME+9nxPu09iO+VN6MnlkqNbyf7/ZneZNdA9CFEKImkASSDVSGikI8XPRWlbe0/0MTXpRCSEelpwl6oCa8CAmIUT9I2eJOqCmPIhJCFG/SAKpI+TkL4R43OQWlhBCiEqRBCKEEKJSJIFUgfRcEkLUZ/X+jLdg3UkGvrWBBetOVrzxfR5FzyVJQkKI2qpen61y84vYcugiAFsPXSQ3X79njd+rqvM/SfdZIURtVW/PVmq1mrjLGZQ+OLBEffdpgOaGmwC3XNKDSghRGxn0CiQuLo5evXrxyy+/AJCcnMyoUaOIiIjgjTfeoKCgep5292fsdV79Yjfvf3dIa/ne44nVEo8QQtRGBksgOTk5TJ8+nQ4dOmiWzZkzh4iICJYvX46npydr1qwx1OHLtff4VWYsiSQx5c4D6xauj2Hd7vjHHpMQQtRGBksgKpWKRYsW4ez8vwcwRUZG0rNnTwDCw8M5fPiwoQ5fprz8Iuav/Utz26osP245Q2pG7uMLSgghaimDJRBjY2PMzMy0luXm5qJS3X1IkoODA6mpqVU6xsP2oDrw1zWy83Q3lJeUqNnx5+UqxSWEEPVBtfXCUuu6DNBDZXpQXb7+4G2rslzRczshhKjPHmsCsbCwIC8vD4CUlBSt21sPKzev8IEeVBXRd4yFiUm97t0shBB6eaxnyo4dO7Jt2zYAtm/fTpcuXR66jMzsAhauP8WEz//QWr4nKrHCq5rW/i4615cK9at8YhNCiPrCYONAYmJi+Pzzz7l27RrGxsZs27aNL7/8kilTprBq1Src3NwYMmTIQ5V5604+U+bt51pq9gPrFm2IIflmNuOHtEBROrT7PoHe9vi62xGfeKvcYzjamdO5ldtDxSWEEPWRwRJIUFAQP//88wPLlyxZUukyF204VWbyKLXpwEVa+7vQJqDsKw2FQsGU0W15f8Ehkm8+WI61pQkfvNgOE2P9HksrhBD1Wa252Z9xJ4+DfyVVuN3mgxd1rne2t+Df/+jG354MpLGzlda6zyZ2pkkj2yrFKYQQ9UWtSSAXrt2muKTinlvnLmdUuI2luQlDw335/DXtNhg7a7Ny9hBCCHG/WpNAymvXuJ/RQ7wjmQlXCCEqr9acMZs2ttPrBB/gZa93mTITrhBCVF6tOWPaWKroFtKYnUev6Nzuyc5NHqpcmQlXCCEqp9ZcgQCMHdQcbzebctc/26sZrXydHmNEQghRf9WqBGJloeKzVzszvHczbK1UWuveeDaEUf0CqikyIYSof2pVAgGwMDPh+b4BzJ3cQ2t5WJBrNUUkhBD1U61LIKWMjPTrlSWEEMIwam0CkS64QghRvWrtWVe64AohRPWq1Wdd6YIrhBDVp9ZegQghhKhekkCEEEJUiiQQIYQQlSIJRAghRKVIAhFCCFEpkkCEEEJUSo3txltcXAzA9evXqzkSIYSoPUrPmaXnUEOqsQkkNTUVgJEjR1ZzJEIIUfukpqbi6elp0GMo1Gp1xc+JrQZ5eXnExMTg5OSEUqms7nCEEKJWKC4uJjU1laCgIMzMDPuY7hqbQIQQQtRs0oguhBCiUiSBCCGEqBRJIEIIISpFEogQQohKkQRyj/T0dFJSUvTatqSkpMrHexRlCCFEdZEE8v+2bNnCs88+y+zZs3VuV1BQwFtvvcW0adMAqEwntqqWkZ+fz7vvvsunn3760Mcuz8qVK1m/fv0jKWvXrl389ddf1VrGo4hh7969ev+gMGQZNaEuakoZVa3PmvB5PKoyHsV7KXX27FkKCgoeej/lRx999NEjiaCWioqKwtraGrVaTe/evTlz5gwlJSX4+Pg8sG1BQQGZmZnExsayd+9eQkNDcXV1fajjVbUMtVpNfHw8CQkJbNy4kS5duuDo6PhQMdxv7dq1/Pbbb2RlZeHq6oqzs3Oly1qxYgUzZ87EyMgIX19frKysHnsZjyKGefPm8cEHH2BhYUHr1q0xMnr431qPooyaUBc1pYyq1mdN+DweVRmP4r3A3TEjixYt4t133yU7O5tOnTo91P71NoEkJiby8ccfs3PnThISEsjOzqZv375kZGSwe/duunXrhrHx3YH6Bw4c4OOPP+bo0aPY2dkxatQoLC0tWbp0KU899ZRex6tqGaX7x8XF4erqynPPPYeJiQmLFy/m6aeffuj3n5CQwBdffAFA06ZNGT9+PBcvXuT8+fOEhYU9VFnp6en8/PPPFBQU0KFDB0aNGsWePXuwsLDA3d1dr4GgVS3jUcWwYcMGjIyM6NOnD8OGDWPFihX4+fnh5OSkd108ijJqQl3UlDKqUp814fOoKXVxL7VazZEjRzA1NcXX15ehQ4eybNkygoODsbe317uceptA9u7dS1paGnPnzsXLy4sPPviAbt260bJlS6KiokhOTqZVq1akpKSwYMECXnnlFRo3bsyKFSuwsbGhf//+rFy5EhMTE/z9/VGr1SgUijKPVdUy4uLiWLBgARMnTkSlUrFo0SJat25NeHg4P/zwA3Z2djRr1kzv9x4VFcWHH35I586dycjIYM2aNYSHh6NSqYiOjsbIyEjvKRDi4uJ45ZVXcHd358iRIyQmJhISEkJBQQFRUVF4eHhU+IWsahmPIoZTp04xceJEnJycWLVqFQqFgnbt2pGUlMTRo0dp164dJiYmBi+jJtRFTSmjqvVZEz6PmlIX9zpx4gRvvfUWiYmJbN68GXd3d0JCQrh+/Trbt2+nT58+epUD9SyB3HuCjouLAyAwMBBHR0dyc3NZt24dTz/9NEqlku3bt9OiRQsKCgpYtmwZb775Jl5eXuTk5HDq1CmaNGlCUFAQX3/9NcOGDdNcrZQlLS2tSmWUfrCvv/46fn5+XLlyhbNnzxIWFoaHhwefffYZL7zwgt71cPnyZUxMTBg/fjwhISFER0eze/dunn/+eS5dusTZs2cJDg5GpVJVWNbx48dp1KgRr7zyCj4+Ply+fJmjR48yevRodu7cSUlJCV5eXjrLqmoZjyKGHTt20Lp1a1566SXc3Nw4ffo0ycnJjBgxgp9//hlnZ2e8vLx01sWjKKMm1EVNKaOq9VkTPo+aUhf3Wr16Nf7+/kyZMgUzMzMWLVpE9+7dadmyJWvXrqVBgwZ4e3vrVVa9aEQvTRb3/ro3MzPj2rVr5OTkADBx4kQuXbrEypUr6dq1Kz4+PqxcuRJPT09CQ0PZsmULAD169MDY2Jh9+/YRFhaGl5cX8+fPB8puDFer1VUuw8HBgZYtW/Lnn38CMHz4cJKSkjhw4ADdunXD3d2db775Ru/6SElJIT09XfP67bff5tChQ5w9e5aePXtSVFTEzp07dZZR2oMsOzub7du3A+Dj40Pnzp25cuUKp0+fZvDgwRw/fpyEhASDlPEoY1AoFOzatQuA9u3bExgYSExMDBkZGTz77LOsXbtWq84MVUZ11kXpd6+6PtPSRtyq1uej+Dyquy4e5XspfT+l78nBwYHCwkKKioro06cPTZs2ZeXKldjb2/P000+zZMmScsu5X52+Arlw4QIzZsxg8+bNXLlyhQYNGmganH18fFi7di0mJiZ4eXkRGxvL4cOHiYqKYtiwYVhZWXHw4EFsbGywsLDg9OnTNG/eHAcHB1JTU4mKiqJXr174+/vzxRdf0Lt3b+Li4pg2bRoZGRlYWFjg4OCAQqEgPz+f27dvV1hGo0aN+Pe//82NGzcANI3rhYWFxMXFkZmZSdOmTXF0dOTy5ctERkbSu3dvmjdvzvTp0xk6dCjm5uaa93/lyhV+//13SkpKcHFx0Sz38PDg22+/xd/fn4YNG2qufP744w+GDh3KzZs3iY2NxdvbG1tbWwCysrLYvn07OTk5uLq6aq7m/P39WbduHRYWFvj6+mJqakpWVhbx8fEMGjSIY8eOcevWLby8vCgpKeGPP/4gLy8PFxcXSkpKMDIy0rsMV1dXFi5cyI0bN/D399f8IHjYGPbs2UNBQQFOTk6aGBo1asSBAwewsbHBw8MDMzMzLl++TG5uLgMGDGDTpk2o1WoCAwPJyckhOjoac3NzzM3NK1VG6UmlsvX5qOoiKioKU1NTLC0tH/rz8PLyori4uMrfC4AZM2awY8cOevfu/dD16enpyYEDBygpKcHe3r7Sn2lV66KkpIQjR46gUCiwtbWtdH0ePHgQY2NjbG1tNfX5MO/l3h/K8fHxmvNQ6fL4+Hhu376Nm5sbtra2eHt7M3/+fMLDw2nTpg07d+7k5s2btGrVqsJzbJ1NIIWFhXz33Xe0adOGSZMmsWrVKpo0aYKnpyeFhYUolUpsbGzYvHkze/fuZeXKlQwYMIDAwEDCwsJwdXXl6tWr7Nu3j759+3LhwgWuXbtGq1at8Pf3Z86cOfTo0QN3d3dycnJYuXIlx48fZ+DAgXh7e6NSqTTJytjYGIVCobOM48eP8/PPP/PSSy+hUChYtmwZgwcPBsDc3JycnBzi4+MpKirCx8eHwMBAFi1aRO/evWncuDGZmZmsX7+efv36AXfbeKZMmYKLiwurV6/GysoKV1dXVCoVJiYmFBQUsHr1agYNGoRCocDY2JgLFy7QqVMnTE1NSU1N5dKlS4SEhHD8+HFef/11jI2N+eGHH/D29sbT05Pi4mKMjIywtrbmp59+4umnn8bMzIz4+HiSk5Pp1KkT1tbWHD58mEuXLvHNN99QUlLCvHnzCA0NpWHDhhQVFelVxtKlS9m6dSuNGzfWdCCAu7/QFAqFXjFcuHCBuXPnUlRUxLx58wgJCaFhw4aazygnJ4fdu3fTu3dvbGxsOHToELdu3aJdu3bY2tqyYcMG8vLy+OSTT0hPT2flypUEBgZqeq0plUpyc3PLLcPOzo7169djYmLClClTKl2fVakLKysrjhw5QlJSEp999hmFhYUsW7YMPz8/rR8Z+tRnWloa06dPr9L3Ijo6muXLl+Ps7MzVq1cJDw/X/AjS9ZmEhYVhZ2fHDz/8wMqVKzV/78HBwZofXvp+psXFxUybNq1KdXHjxg1mzJhBQUEBixYtolmzZjRu3PihysjIyGDq1KkUFBRw9OhRQkNDsbCw0Ou9lH63goODsbW1Zd++fUyfPp0DBw6Qnp6OhYWF5nxka2vLH3/8gY2NjabX5cmTJzlz5gydO3fGwcGBn376iV69emn9IC1Lnb2FpVarOX/+PP3798fS0hIzMzPy8/MpKCjAxMSEkpISunTpQkhICOfPn0elUrFjxw7s7e05d+4cCQkJHDhwAB8fH3x8fOjevTsbN25ky5YtLF26lObNm2umSjYyMtL0yR40aBBhYWHY2tqSn5+vicfX11dnGUVFRWRkZBASEkKrVq1o27at1vvp2LEjzZo1Y9WqVezbt4/ly5cTGhqq6QJoZWXF3r17SU9Pp6SkhOPHj/Pee+8xadIkxowZQ0xMjNZtqRdeeAFTU1MWLVpEUlISx44dQ61Wo1Qq8fHx4fr16xw8eJCUlBS2b9/OyJEjee+993jxxRc1l7ilPUeeeOIJHBwc+Oyzz4D/fdkBWrVqRXZ2Nv/973+JiIjg/fff55lnnuHIkSOabSsqo3HjxsTGxqJQKHjppZcwNzfX3O4o7b6oTwwbNmxg+PDhvPfeezz77LPs27dPUx8qlYpu3bqhUCiYO3cuAG5ubprL/s6dO5OZmcmPP/7IuHHjmDFjBj169ODf//43WVlZAJiamuoso1OnTmRmZvLdd9/x3HPPVao+3d3diY2NBahUXQQHB5Odnc2aNWt49tlnmTp1KsOGDWP+/PlaYwr0qc8VK1bw1FNPVfp7cfnyZfbu3ctrr73G5MmTCQ0NxcbGRnPbRtdnolAo6NSpE9euXcPV1ZUPP/yQp59+WutWrr6f6ZIlSxgzZkyV6mL16tU888wzTJs2jTFjxrB69WrOnj2rVxn29vaaz+SZZ55hxowZ/Otf/6JBgwZ6v5fS79ZPP/2EWq1m27ZtDB8+nFmzZqFQKDS3yAEaNWpE69atiY6O5vDhwwC0bdtW0xGnffv2uLi4sGDBAipSJ69ASkpKMDY2Jjw8XNNV9syZMyQnJ/PXX39hZ2en+YUREhKCo6MjcXFxeHl5ERMTw6lTp9i4cSPvvfce/fr1Q6lU4ubmRpMmTTh58iSJiYlMmjQJBwcH4O49zpEjR5Kbm0taWhqff/45J06cYN26dXTo0AFLS0uMjIx0lnH48GH8/Pw4f/48c+fOJTMzk+joaFq0aIGFhQVKpRJ/f39sbGw0ieLVV1/FxsYGtVrN0aNHGTNmjOZ2xr59+zh79izh4eG4ublx8+ZNTp8+jZeXl+a2lL+/P9evX+eHH34gMzOTF198EXt7e6KiooiLi2PcuHF4eXlx48YNGjVqhLu7Ow0bNmTv3r306NEDpVKpuUxv3bo1u3bt4rfffuPo0aMMHjwYhUJBYWEh+/btIzg4mP79+3Pr1i1++OEHgoKCKCkpwdXVVfPL+d4yDhw4wMCBA/H19SUrK4s///wTMzMzevfuzfTp04mOjqaoqEir4VBXDHv37qVhw4akp6fj4+PDwoUL8fX1xdjYWPOL1crKCj8/P3788Uf2799PZGQkAwcO5LfffiMpKYm4uDiaNm2Kra0tLVq0ICQkhB07dpCbm0tQUBBqtbrMMgYNGsTy5cu5fPkyCQkJdOjQgYCAADw8PPSqzwMHDmBmZkZxcTGmpqacOXMGExMTnnjiCb3rYtCgQaxatYrExEQuX76Mn58fBQUFdOzYET8/P+bPn49KpaJZs2YYGRmV+ZkOGjSItWvXcvXqVS5fvkxwcDDBwcF6v4/SMtasWUNqaiqJiYlMnDiRdu3akZuby9y5c+nQoYPWrZv76/PQoUOYmJigUCgwMzNj69atDBgwgJCQEMzNzbly5Qpdu3ZFrVZjZGRU5ucxZswY5s2bx/nz50lOTsbDw4OioiI6dOjwUHWxatUqkpOTSUpKokmTJhQUFNCpUycCAwOJiooiLS2NJk2aaH4klvX9XLRoEb/99hvm5uZ4enqSlZVFQEAAM2fO1LSLuLm5UVJSgrW19QPvZdy4cTg6OpKQkMCmTZvo0aMHfn5+bNy4kYiICBwcHPDx8WH//v1cu3aNkJAQ4G7X/ezsbH799VeioqLYunUrTz/9NK6urpw9e5bNmzfTs2dP/Pz8dJ5r68XzQNLT07G3tyclJYVNmzZRVFTEyy+/zFdffUXfvn1p0qQJ//3vf9m1axfjxo0jLCyMTz/9lKtXrzJ//nyuXr3KrVu3NCeJ0nuJRUVFml/Qt27dYs6cOeTl5fHkk0/SsWNHpk+fTlJSks4yCgsLMTExIS8vj8zMTN566y3Gjh1L9+7d+fTTT7lx4wazZ8/W9Jxyc3MjPz8fU1NT4G6yzM3NZdiwYfj5+fHhhx9iZ2fHxYsX+fzzz5k0aRL+/v7ExcWxZcsWWrVqRXh4OBkZGajVauzt7UlKSsLNzU1TV2PHjqWwsJBNmzY9UJfbt29n06ZNzJkzR7OssLCQ/Px8rKysuHjxIra2towdO5aCggI2b96stf/mzZtJTEzE1dWVZcuW8cknnxAQEEBBQQGFhYVYWlpy4sQJPvnkE60YsrOzeeWVV8jKymLs2LEoFAqWL1/OP/7xD1q3bq21f3kxXLlyhY0bN7Jnzx569OiBm5sbS5cuZcGCBbi6ulJQUIBKpaKgoICLFy+SmprK559/Tq9evUhISKBr166kpqZibm5Ov379cHFxITIyki+//JLly5drbg3eW0ZmZiaffvopffr0IT09HTMzM8aPH6+5cqyoPjdu3MiSJUu09h85ciTvvvsut27d0qsu0tLStGIwNzenefPmbNq0idDQUExMTDh69CglJSXMmDEDGxubB8q4cOEC8+bNo3v37iQmJtKpUycGDRqkuerR53tRVhmDBw+msLAQlUrF/Pnzadu2LW3atNEqo7ReL168iLGxMS+99BItWrRg9uzZWgPoVq5cycmTJ7VmaLh/fz8/P9LS0hgzZgwqlYo1a9YQHR3NwoULad++vV51UToTRM+ePVEqlQwZMoSYmBhiYmLo378/fn5+xMTEsHDhQt566y08PT21ykhISGDDhg2cOXOGoKAgzpw5w9y5czl27Bh//PEH6enpdOzYkfz8fH766SdWr16NtbX1A9+t0pP75s2b+eWXX2jWrBnHjh1j8eLFzJ49m4YNG/L3v/8duNsTbPbs2cyePVury/D58+eJi4vTunWYnZ2NiYmJXr0wa+0VSFFRETExMbi4uGjuuZYnPT2d1NRUTVuDubk5rVq1wtnZGX9/f0xMTHByciIoKIiWLVuiVCrp2LEjy5Yto1+/fpw6dQqFQkFeXp6m8ouLizXJQ61Wa24lnDhxgpYtW+Lh4UHXrl1ZsWIFffr04c8//yQrK4umTZuiUCg0vSJKyzAyMqKgoIC4uDieeOIJbGxsaNGiBb///jtPPPEEe/bswdLSEmdnZ80+pb/yEhISSE5OJiMjA1tbW5o2bappx4iMjKRbt244ODiwadMm7OzsCAwMZOvWrdy5cwcvLy+sra015cXFxeHk5ERsbCxqtZqWLVtq1e+mTZto164dvr6+wN3EuXfvXm7fvo27uzsNGjTg3LlzODg4cPr0aU0ZBQUFKJVKmjVrRps2bfD39+fs2bMkJCTQuXNnNm3axJ07d3B3dyclJeWBGFQqFZ6enjRo0IBhw4bh6+urGZF///73x1BSUkLLli2xtbXFycmJEydO8PHHH+Pv78/Ro0e5fPkyHTp0YOXKlZorEkdHR5YuXcrIkSMZPnw4+fn5xMfHM2DAALZv346TkxOurq54enqybds27ty5Q6tWrVi1ahVKpVJTxm+//Ubnzp15/vnnMTU15cSJE3Tr1g2lUolCoaiwPnfv3k3Xrl0ZMWIEZmZmREdH07dvX83YAX3q4t4YTExMiImJ4bnnnsPLy4uzZ8+SkZHBjBkz2LlzJ+np6QQHB7N582ZNGba2tqxcuZKxY8cyZMgQrly5QnFxMcHBwZofQxW9j/LKaNWqFUqlkqKiIrZs2YK3tzfu7u4UFxeTlZXFhg0btOrT2NiYjIwM4O7Jr02bNpo2zQ0bNtC7d288PDwAyMzM1Ay8a9iwoaYNoLQROzMzkzNnzjBs2DAaNmxIXFwc6enpOuuiQYMGJCYmUlBQwD/+8Q/atm2LtbU1dnZ2REdHk52dja+vL25ubmzbto2bN28SFhamVYZareb48eP885//JDw8nLNnz+Lm5oaDgwMnT54kIyODyZMn06JFC824kXbt2j3w3So9Dy1atIjRo0fzwgsvkJSUhJGREU888QSffPIJgwcPxsLCAlNTUy5evIiZmRkNGzbkk08+oV27dri4uODr64uJiYmmPVKlUun9FNhamUCKi4uZPHkyP/74IwMHDtT0mihrEN7KlSuZOnUqO3fu5MiRI5w6dYrWrVvTrFkz7O3tNftYWVnh4uLCtWvXUKvVrF69GoVCQd++fYmMjGTJkiWcO3eO06dPa6YfuPeYCoUCLy8vrl27xrVr1zAzM2Pbtm2UlJRw69YtVq1aRWpqqmZ/S0tLrSsRhUKBiYkJ69atw8TEBFdXV9auXYtCoaBHjx5ajbWlSvfNz8/n6aefxs7OjrVr12oa1ZydndmyZQt5eXkEBgYSGxuLiYkJLVu2xN/f/4F+4wqFgqtXr9KpUyfatGnDxx9/zOjRozE2NtZ8uXbs2MFTTz3FmTNn+OKLL2jSpAkhISFaU79cvXqVzp07a5VR+iswLS2N69evY7zyyjUAACAASURBVG9vT3p6OtnZ2bRv3x4PDw+aNGmi2f/+GEpPAsHBweTk5GBiYkJaWhp5eXma8TCl+5cXg5GREUlJSaSkpKBSqWjcuDFZWVmUlJTQunVrPD09NX/gCoWCS5cu4efnh6OjI46OjixZsoTx48dz9epVLl68iJGREe7u7ty4cQMfHx+8vLxwd3fH3d1dE8eVK1c0Y43s7e1ZvHgxAwYM0Pzaq6g+r1y5QkBAAE5OTjg4OLB48WL69etHkyZN9K6Le2NwcHDg+++/p3///nh5edG+fXvatGmDsbExV65c0byPxo0b4+Pjo7kVlJ6eTkBAADdv3uSbb75BqVRSWFiIlZUVVlZW/P777wwdOrTM96FPGTY2NiQlJbFu3ToGDx6MkZERpqameHh4aNXn6dOnOXLkCOPGjWPFihVaDb1bt25l9OjRREVF8dVXX9G6dWuaN2+utT/AuXPnOHz4MB9++CGfffYZAwcOpEmTJrRv356QkBBUKlWZdVHq2LFjxMbGEhISwltvvcXmzZs17ao5OTlcvXqVwMBAMjIysLGxITAwUKs+LSws6NSpExYWFqSnp7N9+3Z69eqFs7Mz+fn5ZGdnk5eXR5MmTcjKykKlUml+lLq7u5OcnExmZiY2NjbcuHGDoqIiOnfuzLVr1/jxxx8JDAwkMDCQ4uJiNmzYQN++fTE3N2fr1q107NgRV1dXGjVq9MAUSpWZDqVWJpAbN26QmJiIg4MD0dHRdO3aFeCBBFJQUMDChQt57bXXsLOz49ixYzRo0IB33nlHq1tbqfj4eDZt2sTcuXNRKpW8/PLLmJmZsXDhQt555x2eeuopjh07xubNm+nTp4/Wyb/0j6S0IWrDhg2kpaXx0ksvsXLlSp37w91f/0qlEnt7e06fPs3333+PUqlk/Pjxmrm6yhvpbm1trUlgf/zxB7dv36ZFixY0aNAAJycnli1bxrZt20hKSuJvf/sbNjY25Zbl5uaGqakpzs7OREdHc+zYMbp37675cv3rX//i9OnTHD16lH79+tG5c+cHLnXLK6O4uJhDhw6xcOFCDh8+zK5du/jb3/6Gi4uL1ija8vZXq9XExMQwb948duzYwZ49exgzZswD+5dVxtGjR+nevTsmJiYkJiaydetWDhw4wL59+xg9ejROTk5az49WKBQEBwdrfumVtnX06NGDJk2akJ6ezi+//MLRo0c5cuQIw4cPx87OTlNG6ecVEBCgKaO0t07//v216jM2NpZjx45p1aeu/fv164eRkRGnTp3SWRf6lJGWlsarr77KgQMHOHToEM899xwNGjTQ+kxLu+WW3ooKCwujWbNmREZGaqa+mTlzJrGxseV+L3SVce7cOdq3b0/Lli35+uuvadasmeYq4v5neufn55OTk0PPnj2JjY1l3rx5mr+7zz77jLNnz3L48GH69etH69aty3wmuLm5OdHR0fTs2ZOkpCQ+/fRTsrOzadq0Ka+99hoHDx7k4MGDZdYFgJ+fH1999RWXLl1i9OjRhIaGEhMTQ2ZmJl26dGH+/PmcOHGCPXv28MILL2Bvb68p496/u+LiYiwtLTl06BApKSkEBwfj7OyMjY0NS5cu5ciRI+zYsYMxY8bg6OiImZkZiYmJTJgwgaioKAYOHIiVlRXNmzfH1NSULVu2YGNjg7GxMZ999hlTp05l06ZNJCYmsm/fPhISEujevTsODg6attcqU9cC/9fetwfEnL3/v6aaqakp0/2mCyURKUWp2IQWaz8u65LFWn2su4/r2lDZdQsfrM+6i1wiRCzFila5lUqXSReki4pKF6kkNc35/dH3fbapqaawa387r79q5v285jnP+7zPeZ/nPOd5Xr58STZs2ECCgoJIbm4uKSsrI3fu3CFFRUVk8uTJRCAQEEIIEQqFYnI5OTnE29ublJWVEUIIEYlEZMSIEeTWrVtEJBKRuro6ib+Xm5tL/87OziY+Pj4tOG7fvk0IIa1yVFZWtqlDe/J5eXliXAUFBfR/kUjU4vqGhgZCCCFJSUnk22+/JWVlZaSiooLU1NSQhoYGkp6eTq9ldGkNDFdxcTFxdHSkbSkuLiZ+fn5kz549HeaoqKgghBDy7NkzkpGRQU6fPt0h+devXxNCGu2SlJREzp0712kdCgoKyMOHD8nFixfJmzdvCCGSbUrIH31qy5YtJDIykn5eVVVFMjMzybVr19rVg+E4cOAAOXPmDCGEkPr6elJUVES2bNlCtm3b1mF5QhptIRAIpLKFJA6RSESqqqpITk4OCQ8Pb5ejOSIiIsjmzZtJTU0N8fb2Jvv27esUx9atW0lVVRUhhNBnubX7ER8fT5YsWULOnDlDxo8fT0aOHEni4uIIIYQsWrSIHDp0qF0dMjIyyPz588mZM2fIlClTyJAhQ0hUVBT9/Rs3brTKwdgxJCSEODo60s9v3rxJ72NWVhaJjo5uVw+mjZGRkWTPnj2kpqaGfpeXl0diYmJayERFRZGDBw8ST09PEhoaSgiRPIbMmTOHHD9+nNTU1JCoqCiyc+dO+gx9SHzyK5Dc3FwsX74cffv2BZvNRmBgIGxsbGBrawsej0fDM0eMGIHy8nK6OZmbmwsTExOcOHECXbt2hYmJCY3qOHbsGCZNmkSjRZi3gpqaGjx48AB9+vQB0LgqUFdXR2BgoBiHqqoqjh49iokTJ4px1NXV4eXLl1BVVQWHw6HXnjx5Uip54I+3RiZS6vjx4/jll1+QnJxM/clKSkotXHbMKojx5R4+fBhXrlyBlpYWzMzMoK2tjbCwMGzZsgUxMTHUxaWgoNBidcNELqmpqUEoFMLHxwcpKSlQU1OjUWAxMTHo0qVLhzgEAgFyc3MRGhqKly9fgs/nQ1NTE2w2Wyr55ORkaGlpoaysDKGhoYiJiYGamhp0dXWl0sHX1xfJycnQ1NREcXExLl++jKSkJOjo6IjF/TcFs1K4efMmXFxcIBAIsH37dujq6iI/Px8hISHt6sFwREZGol+/fsjLy8OWLVvw+PFjPH36FO/evQOfz4eWlpZEWzSXf/bsGQ3zrKiokMoWknTYtGkTuFwuiouLcf78+XY5gMa039nZ2TA0NERUVBRKS0vh7u6OyspKREVFtdsvJHEUFxejrq4O27Ztw7Nnz6Cvrw9tbe1WV8lnz54FAGzevBl6eno4e/YsxowZg9raWkRGRrarA4/Hw+nTp1FXV4e9e/fCzMwMmzdvxowZMxAfH49z5861agvGjr169UJUVBQaGhpgZWWFe/fu4cWLF3Bzc8O9e/cQFBQkVf8EGlML5ebmwszMDGpqagAaz2ow50ia7kHm5+djwIABMDY2xtGjRzF+/HgoKCigrq4OZWVlqK6uBo/HQ2VlJY1sNDU1xaBBg6CoqNiqq7+z+GQnEGZjrKysDC9fvsSyZctgbW2N8vJysQy0pqamOHz4MEJCQiAQCCAnJwdCCPbu3UtDCo8dO4aJEydCJBLB3Nwcd+/epSewAwMDYWZmhvDwcJpieeDAgWJuKQBiHGZmZpRDXl4egYGBEAgE+N///ofY2FhYWlrSE7HMZlRr8k114HK5Yjf39evXOHbsGH766Sf069cPDx48wIMHD+Ds7CyxE7BYLOTk5ODMmTPQ1tbGTz/9BBsbGwBAWVkZAgIC8MMPP0BLSwuRkZF49eoVrKysJHLJy8sjPz8f4eHhqK+vx4IFC2BpaYljx45h1apV0NLSQlRUlNQcQqEQM2bMQGxsrJgOFRUVUuuwcOFC9OzZU6wdHdGB4dDV1UVgYCC8vb1RX1+Pq1evQlVVtYWvnEFFRQX8/f2RmJiI3NxczJo1C2ZmZh3So6amBufPn0dUVBSePXuGiRMnIiEhAV5eXtDU1ERUVFSbtmgqn5eXh9mzZ6N3797vpUNnOPLz87Fu3TrcunULL168wOzZsyEnJ9ehftGcY/To0QgNDaX348qVK1BTU5N4P0QiEUaMGIF//etf4HK5MDQ0RL9+/UAIwfHjx6XSQSQSYfjw4fjyyy+p69fR0REAcOTIkXZtwQzo1tbWEAgEOHr0KAoKCqgtOmJPAHR/rXfv3jA0NGzxfdO9CRMTEzq5PHjwgLoQ3717h0uXLiE0NBQ3b97EnTt3MGPGDLFSD03Hsw+F1jMA/kXIyMjAzz//DDMzMwwcOJDG7jMhbDNnzsSFCxdw6dIljB07Fo8ePYKmpiaKiopQUFAAf39/hISEYPjw4Xj48CHGjRuH0NBQBAYG4uuvvwaHw4G6ujqMjY2hqakJFRUVLF68GGw2G9u3b6fhrE0hDUd4eDh8fHzAZrNbbHa3J6+mpobnz5+3yMiZnZ1NB7b6+npMmTIFq1evRkJCAuzs7GiIYlMUFRXh+++/p2kImEit58+fo6CgAN26daOJ0uLj4/HgwQPY29uLhSQzSE1NxbBhw2h2zpSUFBQUFKB79+7o3r07WCxWhzgYeVNT0w7r4O7uDhaLRTnepx2xsbEoLS2FsbExjI2NUVNTg/j4eOjo6MDMzAwNDQ104ieEgM/nw9LSEtbW1pg8ebKYLaTVQ1lZGSYmJhg6dCi++uqrFrZoz5ZN5SdMmNApWzTXoaPtIITA1tYWgYGBKCoqQu/evTvcL5pyvHjxAn369EFsbCzKysqkuh9dunSh50TI/5274fF4HdKBw+HQPQDm8x49eiAlJQXPnz9v1xaMLsyeSX5+Pp3sOnpPRCIReDweNmzYQDNg19TUQFlZuc19TwUFBUyfPh0//fQTPD09oaysjEGDBsHCwgIFBQUSi819yJUHg09qBVJcXIydO3di/Pjx6N27N3bs2AE3NzeEhYVBVVWVhgjq6elh586d4PP5qKqqQkJCAn2r0NTURN++fdGzZ0+oqKiAy+WiZ8+eiIiIwIMHD3Dt2jWUlJRg2LBhOHXqFNTV1cFisVBTU4OpU6ciPz8fERERUFVVBY/HozO2JI7i4mLk5eXB3Nwc9+/fx9y5c/HmzRskJCRASUlJbLO6NR2UlZWhra2N0tJSmoeHga6uLnbu3AlLS0saCllbW0tTLsvLy9NOxrwVGRkZ0egK5jMWiwVdXV3cunULLBYLPXr0gKqqKp1U+vXrJzYRMRFXpqamsLCwACEEIpGIHhaTloMQAgsLC5iZmYHFYkFLSwt37tyhhXSk0UEoFMLR0ZG2saM6yMnJwdzcHObm5rQd6urqePjwIVRUVGBkZAQ+nw+BQAAWiwVzc3OxEEbGhm5ubrCysqIDl56entR61NfX08wHvXr1AovFgra2Nm7fvi2VLZjVuIaGBoYMGQIWi0UPYXZEB3l5eXrQDUCHOZiXESbUuTP9QiQS4datWzQqjHHppqamSnU/mP7ODMTM/x3p303dOE3fyDvLwUxonbEF8/t8Ph+XL1+mh5D19fXbLeymq6uL0tJSbN++HfHx8bCwsMDAgQNhZWUl1nc/Jj6pCeTt27cICgrCypUrYWRkhOrqauTk5MDCwgLnzp3DF198AQ6HAy6Xi7t378LOzg56enrgcrmYN28eTpw4AR6Ph9DQUBqdADQuER0dHVFbWwt1dXV4eXlBWVkZmZmZMDAwwLRp0+Dv74+HDx/i5s2bqKysRHh4ON69e9cmR25uLvT19eHk5AR/f3+UlZXRaKfw8HCwWCwaldWaDgBw+fJlBAcHIy4uDnw+H4aGhnS5yWazceDAAUyZMgWEEKioqCA1NRXdu3dHVVUVAgICMGjQIDFf/datW1FaWgo1NTVoamrSzk4Iwc2bNzF06FCoqamhtLQU2dnZcHR0pO6yAQMGICoqChs2bMCjR4+grq4ulvhQGo6amhps2rQJmZmZUFdXh46OjtgDL40O0dHR2Lx5M6qrq2nFNaaN0nLcvXsXGzduRFZWFvWJy8nJoba2FqWlpcjMzIS9vT00NTWRlZWF3NxcuLi4oLCwENeuXYOVlRWioqKoPVVUVKCpqUkHLWn0qKqqwubNm6ktdXR06Bu1tO2IiYmhtujfvz89Q9IRW9y+fRs7duxAbW0tFBUVoa6u3qF7+iH6RV1dHfz8/LB3716MHDmSrgKY8G5p7kdkZGQLHZgXhY60g7GFkpIS1NXV6WArLUdkZCTtFzwej7alI89Z0xVBVlaWmFv1ypUrbbpVgcZzML/++itEIhHmzZuH/v370+8+hrtKEj6ZXFjk/w7jOTo60hxFU6dORW5uLiwsLGBubo69e/ciKysLycnJKCoqQlFREezs7DB37lwYGxsjICAAfn5+cHd3p0WhcnNzsXv3bvB4PHzxxReYMmUKgMbZOS0tjaZ6X7ZsGQoKCrB69WqsX78e48aNQ2ZmJvLz81vlePjwITIzMwEAixYtwsWLF7F48WJs2rQJ7u7uSE9Px4sXL9rUgUlJ8Msvv8De3p6eP2HAXMsUkdHU1MS7d+9gYGBAXVuMDrdv38apU6cwe/ZsAICPj4/YWw6Ta4jJWeTk5ASBQIC6ujo6MB48eBCnTp3CqlWrYGJighUrVgD4I2dVexypqan4+eefsXjxYmhpadGltLTy9fX1+Pe//42tW7di4cKFNIljU0jTjvXr1+PYsWNYtGgRtLW1sX79eiqvpqYGKysr1NbW0lPqI0eORFxcHOrq6qCoqIjy8nLs3btXzJ6+vr4dsmdeXh727NkjZksWiyW1LYRCITw9PcVs0dxlKY0tdu7ciRMnTmDWrFloaGig+ZiYN3tpOA4cONDpfsFmsxEZGYmdO3di5syZ8PX1pYdXO3I/3rdvSrKFn59fh22xZ8+e937Obt26JXYfS0tLqRtvypQpsLKyQnx8PE1n0tDQgOZITEzE6NGjcerUKZqihMHHcFdJwl+yAsnJycGOHTvw9u1bujRnGvz06VN6alxdXZ2mEf/xxx9RWFiIM2fOID09HaNHj8adO3doriqRSASRSAQOhwMNDQ1cunQJ//rXv+iGU9P8OkDj0lVJSYlmsDUzM4O1tTX1Q6qqqiIsLAxffvkl+Hy+RI6m8r169UJoaCjNm6OpqYnLly9jzJgxEuWPHDmCwsJC1NbWQiAQYPLkyVBQUEBaWhpsbGzEDhr27NkTp06dQmlpKa5evYr6+no4OztDSUkJxsbGePXqFfT19ZGdnY3CwkJ88803sLGxQVRUFEpKStC3b1+aF6hr167Ytm0bDAwM8Ntvv4HL5WLQoEFQUlJC7969ERcXB5FIhKlTp8La2hrXrl2DmZkZdYupqqq24FBSUoKTkxOUlJTw6tUraGpq0jK9Kioq6NevH30zkyTfVAdTU1NkZGTA0NAQU6ZMQV1dHTIyMqCoqEhj+qVpR0JCAnr16oVRo0ZBTU0N9fX1sLGxoQOFhoYGFBQUsGfPHnTr1g2//fYb9PT0YGdnh+rqahqpVF1djZkzZ0ptTw6Hg27dukFbWxsVFRUoKCiAp6en1Lbkcrmwt7dHdXU1HBwckJKSAn19fXh4eHTIFvb29nj16hUcHR2RkpICRUVFTJ06FVZWVvD390dVVRXs7Oza1aOqqgoDBw7sVL/gcDg0YaihoSHc3Nzg4OCAXbt20TMPzOq4rfuRl5cHR0dHPHz4EEKhEB4eHh22Z3Z2NlxdXZGYmAhlZeVO2eL169dwcHBAamoq6urqOtQvuFwufUZ69+6NiooKGBgY0LGEz+dL7VZlVlxWVlY0ncmf4a6ShD99AklOToaPjw+cnJzw5s0bnDlzBs7OzuByudRIGRkZePv2LSwtLWFpaYn9+/fD1dUVzs7OGDBgADw8PGBgYIDMzEzU19fD3Nwcz58/h6+vL7p3747Tp09DKBRiyJAhUFZWpiGxzWdlDodDwyiZ0+XHjh2DmpoagoKC0NDQ0CZHc3lbW1v4+/vD0NAQ58+fh1AoxODBg6GiotJCPjMzE9ra2hg+fDiCgoIQHx+PzZs3o2fPnrhy5Qp0dXVpGJ+enh7s7e1RXFwMdXV1rFy5EkpKSrhy5QrOnj1LqxEWFBSgqqoK+vr64PP5MDExQWBgIAYNGkQPG2poaNDcWBUVFejbty+CgoLg7u4ODocDY2NjDBs2DEpKSigrK0NcXBymTZsm1oE1NDRo4sekpCTIycnR1PMcDgcjR45Ebm4uZs6cCR6Ph6CgIAwfPhyKioqt6nDq1Cm4u7uDx+PB1NQUCQkJNM1+ZmYmDh06BHd3dygrK0vVjtLSUqSmpuLu3bvw8/ODoqIiLl26RA/xKSgowMTEBLq6ukhISMCbN2+gp6eHvXv30vBLHo+HhoYG6OjotGvPzMxMxMTE4PXr10hKSgKXy0WvXr0wefJkcDicNm3JyFdUVMDIyAiHDh1CdHQ0unTpAgcHB2RkZCAiIgJnz55t0xbNOZhwbSUlJZqxt2vXrsjJyaG1aJicapI4/P39ERsbC11dXTg7O0vVL5raorKyEomJidDT08OQIUNgaGgIFouFqqoqVFdXi9UxkXQ/5s2bh127diE4OBgTJkyAlZUVHSuktefixYuxd+9eBAcHY/LkyRCJRCgqKgKHw4GhoWGbtmD6lqmpKfbt24fY2FjY2NiAy+VK/Zwxenz//fdQVVXFjRs3sGPHDhq2zazEOuJWlTRR/BWTB/AXTCB5eXlgsViYO3cubGxskJ6ejqtXr2LkyJEAAH19fbx58wbh4eGorq5GVlYWKisrMWbMGLDZbLrJrKioiLKyMiQlJdHwt8LCQsTFxUFVVRW+vr60Q7QGZWVllJaWIjk5Gb1794aamhoEAgFu3LgBPp8PHx+fNjmayltaWsLExATdu3dHbm4uRCIRfHx8JJ6EBf6IZbe1tcWwYcOQkpKC//73v5gwYQJKSkqQmZmJPn36IC8vD9evX4ezszP69OlD3+YbGhoQHBwMgUAANpsNKysrCIVCREdHQ1NTk6YqiI+PR3p6OoYMGYLc3FxkZGTAwcEB/fv3x+DBg3H+/HkIBAJwOBz07t2bBh4AjaG/UVFRGD16NH1I8/PzkZ6eTjkEAgEePnxIddDU1ISCggL4fD5GjhyJSZMmITY2FnFxcXB1dUVOTg4ePXrUQoeUlBSqAzPg3bt3D15eXpg6dSqePHmC+Ph4fPbZZ61yNLWFqakpbGxscOHCBaxfvx6enp6IiIhAQkICPvvsM2RmZqK8vBz29vY0h1NwcDC2bNkCVVVVCAQCGBoaIjU1VSp7Mt9t27YNKioqiI2NhbW1NT1fIo0tzczMqA48Hg9xcXHUHtHR0VLZojlHUlISrKys8PbtW5w9exZXrlxBjx498PbtWzx58gROTk4t+gWbzcaNGzewbds2VFZWIiIiApMnT6Z9ubS0tM221NXV4enTp9ixYwdev36NGzduYNSoUXRFHRMTQyOsmLfprKwslJWVwd7eHgMHDoSrqyu4XC5u3bqFoqIi1NTUYPDgwVL3TVtbWwwZMkSMo66uDl988QWysrIQHBwslS0cHBzg7++P77//Hi4uLrSYU2RkJLS0tKR+zthsNtLT0+Hv749p06ZBU1MTqqqq9MVSUVERQqEQT548watXr2ixN39/f4wbN44W/6qpqaFekk8BH33aIs2S/TYvp7pixQoIBAIkJibSz9zd3TF79mw8fvwYd+/exdy5c1sUNmGz2Rg6dChUVVVpDYDvvvsO3t7eWLJkCYA/ykG2Bjk5uRYcs2bNwoYNG7B06dJ2OZrK7969GwBgb28PDw8PLF68uE35UaNG4fbt26irq4OGhgaysrKoDdzc3KirgsfjYfDgwVQuICAAwcHByMvLA4/Hw9KlS3Hu3DlUVVXR2iUPHjygezuzZ8+m+b0KCwuhq6uLo0eP4ty5c8jJyYGKigqWLl2K4OBgVFdXQ05OjvpbY2Ji0LVrV7DZbLx48QL5+fm0lO758+eRk5MjpkN1dTV1VZH/qxQHNJbMTUhIQF1dHYqLi9vVgcvlwsXFBatXr6aRd//5z3/w4MGDdjkYPZSUlMBms9G1a1eaRmLt2rVIS0tDQ0MDMjMzwWazERAQgNDQUKSkpNA9puHDhyM2Nhbm5ubo27cv4uPjW7Xn3bt3cfnyZaSkpEBOTg7q6uoYMWIEBAIBrffQli2Zdly6dKmFDomJiSCE4PPPP4eXl1e7tmjOMWLECMTGxqJLly6YMWMGfH19sXz5csyZMwdz586ldUyYfsHYory8HMbGxuDz+bC1taXuOAbR0dES23Lnzh2EhYWhvLwcJiYmUFVVha2tLXR0dPDq1Su62ujatSsCAwMB/LF/8ejRI3o/Lly4AKAxqEZOTg7r16/H/fv38ezZM6rDvXv32rTn+fPnW3BERkaisLAQ3377LXx8fLB06dJWbcH0refPn0NOTg69evWCoqIiIiIiaGXBlJQUWvND0nPGYjUmYA0JCUFRURH9DTc3NwwYMADKyspiNYP69OkDFxcXHDlyBDExMQgKCqJ1gZjEmdra2hLHk78KH3UFEhAQgKysLFhZWVFfp6mpKfbs2UMzVjLuhPDwcJpumklG5uDggFGjRolFvjSFsrIybGxscPjwYTQ0NNAwOmbQlmZZ15RDKBRCJBLBwMAADQ0NYidPpZFvaGgAIQS6urp04mxNvrn7i8PhICAgADY2NggJCQGHw4GTkxM0NDTo6VSgsb67gYEB+vXrBxsbG1hYWCA1NRXJyclwcXGBiYkJMjIy6AMWGhoKfX19ODg40H2lJ0+eQF9fvwVHSkoKnJ2daQTH9evX4ejoiLi4OPj5+cHS0hLOzs4oKChoU14oFFIdlJWVcfHiRXTp0gXDhg1D165d29RBIBDQqog6OjpIT0+HUCjEhQsXoK6ujqFDh0rNUVtbi/v376Ompgaampq4ePEiNDU1MXjwYPTo0QN8Ph+PHz+Gnp4e3NzcaK16eXl53Lt3j9aLbsueTGp6Nzc3DB06FGw2G/Ly8oiOjsaIESPA4/HAYrFatSWTOViSDtHR0Rg+fDjNGtCWLdricHNzg5qaGs2zRgjByZMnYWJiAltbW9ovwtyH9AAAFTZJREFUmrpVnZyckJWVhXXr1oHH4yEsLAxKSkro3r07bt68iQEDBkjsF03ls7OzxeQVFRVhZmYGCwsLhIaGgs1m0yhFCwsL8Pl8ZGZmQldXF6ampmCz2bh27Rr69+8PPp+P0NBQKCsrw8jICBEREXBwcJBoz7Y4fv31V/B4PPTr1w+PHz+GSCSSaIumfWv37t00wrKgoAAREREwMjLCu3fvEBMTAwMDA4nP2a+//oqNGzciNDQUHh4eYLPZKCwsRFFRETZv3oy0tDScP38eY8aMocEVzd14S5cupSs/JhPvp4SPugLh8Xg0lplJV85s5u3Zs4e+6fbv359OEomJibQaWNOaF61FFSgrK2P79u3g8/nYv38/fYvuSBQCw6Gurk45moZKSivP5/Oxb98++ibelry2tjbdtC4pKcHo0aMxduxYBAUFgcvlYsOGDWJnQhikpqYiJycHwB/2+eabbxAfH49Hjx5BQ0MD06ZNg4ODA/bt24c3b95gxowZUnHExsbiyZMn1CVQW1uLZcuWIScnBydOnMBnn33Wrvzjx4/BZrPR0NCAFy9ewNvbG2VlZXRV2J4OcXFxePLkCeTk5JCZmYn79+9jyZIlKCsrw6JFi6TmYMI8x4wZg/r6enh5eaGyshLz5s0T40hLS0N2drYYR2VlJa2JoK6ujpkzZ7Zqz9bkq6ur0aVLF9oHamtrsXTp0ha2lIYDaAw8acsWbXEwle3q6upQUlICX19fsFgsephQkj2BxoSUx48fh5+fH0aMGIG4uDjU1NSgqqoKK1asaLNfSJJPSEjA8+fPAQBjxozB69ev0RwPHz6kK42ysjJoamrC0tISIpEIN27cwKVLlwA0lmhYvny5RHu2xkEIwe+//05rzOTm5mLdunWt2kJShKWfnx+cnJxQXl4ONzc32Nvb4+DBg2L94t27d1i2bBlu374Nf39/zJ07F5mZmTT1UnJyMn788Uds3boVbDYbmzZtAtC4L5qZmYlhw4ZhyZIl8Pb2hrKysliFxk8OnUmgJS28vb3J8ePHCSHiiQ5fv35N5s+fTw4cOEAKCgrIiRMniK+v73v/XvNkin8FR0fkS0pKyPbt28maNWvoZ/X19fRvJhlgU9y7d494enrSBGrMNUePHiWrV68mz549Izdv3iSEEJqgjhDx5HRtcaxZs4Y8e/aM/P777yQiIoI8ffq0hT5tyXt5eZG8vDzy+++/E0IIKS0tldie9trRlKOoqKjDHM31ePnypVQchBASFBRE/Pz8CCGEpKSkkCtXrrRqT2nkw8LCyI0bNyTaUhqO1NRUqoO0tmirHU2T/LXVL969e0fbXFhYSObNm0eqq6vJnTt3yJMnT1roIY08k7yytSSiDMe7d+8IIYR8++23ZMyYMWTx4sXk4MGDZPbs2YQQQu7evStRB2k4vvvuO3ptW7aYNWsW5ZgwYQI5dOgQIaTxHixcuJBUV1cTQv5InEoIIf7+/iQsLIyUlJTQz3bv3k2TRJ4+fZosWLCA/l9QUEBmzZpFGhoaSFhYGMnJyRGzR2tJJT8VfFQXlpKSEi5evIjRo0dDQUEBWVlZWLBgATgcDsaPH4+XL1/i6NGjKCsrg6enJ7S0tNo8vt8ePkQkwvtydERekguOKV/ZGheHw0FmZibq6upocSqgMfxw06ZNuH//Ptzc3GBoaEjfWJrGqLfHsXHjRkRHR2PkyJFwdnaGhoYG3dNgOKTVoWvXrjQlQ1N5aTmGDRsGQ0ND8Hi89+ZgwqLb4wAaQ8mVlZVx/fp1XLhwAUOGDIGRkZFEe7Ynf+7cOQwfPhyDBw+WaEtpOM6fP091kNYWzTlCQkIoB5fLlYqjoKBALLKxvr4eQ4cOhbm5OfUYtNUvmsszUYlKSkpiqWKaRzYyHD169EBNTQ3c3d2xYMEC2NnZISkpCWZmZujTp49EHaThSE5OpqGybfXP9iIsXVxcoKioKOYpefr0KTQ0NNC7d2/qOYmOjkZ2djYGDRoEIyMjejZMS0sLISEh0NXVhZOTE3XjNcWfdZ6js/ioEwhzE5gOlZubC3Nzc4wdOxaampqwtraGg4MDJk6c+N6Tx98VbDYbAwcORHV1NU6ePAlXV1ca7ioJTOQXE12joqKCxMREbNq0CZMmTcJ///vfFgnZJO0dtcWxfft26Ovri8k35ZBGByYEWZJ8Z9rxZ3D06tULPB4PwcHBOHjwIEaPHo1169a1OA3cmi0kyf/4448fXIc/g0NSZOO6deta1PhozRatRUY2j0psTYfExEQ6PjA5pQDA1dWVuvXaa0drHJ999hlNYdQeR3sRlk3lWCwWrRrYr18/mj6Gy+UiKysLffr0oRVDa2pqEBISAkII5s+f32q05qeOj1oTXSQSISQkBGlpaViwYIFYbpemCdKYa/+qWOZPBc1t0hpKS0tx/PhxWoqUqXrYtNxuezzvy/Ep6PChOcrKymhxoerqahqCK60tOiv/KXKUl5dT33zTZ1NaW7QmL81z3vyeNoc0L5ofkqNpW5pCUluio6Nx5MgRHDhwgGYNiI+Px82bN+Hp6SkWRVVVVSVWTvrvOP591BUIk+QsIyMDt2/fhpubG/2uubH+aSsPSZC2AzGuL39/fxo51q1bN6kjxz4Ex6egw4fmYCLx5OXlYW5ujvr6erEcXB9L/lPl6Exk44eMjPT39xeLbGQgzVjxITk6EmEpyZWora2NI0eOwMDAQGwl1NT19XecPIA/4SBhezdShs6hqevr1KlTcHV1bbGk/tgcn4IOH5OjeX2Wjyn/qXJI41b90PKtcXQ0CuljcUjrYmYO5srLy0MoFEIoFMLS0rKFzN/55fmjurCagjmdeenSJWzfvp1WDpTh/SGt6+tjcnwKOnwqHJ+CDp8Kx6egw5/Nwbi+KioqsGHDhvf6zU8df9oEwuBD3EgZZJBBhk8ZNTU1+Oabb/DVV1/BysoK1tbW1PX1d15xNMef7niTTR4yyCDD/+/ozOHivyP+9BWIDDLIIMM/Cf8/e11kE4gMMsgggwydwt8zdkwGGWSQQYa/HLIJRAYZZJBBhk5BNoHIIIMMMsjQKcgmEBnaxe3bt7F///735lm5ciUuXLiABQsW0II/AHDr1i0MGDBArPiWp6cnrl271qnfiY6OFku5/uuvv2LChAmYMmUKxo8fjw0bNuDt27edb4gEvH37FtevX39vnhkzZiA6OhoZGRn0DMHTp0+RlpbW4tpDhw4hKipKau5r165h4sSJmDp1KubMmUPTqUdGRmLSpEn4+uuvsWTJEtTW1r53OzoDJlV7SUkJ/vOf/wAAvLy8cO7cub9EHxnah2wCkaFdDBkyBPPnz/9gfC4uLoiJiaH/37t3D6qqqkhNTQXQWLciOTkZTk5O7/1bUVFRCAgIwIEDB3D27FmcO3cOIpEI69evf2/upkhPT/8gEwiDXr16wcfHBwBw48YNpKent7hmzpw5cHV1lYqvoqIC69evh7+/P06fPo1u3bohMDAQ7969g4+PD3bt2oWgoCBoa2vj2LFjH6wd0qKhoQH79u0D0Jj645dffvnTdZCh41D4qxWQoXXExsZi3759UFRUxIgRIzB27FisX78ez549w5s3bzBmzBh4enpCJBJh48aNdACeNWsWRo0aBYFAgC1btkBBQQEsFgu+vr4wNzdHamoqfH19oaysjCFDhmD37t1ISkrC/v37UVFRgaKiIjx79gwODg7w8fHBhQsXEB0djeXLl+OHH36g+iUmJuLUqVOwsbHBzp07kZiYiNraWgwYMACrVq0CIQRr167F48ePYWhoSEu8Dh48GHv37qUJ7WJjYzF16lRER0fD2toaiYmJsLCwgJqaGnJycrBu3ToQQiAUCrFixQrY29vDy8sLHA4HOTk52L59Ox4+fIiff/4Zenp6YjWjDx48iJUrV9JEngoKCli9ejUtZtaajWbMmIH58+fDyckJBQUF+Prrr3H79m14eXlBR0cHT548QU5ODiZOnIgZM2Zg7dq1qKysxLZt27Bq1Sr6+0+ePIGvry/YbDZqa2uxcOFCuLq6ws3NDWPGjIFAIMCrV6+wZs0aODo6it37Xbt2YdWqVTh58iR4PB6UlJTw5Zdf0mu8vLxgZ2eHQYMGYf78+XBxcUFKSgrevHmDgwcPiqUM6tKlC65fv04zQGhqaqK4uBjJycno1q0bzRg8cuRI7Nixo0XhrcjISGzfvh06Ojq0aiZjDzs7O0yaNAkA0LNnT6SlpaGiogKrVq2CUChEdXU1vvnmG4wbN472JZFIhJycHBgaGmL37t1Ys2YNnj9/Dk9PT6xfv57auymuXr2KkydPghACDQ0NbNy4EaqqqvD29kZOTg5YLBZ69eqFdevWdeg5k+E98DGKjMjwYXD//n3Sv39/8urVK0JIY7Ga//3vf4SQxsJVEyZMIBkZGeTixYtk8eLFhJDGYl3fffcdEQqFxN3dnRauuXnzJpk+fTohhBAPDw8SERFBCGkscGNhYUHq6+vJL7/8Qjw8PIhQKCRv374lNjY2pKKigoSEhJAVK1aI6Xby5EmyfPlyQgghV69eJatWraLfLViwgPz+++/kzp07ZPLkyUQkEpGamhri7OxMQkJCCCGEuLu7k4yMDFJSUkImTJhAsrOzyYwZMwghhOzcuZPs2bOHEEKIp6cnuXr1KiGEkEePHhE3NzdCCCE//PCDmE6DBw+mBZs2bNhA22pvb0/Ky8tbtXFrNpo+fTq5d+8eIYSQ/Px8MnjwYPq7S5cuJYQ0FgPq378/IYRItBGjy8GDBwkhjcW1Ll68SAghZOjQoeTIkSOEEEKio6PJuHHjxH73/v37xMPDg/5mcHBwC27m8/z8fNKrVy9aYMnLy4scPXq01TZXVFSQESNGEIFAQC5fvkzbQwghubm51MZNMWTIEPLo0SNCCCEBAQFi9miqG9OX0tLSaB8rLi4mAwcOpHZyc3Mjb9++JSKRiAwbNoykpaWJ2bi5vYODg8mLFy/Il19+SQs8HTt2jPj5+ZG0tDQycuRI+vtnz54VK/Akw8eFbAXyiaNbt260yExsbCyKiooQHx8PoNHVk5eXh5SUFDg4OAAA1NTUcOjQIVRWVqKsrAzW1tYAgIEDB2L58uUAgEePHtHrP//8c7E3Njs7O8jLy0NeXh7q6uoSy44mJSUhJCQEp06donolJyfTfYeqqioUFBRAKBTC1tYWLBYLXC6X6gI0rkKio6OhpaUFR0dHdOvWDYWFhaitrUVMTAy8vb0BNK4Qfv75ZwCNb7fV1dUoLy8HANja2gIAXr16hXfv3sHMzAwA4OjoiMePHwNozJjadG+lKdqyUVsYOHAgAMDQ0BDV1dV0NSMJn3/+Oby8vPDixQsMHToUY8eOpd+5uLgAaCzp/PTp03Z/ty2oq6ujR48eABpLyVZUVEi8rri4GHPmzMGcOXNgbW1NS78yIBLSnL969Qpv375Fz549AQDOzs44evRom/ro6Ojg8OHDOHz4MOTl5cX0sba2pvUv9PX18fr1a6ipqbXJl5SUhJKSEvz73/8G0Nj3u3btCjMzM6irq+O7777D0KFDMWrUKJoiXYaPD9kE8omDqSkANKaKXrhwIUaOHCl2TWxsbItBsvkgQJqcF21aY775Cdnm/5Nm50xLS0vh7e2N/fv3g8vlUr0mT55MH24GR44cEdOjqY4uLi44c+YMNDQ0MHr0aACNE8K9e/fw/Plz9OnTR2I7mn7GZFZtPug1HdAtLCyQmJiIESNG0M+EQiEyMjJgamraZlsZ1NfXi/2voCD+2LQmBwADBgxAWFgYYmJicOHCBVy+fBk7duwA8Ic9JA3aHUV79w1o3JyeNWsWli1bRu2hr6+Ply9f0mtevnwJPT29Nrmaph5vqnddXR39e9euXTAxMcHOnTvx5s0b9O/fv0O6NgeHw4G1tTUOHjzY4rugoCCkpaUhMjISEydOxOnTp8VqD8nw8SDbRP8bwc7ODr/99huAxsHHz88PFRUVsLW1xZ07dwAA1dXVmDRpEhQVFaGtrQ2BQAAAiImJgY2NDQCge/fuSEpKAoAObfwKhUIsW7YMK1euhLGxsZheN27cgFAoBADs2bOHVp8UCAQghKC6uprqAgAODg5ITU1Feno67OzsADSuHI4dOwYHBwc6SPXr1w93794F0LhRzefzoa6uLqaXuro65OXlkZubC6AxCovBvHnzsGPHDjx//hxA4+SyZcsWnD59Gqqqqq3aiMfjobCwEABw//79dm0jJydH298UgYGBKCoqgpubGzZt2iRmA4Y3ISGBvt1LAovFajGJdQYrVqzA999/LzaZWltbo6CgAHl5eQCAy5cvi9XtARrty9S5AICbN2/S71RUVKidYmJi6IRSWlpKV0RhYWGQk5MTm2CaozX7Mejbty9SUlJQUlICAPjtt98QERGBhw8f4uLFi7CyssKiRYtgZWVF+4EMHx+yFcjfCNOmTUNmZiamTJmChoYGuLq6gs/nY9SoUUhMTISHhwcaGhowa9YscDgcbN26FVu2bIG8vDzk5OTAlH5ZtWoVNmzYAB0dHbi6ukpdeCk8PBypqakICAhAQEAAAGDq1KkYNWoUkpOT4eHhAXl5efTu3RtGRkYwMjLC5cuXMWnSJBgYGNDBGQC4XC7Mzc3R0NBAVzKDBg3C6tWrsWXLFnqdj48P1q1bR2tqb9u2rYVeLBYLa9aswcKFC2FkZCS2ie7s7IzVq1dj8eLFdOXg5OQELy8vAGjVRtOnT8e6desQFhaGwYMHt2ubvn37Yvv27Vi9ejX8/Pzo5927d8eKFSugoqICkUiEFStW0O8Yd1JRUVGbG7+Ojo7Ytm0bCCGYNm1au7pIQkpKCpKSkkAIoffOwsICPj4+2LRpE1asWAF5eXkYGxtj+vTpYrIsFgve3t5YsmQJdHR00LdvX/rdxIkTsWTJEsTHx8PFxYW6j6ZPn44NGzbg3Llz+OqrrzBo0CCsWLECQ4cOlaifjo4OtLS0MGHCBGzdurXF97q6uli7di3mzp0LLpcLJSUlbN26FWw2G3v37sXZs2fB4XBgbGwsttqR4eNClgvrH4j79++Dz+fD0tISaWlpWL58OcLDw/9qtf5RcHNzw9GjR8Umu78LmkalyfDPhmwF8g+EgoIC1q5dC0VFRdTX13/wMxEyyCDDPwOyFYgMMsgggwydgmwTXQYZZJBBhk5BNoHIIIMMMsjQKcgmEBlkkEEGGToF2QQigwwyyCBDpyCbQGSQQQYZZOgU/h/32nDWgWOwTQAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('recognizedWordCount', 'recommendations')" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "stacked_bar_plotter('recognizedWordCount', 'recs')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "The relationship between comment size and upvotes is almost monotonic - and the highest length of over 77 words displays the highest average upvote totals." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Next, let's look at one of the features that reflect vocabulary sophistication - the average inverse document frequency for a comment. There seems to be an ideal range in the middle here, but I am affraid it might be influenced by the presense of overly short comments - that's a sure way of getting very low or very high average IDF." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('MeanIdf', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "If we limit the comments to only those longer than 10 words, the range does shrink, so size was a confounding factor here. The only thing we can conclude from the plot is that low IDF does bring a comment down, but a higher average infrequency of words does not really help." ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('MeanIdf', 'recommendations', df=X.loc[X.recognizedWordCount > 10, :])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "The other measure of vocabulary complexity that showed up in the top features of the model is very simple - average word length. It behaves in much the same way - too low is a problem, but beyond a certain threshold, it makes little difference." ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('meanWordLength', 'recommendations')" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZ0AAAEnCAYAAAByjp6xAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3XlcVdX++P/XYR4FkSkEFHFiEBAVEcm5a9pHzW4OoabfzEzT0jRLM/VmZmrlbbipaWqpGUVmZSqmIuIESk4gieI8gEwi83h+f/Dj1InpOJzD9H4+Hj2SvdfavA9nn/0+a+2111IolUolQgghhA7o1XUAQgghmg5JOkIIIXRGko4QQgidkaQjhBBCZyTpCCGE0BmDug7gYRQUFBAXF4ednR36+vp1HY4QQjQIpaWlpKam4u3tjYmJiU5/d4NOOnFxcYwZM6auwxBCiAZpy5YtdO3aVae/s0EnHTs7O6D8D+fo6FjH0QghRMOQnJzMmDFjVNdQXWrQSaeiS83R0RFnZ+c6jkYIIRqWurgtIQMJhBBC6IwkHSGEEDqjte61/Px83nrrLdLT0yksLGTq1Kl07NiRuXPnUlJSgoGBAStWrFDrU4yOjua1116jXbt2ALRv35533nlHWyEKIYTQMa0lnYiICLy9vZk0aRI3b97khRdewM/Pj5EjRzJ48GC2bNnChg0bmDNnjlq9gIAAPv30U22FJYQQog5pLekMHjxY9e/bt2/j4ODAwoULMTY2BqB58+bEx8dr69cLIYSoh7Q+em306NEkJyezevVqzMzMgPIHk7799lteeeWVSuUvXrzIyy+/TFZWFtOmTaNnz57aDlEIIYSOaD3pfPfddyQkJPDGG2/wyy+/UFZWxpw5cwgMDKRHjx5qZVu3bs20adMYNGgQ169f5/nnn2fPnj0YGRlpO0whhHhoq7ed4bfDl3mqpxsvP+NT1+HUS1obvRYXF8ft27cB8PDwoLS0lIyMDObOnUurVq2YNm1apToODg4MHjwYhUKBq6srtra2pKSkaCtEIYR4ZPILS9h55DIAu45cJr+wpI4jqp+0lnROnDjB+vXrAUhLSyMvL4/Dhw9jaGjIq6++WmWdX375ha+++gqA1NRU0tPTcXBw0FaIQgjxyBSXlFGxDnOZsvxnUZnWutdGjx7N22+/TUhICAUFBSxYsIAvv/ySwsJCxo0bB4C7uzuLFi1i5syZLF26lH79+jF79mz27dtHcXExixYtkq41IYRoRLSWdExMTPjoo4/UtvXr16/KsitXrlT9e/Xq1doKSQghRB2TGQmEEELojCQdIYQQOiNJRwghhM5I0hFCCKEzknSEEELojCQdIYQQOiNJRwghhM5I0hFCCKEzknSEeACrt51hyKyfWb3tTF2HIkSDIklHiPskEzsK8eAk6Qhxn2RiRyEenCQdIYQQOiNJRwghhM5I0hFCCKEzknSEEELojCQdIYQQOiNJRwghhM5I0hFCCKEzWluuOj8/n7feeov09HQKCwuZOnUqHTt2ZM6cOZSWlmJnZ8eKFSswMjJSq/f+++9z+vRpFAoF8+bNw8fHR1shCiGE0DGtJZ2IiAi8vb2ZNGkSN2/e5IUXXsDf35+QkBAGDRrExx9/TFhYGCEhIao6MTExXL16ldDQUJKSkpg3bx6hoaHaClEIIYSOaa17bfDgwUyaNAmA27dv4+DgQHR0NP379wegb9++HD16VK3O0aNHGTBgAADu7u5kZWWRk5OjrRCFEELomNbv6YwePZrZs2czb9488vPzVd1pLVq0IDU1Va1sWloazZs3V/1sY2NTqYwQQoiGS2vdaxW+++47EhISeOONN1BWTFgFav+ujiZlhBBCNBxaa+nExcVx+/ZtADw8PCgtLcXc3JyCggIAUlJSsLe3V6tjb29PWlqa6uc7d+5gZ2enrRCFEELomNaSzokTJ1i/fj1Q3m2Wl5dHUFAQ4eHhAOzZs4fHH39crU7Pnj1V++Pj47G3t8fCwkJbIQrR5Mm6QELXtJZ0Ro8eTUZGBiEhIbz00kssWLCA6dOns337dkJCQrh79y5PP/00ADNnzqSgoAB/f3+8vLwYPXo07733HgsXLtRWeEI0ebIukKgLWrunY2JiwkcffVRp+4YNGyptW7lyperfs2fP1lZIQoi/qWpdIFPjuo1JNH4yI4EQQgidkaQjhBBCZyTpCCGE0BlJOkIIIXRGko5ocmSYsBB1R5KOaFJkmLAQdUuSjmhSqhomLITQHUk6QgghdEaSThMh9zGEEPWBJJ0mQO5jCCHqC0k6TYDcxxBC1BeSdIQQQuiMJB0hhBA6I0lHCCGEzkjSEUIIoTOSdIQQQuiMJB0hhBA6o7WVQwGWL19ObGwsJSUlTJ48mR07dpCZmQnA3bt38fPzY/Hixary27Zt45NPPsHV1RWAoKAgpkyZos0QhRBC6JDWks6xY8e4cOECoaGhZGZmMnz4cA4cOKDaP3fuXEaMGFGp3uDBg3nzzTe1FZYQQog6pLWk061bN3x8fABo1qwZ+fn5lJaWoq+vz6VLl8jOzlbtF0II0TRo7Z6Ovr4+ZmZmAISFhdGrVy/09fUB+Oabbxg7dmyV9WJiYpg4cSLjx4/n3Llz2gpPCCFEHdDqPR2AvXv3EhYWxvr16wEoKioiNjaWRYsWVSrr6+uLjY0Nffr04eTJk7z55pv8+uuv2g5RCCFYve0Mvx2+zFM93Xj5Gc17YUpLy4g8eYPfDl9W237jTjaebi0edZgNnlZHr0VFRbF69WrWrl2LpaUlAMePH6+2W83d3Z0+ffoA0LlzZzIyMigtLdVmiEKIh9QYZjB/0ElxC4tLWbT2GCu3niTx2l21ffO+OETUqZuPPNaGTmtJJzs7m+XLl7NmzRqsra1V28+ePUvHjh2rrLN27Vp27NgBQGJiIjY2NqouOSFE/dNYZjB/0ElxN/4az6kLqVXuKy2Dj7+N5cad7EcVZqOgtaSzc+dOMjMzmTFjBuPGjWPcuHHcunWL1NRUWrRQb3JWDIseMmQIoaGhjB07lgULFrBkyRJthSdEnWsMLYSmPIN5Tl4Re2Ku1VimpFTJb4cu11imqdHaPZ1Ro0YxatSoStvfeeedSttWrVoFgKOjI5s2bdJWSELUG/9sIYx/yhNTY63fYhWPUPyldIqKa+/+/+P8nfs67oPeW2ooNGrpFBcXk5ycDMCff/7J9u3byc/P12pgQtRHSqWShCvpD32cptxC0Ia6aDUWafieaVoOGk93ZU00SjpvvfUWp06dIiUlhenTp5OYmMhbb72l7diEqFf+vJrBKysieG99jNr2j7bEkpVTWEdRibq6ULs6Wj7SctA0voxolHRSUlJ48skn2blzJyEhIcyZM4esrCxtxyZEvXHxxl3mrz7C9ZTKN4X/OH+H+auPUNAIv5XqysO0VOrqQt3KsRkerW1qLfdkYGvtB9OAaJR0ioqKUCqV/P7776ohzXl5edqMS/xNY7jh/KjU1d9iw6/xFBZV339/5fY9dh29orN4GpOG3KU05d8+Nd6LC/J5jO5ejjqMqP7TKOkEBATQpUsX7OzscHNzY+PGjbi5uWk7NkHD/kA+anX1t0hOz+XMxbRay/1ey0gmUbWG3KXk5mTFsmnB+LS1rbRvWC933hjbFT09RR1EVn9pNFxm9uzZvPTSSzRr1gyAAQMGVDuNjXi0qvpAmhrXbUx1pa7+FinpmrXqk9NztRyJqI/cnKxYMqUnidcymfXJQdX2kQPaY6Avq8f8k0ZJ58KFC/zwww9kZWWhrPjUU750gRCNnamJZkOZZchz0+bYwryuQ2gQNPqUzJgxg0GDBuHh4aHteISod9xbWmFrbUra3ZofE+jR6TEdRfRwlEolx+KS2R55UW37gdjrPBXcBn3pDmrQ6vtzPholHVtbW6ZNm6btWISol/T19fh337as+elstWUMDfQY+ngbHUb1YJRKJet/jWd7ZFKlfWt/jiPuUjpvjuuKvnQLNUgN4aFjjc6sXr16cejQIYqKiigrK1P9J0RT8VRPN57u7V7lPkMDPd56vhuujs10HNX9O3L2dpUJp8LRs7fZduBitftF/dYQBmVolAJXrVpFTk6O2jaFQkFCQoJWghKivlEoFEwc6s3jfi355WASkSf/mj145YxetHrMqg6j09wvB6tPOBV2HLrMM33aSmtHaIVGSefEiRPajkOIBqG9a3NeGu6jlnSaNzOtw4g0V1JaxrnLGbWWy7hXwM3UnAbRchMNj0ZJJzc3l40bN3L27FkUCgWdO3fm+eefx8TERNvxiUamvt/kbMzKypS1F/r/ld5HWSHuh0bt53feeYecnBxGjx7NyJEjSU1NZf78+dqOTTQy8qBr3TIy1MfZ3qLWcqbGBjxmK8N/hXZo1NJJS0vj448/Vv3ct29fxo0bp7WgROMkD7rWvUFBrVm7Pa7GMv27umBiVL9GPInGQ6OWTn5+vtpSBnl5eRQWyqy6omE5lXiHFZvU709G/nGd0tL6N8JHWwb1cMOvvV21+10dLRnzZNUr+wrxKGj0dWbUqFEMGjQIb29vlEol586d47XXXtN2bEI8MqG/n2fz7j8rbf9yexynEtOYO6Fbk5iyxNBAjwUTu/Pd74nsPHyZnPxi1b7+XV2YOMwbCzOjOoxQNHYaJZ1nn32Wnj17Eh8fj0KhYMGCBTg4OGg7NiEeiT/O36ky4VSIOZfM93sTCRnYNL7hGxroM26QB08FtWb8u3tU218Y6o2lJByhZTUmncjISHr37k1YWJja9qioKKA8GdVk+fLlxMbGUlJSwuTJk9m/fz/x8fFYW1sDMHHiRNVSCRXef/99Tp8+jUKhYN68efj4yAgn8XA0eTZl55HLjOjfDkMDfR1EVD8YNKHXKuqPGpPO+fPn6d27N7GxsVXurynpHDt2jAsXLhAaGkpmZibDhw8nMDCQ119/nb59+1ZZJyYmhqtXrxIaGkpSUhLz5s0jNDT0Pl6OqMrfJ2ltapRKJacv1L4sQVZOEVdu36OdS3MdRCVE01Vj0nnppZcACA4O5qmnnlLbt3Xr1hoP3K1bN1UrpVmzZuTn51NaWv0iWABHjx5lwIABALi7u5OVlUVOTg4WFrUP8xSVKZVK9kRfY9uBC2rbt4T/ybhBHliYGtZRZLql6ZRNpaVNNzkLoSs1Jp2EhATi4uJYv3692ui1kpIS/ve///Hcc89VW1dfXx8zMzMAwsLC6NWrF/r6+mzevJkNGzbQokUL3nnnHWxs/lruNS0tDS8vL9XPNjY2pKamStJ5AEqlklU/nqlyNcudhy8Tl5TG0qnBNDNv3H34CoWC1k5WXLpZ8/LqhgZ6Gj3DIoR4ODUmHSMjI9LT08nOzlbrYlMoFMyZM0ejX7B3717CwsJYv349cXFxWFtb4+HhwZdffsnnn3/OggULqq3blLuFHlZMfHKNyydfS85m4454Xh3VWWcx1ZXBQa35/IfTNZZ53K+ljNoSQgdqTDru7u64u7sTGBiIn5+f2r7w8PBaDx4VFcXq1atZt24dlpaW9OjRQ7WvX79+LFq0SK28vb09aWl/9b/fuXMHO7vqnykQ1dtx+HKtZSL/uMELQ7wa/cW2fzdXjpy9zR9/3qlyv4ONGROe8tRxVEI0TRoNmba3t2f58uVkZmYCUFRURHR0NAMHDqy2TnZ2NsuXL2fjxo2q0WrTp09nzpw5uLi4EB0dTbt27dTq9OzZk88++4zRo0cTHx+Pvb29dK09oPNXM2stU1RSxqVbWfi0bdyJ3UBfj/n/L4Cte86z68hlcvL/mn4n2PcxJj3tQ/NmMo+gELqgUdKZM2cOvXr1IiIigrFjx7Jv375al6reuXMnmZmZzJgxQ7XtmWeeYcaMGZiammJmZsbSpUsBmDlzJkuXLsXf3x8vLy9Gjx6NQqFg4cKFD/HS6o+6mORSIYs/qjE00Of5wZ48FeTGhMV/PZsy5d9+jf6+lhD1iUZJR19fn5deeomoqCjGjBnDs88+y+uvv05QUFC1dUaNGsWoUaMqbR8+fHilbStXrlT9e/bs2ZqE1GDU1Up+Hq1tiK2mO6mCkaE+bVpaaz2W+sTQUJ5NEaIuaTTvR2FhIcnJySgUCq5fv46BgQE3b96svaKos5X8/i+49qWT+3V1aTLDpoUQ9YNGSefFF1/kyJEjTJw4kWHDhhEYGEjnzo1/1FND1qWjfbXLKwO4O1vJzXPx0GTdHXG/NOrnqXhgE8pnDcjNzcXKqmEsz9tUKRQKXhjihbuzNdsiLnD51j3VvuG92zL6X+0xM5FWjngwOfnFbIu4wO5jV9S2Hzt7mye6u6KQm4qiGjUmnblz59ZYuWIggKifFAoFffyd8e9gz5gFu1Tbn+3fThJOHcsvLK69UD11L7eIuV8c4lpydqV9n/1witvpuYyXVrSoRo3da/7+/vj7+6Onp0dWVhYdO3akffv2pKenY2raMNaFF6I+uZ6SzYpNJ5i8dK/a9oTL6XUU0f376pe4KhNOhbD9Fzh9IVWHET2couJSok7dUNtWUCSr2mpLjS2dESNGAPD777/z5ZdfqrZPmDCBV155RbuRCdHIJF7LZP7qI1Uu071kQwyvh/jTp4tLHUSmuaycQg6evFFrud8OX8a3Xf1//utsUhrLvznB3Rz1RSlf/TCCWWO60M3TsY4ia7w0Gkhw+/Zt7t37655Abm4u169f11pQQjQ2pWVKPtwcW2XCAVACn31/iszsAt0Gdp8u3rhLiQYTo57TUcstOT2XLbsT1LZF/nFdo1GiV5Pv8e66Y5USDkBuQQnvbzzO+asZjyxWUU6jgQSjR4/miSeewNnZGYVCwY0bN3j55Ze1HZsQjcbJ83e4nZ5bY5mikjL2xlxjRP/2Oorq/tWn6RBPnr/Dko0xFBapz17/5fY4Dp2+xcIXA2u8dxm27wIFRdXPfF9SWsZ3vyey8MXARxZzTbJyCtl3/JpOfldd0ijpjBkzhmHDhnH16lWUSiWurq40a9ZM27EJ0WhoMi0RwJ9XNCtXV9o6W2Ogr6i1tdOxlU2N+x9Wxr0Cln5dOeFUOHc5g9XbzvB6SJcq9xeXlHHo9K1af0/snylk5xVpdUXVsjIlm3cn8NOBJEpK1VtoW3Yn8NJwH/T1Gs9oQI2STmpqKjt37iQrK0tt5ufXXntNa4EJ0ZhoOoK4vo80trY0Jti3JQf+qPm+zlM93TQ+ZnZe0X3HEX70CvmFNa/PFXnyJuOf8qSFVeVBT3kFxZUu8FVRKstH62kz6WzYEc/2yKpXt9155AoKhUJn02fpgkb3dCZPnsyff/6Jnp4e+vr6qv+E0JWS0jKi426rbTt/NaPBLH/h0Vqzb/6alqtLLw7zpqVd9RPxPt3bHb/2tQ8iuHjjLou/iublD/apbddkJF9tUzxBeQviVGLVo+jMTAwxNKj98qdQoNW5+VIy8vi5luXUfzt8mVupOVqLQdc0aun8fXJOIXTtTmYe/1l3rNIw3Xe/iqanjxOzxvhjaFC/vwT5trPD2d6CG3eqv3gYG+kzIMBV42PWxUSyAFYWxqx49XG+35vI79FXyS34a3DEy8/4MDioda0Ph565mMp/1h6jqIob/ks2xvDGmK483rlltfULi2tu5VQoqqacoYEewb5ORMTW3GLr6uGg1VbOgdjrGt0n23fiOuMGeWgtDl3SqKXj6+tLUlLN2Vg0fqu3nWHIrJ9Zve2Mzn5ncUkZi9ZWTjgVDp+5xZqfzuosngelp6dgzriuWJpVfWNbTwGvP+ePlYWxRsf750Sy1Y2K0xZLMyMmDvXmizn91bY/7tey1oRTXFLGR1tiq0w4UN6l9cn3J7mXW323m6uDpUZxutRQbkT/9pgaV/9lxUBfj9FPdNDo9zyo1Lv5tRcC0jQs1xBolHSioqIYOnQowcHB9OnTh969e9OnTx8thybqk7q6yB09e4vrKdU/iAjwe8y1BvGhdHOy4uMZvflX91YYGap/9Ba+GEiQj5PGx6qriWT/yUCDLqp/OhZ3m4x7lYcp/11hUSn7T1Q/kmtgj1a1/h5newu82rSodr+LgyULX+xBc8vKid7C1JD5LwTQ3rV5rb/nYWg64W5jmphXo+61VatWaTsOUc9VdZEz1exL+UPRZIRRWZmSY3G3NZpZu645tjBn+kg/xgzsyPh3/1p9t62Ldi9u9UniNc1G6CVeu1vtvk7utvTv5sK+41U/L2hooMcrz/rW2uryatOCr+Y/wd7j1/ki7K8lzT+d1Qe75mYaxfkwevo68WPERY3KNRYafU2xs7PjwIEDbN26lZYtW5KWloatra22YxOC3HzN5ijL0bBcffEgLYTGQk/DIXo1FVMoFEwf2ZmQf3XA3ET9u7ObUzPeezkIb3fNrlGGBvr0/Ecr09hI+2teAbRzaY5/R/say/i0tW0QA0w0pdGZv2jRIq5du0Z0dDQA8fHxvPXWW1oNTAgAW2vN5vizrWJYrKifPN00u4DW1DUGoK+n4LmBHfn8jX5q2997uSeebjXXrU/mjO1Kp2oSZDsXa94a361RzdqtUdK5dOkSc+fOxcSkfB35kJAQ7typfcji8uXLGTVqFP/+97/Zs2cPt2/fZsKECYwdO5YJEyaQmqo+nDE6OprAwEDGjRvHuHHjWLx48QO8JNGYDOhW+2guU2N9gnwe00E04lHo6umIY4uau67MTQ3p4++s0fGMGvhqsOamhrz3chD/mdSDnr7q5/E7EwO1OnquLmjUhjQwKC9WkW3z8vIoKKh5jqhjx45x4cIFQkNDyczMZPjw4XTv3p2RI0cyePBgtmzZwoYNG5gzZ45avYCAAD799NMHeS2iHispLePk+ZT7ruft3oLuXo5ExydXWyZkoIcs1dCA6OspePP5bryz+kiV3aKG+greHNe1Sb2nenoK/Dva09bFmsOn/3oerTHNRFBBo6Tz5JNPMn78eG7cuMF7773HwYMHCQkJqbFOt27d8PEpf3agWbNm5Ofns3DhQoyNy+8+N2/enPj4+IcMXzQEkX/cYP2vcZVGLH0b/ieThnmjr199g1uhUPDGuK6s+vF0pZvGxoZ6jB3kybBe9X8AgVDX1tmalTN7E7b/Agdir1NY/Nfou3cna34/RjQ8GiWdsWPH4uPjQ0xMDEZGRnz88cd4e3vXWEdfXx8zs/ImdFhYGL169VL9XFpayrffflvl8ggXL17k5ZdfJisri2nTptGzZ8/7fU2NSm6B+jfBYg0fiqsvImKv8/G3f1S577fD5UOvZz7nX+MxjA31mTHan6HBbXhtZaRq+//m9MPBxvyRxit0x7GFOdNGlI/ke/4/f43kc3WUeR0bM42HaBgZGeHn54dSqSQ/P5/jx4/TrVu3Wuvt3buXsLAw1q9fD5QnnDlz5hAYGEiPHj3UyrZu3Zpp06YxaNAgrl+/zvPPP8+ePXswMmpcfZqaKC1TsmV3QqUpMqZ9FMGEpzwZGNi6bgK7D0XFpazdHldjmf0nrjMoqLVGE0Ta/mMIq6lx0+l+0QZDAz0UivKHMfUUaDQtjDbU1NIVjY9GSefll1/mwoULODg4qLYpFAq2bNlSY72oqChWr17NunXrsLQsfzJ47ty5tGrVimnTplUq7+DgwODBgwFwdXXF1taWlJQUXFzq98JW2rBm2xl2Hb1SaXtOXjGf/3CaklLlfU2qWBei45I1mszx9+hrWp+VWFRmamzA4CA3fjt8mUFBbpga62aYcGNVX5J4fafxLNP79u2rveDfZGdns3z5cjZu3Ii1tTUAv/zyC4aGhrz66qtV1vnll19ITU1l4sSJpKamkp6erpbomorLt7KqTDh/t3FHPH27ONfrm6230jWbpPBWWuOZzLChefkZn0Y1g3FdkiSuGY3+Kt7e3ty4cQNnZ82GMALs3LmTzMxMZsyYodp269YtmjVrxrhx4wBwd3dn0aJFzJw5k6VLl9KvXz9mz57Nvn37KC4uZtGiRU2ya21vTO0LORUUlRJ16hYDA2ufDqSuaPqhkw+naCwkiddOo0+7h4cHTz75JLa2tujr66NUKlEoFDW2fkaNGsWoUaM0CmLlypWqf69evVqjOo1ZbStMqsrV8xZCgKcj636Oq3UW3UBvecZGiKZCo6Szbt061q9fj6Ojo7bjEWj+zb8+d61B+eikx31bcvDUzWrL2Fqb0quGKeyFEI2LRle3Dh06EBAQoO1YGqXSsvtfZCzQ6zEOnqz+Ql2hu3f9/xIwbaQfd3MKOXMxrdK+5pbGLJoUiImO5rkSQtQ9jT7ttra2jBs3js6dO6utGCrLVVcv814BYfsvsPe4+v2ZhCvpdPequTspsNNjtLSz4GYNqwV293KkVQN4nsHU2IB3JwcRE3+bXUeucjLxr+mTVrz6uDxnI0QTo/Es0927d8fIyEiWq9ZAcnour/83kl+iLpFXoL7uzJL1Mew7XvNAAUMDPRa+GMhjLaq+IHu62dT6QGV9oq+noEcnJ2aP7aK2vaE+Z1MxNBZkaKwQ90ujlk5Vz9SI6v33u5OkZVU9N50S+Oz7U3i1aYFjNUkF4DFbcz6d1Yfdx67w1S9/TRf02qjO9O3iLA/U1SEZGivEg9Po07JmzRrWrVtHTk55d0/F6LWEhAStBtcQXb6VRfyl9BrLlJYp2X30ChP+z6vGcibGBvTr6qqWdAK8HCXh1AMyNFaIB6NR0tm+fTvbt2+X0WsaqC3hVDh3OUPLkQghRP2j0Vfmdu3a4ejoqHY/p6nc01m97QxDZv3M6m1nNCpf2zMpFco0LSgeKbkfIxqr3Pxidh+7rLYt8apmS4PrkkYtnaeffpqhQ4fi5eWllmyWLl2qtcDqg/zCEnYeKX8Tdx25zPinPGvtv2/vaq3RsTu4Nn/o+MT9k/sxojFKvJbJ4q+iuZujvnzIf746xhMBrrwywq/erM2j0Sdu6dKlDBs2rMnNg1ZcUqZquZQpy382Na65TnvX5rR1tuLijaxqyygU8GSP1o8uUHFf5H6MaEzuZheyaO2xaifX/T3mGjbNTBg7yEPHkVVNo6Tj6uoqI9g0pFAomPGcP3P/d7jak+CFId64OFjqODIhRGMUHn2l1tncf4m6xL/7tasXLXuNIvD19eXTTz/F399frXvtn+sAjYjCAAAgAElEQVThiHKtHJvx8YxebAn/k6iTN9VmJZj5XGf6dXWtw+iEEI3J0bO3ay2TX1jCqcRUenSq+3kONUo6x48fV/s/lH+jl6RTPccW5swK6cLYJzvy4pK9qu1dPXQ/AlDW+Whc8gqKCT92RW3bqcQ7BPu2RK+e9NsL3cnLL6m9EOXnTX2gUdLZtGmTtuNotOrDU/dy87zxuHL7Hgu/PErGPfWHj1dsjiUi9gZvje+GsWHTGFkqytnbmGo0M72DjVmtZXRBo6+8SUlJPP/88/j7+9OlSxcmTpzItWu1r/ki6o+Xn/Hh14+GyQ30BiyvoJhFaysnnAonElJYo+HQ/kdFhqDXvQEBta+p9ZitOZ5uLXQQTe00OkMWL17MCy+8wKFDhzh48CCjR49m4cKF2o5N1BNpd/P5cf8FtW31panelETE3iC9mumVKuw7cZ30rHwdRfRXKxqQVnQdCfZ1wqtN9QlFoYAXh3nXm65XjZKOUqmkT58+mJmZYW5uzhNPPEFpaam2YxP1wG+HL/Pikt/ZduCi2vYZHx/gzMXUOoqqaToWV/sN47IyJcfPpeggmr9IK7puGejrsWBidx73q7wulbWFMXPHBxDgWX9mk9Eo6RQXFxMf/9f8X2fOnJGk0wQcOXOL1dvOVLkmUG5BCYu/iubGnew6iKxpyi/Q7IZxfqFm5UTjYWZiyJxxXVk5o5fa9k9m9akXI9b+TqO28JtvvsmsWbPIyCifL8zOzo4PPvig1nrLly8nNjaWkpISJk+eTKdOnZgzZw6lpaXY2dmxYsUKjIyM1Oq8//77nD59GoVCwbx58/DxkW9PdUGpVLJ1z/kayxQUlbI9MolpI/x0FFXj8SAjCh1bmHP+Wu3TmtQ0e7lo3Oz/sT6VQT2cHFjj53R27txJbm4uCoUCY2NjDA1rHpV17NgxLly4QGhoKJmZmQwfPpwePXoQEhLCoEGD+PjjjwkLCyMkJERVJyYmhqtXrxIaGkpSUhLz5s0jNDT04V6heCC30nK5cvtereUOnbopSecBPMiIwie6uxJ58kaNZawtjenq0bRmDhENi0ZpcPfu3UydOhVLS0ssLCwYM2YMu3fvrrFOt27d+OSTTwBo1qwZ+fn5REdH079/fwD69u3L0aNH1eocPXqUAQMGAODu7k5WVpZqOQWhW9m5NT/hXCG3oOSBluR+EI1tpNT93gvxaWtLTx+nGsu8ONS7wf9dROOm0dm5ceNGVqxYofp5/fr1bNiwocY6+vr6mJmVjwsPCwujV69e5Ofnq7rTWrRoQWqq+o3otLQ0mjf/ayJMGxubSmWEbthYmWhWrpmxziYSbOojpRQKBbPGdOH/erpV+ptbWxjxxtgu9PZ3rqPo6s6j+DLS2L7Q1Gcaj16ztPxrrjALCwsUCs0uNHv37iUsLIwFCxZUOqYmv1fUDfvmZvi0ta21nK6n9GnqI6UMDfSY/IwPn7/RV237J7P60qtz00s48Gi+jNSXLzRNIflp9Jf19vZmxowZBAQEoFQqiYqKwtvbu9Z6UVFRrF69mnXr1mFpaYmZmRkFBQWYmJiQkpKCvb29Wnl7e3vS0tJUP9+5cwc7O7v7fEniURk32IO5/ztMSWlZlftbWJkwtFcbHUclAJqZq093Xh9vGGvqUUzT9ChmDq8Ps483hdlDNHp358+fT9++fUlKSuLy5csMGTKEefPm1VgnOzub5cuXs2bNGqyty9eYCQoKIjw8HIA9e/bw+OOPq9Xp2bOnan98fDz29vZYWFjc94sSj0bHVjYsmNgdm2aV13No5WjJ+1N60txSs244IapTX1oZ9UVjb81r9O4qFAo8PDwwNzdnwIAB3Lt3Dz29mvPVzp07yczMZMaMGaptH3zwAfPnzyc0NBQnJyeefvppAGbOnMnSpUvx9/fHy8uL0aNHo1AoZNaDeqBzB3u+mv8vDsTe4JPQk6rtS6b0xMqilsWFhNBQfWhlCN3QKOls3LiRHTt2UFRUxIABA/jiiy9o1qwZU6dOrbbOqFGjGDVqVKXtVQ1AWLlyperfs2fP1iQkoUMG+noEeKk/0azpPT0hhPg7jbrXduzYwffff4+VlRUAc+bM4cCBA9qMSwghRCOkUdIxNzdX607T09OrtXtNCCGE+CeNl6v+/PPPuXfvHnv27GHnzp24u7trOzYhhBCNjMZDpnNycnBwcOCXX36hS5cujBkzRtuxCSGEaGQ0SjoREREsX76ciRMnajse0YjJstlCCI2STkFBAf3798fNzU1tos8tW7ZoLTDR+DSFB9+EEDXT6FNf09BoIe6HPI8hRNOmUdIJCAjQdhxCCCGaAOlUF0IIoTOSdGpQWCTL/gohxKMkSacKOfnFrPnpDFOX71fb/sO+RIpLSusoKiGEaPgaddJZve0MQ2b9zOptZzSuk5tfzLwvDrHj0GUKitQTzPbIJN5bH1PtVP9CCCFq1miTTn5hCTuPXAZg15HL5Bdq1lW2dc95Lt+6V+3+P87fYdeRK48iRI00hUWdhBBNR6O9ghWXlFGx8GiZsvzn2hQWl7I35mqt5XYdvfyw4WlM1hoRQjQmcgX7m+S0XHILam8RXU/JoaCoBBOj2v989WVVRCGEqA8abUvnQejpab5GjL6GZaWlIoQQf5Er4N842ZrTwsqE9KyCGst1bNUcQwN9jY8rLRUhhCin1aSTmJjI1KlTmTBhAmPHjuXVV18lMzMTgLt37+Ln58fixYtV5bdt28Ynn3yCq6srAEFBQUyZMkWbIarR19fjqZ5ufLMzocZyQx+XZR2EEOJBaC3p5OXlsXjxYnr06KHa9umnn6r+PXfuXEaMGFGp3uDBg3nzzTe1FVathvdpy/mrmUTHJ1e5//96uhHs56TjqIRQJzN2i4ZKa2eqkZERa9euxd7evtK+S5cukZ2djY9P/etyMtDXY+74bkwb4Yuro4XavldH+vHS8E4oFJrf+xFCG+ReoWiotHamGhgYYGBQ9eG/+eYbxo4dW+W+mJgYJk6cSElJCW+++Saenp7aCrFa+vp6DAxsTY9OToxZsEu1vbv3Y5JwRL0h9wpFQ6Tzr0dFRUXExsayaNGiSvt8fX2xsbGhT58+nDx5kjfffJNff/1V1yEKIYTQEp0nnePHj1fbrebu7o67e/lN+s6dO5ORkUFpaSn6+pqPFBNCCFF/6fzu49mzZ+nYsWOV+9auXcuOHTuA8pFvNjY2knCEEKIR0VpLJy4ujmXLlnHz5k0MDAwIDw/ns88+IzU1VTUkusKUKVNYtWoVQ4YM4Y033uC7776jpKSEJUuWaCs8IYQQdUBrScfb25tNmzZV2v7OO+9U2rZq1SoAHB0dq6wj6p4M0RVCPApy5RAakSG6QohHQa4cQmMyRFeI+q0h9EjUv4iEEEI8kIbQI1H/IhJCCPHA6nuPhLR0hBBC6IwkHSGEEDrTKJPO7bRcNu9SX54gNiEZZcX61UIIIepEo7unc+TMLVZsjqWktExt+8dbTxJ9LoU3xnRBX79R5lohhKj3GtXV93pKdpUJp8Lh07f4ds95HUclhBCiQqNKOr8eulRtwqnw2+HLFBSV6CgiIYQQf9eoks6JhJRay+TmF/PnlQwdRCOEEOKfGlXSKSwqfaTlhBBCPFqNKuk421vUXghoqWE5IYQQj1ajSjoDA1vVWsarTQuc7S11EI0QQoh/alRJp1dnZ3zb2Va739RYn0nDvHUYkRBCiL9rVEnHQF+PdyYGMiioNYb6CrV9bVta8f7UYNydresoOiGEEI0q6QAYG+oz9d++fPZGP7Xt/5kcRNv7TDgV04RD/Z0mXAghGpJGexW1NDN66GM0hGnChRCiIdHqVTQxMZGpU6cyYcIExo4dy1tvvUV8fDzW1uUtjokTJ9KnTx+1Ou+//z6nT59GoVAwb948fHzqdoru+j5NuBBCNCRaSzp5eXksXryYHj16qG1//fXX6du3b5V1YmJiuHr1KqGhoSQlJTFv3jxCQ0O1FaIQQggd01r3mpGREWvXrsXe3l7jOkePHmXAgAEAuLu7k5WVRU5OjrZCFEIIoWNaSzoGBgaYmJhU2r5582aef/55Zs6cSUaG+nQ0aWlpNG/eXPWzjY0Nqamp2gpRCCGEjul0IMGwYcOYPXs233zzDR4eHnz++ec1lpf1b4QQonHRadLp0aMHHh4eAPTr14/ExES1/fb29qSlpal+vnPnDnZ2droMUQghhBbpNOlMnz6d69evAxAdHU27du3U9vfs2ZPw8HAA4uPjsbe3x8JC5kkTQojGQmuj1+Li4li2bBk3b97EwMCA8PBwxo4dy4wZMzA1NcXMzIylS5cCMHPmTJYuXYq/vz9eXl6MHj0ahULBwoULtRWeEEKIOqC1pOPt7c2mTZsqbR84cGClbStXrlT9e/bs2doKSQghRB1rtDMSCCGEqH8k6QghhNAZSTpCCCF0RpKOEEIInZGkI4QQQmck6QghhNAZSTpCCCF0ptEmHVn1Uwgh6p9GeyWWVT+FEKL+adRXYln1Uwgh6pdG29IRQghR/0jSEUIIoTOSdIQQQuiMJB0hhBA6I0lHCCGEzkjSEUIIoTMNesh0aWkpAMnJyXUciRBCNBwV18yKa6guNeikk5qaCsCYMWPqOBIhhGh4UlNTadWqlU5/p0KpVCp1+hsfoYKCAuLi4rCzs0NfX7+uwxFCiAahtLSU1NRUvL29MTEx0envbtBJRwghRMMiAwmEEELojCQdIYQQOiNJRwghhM5I0hFCCKEzknQeQnp6OtnZ2XVWvz7Jzc196DH/j+IYotyjOLcexTGysrIoKCh44Pr15bwqLCx8qPqP4hiP4nU87PvxdyUlJQ9UT5LOAygrK2Pr1q08//zzhIWF3Xd9pVLJN998w4QJEx6oPsCZM2ce6uQ5c+bMI/kglZWV8eGHHzJ16lQSExMf+BgrV65kypQpGh9j//79nD59+oF+X4XIyEhSUlLq9BiP4nX83cOem4/qGCUlJXz99df8v//3/4iKinqgGB7FefWwxygtLeXdd99lxowZD5SAlUolpaWlvPfeew98jId9HUqlkrKyMlavXs2kSZMe6P34+7EANm7cSEhICNevX7/vY+gvWrRo0QNH0ATt3buXxMREbG1t8fDw4OrVq1hbW2Nvb6/xMYqLi8nJyaFjx473Xb+0tJSwsDCmTp3K9evXGThw4H2/hi1btjB9+vQHrl8hISEBCwsLbty4QUFBAWVlZbRq1QpjY2ONj3Ho0CHy8/O5d+8eBQUFKJXKWo8RGhrKe++9h4GBAW3btsXCwuK+Y1+1ahXz58/H1NSUrl27oqd3f9+/ioqKWLVqFQsWLMDMzIwuXbrc9zG2bt3K0qVL0dPTo127dg/0Ov7uUZybj+IYu3btYu3atXh7e9O8eXMyMzNxdHTEyspKo/rHjh3DxcWFS5cukZ+fD3Df59WFCxewsrLi2rVr5Ofna3Re/dPNmzcxNTUlISGBS5cuYWVlRbt27TSuD6BQKLh79y5JSUkPdIyLFy/SrFkzrl279sCfMYVCQW5uLhkZGVhZWZGVlYWDg4PG78c/jwVw6tQpUlNTycvLw9fX977OfUk6GoqNjWXp0qVcvHiRp59+mk6dOmFvb8/Fixe5fPkyAQEBNda/d+8eY8eOxcjICC8vL1q1aoW9vT1JSUka1QdYs2YN8fHxDB06lFdeeYUPP/yQbt26aXRBUCqV7Ny5k3v37hEcHMyUKVNYsWIFXbt2va8LCsAff/zB0qVL+f3330lOTmbQoEF4e3uza9cuHB0dcXJy0vgYp0+f5qmnniIwMBA7Ozv27t2Lg4NDpWOUlZXx6aefkpCQwNixYxk6dChHjhzB1NQUFxcXjR4OzszMZO3ateTm5tK9e3cmT57Mli1b6NChA3Z2dhq99rt377JkyRJyc3MZMWIEI0eO5LvvvtP4GJmZmXz++edkZ2fTu3dvxo4dy8GDBzEzM9P4dfzTw56bj+oYJ0+eZOnSpRw/fhxnZ2cmTpxIixYtiI2Npbi4mI4dO9Z6jJiYGMaPH0+nTp0YOnQo1tbWREREVHlOVOXcuXO8/fbb7Nq1CwMDA0aNGoW9vX2151VVTp8+zdtvv82ePXuwsLAgJCQEW1tbwsLC6Natm0ZfDi5evMiiRYto3749LVu2pFu3bvd1jNjYWJYsWUJ4eDiZmZmMGTMGZ2dnwsPDNf6MpaSkMHbsWNzc3GjTpg3t27fHzs6OEydOaPx+VMjIyODdd9+lrKwMd3d3/Pz8CAgIYN26dXh4eNzXNUS61zQQHx/P+++/T4cOHfjvf/+Li4sLADY2Nvj5+ZGZmUlkZGSNx0hOTiYvL4/jx49z584dAFq0aIGvry93796tsf727duZPHkyqampPPXUU1haWmJiYsJzzz3H4sWLa40/KSmJUaNGcezYMZYsWcKOHTswNTXlueee47333ruPv0R5S+unn35i6NChrFu3jqKiIs6fP0+7du1wdHTk2LFjtXY3/fbbb7zxxhsMGjSIL774AltbWwB8fX1xcHCo8hj5+fmcPHmSTZs2kZKSgoODA23btiU6Oppr167VGndsbCwvvPACJSUlHDhwgL1792JnZ4eXlxfbtm1TfaOuyVdffcWsWbOwt7dnyJAhmJqa0rJlSzw9PTU6xu3bt5k8eTJ6enokJiaSnp5O8+bNad++vcav45/OnDnD0qVLH+rcTEhIeOjze9++faxZs4aRI0eycuVKMjMzAXB3d6dt27YkJSVx7ty5Kuv+8/l0d3d3Pv/8cwDVl6Lo6GiNujEPHjyIn58foaGhBAcHA+Dj41PteVWV8PBw+vbtyxdffEGHDh0A6NWrF+bm5vz222+UlZVVW7finkvF53zdunWqfb169cLCwqLWY+Tk5LB161bGjh3Lhx9+SHR0NJcvX8bT01PjzxiUt9Ty8/PZtGmTapubm1ut70dVLl68SHJyMlu2bFFtc3BwoFu3boSFhZGXl6fxsSTpVEOpVLJjxw5OnTpF27ZtCQ4OplWrVmRlZbFx40a2bNnC2bNn6dGjB66urhw7dkytv7aiZREbGwtAWloac+bMITc3l/DwcNXJ6eXlRevWrSvVr5Cfn8+XX35J27ZtmT9/PnZ2dqp7OS+99BJZWVn88ssvNb6WgwcPEhAQwOLFi3njjTf44YcfAJg8ebJG9ZVKJbt37yYuLo7S0lLi4uLo1KkTxsbGXL16lbt37wLw3HPPceXKFc6dO6d2k7HiovLDDz9QVFSEr68vTk5Oqm/PoaGh7N+/n8LCQkJCQiodQ6lUYm5uTnBwsOp1ADzzzDPk5eVx8uRJcnNzq4y9IhFUfIN+/fXX6du3L2lpaQBMmjSJixcvEhMTU+PfoLS0lPXr1zN06FCmT5+Ovr6+6hiTJ0+u8RgVH8iioiK8vLyYNWsWr776Ko899hgAzz77rCqpVvc6/k6pVHLw4EEyMjLw9PSkZ8+e931uAnz//fccOnSINm3a3Pf5XXGcyMhI7t69S//+/Vm9ejXBwcHY2tpiYWHBxYsXgfKLLZS3bv95H3Lr1q1ERESofk5LS+Pdd9/F1dWVlStXAjB06FAuX75c6byqUHHMoqIiMjIy6NGjB1DedRsZGUlJSQmjR4+u8tyscPToUTIyMlAqlVy7do0nnngCCwsLLly4wIkTJ4Dy9zkiIkL1uv7piy++YMGCBUB5996cOXO4ceMGBw4cUJV54YUXiIyMrHQMpVJJREQEN27cUP0devTogaOjI8bGxty7dw+AkSNHVvs6lEol+/btIykpCYDLly+zbNkyUlJS2LZtm6pcTe/H3+3cuZOioiIADh8+zIsvvoitrS1r1qxRez2XL1+u9fPzd9K9VoWkpCSmTJlCQUEBoaGh2Nra0qFDB/bs2UNoaCgmJiZYWVnx6aef4ufnh6enJydPniQ7OxsPDw/u3LnDuHHjyM3N5cSJE3h7e+Pm5ka7du1wcnJi69at+Pr6YmVlhampKVD+TSIrKwsPDw8yMzO5cOECDg4OGBoa4uTkRGRkJIGBgfz3v//l6NGj3LlzB09PT1xdXfnggw8YP368Kv7MzExWr17N1atXcXR0RE9PD3Nzc1Vr5ODBg/j7+2NpaYmzszPLli1Tqw/lJ7BCoSA6Opq3336be/fusXnzZjw9PXnsscfYu3cvs2bNwt3dnYyMDA4cOEDPnj1RKpX88ccftG7dGmtra6C8H/j27duMHz+eVq1aERAQQHp6Ol9//TURERFkZWVx/vx5Dh8+THBwMGVlZWrHUCgUpKSk8PPPP/Pee+/x1Vdf0alTJ5ydnTE3NyciIoKWLVvi4OCg9hrmzp1LTEwMvXv3JiEhgc2bN9O2bVu++uor0tLSUCgUeHp6YmJiwq+//kr37t1V78ffFRcXY2BggIWFBTt37qRr16785z//4cCBA9y7d49OnTphampa5THmzp3LiRMnCA4OJikpif379xMcHMz8+fP5+eefSU1Nxc3NDUdHR/bv31/l6/i7uLg4XnvtNe7evcu2bdtwcXGhZcuW7Nu3j++++67Wc7Pi/SgsLGTmzJnk5+cTGBiIpaUlu3fv1uj8/nscWVlZ/PTTT1hYWODm5gaUd+scOnSIIUOGYGhoiIWFBXl5eZw/fx5jY2NcXFz49ddf+fDDDykrK+Ppp59Wzf918uRJrl27xuuvv86iRYto1qwZfn5+5OXlcfbsWVxdXVXnVXJyMqNHj8bGxobWrVtjZGTE0aNHiYiI4ObNmyQkJJCQkEBsbCzBwcEUFxdz6tQpWrVqpTrGvn37WLJkCZcuXWLPnj14eXmRnJzMTz/9xNmzZ/nzzz/Zv38/GRkZ9O/fn5s3b3Ly5Em6d++OgUH5fMm7d+9m2bJl5OXlYW5uTu/evenUqRNt2rTB0NCQb7/9luHDhwNgZ2fHxYsXiY+PJyAgAAMDAw4ePMjixYtJTk5m69atdOjQgVdeeQWAFStWkJGRwfnz54mLiyMoKKjK13H9+nXGjx9PWloaZ86cwdvbm44dO9KqVSucnZ355JNPGDFiBPr6+lhYWJCTk0NiYqLq/fin/Px8nnnmGezs7OjUqRMeHh60b98eZ2dn1q1bR58+fbCwsMDIyIjS0lLCw8Or/fz8kySdKmzfvp0WLVrw9ttv4+rqytq1a5k+fTppaWm4u7vzyiuv4OPjQ0lJCTt37mTEiBGkpqaSkJBA69atuXjxIkVFRSxevJhBgwZhZWWFkZERUN4kjYuLIzExUdX8t7GxIT09nXPnzuHk5MR//vMfvvzySyZNmgSAs7MzBw8eZM2aNfTr1w9/f3++/PJL7Ozs6NOnD5GRkVy6dIkePXqQlJTEtGnTcHd3Jy0tjZMnT9KnTx/8/f1RKBTEx8ezf/9+JkyYgFKpxM3NjYMHD6rqV6i4YRgaGoqnpyevv/46NjY2REREMG3aNBwdHTExMWHhwoV4enpy6dIlkpOTGTx4MHv37sXAwABXV1cMDQ2B8i6c+Ph4LCwscHZ2Jjg4mMOHD9O7d2+mTJlCp06duHDhArt27cLPz49z586pHcPCwkL1WhQKBe+88w63b98mJCSE6Oho7t27R+vWrTE1NaWkpAQ9PT3Cw8NJSUnB1dWVgQMH0rJlSzZs2MDjjz/O8OHDOXz4MBcvXmTUqFHs2rWLkpISPDw8KC4u5ocffsDc3JzmzZujp6eHQqHA29ubrVu3sm/fPkaOHElAQABHjx7l6tWrjBw5kp07d1JaWoqnpyclJSXo6+sTHh5OcnIyrVu3xt/fn2+//ZYjR44wYsQInnzySc6ePcsff/zBiBEjiI2NJTMzU/U6/i43NxcjIyO2b9+Oi4sLb775Jvb29mzcuJG+fftSUlJCu3btmDp1ao3nppWVFWVlZWRlZfHnn3/Srl07rly5wr/+9S/S0tJo06ZNree3lZUVP/30kyoOBwcHvvnmG7y8vLCxsVF1IWVmZuLr66s67+Pj47lx4wbZ2dmsWrWKIUOG8Morr6hNOHnt2jVat26NjY0NoaGhREVFMXHiRFq2bMn+/fvVzomEhAQiIyOxtbXF1taWFi1a4O/vz7Jly/Dw8GDevHm0bduWS5cukZOTw4ABAyqdm+vWrWPIkCFMmTKFO3fukJWVxVNPPcXXX39NQEAAs2fPxtHRkbNnz2Jtbc3jjz/Opk2bcHZ2ViXPyMhIpk6dSq9evfjjjz/o06eP6v1r3bo1Bw8eJCUlBT8/PwA8PDzUjrFx40Z69uzJ9OnTMTIy4vTp06rPooeHByNGjKBt27acP3+e7OxsnnjiiUqvIyIiAlNTUxYvXsyAAQOwsLBQDTZwcXHh6NGjJCYmEhQUBICTkxNxcXGkp6fTqlUrzMzM1M63a9euERUVRUFBAf7+/qoucHt7e65cucKhQ4fo168fAJ6enuzcuZOSkhK8vLxU147qSNKh/CbZtm3byMrKolWrVmRnZ2NpaalqGezbt48+ffrQrl07unXrpqqXlpaGgYEBbm5uHDp0iEuXLpGbm0vbtm0JCwujf//+fPjhhxw+fJiMjAzVjTsPDw/VRW3btm04Ojri6+vLiRMnOHfuHN7e3ty9e5cLFy4QHByMvr4+bdu2xcjIiAkTJuDs7ExeXh4RERGqm/hLlixh2LBhHD9+HCcnJ6ZOnYqtrS3R0dEEBQVhbm4OlA/xdXR0pEuXLqqTw9fXlyVLljB06FAKCwtVrSRLS0vu3btHbGwsPj4+/O9//8PZ2RkTExPS09PZvn07zz77LBYWFsTGxqrukZSWlrJ582bVhRrKWwv5+fkUFBRw7949OnfuTOfOnfH39wfAwsKC3bt38/vvv2NlZUVQUBBRUVGqb/4VXQSFhYX8+uuvFBQUEBwcjJ+fHxYWFkRHR5Obm4unp6fqZvyhQ4coKSmhsLAQNzc3XF1d+f7773n//fdxdHQkPz+f8+fP06tXL5o1a8bPP/+MnxaKNHMAACAASURBVJ8fGRkZvPbaa7i6uuLu7o6RkRHFxcXo6+vj4+NDs2bNGDp0KE5OTuTm5pKYmKg6xoYNG+jRowfNmzdXiyE/Px9/f3/at2/PZ599xpw5c2jZsiXGxsbEx8cTFBRE8+bNOXr0KM2bN8fZ2Vk18mnZsmUkJibSrVs3rl27xqVLl+jduzcuLi5s3ryZsrIyBg4cSM+ePSudm126dMHY2JhNmzZx9uxZevXqhbGxMQqFgp9//pmAgACuXLlCq1at6NatG507d67xGGfOnGHAgAHcvn2bpKQk+vTpg7OzM9999x05OTl06NABU1NTmjVrRkxMDIGBgRgaGpKXl8dPP/3EoUOHcHNzw93dXdWK/+yzz7h69SpGRkbk5OQwe/ZsVXfoyZMn6dKlC61ataKsrIwNGzZQVFREp06duHPnDl5eXpw7dw6FQkHLli2xsLCgrKyMiIgIRowYQfPmzTly5AhOTk54eHiojlFcXIyNjQ2nTp3Czc0NS0tLVq9ejaenJy1btsTU1JTw8HCeffZZnJ2d+fnnn2nXrh3t2rWjuLiYr7/+mk6dOhEUFMQTTzyBnZ0dZmZm/Pzzz6pzBMDAwABHR0e+/fZbOnfuTExMDC1btiQxMZHw8HCCgoK4cOEChYWFuLq68vnnn+Ph4YFCocDR0ZGcnBzMzc2xsbEhMjISV1dXOnbsSFlZGevWrSM3N5euXbuSnp7OkSNHCAwM5KOPPiIuLo7s7GxV69PPz4///ve/+Pn58dtvv+Hq6kqbNm2IiopCqVSSlJREdna2qss3Pz8fa2trcnNzSUpKIjAwECjvAWnbti0//vgjLi4uJCYmYm1tja+vLxs2bMDHx0d17lenySedhIQEXnnlFZycnDh58iQpKSn4+/urLsoVLYOQkBBMTEyIiYnh8OHDHDx4kB9//BF3d3c++OADzM3NiYqKwtDQEF9fX5RKJd9//73q4vrJJ5/g7u6u6hL68ccf+fHHH/H392fgwIEkJCTwzTff0LFjR5588kmGDRvGggULVAMHbGxs6Nq1K/n5+RgaGlJcXMytW7dwcHDgs88+Izs7m8OHDzNs2DCsra1xcXHB1taWL7/8kv79+6uGR+7bt49+/fqRlZXF+++/j76+Pv7+/hw/flzVfVjRSkpISFB1AXz++ed07NiRgIAA3n33XZ5//nm2b99OXl4ecXFxREREEBQURFJSEsuWLePKlSvExcXRpUsXHBwciIyMpLS0lEmTJhEWFkZsbCyPPfYYp0+fJjExkVOnTnHgwAHs7e3p2LEjf/75J3v27KF///44OztjYWHBr7/+ys2bN/nggw948skneeuttxg3bhwuLi589NFH7Nu3j+7du+Pg4EB+fj6XLl1i9OjRREVF4eDgoLohnZSUREBAAOfOneP8+fP861//wtXVlW+//Zbbt2/z2GOPceXKFZycnCgrK1MbVdaiRQu8vb1JT0/HzMyM+Ph4rl+/jqWlJZs3byYhIYGcnByCgoIoLS1VDeI4dOiQqqvo/Pnz3Lp1i4CAAP4/8s47oMorW/u/A+fAocOhd5AuRVBQEUSNgh17MmkmxsREo5mYTExiGxNjjKbOTEw0UaMmttgDsQWUJlhQQESaooD0Ir2X7w++dw8oluTe787Md/c/xsherF3evfda61nPSk9PFwg+CwsLIiMjyczMZPjw4Rw6dIi//e1vuLm58dprr6GmpkZrayuXL18mPT2d8vJyqquruXPnDkFBQRQUFHDmzBkSEhI4dOgQwcHB6Ovrs3DhQsrKyjAzM2PQoEEYGhoKOPJzzz1HdnY2e/fuFQ+Cc+fOPVSGr68vSqWSy5cvc+3aNaFHSUkJAQEBGBoacvfuXa5fv46mpibR0dFs27YNbW1t0tLSmDp1qnBX79u3j4EDB6Krq8uKFSuYOXMm/v7+LF68WMQNi4uLqa+vZ926dRQUFFBVVYWJiQkBAQG4u7sjk8lISkrC2tpaPKiio6Oprq7m/PnzJCYmEhQURGFhoZBRWVmJh4cHpqampKens27dOgICAtDS0mLjxo2sXbuWmJgYGhoaxDiDg4O5ffs269at4+bNm6SkpODh4YFKpUJdXZ3S0lIKCgoICQkRFj6ApaUlu3fvZvv27RgaGvLll18il8s5d+4cKpUKV1dXmpubWbduHYMHD8bR0ZEVK1Ywffp0tm3bxs2bN8nJySE2NpYRI0bQ2NjI+++/T1FREa2trdTX12NgYEB9fT0nT57E2dkZe3t7PvzwQ8aNG4eBgQF6enocPXqUH3/8kUGDBgmX8+bNm7l27RodHR3s3r0bQ0NDXF1dSUlJ4dKlS3z44Yds2bKFkpISNDU1sbCwQFdXl6SkJD777DOMjY0ZMWIENTU1fPrpp+Jh9bD2v/7SiY+Px8XFhQULFuDg4MClS5e4c+eOeIHHxMRgaWnJkCFDgJ6gcGNjI9nZ2axYsYLW1lZsbGx4/vnnGTlyJOrq6ly8eBEPDw8SEhJYunQprq6uNDQ0EBcXx/jx49m1axft7e0EBgYSEBCAtbU1XV1dzJo1i9GjR4tYT2VlJceOHWPKlClAT6xp06ZNnD59mkOHDjF69Gi2bdtGcHAw2dnZ3Lp1iyVLluDs7Ex3dzc5OTmkpaUxY8YM8RF8+eWXpKamig9x6tSpVFZW8ssvv2BmZsawYcNYuHAhJiYmJCUlMWXKFJydnYmLi2PdunU4OTlx584d0tPTWbNmDSUlJWRnZ/P+++/j7e3NDz/8QGhoKKtXr6ajo4OKigqGDBlCR0cH9fX11NfX8/PPP3P79m3mzp2LpqYm58+fJzc3l9dee4309HT8/f3Jz89nxYoV4vLv7u5m+PDhzJ49G21tbQwNDQkKCsLc3Jzy8nJ+/fVXtLS06O7uJjAwEIVCwQ8//MDcuXMpKipi48aNFBQUMH/+fL799lsuXrzIlStXePnll7GysqKoqIi4uDimTZuGubk5N2/exMDAgIaGBmFlSq20tJS33nqLmJgYLl++jLa2Nps3b+bNN9+kqKiIGTNmMGDAANTV1fvosGHDBgoLC/nzn//M9evXOXDgABcuXGDevHnY2tpSVlZGYmIi4eHhuLi4sH79eiZOnMgrr7yCmpoaFRUVODk54erqys2bN8nPz2fNmjXk5OSQk5PDE088QWlpKTk5OWLuKioqKCwsZNOmTcyZMweFQoGmpqawBlpaWjhw4ADl5eWEhoYSEBBAUVHRQ2Voa2tjbW19nx65ublcuXJFIL3U1NSwt7fn0qVLvPHGG/j7+6OpqUloaCjW1tZ0dnbi7u7O3Llz8fT0JC8vj5ycHF599VUUCgVdXV14eXnh4+PDnj17GD58OB988AFaWlrk5+cLy87R0ZGkpCRqa2vx9PRELpcTFBSEXC4nMzOTd955B09PT/bs2cOwYcP48MMPUSqV3L59m+eeew5zc3NaWlpYuXIlgwYNIiUlhZKSEhYtWkR9fT2pqam8++67uLq6snv3bvz9/Vm3bh22tracPHkSCwsLzM3N0dfXZ/PmzVhaWmJvb09HRweNjY28/fbb2NnZ8dlnn1FZWYm/vz/jxo1j6NCh3L17F4VCwcSJE8nJyWHt2rW4urqKNZ01axbZ2dlkZ2ezcuVKPD09xWN3w4YNODo6cv78eaZPn861a9coKCjgvffew83NjaKiIuLj4wkNDeVvf/sbenp6bNy4kTFjxgA93oeWlhZGjBjB0qVLMTQ05OLFi4wZMwZ1dXXa2trQ19fn0KFDnDt3jhkzZmBsbMy6detE6sCsWbNQKBTU1dUxbdo0Ro4c+cgz93/tpdPV1YVMJuPq1ascPnyYp556CpVKxZkzZygoKMDY2BgrKyuio6MZO3YstbW1rFu3DisrK8aNGyfcKWlpaURGRvLCCy9gZWUlfLKGhobY29uTnp4uXHJ5eXmMGTMGR0dHBg0axKpVq9DQ0MDLywtTU1O0tLTEASuTyQgKCuKLL74Q1oIUtO7o6GDVqlV0dnZy584d/vKXvzBnzhwKCwuFK04mk5GQkICmpqaIHVVUVNDS0oKmpiYffPCB8DFramoybdo0NDQ0sLOzw9raWlhJo0ePBiA/P5+Ojg4GDBiAXC7n1q1bhIaG4u3tzZgxY9DR0aGrq0sEpjs6Oti2bZsI8GZlZbFz506Ki4uZO3cuLS0tGBsb4+fnx7BhwwgNDaWgoABdXV1mzZrF9evXOXnyJDKZDBcXF7q6umhvb0dLS4u2tjbU1dUxNTVFJpOho6PDjRs3mDZtGkVFRXR0dODo6EhaWhqxsbEkJCSgUqkYM2YMISEhTJw4EScnJ1577TWR76Cvr8/06dOxt7cnPj4eKysrwsPDiYqK4syZM+jr62NtbU1HRwe6uroMHToUa2trli5dipGREWfOnGHNmjVMnz6d7OxsOjs70dbWJjs7m7Nnz5KQkICxsTFjxoxh6NChDBs2DC8vL55//nlsbGyAHhdjWFgYVlZWArhw7tw5jI2N+eKLLzh16hTFxcU4ODgwYcIEhgwZgpaWFhUVFejo6ODn54erq6sI8nZ1dZGTk8O5c+cIDQ0V4IW7d+9SX19PWloaGRkZzJ8/H09PT9LT0xk0aBCDBg16qIzDhw9TVVWFs7MzoaGhffQwMjLC09MTDQ0NBgwYgEqlIigoCAMDAxQKBT/99BNPPPEExsbG2NnZ4evrK/Z7Q0MDenp6+Pj40N3djZqamoAWX7t2jd9++42hQ4eyfft2sW9NTEzQ0NDA3NycxMREqqur2bFjBwEBAbi5uREcHCz2ZkZGhpDxww8/0N3djZ6eHllZWeLC0tLSoqqqCj09Pfz8/HB0dGTMmDHo6uoK5GZJSQnTp0/HxcWFrVu3oq6ujrW1NXp6esjlcn777TfCwsJQU1NDU1MTT09PwsPDMTAw4OrVq8TFxbFgwQLc3Nyoqanh+vXrFBUVoVAo0NfXFzkvTU1NjBs3Dn9/f8aOHYu2tjbd3d2UlZXh6uqKh4cH9vb2fPPNNzz55JNoa2tTW1tLTU0N7u7uqKmpUVdXR2BgIE5OTkyePBl9fX06OjqQyWQYGBjQ0tKCt7c3crmcr776Cjc3N2QyGXV1daxevZpr167x9NNPc/fuXVxcXHB0dMTBwYGnn34ac3Nzurq66O7uRqVSPRQA07v9r7l0GhsbiY6OpqWlRUyWmpoaXl5eHDx4kNu3bxMTE0NjYyPu7u7U19fj7e3NV199RUpKCgkJCWhoaODn54e1tTXd3d10d3czcOBADh06hLa2Ni4uLmhoaNDU1ERFRQWzZs1iz549JCcns2vXLl544QUcHBzQ1NSksrKSW7du4ejoSHFxcZ8AnEwmEzEES0tLli5dyvXr11EoFAQHB+Pl5YVCoaChoUG4P7q6uoiMjBQJddATT7C3t6e1tZUPP/yQtrY25syZQ0hIiMDWSweeFJuytrYGIDs7m9TUVObMmYOWlhbFxcVERUVx7tw5jhw5wpNPPklKSgq2trZoaGjQ3d2Nuro6rq6u6Ovrc+nSJZycnDA0NOTo0aO4u7tjbW3NypUrcXZ2pr6+HmNjY6ytrcW46+rqSEpKoqamhoMHD9LQ0MDYsWNFwLalpQUzM7M+2djSOl67dg1LS0usra1JSEjAwMCAU6dOYW5uzpdffomzszPHjx9n4MCBmJqaYmFhAfRAoaVsaunwa2xs5Pbt25SVlXHw4EFqa2uZNGkSxsbG7Nu3DwBXV1cRZ5DchLt37+bChQvk5+dz8OBBzMzMiIuLw8LCoo8OHh4e6OvriwTB1tZW9u/fT0tLi5gPmUyGs7Mzp0+fJjY2lilTphAeHi4uAH9/f959913Onj1LREQEs2bNIiYmRsjo6upCXV0dGxsbdu/ezfnz55kzZw5hYWFkZmaSm5vLzJkzWbhwIQ4ODnR3d4uA86NkjB8/nvT0dFJTU/H19WXZsmWcOXOGiIgI8VgzMjK6bz9LFkpJSQmDBg1CoVAQFxdHTEwMkZGRnDx5kqlTp5KQkICurq6IDchkMgICAlBXV2fbtm2Ehobi7+9PZGSkYHMwMTERVvzYsWMpLCxEW1sbIyMjMY57ZQwePJjTp09jZmbG1atXKSoqIioqirNnzzJp0iTBjNB7f6tUKg4cOEBtbS0XL16kvb2d7u5u3N3dhZstKSkJHR0djh8/Tn19PX5+fuICtbW1Fa5WOzs7NDU1KSkpQUtLi+rqalJSUoiNjRXf8q+//tonNqOuro6bm5tAnF26dImcnBxmzpyJmZkZKpWKvXv3cvHiRfbs2cNzzz2Hra0tOjo6XL16VVRZlslkqKmp4ezsjLGxMSkpKbS2tuLt7c2GDRsICwvD1dWVt99+G09PT5RKJXV1dXh4eAhXfWdnp5D1e9r/iksnOjqa1atX09nZyaZNmxg8eDCWlpbixRwcHIyenh7Nzc0sXbqUmzdvUlpayvDhw8nPz0dfX5/XXnuNv//97yiVSgYOHIi2trY48PT09Ni1axezZs1CqVSSlZVFeXk548ePZ+TIkTg6OvLaa6/h7OwsdJKAAP7+/ty+fRsnJycR7IeezVVRUUFkZCTV1dVoamri7OzMwIED6ezsRCaTYWJiIgKWd+/eJSoqirCwMHR0dOjo6OCXX37h9OnTZGVlMWfOHKZOnYqGhgZlZWW88847aGlpided1KSDNyEhAaVSyYgRI1BXV8fAwIDAwEC6urpYvnw5tbW1/PnPf8bPzw8bGxsR85D6Ozk54enpibu7OzExMURERGBkZIS/vz8KhQIPD4/7sqrv3LnD4cOHuXv3Lq+++iq2trbk5uYKP/eHH35IVVUV3t7e4vfJZDI6OjqIiopi9uzZVFRUsGXLFioqKvq4EiT4+bfffsuNGzfo6OjA3t5evKglWQDJycns3buXxsZG5s2bJ9xR7u7ufPrpp6irqwsdpANt2LBh/PTTTyxYsIB58+ahpaVFVlYW8+bNY+rUqUIHHx8fMW7pcikvL+edd95BqVSK9ZAeHdJhNmHCBExMTFAqlWRmZhISEoK3tzdGRka89957KBSKPjK0tbWFDOk1vGzZMmxsbFBTUyMvL4/w8HBxoEpsCo8rQ7JeR44cKehu5syZI3JsJKBA7/0MPdZ2a2srrq6uyOVyZDIZTU1NNDU1sW7dOjo6Ou4DcUgPA5VKxS+//MLHH3+Mo6MjhYWFFBUVMWjQIH788UfMzMz47LPPMDAw4M9//jP29vY4OTmhqanZr4wBAwaIvTBv3jy6u7tpa2vjgw8+oKKiot/9rVKpGDhwICUlJdTX1/PXv/6ViIgIOjo68PHxEXHgmJgYVCoVs2bNEtBqmUyGXC4X335oaCgGBgbExMSgoaHBggULhFXm7OxMZGRkHxm9D3fJU/Pbb78xYMAAvL29xVk0ZswYDA0NefPNN4Xrcc2aNVy4cIG8vDxhmUkoT+hBuAUGBuLg4MDdu3eJiYlh6dKlyOVyurq6cHFxEXB5qf1e2ifR7w/1+g9rMTExzJ8/n5UrVzJ79mzOnz8PIPz0lpaWBAUFMXPmTBQKRZ+DaPHixbz33nsA+Pn5YWhoSFRUFPDPDyksLAxjY2PWr18PINA6AEZGRnh4eKClpdWHIfbOnTuEhIQwefJk5HI5H3zwAcePHwd6XhANDQ3cuXMHNzc3duzYQUlJCbt376ajo0NsYvhnol9ubi5qamoieVRyxYWFhfH1118zatQo8fOtra0MHjwYQ0NDfvvtt37nTIKHX7t2jddff534+HgcHBz6WD5OTk5kZmaKhDbo+bA6OzspKioiNTWVv/3tb5w5c0Ykt94LzezdHB0dWb58OZ9//jmBgYF4eHgIN4WWlpZw9UjJb9Dz8cnlcpRKJc888wxbtmxh9uzZKJVKQWgqXUrbtm3jmWeewcfHhw8//LDPGvaeyxEjRrB8+XI+/fRTxowZg5eXl3gwSGwCkg7q6up0dHSgUqn4+9//LlypU6dOJTk5WewjKZHP3Nz8vgz8/tZDOrDt7e0JDw+nurpa7Jva2lq0tLRwcnJi/PjxaGlpCXTcvTK6uroYOnQoo0aNEtnk1dXVlJeXC0tLOswepEd/MqqqqqioqEBTU1Po0dTUhKenJ42NjaSmpva7xkqlkuzsbDEHdnZ2TJw4kSVLlqBUKqmsrMTLy6uPDHV1dbq7u7GwsMDJyYk9e/YAYGBgQFNTE9ra2jzzzDP85S9/+d0yVCoVxcXFWFtbExYWxquvvopSqXzg/oYeiPCLL74oUhqcnJxQqVRAD+vEb7/9xqxZs1i6dCmampp91ltDQ4NRo0Yhk8kE64K1tTW1tbVoaGiIOO+uXbuYPn16vzJ6r9ndu3dxdXXlypUrLFmyhLNnz4oHonTmnDlzhokTJ7Jt2zacnJzYuHEj0OPdkDw2FRUVZGVlAT0sEDo6OrS3twN9L5d79fgj7f97S6e1tZWmpiYGDx5MTU0N27Ztw8vLi66uLuFikclk1NfXs2HDBhHcXbBgAaampuKAb2hoICYm5oGWyZAhQ4SL4dKlS7z88stCvtR6L15DQwOJiYn9upJ++eUXmpqaCAgIEFZVTU0NDQ0NXL16lZCQEPFykzZfUlISZmZmdHR08Pbbb9Pa2srcuXMFAEJ6GUkvywdZWZLl0NtKeuqpp8RrXdp0zc3NImlTqVT2ITHs7u7m2LFjbNmyhYKCAurr61m7di1mZmakpqaioaEhLpHerzelUtlnziwsLDAxMUFNTY3Ozk7i4uKEfHt7e5RKpdA3Pz8fX19fVq1aRUBAADk5OWhra1NeXo61tTVZWVkolUqeeuoprK2tqaiowN/fX7zkpLkB0NbWxsbGpo/VZmZm9lAdZDIZxsbGxMXFUVdXx9mzZykrKxOZ7b3X/l53xKOs3pqaGt58801iYmJITk7m5ZdfFi5eSVZzc3O/MqR/l5ByvcELkmv1UXo8TIatra3Qo7m5+aEADIABAwawcePGPi6i3vumtbW1XxlSrNPAwIBNmzaRnJwsqI3Mzc3Fd/pHZMyfP/++x8DD9jf0XNwLFiwgLi6O5ORk5s2bh4GBAUqlUiDuHB0d+fjjj8nLy6Orq0usma6uLm5ubuzcuZP4+HguXLjAK6+8InJhJBklJSX9ypA8LG1tbWzfvp2LFy+SmZnJ7NmzCQ0NpbW1laqqKnR1dWlsbOTChQsMHz4cCwsL3N3diY2N5fr16wQFBYlz5MqVK2zZsoWkpCQOHDjAs88+y4ABA7i3/V5XWn/t//tLRy6X4+7ujp6enggoq1QqNm3ahLe3N6amprS1tSGTyZgwYQIuLi4sXrz4PvLGrKws9PT0mDVrFsnJyRw6dEgEudva2pDL5UyePBno+bCGDh0qXBf9LVRhYeEDXUkDBgwQPlzogXWfP3+e1atX8+mnnzJt2rT7LIaMjAy++OIL6uvrWbZsWR/26N6WmyTvYWNRKBSUlpZia2vLqlWrsLe3F3KkAzYiIkK8cI8cOcKVK1dEMHXz5s0cOHCA1atX4+npKfJ4jh49SkZGBlu3bhUEiI/Turq6aGpqoqioiNmzZxMdHS3iENra2qirq+Pl5SVygrq6uvD19SUyMpJNmzYxefJkQZdz7tw51q1bR3t7OydPnmTYsGHo6en1+3vvdWc8SAfJYuju7iYrK4uff/6Z2tpa3nvvvccKrj7OeowfPx47OzsWLVok4m699XuUjO7ubkJCQu4DL/wePSQZnp6ezJ07V8iQ9HgYAKOrq4uamhq0tLQIDAwUSaP3zvfDZHR2dmJjY8Po0aOxtbXljTfeuG9+/ysypL39sP1taGhIe3s7urq6DBs2DAcHB5YtWybiHJqammhra3P06FGOHTvGuHHj0NDQ4B//+Adjx45FV1eXtrY2VCqVQIZKaFGpPUqGnp4ebW1taGhoUFRUhLW1NatXr8bBwYHy8nJefPFFysrKGDZsGNra2pw9e5bKykpBOzVo0CA+//xzJkyYgJ6eHjKZDHNzcwYOHIimpibvv/9+n1DAf3f7/+bSkeIz/TXpsHR1dcXf31/kgdy8eZPg4GAiIyO5du0aXl5eYvF7B5jh4ZZJREQEDQ0N2Nra8vXXX3Pr1i309PQYMGDAA18GWlpa+Pj4MHfuXGxtbens7MTNzU3EHu79WSlAWlxczMcff0xDQ4Ogk2lra6Ojo4OwsDBeeeUV9PX1+8Qp7tXhUVZWfX29QEdBXytJ+u+GhgY6OzspLi7mwIED1NTUEB4eTnd3Nzt27KC5uZnOzk7BfBATE8NLL73Eyy+/TG1tLSdOnGDChAlUVlZy/fp1jIyM7hu31GQyGZqamvzwww+88MILXLlyhS+++IKmpiZGjhxJcXExhYWFqFQq1NTUUFNTQy6Xc+XKFW7fvk1lZSVPPfUUQ4cOJTIykhdffJG//OUvZGRkCEh6WVkZJSUlGBgY9OurfpgOwcHBFBcXU1BQQFBQEMHBwUycOFH456W4yIPao/ZWXV0dbW1t+Pj49Ls3HyUjMjKSxsZGEZCWXKC/V8a5c+cICAgQB5Uk43EBGDo6OhgYGGBiYtIHpQmPD+JQKBQMGDBAXHh/BAjyMBmPs79//PFHLC0tBaPAvTL09PTQ09PD0dGR2bNn4+XlxeXLlykoKGD48OHs378fdXV1Ycn3t6aPkrFv3z40NDSYOHGicOlKCL38/HxMTEyor68XiLPPP/9cpGPo6+tTWFhITU0NAwcOFDmEjo6OuLu7o1Ao+t0f/13tP/7SuXLlCitXriQzM5Ouri4cHBzumzDpA6moqKC0tBSVSkV1dTWNjY1oaWkJIsqAgABx8N074Q+zTDQ1NRk0aBBVVVUcPnwYb29vqqursba2Rl9fv19rp7crSSLkfBA1vsQLJbn46urq+NOf/oS9vT0nTpygsrKSUaNGUVlZycqVK7l+/bqguOlv8zxsLI2NjezatUswYfd+Bfb+8/jx4+zYsYPGddPWrAAAIABJREFUxkZGjRolKM+trKxIS0tj8eLFREdHY2FhwYgRI3BxccHLywu5XE5AQAD/+Mc/GD9+PEVFRVy7do2kpCSGDRvW7/i7u7spLy8nNzeXM2fOkJ6ejkqlIjw8HAcHB65cuUJ3dzeZmZk4OTkJGOeZM2d45ZVXiIqKQqVS4ezszOXLl3Fzc8POzo6RI0eyfft2xo8fT1JSkkAhSpbd79VBW1u7D7ru0qVLrFq1ioKCApRKpcj2vrc9aj327t1LQUGB2J/9HQYPkyGxFhcUFDB06NDfLUMulxMREcFPP/2Er6+vmJ/fA8Cws7Nj9erVVFVV4ePjcx/q6XFkSJyCd+/eRVtbG2Nj437dlg+TUVNTw/bt24WFqqen168MaX+3t7fzzDPPUFdXx4ABA6iqqqKrq4tBgwb1eUj0lqGuro6joyOenp7iMVxdXY2Ojg7e3t7U1dWxbds2SkpKAMS+6D0fj5Ihcar13p9SLpeVlRVKpZKMjAxRWqGkpIT4+HgCAgJEYq+3tzd2dnZYWVn1cXXe6xn5727yR//Iv2/LzMxk06ZNvPLKK7S0tPDuu+9y4cKFfl+VnZ2dXLlyRUBZMzMzGThwIG+++Sbr1q0TnEQPalKQW3ILaGtr09TUxEsvvYSmpiaLFi1i8ODBbNu2TRD3JScnCwhsfxdPdHQ0u3fvxtbWluHDh9/n/5aag4MD1dXVpKWlsWfPHuLj41m3bh0hISHMmjXrd89Ff2Px9fXlxIkT7N+/n2effZampib+8Y9/sG3btj56S+OYOXMmZWVl3LhxA3V1de7cuUNTUxNyuVzAxv/0pz+xc+dOmpubCQwM5Pbt22hoaHDhwgXc3d0xNDTExMSExMREvvvuO6ZPn94v+aAUL7l16xZDhgxhzZo1/Pbbb5w5c0ZkVx86dIgPPviAX3/9tc/rU9Jj3759tLW10dnZyfXr17GwsODSpUu4ublx7NgxoqKisLW1FS66P6JD71ZdXc3u3bt56qmnsLe3F0HZ/lp/6+Hj48O3337LkSNH/vD+9PHx4ZtvvuHo0aN/SIZcLicjI4OvvvqKF198kZCQkH7dhdKeGDFiBObm5uLx0NHRga2tLe3t7YLzTQJh3IuEepSMwsJC9uzZQ3h4OBYWFv2WBniUjPT0dMG1JuV7PUjGzJkz8fT0FOtaXV2NhoYGa9euxdjYGFdX14fWo5HJZCQmJnLu3DlB4Lt27VoiIyMFlLm2tpYtW7awZcuWB1rX/ckA7ivAJn2jEo1RVlYW+fn5REVF8eyzz/LWW2/x4Ycf8vXXX4v8xLFjxwLc98j674jbPKz9R146hYWF2NraUltbi0KhELxAfn5+nD59mrCwsPv6aGho8MQTTwjyvvXr1/Pzzz/T3NwsPsbU1FTs7OxQqVQiWCc1Q0NDkf8CPX7RXbt28eqrrwo2Wejxx9rb2+Pwf4k/MzIyRA5OVVUVt27dwtPTkw0bNggXzaP8p2pqamzatAljY2MARo0aJWC3Uq5PXV3dY8/FvWMxNTUVNEAzZ85k0qRJ5ObmCup++Gc+jLQh5XI5NTU1fPzxx+jp6fHVV19hampKc3MzWlpaDBkyhKSkJC5dukR7ezseHh7ExsYSFRWFk5MTb775pgj+tra2YmZmxvHjx3n55Zfvuyg7OzuRy+Vs2rRJkEM+8cQT+Pv7i3FUVlZiamrK8ePHmT9/Pp2dnX30OH/+PAqFgkWLFhEdHS2gvRYWFhw+fJi1a9cKio/+2uPoAIjcFYnX7auvvhLrdPfuXYyMjER+w4PWY8CAAejp6XH16lV8fX3/0P6UyXoYtCV9fq8MCYnW1NQkDt63334bXV1dXFxc+jyipD8l4k3p3/T19cXB3NnZKXJmkpOTsbS0vE/f/mSEhoYCPWwcd+/eJTw8HOhBibW2tgorpvfevFfGkCFDUKlUpKSkYGhoyJw5cwAEIhDu398WFhZYWFjQ3d1NREQE/v7+wjLS09MjJSVFxDAf1Nzc3GhoaKCiooLXX38da2tr0tPTef7555k4cSIXLlzoUzL+3vW4V8bHH3/8wN91b3NxccHNzY2UlBRKS0uxsLBgyZIlFBUVceHCBb777juRbvE/3f6j3Gu//vorn3zyCYmJiSiVSlxdXRk7diw6Ojq0tLQQHx/PrFmz+j04uru7OXXqFDt27CA/P1+UOq6srBQBu0cFuevr67lz5w4qlYrKykp+/fVXFi9eTHd3NxcvXhTIGDU1NfT19cnMzBQ5LrW1tRQXF3Pz5k2am5tRqVRkZGTw6quvUldXR0REBHK5HH19fQHDlTagurq6AA5I///8+fOsX7+e+Ph45HI51tbWzJgxA01NzUfORe/53LBhA0lJSRgaGlJcXMzt27c5f/48mzZtoq6ujtTUVLy9vfugqaCnuqJcLmf06NHk5ORw/PhxUUtFoswpLCxkxowZZGdn86c//Ql/f38CAwOZPn06BgYGItbR0NCAl5cXUVFRuLm5iYxsyTUozUNviKe6unofRumamhohw8XFBWtrayIjI/nss8+EHteuXeOFF14gICCA4cOHCx+9lCSnUCj47rvvBCxdX19f6PgwHaQa9N9++y2bNm1i3LhxWFlZkZuby61bt/jmm29IS0tj69atBAUFPZAQMSIigs8//5zU1FQsLCxwdnamsLDwsfenNGfffvstX3/9NRMmTGDgwIGkp6dz4sSJxwJySCUHUlJS+tQ8kmSXlJTg6+v70Nfw2bNnWbt2LeXl5YSEhAimioeBMO5tcXFx7N27l2HDhqGurk59fT0VFRVUVlayYcMGUlJSOHz4MIGBgQ+UceLECVHJ1NLSUqDRWltb+etf/8qlS5dITExk0KBB/UL5o6OjWbt2Lfv27WPixIm4ubkxfvx4QQJsYWHxQJcp9FibWVlZHD9+nKSkJHR1dVGpVAwdOpTy8nIWLVpEZ2cnsbGxgri2PxlOTk74+PiQlpZGXFwc9vb2DwUpASJ9orS0lMjISE6cOIGHhwdeXl6Cjuhe9Oj/VPu3v3Skib1+/To//fQTH3zwAcbGxuzatYvnnntOLFRDQwORkZFMmjRJvESrq6vJyMjAyMiIpqYmvvvuO5YvX46bmxuxsbE4OTlhampKbGzsI4Pchw4d4tNPPyUrK4uMjAxGjRrFiRMnuHz5MrGxsRQWFnLy5Ek6OztxdnZGpVLR0NDAzp072bVrF01NTezYsYPJkydjb2/PkCFDOH78OGfPnhWFsE6dOiVqlvQOqvfeHBK1xZYtW1ixYgVubm6cPXtWkPw9zlwoFAquXr3Knj17+OCDD8T43njjDZFo98Ybb/Dmm2+SmJhIZGQkDg4OtLe3U1dXh76+Pra2tvj5+dHa2srWrVsxMTEhOzubH3/8kbCwMIYOHcpbb72Fr68veXl5gp/JyclJ5CtJls6OHTsICQnBw8ODffv2cefOHTw9PUXOlDT2trY2cnJyBHy594V0r4zCwkK8vLwYPnw4b731Fn5+fuTm5lJRUcHAgQPR0tJCTU1NwKmPHz/O6dOncXFxIS8vjx9//JFZs2aJBNB74w/3/l2CnRYUFFBRUcHIkSOFpTtz5kxee+01ysrKOH78OBMnThSlAqT1KC0tZdu2baxYsQI9PT2SkpJoa2vDxcWFuLg4XnzxxccCYUh65OfnU15ezrhx4x4byFFRUcH27dtZuXIl+vr6JCYmCvoT6ImVSImx0mVfXl5OcXGxeGy99dZbHDt2jM8//5zp06eLNXoYCKM/EMd3331HQUEBbW1tIsXh/PnzFBQU8PTTT/Pyyy9z9epVIiIi+gWCnDlzhkOHDrF27VpaW1v56aefePrppwUo4MUXX+RPf/oTFy5cIDY2lnHjxlFcXExjYyOampp89NFHJCUl8dFHH+Hi4kJ5eTleXl6oq6tjbm5OWloaZWVl2NraijPoXh3u/cb27dvHK6+8glKpRE1NjTFjxvDcc89x4cKFxwK0rF+/ntu3b2NgYCAAIQ9rLS0tbN68mZKSEmbNmtXHvfr/Om7zsPZvnRza0NAgSvg2NDSgq6uLlZWVoKKpqKgQP5uWloaGhgYGBgbU1NRw48YNsrKyuHTpEqmpqeTl5VFeXo69vT3Dhg2jsLCQu3fvEhISwrvvvou/vz8ACxcuFK6loqIi8vLySE5OJj4+njVr1vDRRx/R3t7O2rVrWb58OWfOnGHKlCl89NFHzJkzR7xuKyoq2LlzJzY2NqJ2RktLC1lZWcIf+95771FWVsZLL70kmJsLCgq4efMmtbW1fPPNN8A/L5q0tDTa29vFgSKNpbi4mNLSUjEXV69evW8uCgoKyMjI4ODBgyLz+s6dO1haWjJs2DAUCgWVlZWCzkdy+b300ktkZGSwatUqzpw5Q2xsLE1NTSKnQVNTk9WrV7Nq1SreeecdoOdFPGnSJKBnc7/++uucPHmS1atX09LSIoLIkl9eemmWl5eTmJgoKIeKiorYvHkz1dXVtLS0sH79+j4yJAujPxnnz59n/PjxQg+prILEpiyXy8XBOXr0aNzc3Hj11VdZsGAB77zzDurq6hw/fpy7d++yZcuWPq4Y6KHs2b9/v0gS7e7upra2ljVr1pCfn8/Vq1cJDw+nq6tL7NOlS5eSl5dHZWUlxcXFZGRk8OWXX1JYWEhpaSn5+fnY2toybtw44c+XePIkJOG9+zMjI4PDhw+L5NPOzk5qa2v54IMPKCgoIDk5mVGjRrFs2TJBYttbRnZ2NnFxcaSmpgr3r62tLWPHjkWhUBATE0NGRgYA7u7u7N+/H/jng+Hq1aucO3eOVatWce7cOSZOnIhCocDOzo66ujp+++038vPzqaqqwsbGhtWrV3P58mXs7e2FKzgjI4OEhARiYmJoa2ujrKyMpqYmnnzySRITE4V3QSLIlPbeypUrqauro6GhgfT0dBISEkQVUulAtbS0JCwsTFwOkyZN4vbt26hUKrS0tHjppZdobW2lo6OD06dPc/DgQTo7O3n55Zf57rvvsLS0pL6+XoB8pNjclClTKCgooKysjOrqampqarh16xbx8fGcO3dO7Lne35ienh7FxcUAwk0JPYX+eo/j8uXLIiG1sLAQ6EnGraqqwsXFRXC1SfvuQe3MmTMMGzaMw4cPi3QOqf0rLByp/dtaOnv27OHLL78kLy+PmzdvEhISQnh4OGVlZbz77rtoamry66+/IpfLGTBgABcvXhQ8ZqtWrSIvL48zZ87Q2NhIXV0dkydP5qmnnhKTfeHCBYKDgzE1NcXMzIzc3Fyampo4ffo0LS0tTJ06FSsrKzQ1NcUBPXbsWPT19QkKCuLjjz/G39+fjo4OsrOzGTt2LI6Ojuzbt4+wsDA0NDTw9fXlueeeQyaTUVpaSnBwML/88gu+vr4YGxtjYmKCn5+f8HtbW1uzf/9+wsLCMDU15eLFizQ3N3Pp0iU2btxIfn4+p0+fZsaMGTzzzDP3jUVyS8XHx+Ps7CzmwtTUlMrKSg4cOEBdXR1nzpxhxowZXL58mYSEBJYtW4a7uzunTp3C2NiYpKQktLW1MTc359ixY4I6w8LCAjMzMxz+L38c9Lym6uvrheWQmZkpCEolH7VCoSA2NpaysjLq6uoExFM6GLZt28b333+PpqYm06dPJy8vj9DQUIyNjcnOzhasxnFxcZSWlj6WjJs3bzJu3DgUCgUymUzwfN3bXyaTCWqT3slwubm5DBs2DDs7O5FoKgXRd+7cyaZNmzA1NRUXhJQfYm1tjYeHB3v27BF7S6r3IvnwJ0+eTFRUFJGRkdjY2DBhwgQsLS1FhUo/Pz9SUlJoa2ujq6uL4OBgbt26RWNjY5/9mZaWxpYtW6ivrxd1nDQ0NProsXfvXkFeWVBQQENDg5ChoaHBF198QWtrKzKZjMmTJ3P27Fmhw5UrV+jo6KCtrQ0PDw+sra25dOkShYWF4gI7d+6cqHPk6OjIpEmTOHLkiLDg79y5w5EjR7CwsCAqKgovLy8++eQT9PT0iI2NZciQIVy4cEGQiEosx1OnTsXS0pK8vDwyMjIIDAzE3t6e4uJiioqKUCqVnDp1iq6uLiZMmEB8fDyHDx8W5QwCAwNFGZL3338fW1tb9u7dy8yZM7l58yYdHR0YGBjwyy+/oKamRlFRESdOnKC4uJiKigr8/PwEPVRWVhZJSUmMHTtWPHZMTEwoKSlh+/bt7N27F0tLS0pKSti3b59gXp8wYQLJycnExcXx7rvv4u7uTmRkJMbGxlRUVHDjxg2USiWHDh2io6ODSZMmYWdnR0NDA7m5uXz88cckJCRgY2MjCDttbGy4fPky7e3toqzDg9xsHh4ewj36/xIC/Xvbv+Wl09DQwPbt23n//fcJDw8nNTWVEydO8MQTTwhAgMSKfOXKFTw8PMQi6ejo8PbbbxMfH8/7779PaGgoubm5HD9+nLFjx4rJ37VrF9OmTUNXV5fy8nISEhLYtGkTLS0tTJgwAScnJ1H90c7Ojt27d2NjY4O9vT0ymQw9PT327NnDp59+ytatWzEyMiIiIoKysjJsbGxwd3cXkE6pwp+Hhwd5eXlcu3ZNwF9VKpWg5Dh69Cjl5eWMGjUKPT09Bg4cSHV1NceOHWPZsmU888wzpKamcuPGDfz9/cVm6z0W6Hl9rlu3Dh0dHZGh/+2337J48WLmzZtHamoq5eXlvPfee2RlZbFw4ULmzZtHY2MjpaWl1NbWkpmZybFjxwQv15///GcOHTok4hVS0LW6upoNGzZQUVHBwYMHycnJYcaMGfz888/cvHkTT09PWlpaSEpKYuHChRw4cEDQCUmWhpOTE76+vrz00kt4enqSnZ2Nra0tR48epa2tTRAZJiYm/i4Zly9fFmSqLS0t9/U3MjKiu7tbgAT279/PrVu32L17Nzdv3mTKlCno6elhampKTU0NFhYW7Ny5kyNHjrBy5UomT54sPvaWlhZSU1OZNm0ad+7c4YcffuDu3bv85S9/QVtbm3PnznHr1i0WL15MREQER44c4e2332b27NnCanBwcGD//v0cO3ZMJLyWlpbi4uJCdHQ0mzdvpq2tjcWLF2NkZMTWrVt57rnnWLRoER4eHiiVyvv02LlzJxUVFYwYMYJTp07x3XffCRkHDhxgwYIFPP300/j5+aGhodGvDsXFxYKpXENDg/LycgYNGsRnn31GaWkpb7/9NqamphQVFREQEICHhwc///wzf/3rX3nyySfFi3/JkiWCC8/BwQE/Pz9aWlo4ePAgb775JjNmzCA5ORkrKyvMzMyQy+UiudHU1BRbW1ucnZ1RU1PjyJEjVFZWsnDhQpqbm4WM6dOnk5ycjK+vL5MnT+bKlSssX76cF154gVu3blFcXMyTTz5JdXU1u3btoq2tjWeffZYDBw6wfPlyJkyYQE5ODlFRUUJXHR0dcnNzcXFxEd9YUVERW7ZswcLCgvXr1+Pt7c3333/P4sWLeeqpp7h06RLR0dF89tlnZGZm8tprr/HSSy/R1tbGzZs30dTU5PLly+zcuRPosT6l/DpNTU12797N888/z8yZMzEyMkJHRwe5XI6hoSF37twhPz8fAwMDzMzMBEjpXldr7xyof5cLB/5N0WsFBQXo6+ujUqnQ0NBg8ODBovDXkCFDaGxsBGDcuHEcPXoUAE9PT3bv3s2gQYO4fv26KHwmmeXLly8X/c+ePYuRkRHm5ubcunWL9PR0nn76aYKDg9mxYwfffPMNw4cPR1tbG11dXRQKBdOmTeObb75h5MiRdHV1MXnyZE6fPk1ZWRkfffQR2dnZNDU1YWFhwd///neGDh2KUqkUhJBSbGX+/Pm88cYbom479PCz7dmzBx0dHT777DMR1JTQMq6urgKtpqWlJQLAMpmMuLi4PmPJyMjA0tKSffv2Cfhvenq6gHgqFAp0dHSEpSLBmIcPH860adN4/fXXCQkJwcnJiQEDBmBra8v777+Puro6U6ZM4eDBg7S2tgr/sJmZGc8//zwpKSk4OzsLTjNdXV1heSmVSgGnnjJlCps3b2bq1KlChrOzM87OziLLetmyZQBcvnwZc3PzPrWMfo+Mn3/++bF0kA59T09PEhMTBbmo1CwtLbGwsEAmk+Hm5ia4swoLC9m9ezdDhw7F39+ftrY2Jk+ejKOjIwsXLiQ6Opr29naGDh2Kj4+PqNjp6urKqFGj0NfXp6SkhD179oiqnT/88AOVlZWYmJjQ3NzMiy++yMKFC3nxxRdFHZru7m5aWlowMjJCT09PEKUOHjyYgIAAYU1JRLPR0dEYGhoyf/58JkyYgJWVFe3t7QI9VlNTw+HDh0Ul1946tLS0iFIUSqWSkSNHMmrUKNTU1Jg/f77Yi1K8raWlBS8vL8GqDTBt2jReffVVnnnmGeCf7MRGRkbExsYil8txcXGhsLCQtLQ0Zs2aJZB+UjpBVFQUFhYWlJSUMHLkSPz8/MQF0FtGQUGBqCFlbm4uuPIGDRrECy+8IPQIDw9nxIgRmJiYPPC8SE5Oxt/fn6amJmHxSd9dQ0MD7777Lt7e3kAPus7Y2FikArzzzjuEhoaSmppKc3MzKSkpBAYGMmXKFF5//XWeffZZJk2axJ07d7CxsaGuro6UlBTxjdbX1xMcHExdXR2JiYkieVMulzNixAgOHjzIrVu3MDExoaOjg7t375KXlyfqdPVu/0pXWn/tX27ptLW1UV5e3gd6aGJiwi+//EJlZSW6urqkp6ejpaVFfHw8o0ePZs2aNRgZGREZGUlxcTG+vr74+PiIZEuVSkVkZGS//adMmSLqzly8eJFt27aJOiSGhobCBVNfX09AQIBwnbi7u3PixAlBpKlQKIRby8bGBjc3N0JCQkhISOjXhSMlqGpra6OmpkZMTAwAR44cYe7cuYSEhBAUFEReXh7GxsbCIpMKq0mX1smTJzEzM2PgwIFAz6Uhl8u5ePEiW7duZfDgwdjY2ODp6Sn8/BYWFvfJkEpL6+npsWPHDpydnTl58iQNDQ1UVVWhVCoJCQmhqqqKrKwspk2bxrVr1/jxxx/p6Ohg3LhxZGVlsW/fPqZOnYqvr6/I7+js7OTnn39GXV1dJM1mZ2cTHh5+n4ycnBz27t3L0KFD7yPflDK3/6iMvXv3IpfLH9k/KyuL/fv3M3XqVPz9/UWuQ3NzM7m5uZiamor1sLW15fr16/cBD7Zs2cKUKVPuA1BoampSW1uLpaWlAFDY2dndJ+PWrVts3bqV2bNnc+TIEerr64XbZsiQIdy8ebMPCENyF1ZXVxMTE4OpqSl5eXns2rWL8ePH99GjN4BCKq4mk/Uwid/bf8+ePUybNq2PDlZWVgQEBHDjxg3Mzc2FDr0Rjenp6VRWVjJy5Ejhfjp+/DgymYzDhw+L4nr5+fmYm5sLC9Xe3p7MzEwiIiLYvHkzo0ePJjs7m9OnTxMaGioeYN988w0REREMGTKE1tZWwZIMPVaTJGPLli2MGjWKGzdu8MsvvxAcHExUVBT29vYcP34cDQ0NTExMsLa2RkNDQzBPP+y8MDU1FbEeHx8fmpubuX37tngMdXZ2olKpROy2tzfk+++/Z+7cufzwww84OTmJb0yq86Ovr8+BAwdEkcHo6GgmTJjAyZMnuXLlymOBlGxsbAgJCUGpVPZhnv93bf/SS+enn37iyy+/FBBoc3NzQcwnVVLcu3cvSqWS1157jbS0NJHNfPHiRVJSUujs7CQ9PV28iiTLor/+Evw3JSWFTz75hBEjRmBvb49cLn+oC0aCzbq5uREVFUVycjInT56koqKCiooKCgoKRP9z586xaNGiPi4g+KdPVSaTYWRkxJo1aygsLBQZ7e3t7XzyySfs379fuA7vJaLs6Ojg4MGDPPvss+KSvnjxokj8W7ZsGUeOHGH//v19aOv7k/Hcc8+hp6eHra0tRkZG4lW+YsUKjIyMOHLkCBMnTkRPT4+tW7eybds26uvrRQ5PeHi44LPqnagmmfJKpbKPjO+//75fGXp6evfJgJ6L+r8q43H79zeO5uZmNm7c2GcupYNSAmiEh4czbdo0goKC+PXXX3FzcxMB2+7ubgYPHszOnTv7yJD2QX8yjh49io2NDSqVivPnz9PU1MTrr7/O3/72t35lSG6moKAgXnrpJYKCgjh+/Dju7u5MnDhRjCM2Npbo6Og+45Bg4L37BwcHc+TIEWxsbDAyMuqjw1dffdVHB2mNpGZoaMhXX33FE088IfjFUlJSOHjwIAqFgjfffJNNmzb1kSF9V8OHDxf5QPPmzWPgwIGcPHkSXV1dNDQ0WLp0KWFhYXz00UccOnTovjXpT4aHhwcxMTG4uroKVy30IEGPHj3aRwe5XP7A88bGxgYTExO0tbUpKyujoKCA9evXo6amxtChQ/ugSqEHRTl79mxRniAhIYHAwEAcHR2JjY2ltbWVFStW9DkXpJIRzzzzDJcvX6asrIzw8HC+/vprFi1axAsvvICmpqbwYnR3d7NhwwasrKzYuHGj0EM6Wx5GufTv0P5ll05xcTF79+5lw4YN6Ovrk5SUJCoNAqIW+6RJkxg5ciRaWlqcOnWKyZMni1omeXl5fP7559TW1vLbb7+JD00mk93XX1tbm1OnTjFp0iT8/PyYNGkS48aNIy8vDxMTE2G6njx5UtC7R0REoK2tjYODA9BjgQ0fPly4Nt577z1u376Nubk5Dg4OD+xva2srLpzy8nK+/fZbxo8fz0cffSReRXK5/IGBbqllZGSQl5fHtGnTiI+P58cff2T+/PmEh4czbtw4tLS0+g3YP0hGQkICu3btYt68eQQHBzN69GgUCgVyuZwbN24IyG5TUxNhYWEsWrSIIUOGkJaWhp2dHSYmJg/MjNbQ0CA3N/cPyZDaf1XGH+kvXdK/F3iQnZ1NQEAAZmZmjwRQPAy84OXlhZ+fHwEBATzxxBMPXVM7OzsyMzPR0NDAyckJLS0tsrOzGTx4sHArPmwc9/ZXKpXk5OQKD5y/AAAUsklEQVTg4eEhXHWP0gF6DnJtbW0qKiooKysTlSi9vb0ZOXLkA/dm72TMXbt2oaWlha+vr+AalB6FwcHBTJgw4XfLuHLlCqNGjSIwMFDoca8nQjqs+zsvTp8+TXh4OAqFAmtra7Zt28a1a9fYsGGDAJD03jP9eUOSkpJ44oknGDRoEIGBgeIbkx6UmZmZ1NbWCktFSmAdNWoUFy5coLS0VFQb7g+kpK+vLx4hWlpa//YXDvwLLp3t27dTUFBAVVUVN27cYObMmdjb27Np0yZR8x56XgBSfsGxY8cEwqOgoICSkhLa29tpb29n9OjRAn3i7e0tECcStPHe/iEhISKA39XVJQgAH8cFs3fvXkaOHImrq6uIlzyuCyc7O5sDBw4wevRoAgMD8ff3Z/v27eKg6c/Kkl5D0ouwvr6eQ4cOCa6yiRMnEhcXR1VVFR4eHg8M2D9IRmJiIhMnTsTR0bFPETYdHR0qKyu5cuUKPj4+DBs2rA/r9ahRox6Y4Cg1bW3tf7mM39t/+/btfcAPvxd4MG3atEcCKB5XRl5enmAEOH/+/H0yerunsrOziY+P58SJE2RnZ9PW1kZhYeFDx/Gw/jNnznykDtK+kiyNrq4uioqKyM3NFeU4tm/fTmFhIQMHDnwkmMTY2JjvvvsOmUzG/v37KSwsZNKkSezbt4/Kyko8PDxobm7uV4+Hyairq6OgoAA/Pz+am5tJTEzs44mQ1v9B58WYMWPYuXMnTU1NGBgYUFpaytNPP01hYSFRUVHo6en1KV3Rnzdk3LhxKJVK2tvbSU5OFoX0gPvc5ydOnBCu75EjR/L999+LcENFRQWjR48WqEj4Z3zsP6n9j0MaJKK+CRMmiGJCCoUCGxsbjI2NBe5cXV0ddXV1Bg8ejIWFhQhOGhsbY2RkxKRJk3j77bfJy8tj3bp1NDY28v7774sCawqFot/+vQn+1NTUmDRpErGxsbS1tWFsbExGRoYomfvqq68KKpgBAwYwbdo0oC82/nH7S3BSQBRl0tHREXGo/gLdiYmJYizQU3WxpaUFd3d3Nm/eLGq3m5mZiZfOH5HRu0mJjmPGjEFfX58vv/yy33V8VDGnfwcZv7f/o8APiYmJwiqVXLJS3ZQdO3Zgbm7+3y5D4vi7V4YEfrC3t2fRokWEhIQwfPhwdu/ejaWl5SN1eFj/x9UB/snQoKbWUzY6LCxMgAskGTKZ7IF7U9LD19eXdevWifo3W7duxczMDD09PbG/HzUX/cmwsLAQD1mp2N2Dvo/+zgsJzCOXy3n++ecpLi5m+fLlbNiwgbS0ND7++GOOHDki9pC7uzsrVqzAx8cHHx8ftmzZgkql4tixY8yfP59Lly7dtwd7u77Ly8sJCgoCevjVNm7cSH19PW1tbXz55Zf3MSD8p1048C9Ar2VkZNDW1kZwcLBAUEmJUQYGBmIBCgoKOHr0KG+88QZPP/206J+enk5raysjR44EwMrKip07d6KlpcWhQ4e4fPkynp6etLa2EhkZyeLFi/v0v/dl4OTkhJ2dHdHR0UycOJHQ0FAsLS3FYVxaWir4zezs7IC+7qrf0783sZ5MJuPatWuCe6mqqkqgX65evSoqN44YMYKsrCzi4uJYsGABP/74o0DtdHV1kZ6eTktLy39JRn9wShMTExYuXMjcuXPZu3cvnp6eglr/3jl4UPt3kPF7+v+euYyOjub111/vI0uKL/5PyZAC7kuWLBFISOiBzLu6uv7h/r9Hh5ycHE6dOsWSJUv6jOP3zmdUVBSLFy/uQwT6e+ezPxmPOxe9+997XmRkZNDc3ExQUBBLly5l+/btrF+/Hmtra06dOsXly5cpLCyks7OTiIgIlixZImJ77e3tLFmyhJaWFj777LP7SrT3bpmZmaj+T3vnGlRl9TXwHwgMKCCHEvUIKhIikgIhDA6QiNMYSd7CD+KlMYt0DM0hES+Ao1KTfzUmR8Ugb4dLSGgZqYEJ6AyCTiGg4f2SeAHMA4oYEJz3A+95hsMdwQPq/n2Cs5+9nv3sfc5ez9pr77XMzZHL5dJB19DQUGxsbKRrWvu9vkhovfW+vr5kZWVRW1sr/eB//fVXbG1t6devH4WFhRw5cgSZTMaMGTM06qpUKsmyUJvDffr0kXbTeHh4SGEi1EmcmtZv+mYwYMAARo8eTU5ODv/88w8BAQEab/9hYWEaSzJN6Up9dV90xMpSJ2UzNjaWQvfr6up2i4zW6Nu3L5s3b8bMzIwdO3ZQWVnZ6rW9WUZH63emL9WZVNWov1valGFtbd1l67ul+p1pw/DhwyUZXenP1n6rXZXRmZWMttpw6tQpampq8Pb2JiIiQkqk5+TkxJ07d5DJZMjlco3+jImJITs7Gw8PD4yMjJDL5dy+fZuUlBRu374t7b5Tz2VGRkZcu3aNdevWERcXh5eXlxQKSi3zRVc40AM+ncbOXfU+/qtXr0qOu5SUFOmcSEtO6qb1i4uLCQ8PZ8SIESQmJkp+m759+7bq5G762cCBA7l48SIZGRn4+Pg0u6atwHpdqd8ZR3fTiLzP4rBvTUZbmJmZYWtri6+vr7Tu3Fl6g4yO1O/KxoXu2EDRFRmNx7Kr9bv6HL2pP7urDVevXqW6ulo6ILp3715MTU1JSEigrq6u2Zyjo6PDlStXkMlk+Pn5ERMTQ2FhISdOnODRo0f89ttvVFdX4+DgIL0IX7p0iaysLLy9vQkNDZU2MDWN5v2io3Wlo3bu5uXlYW9vj7GxMQcOHGDXrl289957REREtJhTpaX6o0ePlhIUnTlzBhMTE8LDw6Vlu860ycnJiZiYGGkDQ+O8Ie0N9rPW7w3O9o7SHW9YvUFGe5ZdbxgPbW+geJlldGcbzp07x+jRozE1NSU/P5/09HTMzMwICwtrcc5JSkqivr4eZ2dnBg8eTFZWFuvWrWPatGno6+tz/vx5hv9/rqzk5GSmT5+On5+fFGKop6JAP2+0rnTUlkFRUREZGRlMmjQJV1dXZs+eLZ2kbStOUOP6mZmZTJo0CRcXF7y8vKT6zzJY+vr6uLm5UVlZSVxcHN7e3q0mVeuu+l21srpLhqCB3jIePWl9v2wyurMNjeccZ2dnJkyYIDn9W5pzDA0N+emnn/D19ZXSE6j9uiYmJqSmpvL+++9LEUXMzMyk83nQPS9pvZEeOaejtgxiY2OlLZdvvPEGtbW1GuHqO1K/rq6O+vp6Bg8e3OXB0sYSTlO6amV1lwxBA71lPHrK+n4ZZXRnG9RzVn19PXK5nLq6OmnHZ1M6syz3LEvfLyo9dji0sWUQHx+Pt7c3RkZGHe7sliwLdWyrrvI8l3BaoqtWVnfJEDTQW8ajJ6zvl1XG82qDoaFhu8vnnV2We9nRUbV3WEILdPWA04t4QKo1uuNZXqb+6Gl6y3j0ht/IyyJDm2148OAB+/bt4+HDh0RGRgINu9XU5+Zehi3QnaVXKB2BQCB4WamqqmL+/PnMnDkTBwcHHB0dNWIxvmq8WipWIBAItIz6nJhMJmPnzp1UVlZK2XNfRYSlIxAIBFpCLH0LpSMQCAQCLSKW1wQCgUCgNYTSEQgEAoHWEEpHIBAIBFpDKB3Bc6GoqKhZrh5/f3/27Nkj/f/nn3/i5+f3zPeYPXs2ubm5FBcX8/bbbz+znI5QUlLC6dOnAdi2bVur+XmelXnz5pGdnU1RUREbNmwAGgLhXrhwodm13333HZmZmR2WfezYMfz9/Zk9ezaBgYFUVFQAkJGRwaxZswgICGDZsmVSCgFt8/PPPwMNuZ6WLl0KQGhoKMnJyT3SHsHzRSgdwXNh1KhR1NbWcuPGDQAqKiooLy+XkmYBZGdnS3mReju5ubnk5OQ89/vY29sTFhYGQHp6On/99VezawIDA/H29u6QvPLyctavX09MTAyJiYlYW1ujUCiorq4mLCyMqKgoEhISGDBgAHv37u3GJ+kYdXV17NixA2hIE/Ltt99qvQ0C7aL1JG6CniM3N5fo6GgGDRpEYWEhjo6O2NnZkZ6eTnl5OTExMQwaNIicnBy2b9+OSqVCT0+PDRs2YGVlRXp6OrGxsRgYGFBXV8emTZuwtLRk3rx5jB8/nry8PG7evElQUBBTp07Fw8OD7OxsrK2tycnJYdKkSWRmZkonstWphwF27NhBZmYmenp62NrasnbtWkpKSli8eDEjR47E1taWDz/8kOXLl6NUKhk2bJiUAK8tjhw5QlxcHCqVCnNzczZu3IhMJsPFxYVFixZx6tQpysrKiIqKws7OjqysLLZs2UL//v3x8vIiLi6O+Ph4oqKiUKlUUoyskpISli5dyvXr13FzcyM8PFzjvpcvXyY8PBx9fX3+/fdflixZgre3Nz4+Pvj5+ZGfn49SqWT16tW4u7trjFFUVBQhISHExcVhbGyMoaGhRu6d0NBQXFxcGD9+PIsXL8bT05OCggKePHnCrl27NOKK9e/fn7S0NClp32uvvUZJSQnnzp3D2tpaygvz7rvvsmXLFhYtWqTxHBkZGWzevBkLCwucnJxISUnh5MmTUhtmzZoFNKRpvnDhAuXl5YSEhPDff/9RWVnJ/PnzmT59OgcPHiQ7O5v6+npu3LjBkCFD2LZtG6tXr+bOnTt89NFHrF+/noCAAE6ePNnuGJqYmLB27Vpu3LiBjo4O9vb2REREtPt9EPQ8wtJ5xSgoKGDlypWkpKTwyy+/YGpqikKhwMHBgWPHjvH06VMiIiLYtm0bcXFxzJ07V0or/ujRI7755hsUCgUTJkwgPj5ekltVVUVMTAyRkZHExsYC4OnpKS1JZWdn4+7uzptvvsm5c+eoqqri0qVLuLq6kpeXR1paGvHx8SQkJKBUKklNTQXg2rVrLFmyhEWLFnH48GEMDQ1JSkriiy++4MqVK20+671794iOjmbv3r0kJibi5ubGrl27AKisrGTkyJHs37+fKVOmkJycjEqlIiIigk2bNqFQKHj8+DEAVlZWzJgxg6lTp7JgwQIAbt26xdatW0lJSeHQoUMolUqNex84cAAfHx8UCgXR0dGUl5dLZWZmZuzbt49Vq1bx9ddft9h2Z2dnvLy8+Pjjj5sle2vMtWvXmDlzJvHx8djb23P06FGNch0dHUnhVFRU8OOPPzJt2jRKS0ultNLQYGWUlpY2k79u3Tq2bt3Knj17MDU1bbUdakpLS5kzZw779+8nOjqar776SirLy8vjyy+/5ODBg1y8eJGioiKCgoIwNzdn9+7dLcprbQwvX75Mfn4+SUlJ/PDDD9jb20vjJejdCEvnFcPGxkZ6WzczM8PZ2RmAgQMHUllZyZUrVygrKyMoKAhAiqILDamfV65ciUqloqysTKoL4ObmBjSkD1f7DDw8PNi4cSN1dXWcPXuWFStW8ODBA7Kzs3n69ClOTk4YGBiQn5+Pq6urFI/Kzc2NwsJCXF1d6d+/PyNGjAAarAcXFxcALCwspM9bIy8vj7KyMhYuXAhATU0NlpaWUrnawpDL5dy6dQulUklVVRWjRo0CYPLkyZK/oSkuLi7o6emhp6eHTCbj8ePHGnlZJk+eTGhoKHfv3mXixIka2TXVqaHfeustrl692uYztIdMJsPW1lZ6jsbKrTElJSUEBgYSGBjI2LFjuXXrlkZ5S+H9lUolT58+xc7ODmgYz8Y+uZawsLAgNjaW2NhY+vTpo9GesWPHStHXBw8eTEVFRbuKrLUxtLGxQSaT8cknnzBx4kR8fX0xMTFpU5agdyCUzitG09PQjf9XqVQYGBggl8tRKBQa19XW1vL5559z6NAhhg8fTlxcHOfPn5fK9fT0NOQAmJubY2lpSUZGBqamphgbG+Pu7s6qVauoqamR/DlNJ7vGE6BaEak/bxwcUZ3KojUMDAwYO3asZN201RcqlarZxNvWyfGmZU3PWLu6upKamsrp06c5ePAghw8fZsuWLRrt7o48R+21Axoc9AsWLGD58uW88847QMOk39iyKS0tZdCgQW3Katz3jdtdU1Mj/R0VFcWwYcPYunUrT548kRKSdbStTWlrDBMSErhw4QIZGRn4+/uTmJiIhYVFuzIFPYtYXhNoMHz4cJRKJZcvXwbg7NmzJCUl8eTJE3R1dRkyZAjV1dX8/vvvGpNNa3h5ebF7927JqrCysuLhw4ecOXNGeuN3cnIiNzdXyhV/+vRpHB0dm8mysbEhLy8PaFh2UW9SaI0xY8ZQUFBAWVkZAEePHuX48eOtXi+TydDV1eX69esApKWlSWU6OjpSTvuOoFAouH//Pj4+PkRGRpKfny+VqTck/PHHH5IV0RI6OjpSn3SF4OBgVqxYISkcaLA6iouL+fvvvwE4fPhwswRnMplMSvcMcOLECamsX79+3Lt3D2gYL7USevDggWR5paamoqur2+b3RFdXt81+bW0MCwsLOXToEA4ODnz22Wc4ODhw8+bNjnaJoAcRlo5AA0NDQ/73v/+xZs0aKdfH+vXrMTMzw8/PD39/f+RyOQsXLiQkJKSZD6Epnp6e7Ny5U9oKCw1KJicnR1oec3R0ZMqUKcyZMwddXV0cHBzw8/Pj7t27GrKmTZvGiRMnCAgIwNLSkjFjxkhlDx8+ZN68edL/Y8aMISQkhDVr1vDpp59iZGSEoaFhqz4UaJgAV69ezZIlS5DL5YwbN06y4MaNG8fy5cvR19fvUOysESNGEBwcTL9+/aivryc4OFgqUy913b9/v03nt7u7O5s2bUKlUjFnzpx279kSBQUF5OXloVKpJL/JyJEjCQsLIzIykuDgYPr06cPQoUOZO3euRl0dHR3Wrl3LsmXLsLCw0Ohvf39/li1bxtmzZ/H09JSWtubOncuGDRtITk7mgw8+YPz48QQHBzfbPq/GwsKC119/nZkzZ7Y4NgMHDmxxDPX19dm+fTtJSUkYGBgwdOhQDatK0HsRsdcEgkYcP34cOzs7rKysSEtLIykpie+//77b5Pv4+LBnzx4pbfGLRHFxcYu7ywSCziAsHYGgEfX19QQFBWFsbExdXR09lFhXIHhpEZaOQCAQCLSG2EggEAgEAq0hlI5AIBAItIZQOgKBQCDQGkLpCAQCgUBrCKUjEAgEAq0hlI5AIBAItMb/AYOBMzk2kFOBAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('meanWordLength', 'recommendations', df=X.loc[X.recognizedWordCount > 10, :])" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Let's finish the IDF measures with max. The idea is that if a comment contains one very rare word, it is more likely to get upvotes. Yep, that does seem to be true and ditto for max word length and even our measure of grammar complexity (very crude) - max sentence length:" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('MaxIdf', 'recommendations')" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('maxSentLength', 'recommendations')" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('maxWordLength', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Spelling next: generally the higher the number of spelling mistakes, the lower the upvote count. The number of comments with no spelling erorrs is very high, which is why the value at zero is both lower and has a much smaller error bar. " ] }, { "cell_type": "code", "execution_count": 88, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('commentSpellErrorsPercent', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Finally, among the parts-of-speech tags there is a notable leader - 'NNP,' which stands for names of people, organizations or places (proper nouns). Too many names or too few leads to lower upvote totals. Naturally, this could be a function of the types of articles that include many names - politics & news. It is not due to very short comments, however." ] }, { "cell_type": "code", "execution_count": 89, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('NNP_Percent', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "**Don't be too negative or too positive, be slightly positive.**\n", " \n", "Finally, let's look at the tone of the comments - polarity measures positive or negative affect, essentially what is the ratio of negative to positive words, on a scale from -1 to 1. It seems that overly critical or overly effusive sounding comments don't fare well with the audience - there is an optimum close to neutrality, erring slightly on the side of positive. The dip at zero is simply due to the large number of cases there." ] }, { "cell_type": "code", "execution_count": 90, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "quantile_plotter('commentPolarity', 'recommendations')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "OK, one last question: does the time of day and day of the week matter and if so how? Some of the dummies based on these features appear to give the gradient boost sizeable gain." ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 91, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "categories = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']\n", "df['approveDayCat'] = df.approveDay.astype('category',\n", " ordered=True,\n", " categories=categories)\n", "sns.pointplot(x = 'approveDayCat',\n", " y = 'recommendations',\n", " data=df,\n", " join=False\n", ")" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Hmm, I didn't expect this - apparently there is a build-up in terms of comments (and probably reading of articles) from Thursday to Saturday; Sunday is the worst day to comment and then the first three work days see a gradual descent to the weekly nadir on Wednesday. The differences between the days of the week are not large, but it is rather intriguing that they exist." ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "Finally, these are the hours of the day at which comments get the highest upvote counts: clear spike around 9AM, while most comments are posted in the early afternoon. Does this mean that the early bird gets the upvote worm or is simply a function of NYTimes reporters commenting in those hours?" ] }, { "cell_type": "code", "execution_count": 92, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "(array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,\n", " 17, 18, 19, 20, 21, 22, 23]), )" ] }, "execution_count": 92, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df['approveHourCat'] = df.approveTime.dt.hour\n", "sns.pointplot(x = 'approveHourCat',\n", " y = 'recommendations',\n", " data=df,\n", " join=False)\n", " \n", "plt.xticks(rotation=30)" ] }, { "cell_type": "code", "execution_count": 93, "metadata": { "autoscroll": false, "ein.hycell": false, "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 93, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.countplot('approveHourCat', data=df, color='gray')" ] }, { "cell_type": "markdown", "metadata": { "ein.tags": "worksheet-0", "slideshow": { "slide_type": "-" } }, "source": [ "### This is all, hope you enjoyed it :)\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2" }, "name": "NYTCommentsNotebook.ipynb" }, "nbformat": 4, "nbformat_minor": 2 }