{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%matplotlib inline\n",
      "import pylab as pl\n",
      "import numpy as np\n",
      "\n",
      "# Some nice default configuration for plots\n",
      "pl.rcParams['figure.figsize'] = 10, 7.5\n",
      "pl.rcParams['axes.grid'] = True"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 0
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Large Scale Text Classification for Sentiment Analysis"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Outline of the Session"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Limitations of the Vocabulary-Based Vectorizer\n",
      "- The **Hashing Trick**\n",
      "- **Online / Streaming** Text Feature Extraction and Classification\n",
      "- **Parallel** Text Feature Extraction and Classification"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Scalability Issues"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `sklearn.feature_extraction.text.CountVectorizer` and `sklearn.feature_extraction.text.TfidfVectorizer` classes suffer from a number of scalability issues that all stem from the internal usage of the `vocabulary_` attribute (a Python dictionary) used to map the unicode string feature names to the integer feature indices.\n",
      "\n",
      "The main scalability issues are:\n",
      "\n",
      "- **Memory usage of the text vectorizer**: the all the string representations of the features are loaded in memory\n",
      "- **Parallelization problems for text feature extraction**: the `vocabulary_` would be a shared state: complex synchronization and overhead\n",
      "- **Impossibility to do online or out-of-core / streaming learning**: the `vocabulary_` needs to be learned from the data: its size cannot be known before making one pass over the full dataset\n",
      "    \n",
      "    \n",
      "To better understand the issue let's have a look at how the `vocabulary_` attribute work. At `fit` time the tokens of the corpus are uniquely indentified by a integer index and this mapping stored in the vocabulary:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.feature_extraction.text import CountVectorizer\n",
      "\n",
      "vectorizer = CountVectorizer(min_df=1)\n",
      "\n",
      "vectorizer.fit([\n",
      "    \"The cat sat on the mat.\",\n",
      "])\n",
      "vectorizer.vocabulary_"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/usr/lib/python2.7/dist-packages/scipy/stats/distributions.py:32: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility\n",
        "  from . import vonmises_cython\n",
        "/usr/lib/python2.7/dist-packages/scipy/spatial/__init__.py:88: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility\n",
        "  from .ckdtree import *\n",
        "/usr/lib/python2.7/dist-packages/scipy/spatial/__init__.py:89: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility\n",
        "  from .qhull import *\n",
        "/usr/lib/python2.7/dist-packages/scipy/stats/stats.py:251: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility\n",
        "  from ._rank import rankdata, tiecorrect\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/varoquau/dev/numpy/numpy/core/fromnumeric.py:2499: VisibleDeprecationWarning: `rank` is deprecated; use the `ndim` attribute or function instead. To find the rank of a matrix see `numpy.linalg.matrix_rank`.\n",
        "  VisibleDeprecationWarning)\n"
       ]
      },
      {
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "{u'cat': 0, u'mat': 1, u'on': 2, u'sat': 3, u'the': 4}"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The vocabulary is used at `transform` time to build the occurence matrix:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X = vectorizer.transform([\n",
      "    \"The cat sat on the mat.\",\n",
      "    \"This cat is a nice cat.\",\n",
      "]).toarray()\n",
      "\n",
      "print(len(vectorizer.vocabulary_))\n",
      "print(vectorizer.get_feature_names())\n",
      "print(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "5\n",
        "[u'cat', u'mat', u'on', u'sat', u'the']\n",
        "[[1 1 1 1 2]\n",
        " [2 0 0 0 0]]\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's refit with a slightly larger corpus:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "vectorizer = CountVectorizer(min_df=1)\n",
      "\n",
      "vectorizer.fit([\n",
      "    \"The cat sat on the mat.\",\n",
      "    \"The quick brown fox jumps over the lazy dog.\",\n",
      "])\n",
      "vectorizer.vocabulary_"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "{u'brown': 0,\n",
        " u'cat': 1,\n",
        " u'dog': 2,\n",
        " u'fox': 3,\n",
        " u'jumps': 4,\n",
        " u'lazy': 5,\n",
        " u'mat': 6,\n",
        " u'on': 7,\n",
        " u'over': 8,\n",
        " u'quick': 9,\n",
        " u'sat': 10,\n",
        " u'the': 11}"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `vocabulary_` is the (logarithmically) growing with the size of the training corpus. Note that we could not have built the vocabularies in parallel on the 2 text documents as they share some words hence would require some kind of shared datastructure or synchronization barrier which is complicated to setup, especially if we want to distribute the processing on a cluster.\n",
      "\n",
      "With this new vocabulary, the dimensionality of the output space is now larger:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X = vectorizer.transform([\n",
      "    \"The cat sat on the mat.\",\n",
      "    \"This cat is a nice cat.\",\n",
      "]).toarray()\n",
      "\n",
      "print(len(vectorizer.vocabulary_))\n",
      "print(vectorizer.get_feature_names())\n",
      "print(X)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "12\n",
        "[u'brown', u'cat', u'dog', u'fox', u'jumps', u'lazy', u'mat', u'on', u'over', u'quick', u'sat', u'the']\n",
        "[[0 1 0 0 0 0 1 1 0 0 1 2]\n",
        " [0 2 0 0 0 0 0 0 0 0 0 0]]\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The Sentiment 140 Dataset"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To illustrate the scalabitiy issues of the vocabulary-based vectorizers, let's load a more reallistic dataset for a classical text classification task: sentiment analysis on tweets. The goald is to tell appart negative from positive tweets on a variety of topics.\n",
      "\n",
      "Assuming that the `../fetch_data.py` script was run successfully the following files should be available:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import os\n",
      "\n",
      "sentiment140_folder = os.path.join('datasets', 'sentiment140')\n",
      "training_csv_file = os.path.join(sentiment140_folder, 'training.1600000.processed.noemoticon.csv')\n",
      "testing_csv_file = os.path.join(sentiment140_folder, 'testdata.manual.2009.06.14.csv')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Those files were downloaded from the research archive of the http://www.sentiment140.com/ project. The first file was gathered using the twitter streaming API by running stream queries for the positive \":)\" and negative \":(\" emoticons to collect tweets that are explicitly positive or negative. To make the classification problem non-trivial, the emoticons were stripped out of the text in the CSV files:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "!ls -lh datasets/sentiment140/training.1600000.processed.noemoticon.csv"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "-rw-r--r-- 1 varoquau varoquau 228M Jul 21 10:01 datasets/sentiment140/training.1600000.processed.noemoticon.csv\r\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's parse the CSV files and load everything in memory. As loading everything can take up to 2GB, let's limit the collection to 100K tweets of each (positive and negative) out of the total of 1.6M tweets."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "FIELDNAMES = ('polarity', 'id', 'date', 'query', 'author', 'text')\n",
      "\n",
      "def read_csv(csv_file, fieldnames=FIELDNAMES, max_count=None,\n",
      "             n_partitions=1, partition_id=0):\n",
      "    \n",
      "    import csv  # put the import inside for use in IPython.parallel\n",
      "    \n",
      "    texts = []\n",
      "    targets = []\n",
      "    with open(csv_file, 'rb') as f:\n",
      "        reader = csv.DictReader(f, fieldnames=fieldnames,\n",
      "                                delimiter=',', quotechar='\"')\n",
      "        pos_count, neg_count = 0, 0\n",
      "        for i, d in enumerate(reader):\n",
      "            if i % n_partitions != partition_id:\n",
      "                # Skip entry if not in the requested partition\n",
      "                continue\n",
      "\n",
      "            if d['polarity'] == '4':\n",
      "                if max_count and pos_count >= max_count / 2:\n",
      "                    continue\n",
      "                pos_count += 1\n",
      "                texts.append(d['text'])\n",
      "                targets.append(1)\n",
      "\n",
      "            elif d['polarity'] == '0':\n",
      "                if max_count and neg_count >= max_count / 2:\n",
      "                    continue\n",
      "                neg_count += 1\n",
      "                texts.append(d['text'])\n",
      "                targets.append(-1)\n",
      "\n",
      "    return texts, targets"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%time text_train_all, target_train_all = read_csv(training_csv_file, max_count=200000)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 7.94 s, sys: 68.2 ms, total: 8.01 s\n",
        "Wall time: 8 s\n"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(text_train_all), len(target_train_all)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "(200000, 200000)"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's display the first samples:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for text in text_train_all[:3]:\n",
      "    print(text + \"\\n\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D\n",
        "\n",
        "is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!\n",
        "\n",
        "@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds\n",
        "\n"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(target_train_all[:3])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[-1, -1, -1]\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A polarity of \"0\" means negative while a polarity of \"4\" means positive. All the positive tweets are at the end of the file:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for text in text_train_all[-3:]:\n",
      "    print(text + \"\\n\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Okie doke!! Time for me to escape for the North while Massa's back is turned. Be on when I get home folks \n",
        "\n",
        "finished the lessons, hooray! \n",
        "\n",
        "Some ppl are just fucking KP0. Cb ! Stop asking me laa.. I love my boyfriend and thats it. \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(target_train_all[-3:])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[1, 1, 1]\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's split the training CSV file into a smaller training set and a validation set with 100k random tweets each:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.cross_validation import train_test_split\n",
      "\n",
      "text_train_small, text_validation, target_train_small, target_validation = train_test_split(\n",
      "    text_train_all, target_train_all, test_size=.5, random_state=0)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(text_train_small)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "100000"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Let's make numpy arrays out of these \n",
      "target_train_small = np.array(target_train_small)\n",
      "target_validation = np.array(target_validation)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.sum(target_train_small == -1), np.sum(target_train_small == 1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 19,
       "text": [
        "(50195, 49805)"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(text_validation)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 20,
       "text": [
        "100000"
       ]
      }
     ],
     "prompt_number": 18
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.sum(target_validation == -1), np.sum(target_validation == 1)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 21,
       "text": [
        "(49805, 50195)"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's open the manually annotated tweet files. The evaluation set also has neutral tweets with a polarity of \"2\" which we ignore. We can build the final evaluation set with only the positive and negative tweets of the evaluation CSV file:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "text_test_all, target_test_all = read_csv(testing_csv_file)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(text_test_all), len(target_test_all)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "(359, 359)"
       ]
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "The Hashing Trick"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To workaround the limitations of the vocabulary-based vectorizers, one can use the hashing trick. Instead of building and storing an explicit mapping from the feature names to the feature indices in a Python dict, we can just use a hash function and a modulus operation:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.utils.murmurhash import murmurhash3_bytes_u32\n",
      "\n",
      "for word in \"the cat sat on the mat\".split():\n",
      "    print(\"{0} => {1}\".format(\n",
      "        word, murmurhash3_bytes_u32(word, 0) % 2 ** 20))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "the => 761698\n",
        "cat => 300839\n",
        "sat => 122804\n",
        "on => 735689\n",
        "the => 761698\n",
        "mat => 122997\n"
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This mapping is completly stateless and the dimensionality of the output space is explicitly fixed in advance (here we use a modulo `2 ** 20` which means roughly 1M dimensions). The makes it possible to workaround the limitations of the vocabulary based vectorizer both for parallelizability and online / out-of-core learning."
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The `HashingVectorizer` class is an alternative to the `TfidfVectorizer` class with `use_idf=False` that internally uses the murmurhash hash function:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.feature_extraction.text import HashingVectorizer\n",
      "\n",
      "h_vectorizer = HashingVectorizer(charset='latin-1')\n",
      "h_vectorizer"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/varoquau/dev/scikit-learn/sklearn/feature_extraction/text.py:389: DeprecationWarning: The charset parameter is deprecated as of version 0.14 and will be removed in 0.16. Use encoding instead.\n",
        "  DeprecationWarning)\n"
       ]
      },
      {
       "output_type": "pyout",
       "prompt_number": 25,
       "text": [
        "HashingVectorizer(analyzer=u'word', binary=False, charset=None,\n",
        "         charset_error=None, decode_error=u'strict',\n",
        "         dtype=<type 'numpy.float64'>, encoding='latin-1',\n",
        "         input=u'content', lowercase=True, n_features=1048576,\n",
        "         ngram_range=(1, 1), non_negative=False, norm=u'l2',\n",
        "         preprocessor=None, stop_words=None, strip_accents=None,\n",
        "         token_pattern=u'(?u)\\\\b\\\\w\\\\w+\\\\b', tokenizer=None)"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It shares the same \"preprocessor\", \"tokenizer\" and \"analyzer\" infrastructure:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "analyzer = h_vectorizer.build_analyzer()\n",
      "analyzer('This is a test sentence.')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 26,
       "text": [
        "[u'this', u'is', u'test', u'sentence']"
       ]
      }
     ],
     "prompt_number": 24
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can vectorize our datasets into a scipy sparse matrix exactly as we would have done with the `CountVectorizer` or `TfidfVectorizer`, except that we can directly call the `transform` method: there is no need to `fit` as `HashingVectorizer` is a stateless transformer:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%time X_train_small = h_vectorizer.transform(text_train_small)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 1.82 s, sys: 8.17 ms, total: 1.83 s\n",
        "Wall time: 1.82 s\n"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The dimension of the output is fixed ahead of time to `n_features=2 ** 20` by default (nearly 1M features) to minimize the rate of collision on most classification problem while having reasonably sized linear models (1M weights in the `coef_` attribute):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X_train_small"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "<100000x1048576 sparse matrix of type '<type 'numpy.float64'>'\n",
        "\twith 1184803 stored elements in Compressed Sparse Row format>"
       ]
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As only the non-zero elements are stored, `n_features` has little impact on the actual size of the data in memory. We can combine the hashing vectorizer with a Passive-Aggressive linear model in a pipeline:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.linear_model import PassiveAggressiveClassifier\n",
      "from sklearn.pipeline import Pipeline\n",
      "\n",
      "h_pipeline = Pipeline((\n",
      "    ('vec', HashingVectorizer(charset='latin-1')),\n",
      "    ('clf', PassiveAggressiveClassifier(C=1, n_iter=1)),\n",
      "))\n",
      "\n",
      "%time h_pipeline.fit(text_train_small, target_train_small).score(text_validation, target_validation)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 3.96 s, sys: 32.6 ms, total: 3.99 s\n",
        "Wall time: 3.95 s\n"
       ]
      },
      {
       "output_type": "pyout",
       "prompt_number": 29,
       "text": [
        "0.74768000000000001"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's check that the score on the validation set is reasonably in line with the set of manually annotated tweets:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "h_pipeline.score(text_test_all, target_test_all)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 30,
       "text": [
        "0.74930362116991645"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As the `text_train_small` dataset is not that big we can still use a vocabulary based vectorizer to check that the hashing collisions are not causing any significative performance drop on the validation set (**WARNING** this is twice as slow as the hashing vectorizer version, skip this cell if your computer is too slow):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.feature_extraction.text import TfidfVectorizer\n",
      "\n",
      "vocabulary_vec = TfidfVectorizer(charset='latin-1', use_idf=False)\n",
      "vocabulary_pipeline = Pipeline((\n",
      "    ('vec', vocabulary_vec),\n",
      "    ('clf', PassiveAggressiveClassifier(C=1, n_iter=1)),\n",
      "))\n",
      "\n",
      "%time vocabulary_pipeline.fit(text_train_small, target_train_small).score(text_validation, target_validation)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 3.27 s, sys: 60.7 ms, total: 3.33 s\n",
        "Wall time: 3.29 s\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stderr",
       "text": [
        "/home/varoquau/dev/scikit-learn/sklearn/feature_extraction/text.py:620: DeprecationWarning: The charset parameter is deprecated as of version 0.14 and will be removed in 0.16. Use encoding instead.\n",
        "  DeprecationWarning)\n"
       ]
      },
      {
       "output_type": "pyout",
       "prompt_number": 31,
       "text": [
        "0.74802000000000002"
       ]
      }
     ],
     "prompt_number": 29
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We get almost the same score but almost twice as slower with also a big, slow to (un)pickle datastructure in memory:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "len(vocabulary_vec.vocabulary_)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 32,
       "text": [
        "91405"
       ]
      }
     ],
     "prompt_number": 30
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "More info and reference for the original papers on the Hashing Trick in the answers to this http://metaoptimize.com/qa question: [What is the Hashing Trick?](http://metaoptimize.com/qa/questions/6943/what-is-the-hashing-trick)."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Out-of-Core learning"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Out-of-Core learning is the task of training a machine learning model on a dataset that does not fit in the main memory. This requires the following conditions:\n",
      "    \n",
      "- a **feature extraction** layer with **fixed output dimensionality**\n",
      "- knowing the list of all classes in advance (in this case we only have positive and negative tweets)\n",
      "- a machine learning **algorithm that supports incremental learning** (the `partial_fit` method in scikit-learn).\n",
      "\n",
      "Let us simulate an infinite tweeter stream that can generate batches of annotated tweet texts and there polarity. We can do this by recombining randomly pairs of positive or negative tweets from our fixed dataset:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from random import Random\n",
      "\n",
      "\n",
      "class InfiniteStreamGenerator(object):\n",
      "    \"\"\"Simulate random polarity queries on the twitter streaming API\"\"\"\n",
      "    \n",
      "    def __init__(self, texts, targets, seed=0, batchsize=100):\n",
      "        self.texts_pos = [text for text, target in zip(texts, targets)\n",
      "                               if target > 0]\n",
      "        self.texts_neg = [text for text, target in zip(texts, targets)\n",
      "                               if target <= 0]\n",
      "        self.rng = Random(seed)\n",
      "        self.batchsize = batchsize\n",
      "\n",
      "    def next_batch(self, batchsize=None):\n",
      "        batchsize = self.batchsize if batchsize is None else batchsize\n",
      "        texts, targets = [], []\n",
      "        for i in range(batchsize):\n",
      "            # Select the polarity randomly\n",
      "            target = self.rng.choice((-1, 1))\n",
      "            targets.append(target)\n",
      "            \n",
      "            # Combine 2 random texts of the right polarity\n",
      "            pool = self.texts_pos if target > 0 else self.texts_neg\n",
      "            text = self.rng.choice(pool) + \" \" + self.rng.choice(pool)\n",
      "            texts.append(text)\n",
      "        return texts, targets\n",
      "\n",
      "infinite_stream = InfiniteStreamGenerator(text_train_small, target_train_small)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 31
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "texts_in_batch, targets_in_batch = infinite_stream.next_batch(batchsize=3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 32
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "for t in texts_in_batch:\n",
      "    print(t + \"\\n\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "It's sunny outside, so @penguingirl74 and I are inside playing Gears 2 Co-Op  just ate Lucky Me! Curly Spaghetti. Instant Spaghetti finally made right. \n",
        "\n",
        "@mirandafox Poor Princess Twitchy   I was so shocked!! Just woke up feel crap \n",
        "\n",
        "@samkoh hahaha, but were you twittering and driving!? remind me never to ride in your car!  @ithinkminh well, you have till Oct to get there \n",
        "\n"
       ]
      }
     ],
     "prompt_number": 33
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "targets_in_batch"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 36,
       "text": [
        "[1, -1, 1]"
       ]
      }
     ],
     "prompt_number": 34
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now use our infinte tweet source to train an online machine learning algorithm using the hashing vectorizer. Note the use of the `partial_fit` method of the `PassiveAggressiveClassifier` instance in place of the traditional call to the `fit` method that needs access to the full training set.\n",
      "\n",
      "From time to time, we evaluate the current predictive performance of the model on our validation set that is guaranteed to not overlap with the infinite training set source:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n_batches = 1000\n",
      "validation_scores = []\n",
      "training_set_size = []\n",
      "\n",
      "# Build the vectorizer and the classifier\n",
      "h_vectorizer = HashingVectorizer(charset='latin-1')\n",
      "clf = PassiveAggressiveClassifier(C=1)\n",
      "\n",
      "# Extract the features for the validation once and for all\n",
      "X_validation = h_vectorizer.transform(text_validation)\n",
      "classes = np.array([-1, 1])\n",
      "\n",
      "n_samples = 0\n",
      "for i in range(n_batches):\n",
      "    \n",
      "    texts_in_batch, targets_in_batch = infinite_stream.next_batch()    \n",
      "    n_samples += len(texts_in_batch)\n",
      "\n",
      "    # Vectorize the text documents in the batch\n",
      "    X_batch = h_vectorizer.transform(texts_in_batch)\n",
      "    \n",
      "    # Incrementally train the model on the new batch\n",
      "    clf.partial_fit(X_batch, targets_in_batch, classes=classes)\n",
      "    \n",
      "    if n_samples % 100 == 0:\n",
      "        # Compute the validation score of the current state of the model\n",
      "        score = clf.score(X_validation, target_validation)\n",
      "        validation_scores.append(score)\n",
      "        training_set_size.append(n_samples)\n",
      "\n",
      "    if i % 100 == 0:\n",
      "        print(\"n_samples: {0}, score: {1:.4f}\".format(n_samples, score))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "n_samples: 100, score: 0.5411\n",
        "n_samples: 10100, score: 0.7215"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 20100, score: 0.7413"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 30100, score: 0.7536"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 40100, score: 0.7528"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 50100, score: 0.7381"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 60100, score: 0.7560"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 70100, score: 0.7535"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 80100, score: 0.7543"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "n_samples: 90100, score: 0.7575"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n"
       ]
      }
     ],
     "prompt_number": 35
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can now plot the collected validation score values, versus the number of samples generated by the infinite source and feed to the model:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "pl.plot(training_set_size, validation_scores)\n",
      "pl.ylim(0.5, 1)\n",
      "pl.xlabel(\"Number of samples\")\n",
      "pl.ylabel(\"Validation score\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "pyout",
       "prompt_number": 38,
       "text": [
        "<matplotlib.text.Text at 0x8ea3d50>"
       ]
      },
      {
       "output_type": "display_data",
       "png": "iVBORw0KGgoAAAANSUhEUgAAAm4AAAHPCAYAAADqAFbFAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xl8VNX9//F3gAAJAQICsgQIm7LLJtTWJWotVivu1qUq\nLhRRWv2qXfxqW7f251b1K9YNixsuqLjhghU1bihYUVFAAQUkbILsm0i4vz8+ntw7k5lMCDNJDrye\nj0ceySyZOXM/d+793HM+99ysIAgCAQAAoNarU9MNAAAAQOWQuAEAAHiCxA0AAMATJG4AAACeIHED\nAADwBIkbAACAJzKSuJ177rnae++91adPn6TP+f3vf69u3bppv/3208cff5yJZgAAAOxWMpK4nXPO\nOZo8eXLSx19++WXNnz9f8+bN03333adRo0ZlohkAAAC7lYwkbgcddJCaNWuW9PEXXnhBZ599tiRp\nyJAhWrt2rVasWJGJpgAAAOw26tXEmy5ZskTt27cvu11QUKCSkhLtvffeMc/Lysqq7qYBAABUWaYv\nSFUjiZtU/oMlS9K4Ipe/rr76al199dU13QxUAbHzG/HzF7HzW3V0ONXIWaXt2rXT4sWLy26XlJSo\nXbt2NdEUZNDChQtrugmoImLnN+LnL2KHVGokcRs2bJgefvhhSdIHH3yg/Pz8csOkAAAAiJWRodLT\nTjtNb731llatWqX27dvrmmuu0Q8//CBJGjlypI466ii9/PLL6tq1qxo1aqQHHnggE81ADRs+fHhN\nNwFVROz8Rvz8ReyQSlZQi4vIsrKyqHEDAABeqI68hSsnIGOKi4trugmoImLnN+LnL2KHVEjcAAAA\nPMFQKQAAQBowVAoAAIAyJG7IGGo1/EXs/Eb8/EXskAqJGwAAgCeocQMAAEgDatwAAABQhsQNGUOt\nhr+Ind+In7+IHVIhcQMAAPAENW4AAABpQI0bAAAAypC4IWOo1fAXsfMb8fMXsUMqJG4AAACeoMYN\nAAAgDahxAwAAQBkSN2QMtRr+InZ+I37+InZIhcQNAADAE9S4AQAApAE1bgAAAChD4oaMoVbDX8TO\nb8TPX8QOqZC4AQAAeIIaNwAAgDSgxg0AAABlSNyQMdRq+IvY+Y34+YvYIRUSNwAAAE9Q4wYAAJAG\n1LgBAACgDIkbMoZaDX8RO78RP38RO6RC4gYAAOAJatwAAADSgBo3AAAAlCFxQ8ZQq+EvYuc34ucv\nYodUSNwAAAA8QY0bAABAGlDjBgAAgDIkbsgYajX8Rez8Rvz8ReyQCokbAACAJ6hxAwAASANq3AAA\nAFCGxA0ZQ62Gv4id34ifv4gdUiFxAwAA8AQ1bgAAAGlAjRsAAADKkLghY6jV8Bex8xvx8xexQyok\nbgAAAJ6gxg0AACANqHEDAABAGRI3ZAy1Gv4idn4jfv4idkiFxA0AAMAT1LgBAACkATVuAAAAKEPi\nhoyhVsNfxM5vxM9fxA6pkLgBAAB4gho3AACANKDGDQAAAGVI3JAx1Gr4i9j5jfj5i9ghFRI3AAAA\nT1DjBgAAkAbUuAEAAKAMiRsyhloNfxE7vxE/fxE7pELiBgAA4Alq3AAAANKAGjcAAACUIXFDxlCr\n4S9i5zfi5y9ih1RI3AAAADxBjRsAAEAaUOMGAACAMiRuyBhqNfxF7PxG/PxF7JAKiRsAAIAnqHED\nAABIA2rcAAAAUIbEDRlDrYa/iJ3fiJ+/iB1SIXEDAADwBDVuAAAAaeBtjdvkyZPVvXt3devWTTfe\neGO5x9esWaPjjz9e++23n4YMGaJZs2ZlohkAAAC7lbQnbqWlpRo9erQmT56s2bNn6/HHH9ecOXNi\nnvOPf/xDAwYM0KeffqqHH35YF198cbqbgVqAWg1/ETu/ET9/ETukkvbEbfr06eratasKCwuVnZ2t\nU089Vc8//3zMc+bMmaNDDz1UkrTvvvtq4cKFWrlyZbqbAgAAsFupl+4XXLJkidq3b192u6CgQNOm\nTYt5zn777adnnnlGBx54oKZPn65FixappKRELVu2LPd6w4cPV2FhoSQpPz9f/fr1U1FRkaTwyITb\ntfO2u6+2tIfblb9dVFRUq9rDbeLHbW7Xxtvu74ULF6q6pP3khIkTJ2ry5MkaO3asJGn8+PGaNm2a\nxowZU/acDRs26OKLL9bHH3+sPn366IsvvtD999+vvn37xjaOkxMAAIAnvDw5oV27dlq8eHHZ7cWL\nF6ugoCDmOY0bN9a4ceP08ccf6+GHH9bKlSvVuXPndDcFNSx6RAK/EDu/ET9/ETukkvbEbdCgQZo3\nb54WLlyobdu2acKECRo2bFjMc9atW6dt27ZJksaOHatDDjlEeXl56W4KAADAbiUj87i98soruuSS\nS1RaWqrzzjtPV1xxhe69915J0siRI/X+++9r+PDhysrKUu/evfXvf/9bTZs2Ld84hkoBAIAnqiNv\nYQJeAACANPCyxg1wqNXwF7HzG/HzF7FDKiRuAAAAnmCoFAAAIA0YKgUAAEAZEjdkDLUa/iJ2fiN+\n/iJ2SIXEDQAAwBPUuAEAAKQBNW4AAAAoQ+KGjKFWw1/Ezm/Ez1/EDqmQuAEAAHiCGjcAAIA0oMYN\nAAAAZUjckDHUaviL2PmN+PmL2CEVEjcAAABPUOMGAACQBtS4AQAAoAyJGzKGWg1/ETu/ET9/ETuk\nQuIGAADgCWrcAAAA0oAaNwAAAJQhcUPGUKvhL2LnN+LnL2KHVEjcAAAAPEGNGwAAQBpQ4wYAAIAy\nJG7IGGo1/EXs/Eb8/EXskAqJGwAAgCeocQMAAEgDatwAAABQhsQNGUOthr+Ind+In7+IHVIhcQMA\nAPAENW4AAABpQI0bAAAAypC4IWOo1fAXsfMb8fMXsUMqJG4AAACeoMYNAAAgDahxAwAAQBkSN2QM\ntRr+InZ+I37+InZIhcQNAADAE9S4AQAApAE1bgAAAChD4oaMoVbDX8TOb8TPX8QOqZC4AQAAeIIa\nNwAAgDSgxg0AAABlSNyQMdRq+IvY+Y34+YvYIRUSNwAAAE9Q4wYAAJAG1LgBAACgDIkbMoZaDX8R\nO78RP38RO6RC4gYAAOAJatwAAADSgBo3AAAAlCFxQ8ZQq+EvYuc34ucvYodUSNwAAAA8QY0bAABA\nGlDjBgAAgDIkbsgYajX8Rez8Rvz8ReyQCokbAACAJ6hxAwAASANq3AAAAFCGxA0ZQ62Gv4id34if\nv4gdUiFxAwAA8AQ1bgAAAGlAjRsAAADKkLghY6jV8Bex8xvx8xexQyokbgAAAJ6gxg0AACANqHED\nAABAGRI3ZAy1Gv4idn4jfv4idkiFxA0AAMAT1LgBAACkATVuAAAAKEPihoyhVsNfxM5vxM9fxA6p\nkLgBAAB4gho3AACANPC2xm3y5Mnq3r27unXrphtvvLHc46tWrdKRRx6pfv36qXfv3nrwwQcz0QwA\nAIDdStoTt9LSUo0ePVqTJ0/W7Nmz9fjjj2vOnDkxz7nzzjvVv39/ffLJJyouLtZll12m7du3p7sp\nqGHUaviL2PmN+PmL2CGVtCdu06dPV9euXVVYWKjs7Gydeuqpev7552Oe06ZNG61fv16StH79eu21\n116qV69eupsCAACwW0l7trRkyRK1b9++7HZBQYGmTZsW85wRI0bosMMOU9u2bbVhwwY9+eSTSV9v\n+PDhKiwslCTl5+erX79+KioqkhQemXC7dt5299WW9nC78reLiopqVXu4Tfy4ze3aeNv9vXDhQlWX\ntJ+cMHHiRE2ePFljx46VJI0fP17Tpk3TmDFjyp5z/fXXa9WqVbr99tv11Vdf6YgjjtCnn36qxo0b\nxzaOkxMAAIAnvDw5oV27dlq8eHHZ7cWLF6ugoCDmOVOnTtXJJ58sSerSpYs6deqkL7/8Mt1NQQ2L\nHpHAL8TOb8TPX8QOqaQ9cRs0aJDmzZunhQsXatu2bZowYYKGDRsW85zu3btrypQpkqQVK1boyy+/\nVOfOndPdFAAAgN1KyqHSL7/8UhdeeKGWL1+uWbNmaebMmXrhhRd01VVXJf2fV155RZdccolKS0t1\n3nnn6YorrtC9994rSRo5cqRWrVqlc845R99884127NihK664Qqeffnr5xjFUCgAAPFEdeUvKxO3g\ngw/WzTffrAsuuEAff/yxgiBQ7969NWvWrIw2TCJxAwAA/qgVNW6bN2/WkCFDYhqVnZ2d0UZh90Ct\nhr+Ind+In7+IHVJJmbi1bNlS8+fPL7v99NNPq02bNhltFAAAAMpLOVT61Vdf6be//a3ef/995efn\nq1OnTnr00UfL5lbLaOMYKgUAAJ6ojrylwgl4S0tLdffdd+v111/Xxo0btWPHDjVp0iSjDQIAAEBi\nFQ6V1q1bV++++66CIFBeXh5JG3YKtRr+InZ+I37+InZIJeUlr/r166djjz1WJ598snJzcyVZV+AJ\nJ5yQ8cYBAAAglLLGbfjw4fbErKyY+x944IGMNcqhxg0AAPiiVszjVpNI3AAAgC9qxTxuixcv1vHH\nH6+WLVuqZcuWOvHEE1VSUpLRRmH3QK2Gv4id34ifv4gdUkmZuJ1zzjkaNmyYli5dqqVLl+qYY47R\nOeecUx1tAwAAQETKodL99ttPn376acr7MoGhUgAA4ItaMVS611576ZFHHlFpaam2b9+u8ePHq0WL\nFhltFAAAAMpLmbiNGzdOTz75pFq3bq02bdroqaeeqpYzSuE/ajX8Rez8Rvz8ReyQSsp53AoLCzVp\n0qTqaAsAAAAqkLLG7ayzztIdd9yh/Px8SdKaNWt02WWXady4cZlvHDVuAADAE7Wixm3mzJllSZsk\nNWvWTDNmzMhoowAAAFBeysQtCAKtXr267Pbq1atVWlqa0UZh90Cthr+Ind+In7+IHVJJWeN22WWX\n6YADDtApp5yiIAj01FNP6corr6yOtgEAACCiUpe8mjVrlt544w1lZWXpsMMOU8+ePaujbdS4AQAA\nb1RH3pKyx+2rr75Sly5d1KtXL7355puaMmWK2rZtG1P3BgAAgMxLWeN2wgknqF69epo/f75Gjhyp\nxYsX6/TTT6+OtsFz1Gr4i9j5jfj5i9ghlZSJW506dVSvXj0988wz+t3vfqebb75Zy5Ytq462AQAA\nICJljduQIUN08cUX6x//+IcmTZqkTp06qXfv3vr8888z3zhq3AAAgCdqxTxu48aN0wcffKArr7xS\nnTp10oIFC3TmmWdmtFEAAOyMrCxp+vSabsWe5b33pIULpbVra7olFSstlb76qqZbkT4pE7devXrp\njjvu0GmnnSZJ6tSpk/70pz9lvGHwH7Ua/iJ2fttT45fJueFnz5Yeesj+XrFCWrIkM+/jS+zmzJEO\nPFDq1Ek64ICabk1yt9wi9e4tde0q7S4DeCkTNwAAaqMgkL7+Ory9dWv6X3/KFGn7dqlfP2n4cLv/\nz3+WbrpJevhhac2a9L6nL6K9bF98Yb9vukn68svqb8vbb4dtiPeHP4SPzZtXfW3KJBI3ZExRUVFN\nNwFVROxqr+JiaebMip+TKH5//7v06qvJ/ycIpE2bdqlpkqRvv5UyWQL9yCPSjh2WTJ1zjtSnj7Rl\niz1WlSG7226T1q2Lve+HH6SSEmnpUumII6Tnn5cGDZJatrSd/8SJ9ti110rPPrvrn0my2HTtGsau\nps8BLC214efbbpO+/z72sc8/D5e5c++91iP53nvV10bnvvukF14of/8PP8TenjZt19+rpKTme+5I\n3IBaaPPmmm5BYv/5j7RoUeLHgiB2g7Z1a/p7QCRLQGbPTv/r1kbxO0dJOvRQ6dxzd/61rrpKuuGG\nxI99/LG0//5SXt7Ov27U8cdLe+9tyVRVbN8uffddeLu42JImydatzz+XzjpLWr7cdtSzZtk69u23\n9pylSyv/XkEgvfGGdOml0gMPxD42caJ0yinSypV2+6WXbLkPHChdf72UnW1DpUuX2nciHd5+O6zD\n2r5datu2asOxc+dKZ59dcRLrvqu//3247OK5OFx6qfTRR/b3hg2W3PfpI02dGvv8Cy6w7+XChbH3\nl5ZKb71l7fnmm53/PE5JifSPf9iycT77TDrvPBu6drGK+vZbSz6d//5XWrXKEvXJk6WTTtr5drRv\nL02aZOvEhx/u/P+nQ8rE7csvv9SIESN0xBFH6NBDD9Whhx6qww47rDraBs/5UqtR2yxcKDVqZD0L\nixZl9uguPtlyksXuttukV15J/Fpnny0NHRrePvxw6Sc/Kf+8H34o38NRWdu2WfJx771V+/+qKi3d\n9Z6E0lLb6WzdKr34ot2u6LLPs2dL3bvHxsctt1atYp8bBNIdd4TPTRa/hg0Tv9dDD4U7Z5eYJ0oa\no048UfrjH+3v7dtt+Tz3XPh4SUnF/y/Z51+wILx9wQVSixb29/ffW7J03HGWsNWpI7ny6q++smG5\niy6yZNO1uTLv6YwbZ+uoJI0fH3uQ8eGH1qvpEqc335S6dLGk9MknLXmcPdvW5SlTysdx2zbprrsq\n/90dM0Z64gn3XsWaM8f+Xr5c6tHDXsvJyrJkMQisxizaq/T229Kxx9oQbkW9S8OGSYccYu/76afS\nQQeFsZSsFzGalLihxoEDpcaN7e/vvpNatw6f4y6oFI2nZLVwP/+5xapjRztI2LEj9TKJ99570pVX\nWrItSc88I/Xta3FcscKStHXrpE8+kS6+2NbfggLrxczKslq8O+6wXtN99pF++Uvp/fdtOU6fbr/H\njCmfeEp2X+SS7XrvPXutRL181SFl4nbyySdrwIABuv7663XzzTeX/QBI7ayzbMezM0NQ7kj5ueek\nwkLpwQcz0TJz0UVWu1NZS5cmTyYfeST2qPezz2ynINnR+fr19vcNN0jxF16ZNUu6//7U7//++1K9\netLrr1e+zRUJgsolkVOnlt9J7qx//9te49VXpWOOkUaMsJ3lK69IAwZIl1wS+/x33rEeCrcTl+zv\nunWlxYvD+x55RBo71nZWqdaznJzE93ftGv5dWGhF/rm5YW/M1q2W9LnPf8UVtuN8+217z3/8wz5b\n1MyZ9plcQf+DD5bvBXriCalzZ0vSvv/eEijn2mvt90EHWU+JJL38siVQf/mL7YDPPNMSt2++sfun\nTSufRL39tvSvf5X/zJ98Ev7dsKF02mm2I54/32K1aVOYrC9caO1s3tyWxeDBtt507mxJ9Mcfx772\nV1/Zd2vSpPLvW1ISfi8kaeNG6/lytXrr1oWfd/lyS5puv92+dy6BWrgwTJZd0rpsma1XixZJZ5xh\n378tW+xzuTiuWmXL58UXbf2S7Pa778YekP3rX5agHnig9P/+X9jDPW9e+N0vKbF1pV49a/v++9v9\n8YnbypXW1sces9sDBtg6/O9/h8/54gtr6333lV9ezqpV4fKRYmvpZs2y72h+vjRypCVVb79tj61Z\nI+21l7TffuHz3fJYsUK68UZpyBDrZR03ztr57rux792pk/TrX4ffr5tusuTZJXnHHWc9eE2bJm9/\nWgUpDBgwINVTMqYSzQMSKi21n5r2s5+5Pq0g2GuvIHjnndT/88EHQbD33kHQoIH9X35+EHzxRcX/\nU1pqr//KKzv3uaUgyMkJgvXr7f+mTrX7Fy8Ogn/+M/a5Dz4Yfpajjy7//lIQnHRSEPzxj0Hw0UdB\n0Lq13ffss/b7ttvsuTfcYLfXrg3//4EHguCgg8LbGzYEwYoV5dv7978HwWmn2WtXxpIlQTBhQvn7\nN260ZfqHP1hbon74ofzzX3zRnleZ+MXbti0InnnGlktubhD85CdhXOvWDZepFASnnBIEy5cHwcyZ\ndrthwyAYNSoI3n/fXuvRR4PgqKPs/h077L7o/y9dGgTvvhsEWVnl2yEFwemnx97XrVsQXHZZENx0\nUxCcc074Ol262O/584NgxowguP9+u3355baeNWpktwcNCoI+fWLb4H5uvdXW41NPDd//jjuC4Ouv\n7e8dO4Lgscfs79NOC5eJFASbNgXBcccFwZVXBkHfvkFw6aVBcO21QfDww0Fw3nn2nHfftdfdZ58g\nuP56e36vXkHw1FNBsG6dPXbNNWF73PL66U+DoGfPIDj22PCxP/85/Pugg2w5nXpqEOTlhfcvWmTv\n495bCoIWLYLg4ottvYx66y17/KqrwnXgv/+1v7t2tcdWrbLvTXFxELRsGb7Phx+G6+X991uspSAo\nKAifc//99p2V7Ds/fbr9fcgh9h733x8EZ50VtsO9nxS7TKQgePJJ+z1wYNj+Xr0stscfHwTPPx8E\nv/xlECxbZrH/z3+C4IwzbD0+66wgGDfO/mfUKHudJk2C4IUXguCll2yZP/xwEPTubev+kUeW34aM\nHRsE/foFwZ132v0bN9o6vmpVEEycGASzZ9vzrr7aHnfv9/vfJ17vJIvv0KGxt//2t8TPHTIkCH77\n2yDYb7/YOLhl5tbdQYOC4MsvYx8/8MDw8XBdyXzekrLH7ZhjjtG//vUvLVu2TKtXry77AWqzAw+U\nasOV2VxXvWRDC7/9ber/+f57qVs3G4Y49FDrEUtV6/Lqq/b6v/yldOutFT93yxY74nZHzl27Sk2a\nSPvuK/30pzascO+9scMzUnhGnWT1HdGeDTestmCBnX5/66023CtZjYwU9ra49432UqxfH1s7d+ut\nNiwVf/T+wQf2GVevTj0MVVwstWtnR8rbtoX3b9hgZ5p17y7FDx48+6zVL0XraCTr+ZCs96eydTHv\nvGNDO+PHSyecYD0ixx9vn0GS/vY369mK1oM9+aR03XXWo1a/vvUe3H13ON3C11/b8FBpqa0n8T1Y\nK1daD1eyZRM/VDpvnsVhyxZbVl9+aT0QrtZq2zbrITn/fLt9zz22nvXqJY0ebT1Dn31W/n26dLG4\nN25svRdu+a9ZI915p/29ZEnYa/L44/Z73TqpQwcbGv36axtq//Zb61Hp39962Nx655ZJXp493q6d\nLa+TT7Zl+9FH9luynkbXU7N2rf08/7wNvxYU2DLNzg7j9tBD0tVXW2/Y5ZdbD2+HDtZzI1lv31VX\n2VDrL35R/qQP16OzcqWtq9ddZ2UEpaXWo5edbUPCt99uyzC6Tj30kH0/O3e2dWaffez+aO3f6tXh\nCQPz50tPP21/d+hgvzt3tl7IDz6wntgDDpCuucYee/PN2LZu2GC/3Xr/xBPWAzZrlrWxZ0/rcSsp\nsaHOI46wZV1SYt/xc86x///b32yUYN99bSj26KOlRx+1UYcDD7TP0qtX+L4ffmhtHDHCej/nz7f7\nX3jBelbnzbPheLf9cL35GzfaZ09Ua+u2OYceakOqe+9tvdqXXmq9kZLdJ4XDvNOmSf/zP/b68XVy\nzz1n2xHJehb33Tf28Xfftfo9167quoR7ysTtwQcf1C233KKf/vSnGjhwoAYOHKhBgwZVR9vguZqs\ncSspkSZMyNzr/+//2nBEUVHi4SlXL7Nhg21EHNfdn8zcubZRatDAalC6dbONQarhvOeft41Pfn7q\nuazuucc2Yu413Y7XbTgl6d57i7VoUbiTdBt3JzvbdsJZWTZkNmuW3f/RR5YIvvtuuBE9/3wr/r7p\nJktM3GuNGGE1TXfcYYnbkiVhwjR3ru0k42vZ5syxYao6dVLXYE2ZEv69Zk049NikSeKJWr/6yj6L\nFFurJVni5nbsEydaDVEy7rj29dftfVzxeklJuBM+6SQbRuzb14YUBwyw+7OyrN1bttiOzz3f+fpr\n2yk3amTrXrQuSbKkduxYSSqOmabCJXLbt9uyPu+8sB5s40bbcebm2vtdeqkNWfXta+tinTqWgP3m\nN/Zcydbjik5AGDLEfs+caTvsvn3t9htvhMO87duHycQRR1gyFAS2XH7zG/vfwYPtvebNs++CZOtF\ngwbWLskSt+JiO+tz1ChLyufNs2H1E06wpLCwMEwON28OywOee87WtZ/9zGo0f/1r+3E76TfesATV\nlXVHE7frrrMkp2/fcJqJtWst6VyxwpKe2bPtJIfrrrPEy9VENWlivz/7zGLao4fd7tdPuvNOq3H7\nyU9i17vo+v7FF3aQJdnnct8T95wWLeyg5957bZ3o0MHqtyS7352E0rNn+H1ct87+77TT7Hu/ZYt9\nzk6dLBGdO9dOmHDLYdmy2AOBvfe2+joX++zsMNYtWthndomlZK/5u9+Ft11JgDvgdklUXp5tmyZO\ntP/fuNHi6k5a+d//teRZsu1akyaWIP7wg31Pb7zR1veBA2277MoFzjsvbH+3bva9ctzf558fbr/r\n1VM5OTl2MFG3rrVv7tzyz8mElInbwoULtWDBgpifr6MT5wDVaPNm29jH94jE69TJfic606giW7aU\n32nH27jRko3HHrOjLZe0ROXk2Bd5/Xo7one++y5xYe5HH9lOft99rfajfn070h81ypKxaM/KvfeG\nPRhBYEf7TzxhSdCHH9rOZulSS1Lefz/c2TruCHHevHAH8pOfWDtdYfjKlfbabufuNqqNGln9S6dO\nYRL60kvhxlqyxG3FijBx69LFjtS3bLGd4saNtsGcP98+y8UX2+uXllrvz9attsP73e+sFqikxBKa\nrCxLiho1kpo1s/oSl3x+/bVtQJ3zz7d6lZYt7fbMmdLBB4c9Ex99ZL1fThDYRn78eEsoJk6MXWYr\nVsSeaHH22WExedQHH9hOLQgsGVq3Lux9mT07TMRcgbdz99123xFH2LJfsMCWkesdkGy9WbbMdp6N\nGllPwdix1qPhLF1q97VqFVsb5w4kHn7YdmBTpoQ1XtOnW0+j26HVq2dJdYMGFrPsbPsM0d6SVavC\nRCrKrT/HHmvtzcmxujDXszZ7tvTUU7H/c/nltg7n5tp7XnKJHYRI9jlyciy+HTvafTfcEPt9yMuz\n5bL//tb23//e4jB/vn3WU0+1z3/hhfZ/GzZYjCVbj3JyLCEYO9aSIJfgSbbTdu8rWWzr1YvtWWnU\nKDx4mzDBEt9vv7VJX995J6zHbNHCeqAOPzzsPXvwQauNatfOlpfrFZKsl2rxYvuu168fu8yeeSZM\nFqdOtXXgP/+xgyP3XosW2fo/dGhs7dWiRZbISra9cQn+nDnlDxDbtbOkpLDQDlRd4taihX1fE53s\n4r5zDRqJdC3IAAAgAElEQVSE2zqX9Bx9tNSmTfjcaHH/F1/YiMSKFZa83XGH3Z+TYwcoy5fbdmfj\nxtgD4IEDbf0JAotj27bh9t8dMDgNGtjrNWhgZwefc4502WX2GaPftR49wsTYKS21ONx0kz1/xw5r\nS0mJPXbCCeFnz7SUidu2bdv0f//3fzrxxBN10kknacyYMfphVyp0scfIxFxgzzxjQxgjRthO/9hj\nw8fWrw9PN//uO9ug33BD8oLy6dMtGYj2nlx+ue3QN260o7glS2zDGi3c/c9/7It9zDF2NJ1saoqZ\nM20nEU3cduyIPTvJGTo0TH6mTLENS58+dgTetKntcF591RKxCy4Ik8WhQ6Ujj7SN0D77WNI0ZIht\ncDt2tI3P9dfHvpcbYnj//XDnO3SoHS26RK5RoyL95CdhwfSsWWGPy4UX2oY7OnQR1bKlbahdct25\ns/WGXHut3b96dTh8/POf22+XTA4ebIXvX34ZFoBHz+Zcs8Y2vM2aWSLh/v+uuywhWLTIXv/JJ23H\n5obKHnnEfkcn4IxuqDdtChOds88OhzMle/+SktjkdOhQO8OxTZuw0NwNq0m2Hn76qbXf7WRWrAgT\nN5fUOoMH286xdWvbEb33nn3O6Nmj55xj60GzZrHTdriEyiVN554rNWlSpJ/9zG6/+WZsb8L8+fY6\n0c8olT9xoX59i/9ee9kO+uCDw8c2bw53jlHt2oWfz03D4HZml11mSVSi/8nKsv/JzbW/R40KD05c\nj5NLEurVi00Y3ONuGKugwHqr/+//wgTNJc/z5sUmbvEJtDtASKZ5c/s80ee4xG3kSPtuLlgQm+g+\n/LDt2E86yZb5wIGxy/rrr8NlYMN3RZJU9v37/vvYIboOHcLE1SVE3bpZ0u9i4noGXdJ5222WNDlN\nm9p2bf/9Y8+ivOsu66FyXE+jG71o1iz29RMlbm49bNAgnIvOJYddutg2M5FFi8KTPQoLwx7ztWvD\nRLNNG/ueue1Wo0YWE2fvvS1x69bNnhv9jjs5OeHyv+suK+2Q7GDhzDPtIKagwBLv0tLwu7Npky23\nli3tPbOywqH26pYycRs1apRmzJihiy66SKNGjdJHH32kUaNGVUfb4Illy2KP7tNh8uTYuq7337ek\n59VX7YyyBx+0jeQLL1gP0Kmn2o7Ubai++86Oim+9NexSj3frrbbhjtZKTJ5sG8PrrpNee80+1403\nSkcdZUe/jzxidReHH27v3bu37VCj0xC4Hc7q1eV73CTbUEyaZLU4bgjLDZW4tjdoEN52PW5XXx3W\nwqxYES4XKZzWQLK2Ri1ebDvaW26xI9nly20n/M474YatSRPbmLkd2eDBtizvvNM2VOeeG9vj0qJF\nmJDE11nl5trrumXSubO97l/+Yhva5cvDxOP2221oZskSO4qVbOfWsaMtt/XrY+eLCoIwcZMsQVqx\nwuqCvv/eNvCjR9uOu1EjO7PvqKOsd6pDh9je0WhStG5duA4PHmw9V7//vZ0tecQR1qbo2bdHHGHv\nvXy5tX/qVEv433vPPm9xsa1X69ZZguuWq1s/4xM3yXaGrVqFiVt8j9snn9gOsFmz8P8bNgx3au6+\nOnXCsxmDwBJyV6PnrF9ffg6q+MStQQPrAXVJQzRxbdo0dofpuB6Z6Odr0cLWnVtuCR/PzY19Lfc/\n0ftdQpbq+M8dCEWHsi680H67BG3aNIvr7Nm2HXHLdWfnrevVq/zwff36thN3Z0QuWmTfLZc49+hh\nMWnY0LaVrVrZ+nrGGWFC5BJeV3e1aZMlOV9/bb2F0eqkaC9Sly72O3pWsGuTFK53ubmxw5R5eWFP\nXLSOtGtXG2p25RMu9m5Y3B2kVZS4uUS9YUP7Hg0ZElvb69aN996z5XHRReFj7vO72P/xj/b9XrfO\nEuPDDrM2bNliB2e/+lXsQUnfvnaw2qWLxTpREh5N3KJ+9jNLso87zt7n8MMtbm6ak3XrbHk1bRp+\nfqn81DzVIWXi9uGHH+qhhx7SYYcdpsMPP1wPPvigpnMl391GSYn1pCSycmWYIERFu9ODwL5AiQri\nd6XG7Ze/tOLxr7+2jf1RR1kR7+rVtiGI1q+NHm233XDQFVfYjurmm61H5u67k3++gQPDupClSy0J\nOfVU6w5v3942wk2b2g73nntsWFAKe/pcIuXmVXKvI1nSs2BB+cTtxhutF+Tpp8MerfhapujQSH6+\nvc7s2eFG75NPrJeqWTPboEWPyLt3t98u+Vu50nZaf/iDDQOtWmVH3y++GB4duw1RXp5tPIcPL1bL\nltbD6WqJoolby5ZhAXt83V5Ojr3GsmWWWEY3ci5x+5//sSGkXr3C+hT3GVz9TtOmlmBEp76QbIfg\nes6+/dbeK5r4Pv54bO9C8+aWvPzqV7GJWzQp+u67sMava1fbuY8ZYycXbNliO4u2bcMrFrgd6Smn\n2Od/7TXrITjhBEsaLr/cdnaux80lFWGPpspp0cJ2Aj16WMziE7e5c239z88P/79t2zDxiSb7y5cX\nq359i507kIif9sXN2+YkStxefTXsaatbN+y9zs4u31vl2hP/+Vq2DD+HezyaMLnvTW5ubOLm/Pvf\nFV88PtEEsu7EBfd+vXpZsj1jhr23G+pM9Bkqkp0dO5zpRBOErCxb51zPkvtuN2wYDh+ffLINy7uk\nwCUsrVtLeXnFys21dSU317bR0ZMXjjwynColJ8eSlOj6HxVNrC680LaNUY0b2wGI66nr3Nna37u3\nxSX6uWbODGsS3XYj1VDp0qXSP/8Zfk4pXDeGDLHP6w7C3PtL4fZv5Mhwct0jj7RlMm6c9Yo1a2bl\nCtGE9Oc/t4NuKfmJAskSt6hTTgnj55LWpUvt/du2DQ8I3Hu6g87qkjJxq1evnuZHqpa/+uor1UtU\npQfvLFpkR+aPPmpfyuiR16ZN9uXt3dt2nm7jGgT2JZwxw3aoPXvaTm3jRnssOjP2a6/F1jNItiFw\n8wcls327fXGbN7czGZcts4Rq9mz73bRpuMHt3Nnavc8+tiNq2zacHf7UUy3RmjXLht4++ih2/qQ1\na+z5LuFbtMh22hdfbF3oZ59t923caEOzTz9tO+fFi8Mdg5sc88YbwwRnyRKbM8htTN3RdJRLPJIV\n2Mf3uM2ebQngXXfZ8MYVV1hPxMqV1uNYJ/JNdombW0YzZsQmLE2b2tHk1q22AZ46NSwIbtzYNkod\nOoQb4HHj7Hd045uXZ4mgVH7CStfjJpXfQLrErU2bcBjGbazdBnLtWtvRNmli6966deGG3CUjTZpY\nkuS4M/5+9StbF6KJm9sxDBoUe13JaD3Kz39uO5EgsLhFayjr1rXvQ3Z22Eb3fgUFtlxcr2B+vu0g\nN260RGnrVlvnXZyzsy22iRK3X/7S1isXt4YNLVauV3XrVot3tMetXbswcYvfibrkaMUKqxe8+urY\nx7/7LtwBS4kTt88/j52fzW3627a1zxV/WaboUKnTokW4PridaXRH55ZNfI+bk58fzhGWSKI61txc\nS/aidXjt29tBQePGYTt29UoRjktot22z78Udd9hn3LAhTNZdfKLLuWdPq5V1y7VHD1uHo23escMO\nvFyP3l57We+1ZI/NmhX2usaLvlfv3taDLIXrd+PGtj1ziVs0CYrXp094EFbZodJvvin/mnl51i63\nDkR7bl3P4eWX27pVUGDf/5kz7fvq4hUE4XLdWZVJ3KLcaMYPP9h3bfDg8ABesmUaXy6SaSkTt5tv\nvlmHHXaYDjnkEB1yyCE67LDDdIsbFIZ3Fi+2neHf/25f9gsvtJ3hfvuF3e6SnRW47772RVm1ynYo\nhx5qvSRLl1oP1+TJlhCtX28r7qefxu4wp00rKhuimTo1XLmjR/qJCvUXLrQd++GHxw6TucQt2uNw\n/PH2OTp0sO79CRNii5/r17dhvrvusmSnX7/wPV3itmmTbcg2brQNmTs7rbDQNjybNlmSNHeu7ezi\ne9COOio8m2j2bPtSd+tmQ22SbWAaNrQesv/+13Zirgdh2zY78o4/EzSauDVtakfdeXmWYJxxRvhY\nly7lj/ZatbI4bNtmO9LomVuSJSj9+tnG/ze/sdi6oYm8PPspKioq2wC3aWOvFV0/osOj8TV7rsfN\n/R3VqJGtT9EdZnziJlmMGze2Hd+6dWEy6l5v5kzr6XV1Wi1bWu/l00/bzjl6ROwOSPr0iZ0mo3Hj\n8PVWroxt01572RD4v/9t68OOHeEy+vpr29G1aWPLt1evcD11xe4bN4YTCmdnW9Ll4tS4ceLE7dxz\nLUly7XA7RXdSRF6erRcNG4b//4tf2PrRsGHsZysqKipLAL/9NlwHHZfsRXf4iWrcpMRDQS5WrVvH\n9ra7XtlkPW5unRoyxBKJiRPDk0qS9bil8uij4dnAUfvvH9tj1KqVxa5x43AZ78wOPJU6dSzW118f\n9m5G16lEiVudOjbU7jRrJj3+eFHZbRfDBg3CaTeiPdilpbHbiqiXXirfw+a4RLNJE3uNvfe275lb\nx1NxB0OJlp87ICotte2l6/V0GjWKXT+iPW5uu1G/vv1dv74lsosWWfyiy7Oqk91WJXFzB75Nm9o6\nFV1Obhi8OqVM3A4//HDNnTtXd9xxh8aMGaO5c+dyyata4PnnU0/7EPXaa5acdOxoPRJXXSX9+c92\nFOcuI+PqYSRLMoqKbAXfssVqRIqL7Wy91q0tOXJn8nzzjSVzr70We/TrXuv++61+YPRou+165WbP\ntp1ZdBoKyT5Xt272JSkttZ1kbq59eVzi5r7A7rfbWBQW2rBCdDj3ootsJ+9q2TZssCEwlyDed58l\nqRs3li/6/u9/w4Tu/PPLD2lKtkxLSmwn/ve/W49fQUGY4GVl2Wdx9UsDB9rOtGFD+zxnnhk7i7tU\nvsdtxYqwbaNH2/+5udfiZWXZMN/69bZchg2Lfbx1a1vu115rc2NFNW4c7izcMm3duvwG/fbbyxen\nu16/3NyKEzf3Po7bsB96aFjP1KSJtTEnx9Yt9xnc67kjdpcIuOkGGjQoX9dy9dV2kBHfltzc2HhH\n53r75htbL849N0wmXCLjPndBgR0wdOsWHpS4nVDduuEyy821YXs3e3+yxM1JlFQcc4wdpLjXb9TI\nhpOvuipMxLt0iX3daOLWuHHsOtWrly2n6BmTiXrcXPujLrkk7G2V7ExhyRKLU06x5RRtx6hR4Vxc\nBxxgifRTT9n3/IQTYoePq5K4HXlk7BnCybjh/caNwx6uik5E2FmpBqISJW6puHWhQYPw9aP/X9Gl\no446KnEdohQmbu6716zZzvVg1atn/5MoYWnVyoYr586170j8gaU7OHQOOSScJzBRPNy606pV7GXJ\nqqvHTYqtBa4NkiZur/94DvPEiRP18ssva/78+Zo3b55eeuklPZPo8AbV6rjjbAP6+OPlJxF1Nm8O\ne2d+8QsbomzQwJKAiy6yS5mMGBHOU9OhQ9hL9MUXdlSdk2Ovk5NjxcXffGOJUVaWHS3vtZdtiGfM\nsELSNWssmdu2zepsJHsPKewJmzXLdmLu6DzaQxYEtlNwc5JJtkO47rrYHjf3xXc7iIEDrTfKDdVE\nv2Dt2ll9i0skV68O5zRyX8ivv7ZEJrpBGTzYPs/nn9v73Hef/Z1Iu3aW1E2dap+hXTvbIF12mT1+\n//3hMIBLatylhRKJr3GTwrbVrRvW3bmeqHju87dsacmZq82KfuZE8vJsx1ZcXFyWuCU6xT0/P/YS\nMq5dkq0ryYZK3Y45mrjVqWPLfsCAcMjWtb9JExsyufBC2xHFv55rW3TIL97AgVaIHU2k3O/Gja3Y\nfMYMq/mLttPtKN3/xSev48ZZQhm9XmOiupoDD7Shn5Ejw89e0RBddKjUeeEF6/lzr9+oUbiMmjWz\n4dTHHgvrK4uLi2OGShs3jn29Fi3soCXa4xafNLnELT7JvO222HpHt1zcOjtuXGzCEK0JysoKL5MU\nvzyrmrhVlpsnLi/PvjfJanurKl2JW7Q22C2P6LKK9rhV5ZqfUjhU6raXVZk41p1tHC8rK+wdjB4Y\nOI0axa7/nTrZtjV+rkinf387MG7a1DoA3IFCVXvccnN3vofMLZ/4etuaknRVe/vtt3X44Ydr0qRJ\nykqQBp8QLTBBtXLDj4WF4Y4uegTsrFplG3x3jci6de1LsGVLWNwq2bDFsGGWmO2zj23UcnLsC5WT\nYz1i+fm2Yw0CS+gGDrSzOnv2jJ10cNs2q/Faty72zEJ3FmL79rYjvuoqe58rrwyHOXbssCPwDRus\n3se97s9/bknUn/5kG5zoF9/9vuyyMElKJLphmjEj3EhEd0qffBJ75lydOva+zz0XTm9Q0VBC587h\nrOQFBbazdFUF0bqVM8+0pOe558IdrWQ1X5s32/BffI9b9LNG3y9adxbldv6ud6hPHxs6Ofro2EQj\n0f+5o/GWLW3jnOwzu7PP3EkKbjORmxtuVJMNv8UnWq5mx8XD/b9L3Jo2TXyk3Lixta8y8ye5987P\nt+XsetwKCsr3PCb6v/i5tFz9ZjQRjg77SDZEH3+Wb5Mmletxi9+59O4dDls2alS+sD5++VTU49a6\nte1UK+pxc5+3orZGn+fWk+hQ/s6o6lBpZbl1pEkTW8Zuiph0STXMGO0t3tnXdN+tlStjvzvx12Wt\njAsvtO2atPMnZ0S1aJE8AXJxTDRlTLRX38nKqvhgxiWYeXl2ED9hQtWHuavS41a3ru1foklzTUqa\nuF3z4+kjf/3rX9U5er6txAS8Neidd+yMx6ZNrbftmGPsvuXLLfFp29bqpx580Irst2wJz1xcv94S\nq/vuiz0izs21oVd39Pbll/b6bojy889tGMYdNffpY0MeZ59tZ0fFn2zQvr0lFf36FZUlbh06WI9D\nv352xNSkie303347rJH64otw6o6srDBhccNmrVpZohP9kle2uNi9Vn5+7Iz6yYbyHLejrMz7RHsh\nKprbp6jIfl5+Oba4++qrrXfG1crEtz1+YzduXPINr9uZRguDjzjCfleUuLVubTv4oqIi7dhRfiLa\nqJ/+1HpOjzzSah8bNbL2V9Tj5iYqTXYWllv+rjfJJRs5OYmPlLOybAdSUY+b45Zp06a2HuXm2s48\n1f8m63FzKkrcEs2c9K9/JR5yd5Ilbu7yZ1LixC2qqKiorFc5PnFbuNC+o6+9FluLWSdu/CXZUGm8\nVMunsjLd4+a+RxUl6bsiXT1u0fkv45dp/LpalR43N2QflejqL6ncfrsdTCTi4pjobNef/cyupFJV\nhYXSX/9a9WHuqiRuUngQXhukrHE7KcGF+U4++eSMNAbJff+91Z0dfLAN6Ywebb1PBxxgw0B//7sd\nlZxwgj1+991WGFpaGs5avm6dJW/Jxunr1Al3KDt2WHKXk2NnQLZrZ8lIbm6YoGRlJe/p+PZbOw3c\nvXdenhXtHnOMvXb37rah22uvMHGbNi2cbVwKe13cb3cmpxTu4FP1Bjhuo92xoyUZLglNdiae4zaU\nldmhuJ1qgwaJhwji1a8fJqLu/9zf0Q2yi1d825o3T72zjJ7Vm50dm1Ql8rvf2UZRsvXhkEMqfn1X\n23XYYdZb2bBhWOPmzgCNcr2/ycQnbi4RyspKvsEdNizxLP7x3LJy60Jurp1VXdEZi9H/S5W4NWpU\ncVLs9O1b8VBNssTNTTMh2fcu1fxR7nVcjaZ7vfx8i8vQoXaA5eZLjB96cic7pEpIfEnc3LoYnfMw\nnVJ9/qrUuKV6zaoOlcarzIFPvCFDkm9/3YFjou9lomt+7ozs7HBakqqoauJWmyT9Ss6ZM0ezZ8/W\n2rVr9cwzzygIAmVlZWn9+vXaGq0QRFrt2JF49u7bb7eTCRxXX9S2rfVyHHyw7fhfey2cPNElQF98\nYb/Xrq04cZPCeY46dAh3lvPmWQ9Rw4bhtR6dZDuq//1faenSYhUVFalp09giWyncSTRvHiZu77xj\nEzW66QeiPW6SdZG7DbsrXN/ZHrfCQuvVe+YZ2/BEJ86Vyr+eux2fgCTiYrJgQflpUBLJzrbErWNH\n6wWJ7qjjp6NIVReVTPQyTZIty8okF8XFxZW68sWvfmXrnOMuKdO2bXjSS9TOJm49e4ZXVcjNTbzB\nveeelM2UFDtUWtHrJfu/+KFSxyWXb76Znt4c9x2pKGG64IKKd9rFxcVq3Lio7LY7cUMqv7Nt2zbx\nRelTXVbOia8FrKrf/KZqPT87w00nlAnpStyi373qSNxWrUr/MnH7r8ocUFW33SFxS7o7mjt3riZN\nmqR169Zp0qRJevHFFzVp0iTNmDFDY+0qxkhi7drkl/VIZskSG9K84AI7vV2y6TVcDcOXX9oJCe6M\nNDc1Q9u2YV3Wb35jl3lxZ2m6KRvcdAjr1lltV0VDLC4xcmP5OTlWwxS9uHBUovmD9tsv9jT0REmH\ne5/mza1n8JlnrPs8egakO/Xa/W+PHrGXp8rLq3yPm9u5uh7C/fe3zxTfqxF/u7Kv7z7LN99ULmmT\nLBFYtiwczoy+d3ztSvSEjMpyc+5FtWiR3ku0NGgQXnZKCnvcsrNj6yidZAXI0deTwuUevaTZrm5w\n3U6wSZOwfnNn/i/ZTrROHetBTtdOyu30Kqpfip61mky0HCJ6sFXZaTgrm7i52s9dTdzatCl/BYB0\ny1TSduONsZeKSsR9v3emMD7VMq1KjVs8d/3VdJs9O3kNbk066aTwyhq+ShquY489Vscee6ymTp2q\nn8ZfbRUVmjWr/IWUnR9+SPxljO5MXXJxxhnW07b//lao/9e/2qSy114bJiJt24Y9AXXrxl4OxU2J\nsXChDYF+952dTl1RMuJ2GtFC2nXrys/F48Qnbn37hjtsd9QYf/q3FHtmXxDYZzvwwNghRjeXV7Le\nrkSvm4wbHmrY0D6TW97xO+/4ndXODt1E5w9LJTvbehhc4hYtHk9H4pbI228nnyIgqqrXmXXLN5mr\nrkp8rVbH9SC49XDo0NiZ9XdlvqRoArYzZbqpetyk9F/yTUp+jd3KKCoq0uDBdtA2frxtUxo3TnwS\nUzrePx2Jm8/++MfUz0lHjVu8dA2VZkKyqznUtIomGfZFyjy7f//+uvPOOzV79mxt2bKl7AzTcW46\ndZTjdv6bNsUmSYsX20rjpteIys21epPPPw9not+61aaXOP98m4OsY0dLjC69NDZxk6x3rn37cOey\n117hsNSCBdYL9vLLdmRVmaJOt5FxiUui2f+l2MTt5z+3CUvjvxjx81YdfLCd1CCFdTtr1oQz9Dvt\n21dcEHrDDZWvlcjPt16HW2+NnT7FxcHN0h+/s/rJT3au121nuFi5JLKiHrfocPOuqEzStiv++MfY\niXrjubn8ktl339gJSaMqO7SZTFVrsdJVw7WzKtvjlUxubrjuugOXm27KzPvXr19xYouq1bhV1BM2\nfXr6rvwAv6Ss3DnzzDO1YsUKTZ48WUVFRVq8eLHyWFsSev11S6pcj0L8dT7dTOpLl9rUE/Glgm6H\n5SaK/f57m5/J9Zx16GD1SeedZ4nG4YeH9Tp9+4aXwmnQwIZt3P+VlISXe4lOMlqR+I1Msh43l9CN\nH2/TTUSTNjcfUfxQ6VtvxQ6BNW9uyyQ+qcjODueAS+SUUyrfA9OmjSXG2dmx/+P+dldziE/cevfO\n3OVM3I4uelFmJ36n2bFj5Ydg06Gq15n97W93LdFt3NjmI0ukffvKnfSRTFaW7Qh3dlioJhK3Z5+V\nTjyx6v/v4ufW3crUaMbbmcRtT+9xq4ycHFsHk13pwIl+9ypapvvvX3t7tZBZKb/O8+fP13XXXae8\nvDydffbZevnllzVt2rTqaJt3Hn3UJqd0iZubb81xPWklJVbE7OYRKi21JK53b/tiL1hgw0MucevU\nyc72i79UypQpiXvPtm61njDX47ZjR+qzA6Py88NJeVMlbvXqWd3SGWckP+JOVVjviuUz2RvUunX5\ni2pL4efLy7OpLxJN35ApbqMcvbafEz8EPX585s6G88XJJyeum9sZVUkw6tevfE91uhx3XHoKqHfl\noIOh0vRq2NB+dmY9qs6DNfgj5bFn/R/3xk2bNtVnn32m1q1ba2Wiq/ruAXbsqPjI1X3J3ISk0R63\nTZusEL1Jk3D2fVcwvGmTDWv072+Fk7Nm2ZmP339vO43PPqt4Pq1EsrNjL/t0/PGWDG7enPp/oxfi\nzsmxRK6iWq9kSVlFNW5RrtcuU4XDFXG9XI0axV60vDpEh7WlMHHbuLHmz3qqao1bbVe/ftWGSn1L\nSlz8Up0MUpGKpo2JV5Xluqdp0qRy9VXR794FF8SOTgBSJXrcRowYodWrV+v666/XsGHD1LNnT/2x\nMpWYu6G2bcM5rpyZM8OjWpfwuAtO33VXONlhXp5NmzBggM2YL4VJ1OjRtqNu396uqXnPPVbXtn69\nTcOxffvObUQl25C6xC06+/3OFtu7qR12xSmnVNzj5xLYZJOyZlLdunbpsEzVsVXExcSd5euOxBs1\nqtrQFlLLzq7aUKmvScmurEc33RQ7QXRFfF5G1aVp03BqpsqqWze9Z4Fj91CpxK158+Y65JBDtGDB\nAq1cuVIXXHBBdbSt1lmxwuYSi/rTn+yyUlJ4gfXFi61uacoUS8q+/z58/uDB4cz969fbEfEjj8TO\nX3TQQWESU1RkSd/OJlzZ2eFQaWXm7UomN7fqiZur1Rg2rPx1LaNqImmKuu22mkmUXI/brsQnU6pa\n41bbVXWo1LfCexe/Rx8NJ8HeWQ0bVn7dJHFLn931u4f0SXrs+c9//rPs76ysrLIJeJ1LL700sy2r\nJdw1DSXrfdqyxYYc3aLYtMlq0j7+2BK3vDxL3AYODBO5Dz8MX2/0aDuSPfhg6+mZPj18nyi3ETz0\n0MTXe0slOlS6K4nB/vtnvics0VxwewKXDBQWZn7iURhXr7YzfE5KWrVKfYWFdGCoFKg+SfsZNmzY\noI0bN+qjjz7S3XffraVLl6qkpET33HOPZsyYUZ1trDabNkl/+5vVsj39tPWkNWpkQ5/bt9sZmbm5\nsQe9LjEAACAASURBVLVjmzfbVBwDBliC1qmTFfVGL+9aXBz2OLVvb5O0ugtPu2HTeC5Z6tnTrmaw\ns7Kz7aSHrKydH2aN6tGj6heNrmyd1G9/a8tkT+OS/+zszF7qpyp21xq3PaXHrbrjl53t3zKqrXbX\n7x7SJ+mx59VXXy1JOuiggzRjxgw1/rEQ55prrtFRLuvYzdxyi521NmqUdNZZ1rsm2UVx582z6Tby\n8+06abfdZo9t3hzba7bvvnYyQbSX7C9/kcaODc+abN8+LMRPdrJAdOiuKrNau41o06a1cyguqk6d\nnZu4dnfhpluozrMV93R7yskJ1Y1lBFSflJU93377rbIj38js7Gx9++23GW1UTejWTfoxV1VJSVgw\nf8IJNnfUlClWtzZ/vl3NICvLTkbYvNnmXevRwwp5XSFpdPhvyBA7qzN61mJF1wuVdr3myoXs4IPD\na39WN2o1KrYrM+Nn2u4auz3l5ITqjh9Dpemzu373kD4pN2FnnXWWBg8erBNOOEFBEOi5557T2Wef\nXR1tq1bu+p6SJW5r19r0Hh06WA3bqFHWe9WlSzjdx6hRVs8m2f2tW4fTXrget5/9zIZc4+coi5+E\nMX4qjF2tK3Mb0fPOi73+J2qPXZ0ZHztvTxkqrW433CD16lXTrQD2DCkTtyuvvFJHHnmk3nnnHWVl\nZenBBx9U//79q6NtGbNtm2283RBVdFq6nj3tuoPZ2VaX1rFjmHR98YUlWHvvbWeDTphg9zdtGhYA\nu8TN9bg9+2ziiWXdMKxkU4bE58K72uPmdjQ1OR8YtRoVS8cFojNld43dnnJyQnXHb8iQan273dru\n+t1D+iTdhK1fv15NmjTR6tWr1alTJxX+mIlkZWVp9erVap7pix5mSBBYb9e991pRvCRNnhw+3rev\nzbDfqpV02WXWc/bqq/aYqxXLz4+d2LJHj/CyRXl54dw7ubnhJani9esX/p2fX35KjHQNlda2oneE\navNQ6e6KHjcAvkuaHpz244UzBwwYoIEDB2rQoEEaNGiQBg4cqIEDB1ZbA9Ptv/+13wsWhPc9/rhN\nEitJQ4faVQpatrSLpnfpUv76nm7CVKdr1zBxa9TI6tcaNLD3SLaT6N3bTlhw/xMvXYlbTfa4UatR\nsdo8VLq7xq4qiZuPPW67a/z2BMQOqSTtcXvppZckSQvdBTY9VloqDR9uE92664fOnh0+XlJilxZ5\n8klL4M491+Yvc+ITt/jLN/3tb+HJBnl5Yb1aqvmTXFKVKHFLV41bTV86CcnVxCW+9nRVGSpt2jT1\nyUQAUF2SbsJSzdU2YMCAtDcmU777zi7S/dBDdtLBwIGxidvatXZWaW6u/bj5z5zolQ+k8olb166x\nj1V2I5/JxM0N7dTkUCm1GhW78ko7SKiNdtfYVaX37KCDpEGDMtOeTNld47cnIHZIJWnidumll8Zc\nKSHem2++mZEGZcKqVfZ761ZL0nr3lp56Knx8zRqrOXPTgcR/7AceiJ0gNi/PhlQ7dpTuuy/2uR07\nxtavVcQlbokuwL47DJWiYg0bVu2qGKi6bt12/hJuWVnUigKoPZImbrvLOPu6dXaNUcnmWZs61erW\nduywKyU0aGBzse21l/SHPyR+jV69Yk91z8uzHq34KT0kO1HhoYcq17bdfai0uLiYo0dP7a6xu/nm\nmm5B9dhd47cnIHZIpVLVHp999pnmzJmjrVu3lt131llnZaxR6XTiieHJBBdfLL30kl0hoVUrS+i6\ndLHHdqaHKy/PzghMlLjtjIoSt91hOhAAAJBeKRO3q6++Wm+99ZZmzZqlo48+Wq+88ooOPPDAWpu4\nzZ5t86a1bm3J1dSp4ZxpLhnKz7fE7fnnq/YejRvbCQsjRthwV1VVlLjt6llsrpanKpfLSheOGv1F\n7PxG/PxF7JBKyn6dp59+WlOmTFGbNm30wAMP6NNPP9XatWuro21V0qtXOD/b9Omx86i5qeeaNLHE\n7cknq/YezZvb0Oo++0jXXVf1trrELVH9zGOPSdOmVf21a+OFywEAwK5Jmbjl5OSobt26qlevntat\nW6dWrVppsbvOUy3lEqJbbpGuuCK8350dummTJW4ffFC117/kkuT1cDsjJ8d+Eg2LFhRIgwdX/bWz\ns2t+mHR3qZPcExE7vxE/fxE7pJIycRs0aJDWrFmjESNGaNCgQerfv79++tOfVkfbdtqyZfbb1Z59\n8ol09NF2lYMhQ+yMUkk69NCwtu2Xv9z5WdHdtCG7qnnzzF1HtH79mk/cAABAemUFQRAkeuDCCy/U\n6aefrgMPPLDsvgULFmj9+vXab7/9qqdxWVlK0ryE3n3X5lw64gjpP/+x2rHly60mbehQaf166Re/\nkK65xq6gsP/+0tKldu3RXT0ZoLb54APp/POlzz+v6ZYAALBn2Nm8pSqSlq7vs88++sMf/qClS5fq\n17/+tU477bRaf3H5khKrO1u2zIZDd+wI50jLybG52Fxv3IAB0rXX2kkMFUxX563CQunYY2u6FQAA\nIJ2S9jNdcsklev/99/XWW2+pefPmOvfcc7Xvvvvqmmuu0dy5c6uzjZVWUmK9aEuWSN9+a9cPdUlZ\nTo4NlbrErU4d6S9/2T2TNskS0r//vWbbQK2Gv4id34ifv4gdUkk5QFhYWKg///nP+vjjj/XEE0/o\n2WefVY8ePaqjbTutpETq399mo7///vDC75JN2xFN3AAAAHyTMnHbvn27XnjhBZ1++uk68sgj1b17\ndz3zzDMV/s/kyZPVvXt3devWTTfeeGO5x2+55Rb1799f/fv3V58+fVSvXr20TDFSUiK1by+dfrpd\n0ip6kfecHLvkFYlb9WE+In8RO78RP38RO6SStMbtP//5j5544gm99NJLGjx4sE477TTdd999ykt0\nYc2I0tJSjR49WlOmTFG7du20//77a9iwYTG9dJdffrkuv/xySdKLL76o22+/XfnRCdeqaMUKO9Gg\ntFSaN0864IDwMXeGJYkbAADwVdIetxtuuEEHHHCA5syZo0mTJun0009PmbRJ0vTp09W1a1cVFhYq\nOztbp556qp6v4BIFjz32mE477bSqtT7OqlVSixZW3yXZBd8dErfqR62Gv4id34ifv4gdUkna4/bG\nG29U6QWXLFmi9u3bl90uKCjQtCSXANi8ebNeffVV3XXXXUlfb/jw4SosLJQk5efnq1+/fmVdyW4F\nd7eXLSvW3LlS9+52e+vWYhUX2+OWuBVr3jxJSvz/3E7v7U8++aRWtYfb3OY2t2v7bae2tIfbFd92\nfy9cuFDVJek8blU1ceJETZ48WWPHjpUkjR8/XtOmTdOYMWPKPXfChAl67LHHkvbIVWY+lKVL7czQ\nvfe2SWe3bJE2brTJbV9/XTrsMHvemDHS738vvfyyTboLAACQTjU6j1tVtWvXLuaSWIsXL1ZBQUHC\n5z7xxBO7PEz6r3/ZhdQ7dLApPrKzw+uTtmkTPq9FC/vNUCkAAPBVnXS/4KBBgzRv3jwtXLhQ27Zt\n04QJEzQswXWd1q1bp7ffflvH7uIssRs3Sh9/bFcJ+OEHuy8rS3r1Val79/B5e+1lv0ncqk981z/8\nQez8Rvz8ReyQStp73OrVq6c777xTQ4cOVWlpqc477zz16NFD9957ryRp5MiRkqTnnntOQ4cOVc4u\nXlBz0ybppZfK3/+LX8TepscNAAD4Lu01bulUmbHi006TnnhCGjTILhz/xBOJn7dokV0GauZMqU+f\n9LcVAADs2aqjxi3tQ6XV4YwzbOoPSdq82X4ffXTypE1iqBQAAPjPy8TtscekH2ea0KZN9jvVFHON\nGtnvHTsy1y7EolbDX8TOb8TPX8QOqXiZuEl27dE337ShTyl14paVJXXubNOGAAAA+Mi7Grft223K\nj/ffj72k1SOPSL/5TTU3EAAA4EfUuCWwYYP93rYt9v5KXI0LAADAa14lbm+8IU2ebH/HJ26NG1d/\ne1AxajX8Rez8Rvz8ReyQStrnccukiROlb76xv7//PvYxetwAAMDuzqsat1NPtZMR5syxJO7EE8Pn\nfv651KtXDTQSAABA1LiVs2qVNH++/T11auxj9LgBAIDdnVeJ23ffhdcj/ec/7XfHjlL9+iRutRG1\nGv4idn4jfv4idkjFqxq3774rf9/ChTYVSLNm1d4cAACAauVVjVturrRli/WwubNKa2/rAQDAnoQa\nt4gtW2zyXUk64oiabQsAAEBN8CZxW7lSatlSeustqXdvu89daB61E7Ua/iJ2fiN+/iJ2SMWbxG3F\nCql1a+ngg6UGDew+d+F4AACAPYE3idvy5eEF4uvWtd8ugUPtVFRUVNNNQBURO78RP38RO6TiTeLm\netyksNYtK6vm2gMAAFDdvEncli8vn7ihdqNWw1/Ezm/Ez1/EDql4k7itWBEOlZK4AQCAPZE3iZs7\nq1QicfMFtRr+InZ+I37+InZIxZvEbcMGqUkT+5vEDQAA7Im8Sdw2bgyvR0ri5gdqNfxF7PxG/PxF\n7JCKN9cqjSZuZ58t1fEm5QQAAEgPb65Vus8+0qRJ0r771nCjAAAAEuBapRHRHjcAAIA9kTeJ24YN\nUuPGNd0K7AxqNfxF7PxG/PxF7JCKF4nbjh3S5s1cmxQAAOzZvKhx27RJatVK2rSpplsEAACQGDVu\nP9qwgfo2AAAALxI3TkzwE7Ua/iJ2fiN+/iJ2SMWbxI0TEwAAwJ7Oixq3d96RrrhCevfdmm4RAABA\nYtS4/YgeNwAAAE8SN05O8BO1Gv4idn4jfv4idkjFi8SNHjcAAABPatzuuEOaP1+6446abhEAAEBi\n1Lj9iKFSAAAATxI3hkr9RK2Gv4id34ifv4gdUvEmcaPHDQAA7Om8qHEbPlwqKpKGD6/hBgEAACRB\njduP6HEDAADwKHGjxs0/1Gr4i9j5jfj5i9ghFS8St02bpNzcmm4FAABAzfKixm3IEGnMGGnw4Jpu\nEQAAQGLUuP1o2zapfv2abgUAAEDNInFDxlCr4S9i5zfi5y9ih1RI3AAAADzhRY1bx47SO+9IHTrU\ndIsAAAASo8btR9u2SdnZNd0KAACAmuVN4sZQqX+o1fAXsfMb8fMXsUMqJG4AAACe8KLGrUEDaf16\nqUGDmm4RAABAYtS4SQoCatwAAAAkDxK30lKpXj2pTq1vKeJRq+EvYuc34ucvYodUan06RG8bAACA\nqfU1bmvWBCoslNaurenWAAAAJEeNmzijFAAAwCFxQ8ZQq+EvYuc34ucvYodUSNwAAAA8Uetr3ObM\nCXTccdIXX9R0awAAAJKjxk30uAEAADheJG5MB+InajX8Rez8Rvz8ReyQiheJGz1uAAAAHtS4FRcH\n+utfpbfequnWAAAAJEeNm6QNG+hxAwAAkDxI3I45hsTNV9Rq+IvY+Y34+YvYIZVan7hJUoMGNd0C\nAACAmlfra9ykQH/8o3TjjTXdGgAAgOSocftRr1413QIAAICa50XiVlhY0y1AVVCr4S9i5zfi5y9i\nh1RqfeL2q19JBx1U060AAACoebW+xu2qqwJdd11NtwQAAKBi1LhJqlu3plsAAABQO9T6xK1OrW8h\nkqFWw1/Ezm/Ez1/EDqnU+rSIxA0AAMBkJC2aPHmyunfvrm7duunGJBOwFRcXq3///urdu7eKioqS\nvhZDpf6qKK6o3Yid34ifv4gdUqmX7hcsLS3V6NGjNWXKFLVr107777+/hg0bph49epQ9Z+3atbro\noov06quvqqCgQKtWrUr6evS4AQAAmLSnRdOnT1fXrl1VWFio7OxsnXrqqXr++edjnvPYY4/pxBNP\nVEFBgSSpRYsWyRtI4uYtajX8Rez8Rvz8ReyQStp73JYsWaL27duX3S4oKNC0adNinjNv3jz98MMP\nOvTQQ7VhwwZdfPHFOvPMMxO+3pNPDtemTYWSpPz8fPXr16+sK9mt4Nyunbc/+eSTWtUebnOb29yu\n7bed2tIebld82/29cOFCVZe0z+M2ceJETZ48WWPHjpUkjR8/XtOmTdOYMWPKnjN69GjNmDFDr7/+\nujZv3qwDDjhAL730krp16xbbuKws3XZboEsuSWcLAQAA0q865nFLe49bu3bttHjx4rLbixcvLhsS\nddq3b68WLVooJydHOTk5Ovjgg/Xpp5+WS9wkhkoBAACctKdFgwYN0rx587Rw4UJt27ZNEyZM0LBh\nw2Kec+yxx+rdd99VaWmpNm/erGnTpqlnz54JX4+zSv0V3/UPfxA7vxE/fxE7pJL2Hrd69erpzjvv\n1NChQ1VaWqrzzjtPPXr00L333itJGjlypLp3764jjzxSffv2VZ06dTRixIikiRs9bgAAAKbWX6v0\nnnsCjRxZ0y0BAACoGNcqFT1uAAAATq1Pi6hx8xe1Gv4idn4jfv4idkil1idu9LgBAACYWl/j9tBD\ngc46q6ZbAgAAUDFq3MRQKQAAgFPrEzeGSv1FrYa/iJ3fiJ+/iB1SqfVpEYkbAACAqfU1bk89Feik\nk2q6JQAAABWjxk30uAEAADi1Pi0icfMXtRr+InZ+I37+InZIpdanRSRuAAAAptbXuL34YqCjj67p\nlgAAAFSMGjfR4wYAAODU+rSIxM1f1Gr4i9j5jfj5i9ghlVqfFnHlBAAAAFPra9xefz3QYYfVdEsA\nAAAqRo2bGCoFAABwan1axFCpv6jV8Bex8xvx8xexQyq1PnGjxw0AAMDU+hq3qVMDHXBATbcEAACg\nYtS4iR43AAAAp9anRdS4+YtaDX8RO78RP38RO6RS6xM3etwAAABMra9x+/jjQP361XRLAAAAKkaN\nmxgqBQAAcGp94sZQqb+o1fAXsfMb8fMXsUMqtT4tInEDAAAwtb7G7csvA+2zT023BAAAoGLUuIke\nNwAAAKfWp0Ukbv6iVsNfxM5vxM9fxA6p1Pq0iLNKAQAATK2vcfvmm0Dt29d0SwAAACpGjZsYKgUA\nAHBqfVpE4uYvajX8Rez8Rvz8ReyQSq1Pi6hxAwAAMLW+xm3lykAtWtR0SwAAACpGjZsYKgUAAHBq\nfVrEUKm/qNXwF7HzG/HzF7FDKrU+caPHDQAAwNT6GreNGwM1alTTLQEAAKgYNW5iqBQAAMCp9Ykb\nQ6X+olbDX8TOb8TPX8QOqdT6tIjEDQAAwNT6GrfS0oDkDQAA1HrUuIkeNwAAAIe0CBlDrYa/iJ3f\niJ+/iB1SIXEDAADwRK2vcavFzQMAAChDjRsAAADKkLghY6jV8Bex8xvx8xexQyokbgAAAJ6gxg0A\nACANqHEDAABAGRI3ZAy1Gv4idn4jfv4idkiFxA0AAMAT1LgBAACkATVuAAAAKEPihoyhVsNfxM5v\nxM9fxA6pkLgBAAB4gho3AACANKDGDfj/7d1rUFT1Gwfw7yJgI+Fd1EAFWWSB5bLEYjVeUhR0BkGl\nSGs0SA01c0yzHF80WYo4ynjJLJswUQu8lKNZeEEjTVTUxbG8jKK7gaJMs4C3JG7P/4XjSUTRP62w\nB7+fV+y57XP8jvDsb3/nHCIiIlKwcaMnhnM11IvZqRvzUy9mR4/Cxo2IiIhIJTjHjYiIiMgGOMeN\niIiIiBRs3OiJ4VwN9WJ26sb81IvZ0aOwcSMiIiJSCc5xIyIiIrIBznEjIiIiIgUbN3piOFdDvZid\nujE/9WJ29Chs3IiIiIhUgnPciIiIiGyAc9yIiIiISMHGjZ4YztVQL2anbsxPvZgdPQobN3piTpw4\n0dwlUCMxO3VjfurF7OhRnkjjtnPnTuh0Ovj4+GDRokX11ufk5KBdu3YwGAwwGAyYP3/+kyiDmll5\neXlzl0CNxOzUjfmpF7OjR3G09QFramowbdo0ZGdnw93dHUajETExMfDz86uz3cCBA7F9+3Zbvz0R\nERFRi2XzEbe8vDxotVp4enrCyckJY8aMwbZt2+ptx6tFWz6LxdLcJVAjMTt1Y37qxezoUWw+4nb5\n8mX06NFDee3h4YEjR47U2Uaj0SA3NxfBwcFwd3fHkiVL4O/v/8DjaTQaW5dITSg9Pb25S6BGYnbq\nxvzUi9lRQ2zeuD1OoxUaGoqioiK0adMGWVlZGDlyJM6dO1dvO47KEREREf3L5l+Vuru7o6ioSHld\nVFQEDw+POtu4urqiTZs2AIDhw4ejqqoKpaWlti6FiIiIqEWxeeMWFhaG8+fPw2KxoLKyEhs3bkRM\nTEydbUpKSpTRtLy8PIgIOnbsaOtSiIiIiFoUm39V6ujoiJUrVyIqKgo1NTWYMGEC/Pz8sHr1agBA\nUlIStmzZgi+++AKOjo5o06YNMjMzbV0GERERUcsjdigrK0t8fX1Fq9VKSkpKc5fz1CosLJSXX35Z\n/P39JSAgQJYvXy4iIlarVYYMGSI+Pj4ydOhQKSsrU/ZJTk4WrVYrvr6+smvXLmX5sWPHRK/Xi1ar\nlenTpyvLKyoqJD4+XrRarfTt21csFkvTneBTorq6WkJCQiQ6OlpEmJ+alJWVSVxcnOh0OvHz85PD\nhw8zP5VITk4Wf39/0ev1MnbsWKmoqGB2diwxMVHc3NxEr9cry5oqr7Vr14qPj4/4+PhIenr6I2u1\nu8aturpavL29xWw2S2VlpQQHB8vp06ebu6yn0pUrVyQ/P19ERG7cuCF9+vSR06dPy+zZs2XRokUi\nIpKSkiIffvihiIicOnVKgoODpbKyUsxms3h7e0ttba2IiBiNRjly5IiIiAwfPlyysrJEROTzzz+X\nKVOmiIhIZmamvPbaa016jk+D1NRUef3112XEiBEiIsxPRcaPHy9paWkiIlJVVSXl5eXMTwXMZrN4\neXlJRUWFiIjEx8fL2rVrmZ0d279/v5hMpjqNW1PkZbVapXfv3lJWViZlZWXKzw2xu8YtNzdXoqKi\nlNcLFy6UhQsXNmNFdFdsbKzs2bNHfH195erVqyJyp7nz9fUVkTufQO4dIY2KipJDhw5JcXGx6HQ6\nZXlGRoYkJSUp2xw+fFhE7vxh6ty5c1OdzlOhqKhIIiIiZN++fcqIG/NTh/LycvHy8qq3nPnZP6vV\nKn369JHS0lKpqqqS6Oho2b17N7Ozc2azuU7j1hR5fffddzJ58mRln6SkJMnIyGiwTrt7VumD7gN3\n+fLlZqyIgDs3hczPz0ffvn1RUlKCrl27AgC6du2KkpISAEBxcXGdK4jvZnf/cnd3dyXTe/N2dHRE\nu3bteIWxDb333ntYvHgxHBz+/a/O/NTBbDajS5cuSExMRGhoKCZNmoRbt24xPxXo2LEjZs2ahZ49\ne+K5555D+/btMXToUGanMk86L6vV+tBjNcTuGjfecNf+3Lx5E3FxcVi+fDlcXV3rrNNoNMzMTu3Y\nsQNubm4wGAwPvSci87Nf1dXVMJlMmDp1KkwmE1xcXJCSklJnG+Znny5cuIBly5bBYrGguLgYN2/e\nxIYNG+psw+zUxZ7ysrvG7XHuA0dNp6qqCnFxcRg3bhxGjhwJ4M4nj6tXrwIArly5Ajc3NwD1s7t0\n6RI8PDzg7u6OS5cu1Vt+d5/CwkIAd/5QXbt2jbeGsZHc3Fxs374dXl5eGDt2LPbt24dx48YxP5Xw\n8PCAh4cHjEYjAOCVV16ByWRCt27dmJ+dO3bsGF566SV06tQJjo6OGD16NA4dOsTsVOZJ/67s1KlT\no3oeu2vcHuc+cNQ0RAQTJkyAv78/ZsyYoSyPiYlRHsmSnp6uNHQxMTHIzMxEZWUlzGYzzp8/j/Dw\ncHTr1g1t27bFkSNHICJYv349YmNj6x1ry5YtiIiIaOKzbLmSk5NRVFQEs9mMzMxMDB48GOvXr2d+\nKtGtWzf06NFDeapMdnY2AgICMGLECOZn53Q6HQ4fPozbt29DRJCdnQ1/f39mpzJN8bsyMjISu3fv\nRnl5OcrKyrBnzx5ERUU1XFhjJvA9aT///LP06dNHvL29JTk5ubnLeWodOHBANBqNBAcHS0hIiISE\nhEhWVpZYrVaJiIh44CXSCxYsEG9vb/H19ZWdO3cqy+9eIu3t7S3vvvuusryiokJeffVV5RJps9nc\nlKf41MjJyVGuKmV+6nHixAkJCwuToKAgGTVqlJSXlzM/lVi0aJFyO5Dx48dLZWUls7NjY8aMke7d\nu4uTk5N4eHjImjVrmiyvNWvWiFarFa1WK2vXrn1krRoRPhCUiIiISA3s7qtSIiIiInowNm5ERERE\nKsHGjYiIiEgl2LgRERERqQQbNyJqFAcHB7z//vvK6yVLlmDevHk2OXZCQgK+//57mxyrIZs3b4a/\nv3+z30rB09OTd70nosfCxo2IGsXZ2Rlbt26F1WoFYNunnvyXY1VXVz/2tmlpafj666+xd+/eRr+f\nLdjLHdmJyP6xcSOiRnFycsLbb7+NpUuX1lt3/4jZs88+CwDIycnBwIEDMXLkSHh7e2POnDlYv349\nwsPDERQUhIsXLyr7ZGdnw2g0wtfXFz/99BMAoKamBrNnz0Z4eDiCg4Px1VdfKcft378/YmNjERAQ\nUK+ejIwMBAUFITAwEHPmzAEAfPLJJzh48CDeeustfPDBB3W2v3LlCgYMGACDwYDAwEAcPHgQADB1\n6lQYjUbo9Xp8/PHHyvaenp6YO3cuDAYDwsLCYDKZEBkZCa1Wi9WrVys1DhgwANHR0dDpdJgyZcoD\nH0W2YcMG9O3bFwaDAZMnT0ZtbS1qamqQkJCAwMBABAUFYdmyZY8OiIhaJMfmLoCI1Gvq1KkICgqq\n1/jcP4J07+uTJ0/i7Nmz6NChA7y8vDBp0iTk5eVhxYoV+Oyzz7B06VKICP78808cPXoUBQUFGDRo\nEAoKCpCeno727dsjLy8P//zzD/r164fIyEgAQH5+Pk6dOoVevXrVee/i4mLMmTMHJpMJ7du3R2Rk\nJLZt24aPPvoIv/zyC1JTUxEaGlpnn4yMDAwbNgxz586FiODWrVsAgAULFqBDhw6oqanBkCFD8Mcf\nf0Cv10Oj0aBXr17Iz8/HzJkzkZCQgEOHDuH27dvQ6/VISkoCABw9ehRnzpxBz549MWzYMPzwyJJS\nTAAAA8pJREFUww+Ii4tT3vfMmTPYtGkTcnNz0apVK7zzzjv49ttvERAQgOLiYvz+++8AgGvXrv2X\n2IhIxTjiRkSN5urqivHjx2PFihWPvY/RaETXrl3h7OwMrVarPN5Fr9fDYrEAuNPoxcfHAwC0Wi16\n9+6Ns2fPYvfu3Vi3bh0MBgNeeOEFlJaWoqCgAAAQHh5er2kD7jRLgwYNQqdOndCqVSu88cYb2L9/\nv7L+QaNeRqMR33zzDebNm4eTJ08qI4YbN27E888/j9DQUJw6dQqnT59W9rn7aL7AwEC8+OKLcHFx\nQefOndG6dWtcv35dqdHT0xMODg4YO3Ysfvvttzp17N27F8ePH0dYWBgMBgP27t0Ls9mM3r174+LF\ni5g+fTp27dqFtm3bPva/NxG1LBxxI6L/ZMaMGQgNDUViYqKyzNHREbW1tQCA2tpaVFZWKutat26t\n/Ozg4KC8dnBwaHB+2t1Ru5UrV2Lo0KF11uXk5MDFxeWh+93bnIlInRHAB80v69+/Pw4cOIAdO3Yg\nISEBM2fORL9+/ZCamopjx46hXbt2SExMREVFRb3zcnBwgLOzc51zvHte976XiMDBof5n5zfffBPJ\nycn1lp88eRI7d+7El19+iU2bNiEtLe2B50tELRtH3IjoP+nQoQPi4+ORlpamNCaenp44fvw4AGD7\n9u2oqqr6v44pIti8eTNEBBcuXMDFixeh0+kQFRWFVatWKY3QuXPn8Pfffzd4LKPRiF9//RVWqxU1\nNTXIzMzEwIEDG9ynsLAQXbp0wcSJEzFx4kTk5+fjxo0bcHFxQdu2bVFSUoKsrKyH1v4weXl5sFgs\nqK2txcaNG9GvXz9lnUajQUREBLZs2YK//voLAFBaWorCwkJYrVZUV1dj9OjR+PTTT2EymRqsn4ha\nLo64EVGj3Dt6NGvWLKxcuVJ5PWnSJMTGxiIkJATDhg1Tvmq8f7/7j3d3nUajQc+ePREeHo7r169j\n9erVcHZ2xsSJE2GxWBAaGgoRgZubG7Zu3Vpn3/t1794dKSkpGDRoEEQE0dHRGDFiRIPnlpOTg8WL\nF8PJyQmurq5Yt24devXqBYPBAJ1Ohx49etRpuh52Hvefr9FoxLRp01BQUIDBgwdj1KhRdbbx8/PD\n/PnzERkZidraWjg5OWHVqlV45plnkJiYqIxipqSkNFg/EbVcfMg8EVETyMnJQWpqKn788cfmLoWI\nVIxflRIRNYGGRgWJiB4XR9yIiIiIVIIjbkREREQqwcaNiIiISCXYuBERERGpBBs3IiIiIpVg40ZE\nRESkEv8DGxNJ2txL3c8AAAAASUVORK5CYII=\n",
       "text": [
        "<matplotlib.figure.Figure at 0x6d9b3d0>"
       ]
      }
     ],
     "prompt_number": 36
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Limitations of the Hashing Vectorizer"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Using the Hashing Vectorizer makes it possible to implement streaming and parallel text classification but can also introduce some issues:\n",
      "    \n",
      "- The collisions can introduce too much noise in the data and degrade prediction quality,\n",
      "- The `HashingVectorizer` does not provide \"Inverse Document Frequency\" reweighting (lack of a `use_idf=True` option).\n",
      "- There is no easy way to inverse the mapping and find the feature names from the feature index.\n",
      "\n",
      "The collision issues can be controlled by increasing the `n_features` parameters.\n",
      "\n",
      "The IDF weighting might be reintroduced by appending a `TfidfTransformer` instance on the output of the vectorizer. However computing the `idf_` statistic used for the feature reweighting will require to do at least one additional pass over the training set before being able to start training the classifier: this breaks the online learning scheme.\n",
      "\n",
      "The lack of inverse mapping (the `get_feature_names()` method of `TfidfVectorizer`) is even harder to workaround. That would require extending the `HashingVectorizer` class to add a \"trace\" mode to record the mapping of the most important features to provide statistical debugging information.\n",
      "\n",
      "In the mean time to debug feature extraction issues, it is recommended to use `TfidfVectorizer(use_idf=False)` on a small-ish subset of the dataset to simulate a `HashingVectorizer()` instance that have the `get_feature_names()` method and no collision issues."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 37
    }
   ],
   "metadata": {}
  }
 ]
}