{ "metadata": { "name": "", "signature": "sha256:672c6fd4d08769e8c9724e7fb9ce750bc018076186ec2bfd9d5f2097fbedb52b" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Dealing with text data in sklearn\n", "\n", "This demo is partially based on [the scipy2013 sklearn presentations](http://nbviewer.ipython.org/github/glouppe/tutorial-sklearn-scipy2013/blob/master/rendered_notebooks/05.2_application_to_text_mining.ipynb)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "# Score function from slides\n", "from sklearn.cross_validation import KFold\n", "from sklearn.metrics import accuracy_score\n", "\n", "def score(clf, X, Y, folds=2, verbose=False, metric=accuracy_score):\n", " predictions = np.zeros(len(Y))\n", " for i, (train, test) in enumerate(KFold(len(X), n_folds=folds, shuffle=True)):\n", " clf.fit(X[train], Y[train])\n", " predictions[test] = clf.predict(X[test])\n", " if verbose:\n", " print(\"Fold {}: {}\".format(i + 1, accuracy_score(Y[test], predictions[test])))\n", " if metric:\n", " return metric(Y, predictions)\n", " return Y, predictions" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.datasets import fetch_20newsgroups" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 20 Newsgroups dataset\n", "We will use a standard text dataset built into sklearn." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Set to None to use all categories -- use 4 for speed purposes\n", "categories = [\n", " 'alt.atheism',\n", " 'talk.religion.misc',\n", " 'comp.graphics',\n", " 'sci.space',\n", "]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "ng_train = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))\n", "ng_test = fetch_20newsgroups(subset='test', categories=categories, remove=('headers', 'footers', 'quotes'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 20ng dataset consists of posts from 20 topical forums. The goal is to classify a post as coming from one of the forums." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ng_train.target_names" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "['alt.atheism', 'comp.graphics', 'sci.space', 'talk.religion.misc']" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "ng_train.target.shape" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ "(2034,)" ] } ], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "ng_train.data[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "u\"Hi,\\n\\nI've noticed that if you only save a model (with all your mapping planes\\npositioned carefully) to a .3DS file that when you reload it after restarting\\n3DS, they are given a default position and orientation. But if you save\\nto a .PRJ file their positions/orientation are preserved. Does anyone\\nknow why this information is not stored in the .3DS file? Nothing is\\nexplicitly said in the manual about saving texture rules in the .PRJ file. \\nI'd like to be able to read the texture rule information, does anyone have \\nthe format for the .PRJ file?\\n\\nIs the .CEL file format available from somewhere?\\n\\nRych\"" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Converting text to feature vectors: Bag of Words\n", "Text is not in a form suitable for classification algorithms. We have to convert each post to a vector of features.\n", "\n", "The simplest approach is bag of words. In its simplest form, each dimension of the feature vector corresponds to a word (\"car\", \"body\", etc), and the value is the number of times that word appears in the post.\n", "\n", "It is more common to use ngram based models, in which each dimension corresponds to an ngram (a sequence of n words: \"I was wondering\", \"was wondering if\"). Values still correspond to the number of times the ngram appears\n", "\n", "sklearn has several feature extractors to make this easy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.feature_extraction.text import CountVectorizer\n", "\n", "# Learn ngrams of size 1,2,3\n", "# Use the 20000 most common items as the vocabulary\n", "counter = CountVectorizer(ngram_range=(1,3), max_features=20000)\n", "\n", "# Transformers also have a fit_transform method\n", "# This learns a transform then retursn the transform\n", "# applied to the training data\n", "train_counts = counter.fit_transform(ng_train.data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "# After the vectorizer is fit, we can convert\n", "# any posts to feature vectors using the vocabulary it learned\n", "test_counts = counter.transform(ng_test.data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "test_counts.shape" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "(1353, 20000)" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "# All nonzero terms are integers\n", "test_counts[2][test_counts[2].nonzero()]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "matrix([[1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1,\n", " 1, 1, 1, 1, 3, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,\n", " 1, 1, 1, 3, 1, 2, 1, 1, 3, 1, 1, 1, 1, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1]])" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "# Example vocabulary\n", "np.random.permutation(counter.get_feature_names())[:100]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "array([u'exist', u'fake', u'exists and', u'recognize that',\n", " u'but there are', u'something', u'difference in', u'interaction',\n", " u'polygon', u'the graphics', u'pushing', u'of pc',\n", " u'know if anyone', u'patience', u'617', u'is never', u'of ways',\n", " u'you mention', u'and source', u'in by', u'are available',\n", " u'fractal', u'the believer', u'designers', u'the charge',\n", " u'an application', u'with some', u'fear', u'nick', u'didn make',\n", " u'by david', u'made', u'pretend', u'conference', u'so now',\n", " u'question and', u'in any case', u'this the', u'press release',\n", " u'in terms of', u'an enormous', u'proprietary', u'new version of',\n", " u'spacecraft attitude', u'on something', u'office kjenks',\n", " u'for over', u'll find', u'answers to', u'to face', u'would',\n", " u'explosion', u'are on', u'typical', u'someone is', u'headline',\n", " u'allow me to', u'of the word', u'figure', u'where can ftp',\n", " u'to implement', u'from us', u'on what', u'terminal', u'war on',\n", " u'will no', u'kind of like', u'more than one', u'venus',\n", " u'sorry don', u'are distributed', u'the uk', u'far more',\n", " u'that uses', u'exclusive', u'of users', u'kennedy space', u'alter',\n", " u'battle', u'problem in', u'cost of', u'will this', u'best of',\n", " u'planning to', u'the item', u'the committee', u'run on',\n", " u'find it rather', u'nor', u'this leads', u'latter day', u'to save',\n", " u'no real', u'heaven but', u'of the art', u'and tried to', u'mu',\n", " u'in the field', u'to acknowledge', u'63'], \n", " dtype='" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "proj = svd.fit_transform(train_counts)\n", "vis_proj(proj)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWYAAAD9CAYAAACP8N0iAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XlcVPX++PHXAKOyuC9ggKGCAi5AKmim4oJbSZZmYrll\n5tVvpfdXmtXtpnUDbTPNLLvXvZvYvXVFS0nNRq0EdzMxI4VkT1xBQGDm8/uDnMQFBmYYBuf9fDzO\n4zJnPuec98ylNx/f53M+H41SSiGEEMJmONR2AEIIIcqTxCyEEDZGErMQQtgYScxCCGFjJDELIYSN\nkcQshBA2xqzEXFRURFhYGMHBwQQGBvLiiy8CcP78eSIiIujQoQODBw/m4sWLxmNiYmLw8/PD39+f\nbdu2mRe9EELUotvlwBs9++yz+Pn5ERQUxOHDhys/sTLTlStXlFJKlZSUqLCwMLVnzx41e/ZstXDh\nQqWUUgsWLFAvvPCCUkqp48ePq6CgIFVcXKxSUlJU+/btlV6vNzcEIYSoNbfKgdf76quv1LBhw5RS\nSiUkJKiwsLBKz2l2KcPFxQWA4uJi9Ho9TZs2ZdOmTUycOBGAiRMnsnHjRgDi4uKIiopCq9Xi4+OD\nr68v+/btMzcEIYSoNTfmwGbNmpV7//p8GBYWxsWLF8nJyanwnE7mBmUwGLjnnns4deoU06dPp1On\nTuTk5ODu7g6Au7u7MYjMzEx69uxpPNbLy4uMjIxy59NoNOaGJISwI8qMh5ebNWvGhQsXTG7v5uZG\nXl5euX035sDAwMBy72dkZODt7W187eXlRXp6ujFH3orZPWYHBweOHDlCeno6u3fv5ttvvy33vkaj\nqTDZ3uo9pVSd2V599dVaj0Fitr2trsVbV2M214ULF5gHJm/5+fk3nePGHKjT6W6Z065XWQfUYqMy\nGjduzP3338/Bgwdxd3cnOzsbgKysLFq1agWAp6cnaWlpxmPS09Px9PS0VAhCCFFlTlXYKnItBx44\ncKDc/urkPbMSc25urnHERWFhIdu3byckJITIyEjWrFkDwJo1axg5ciQAkZGRxMbGUlxcTEpKCsnJ\nyYSGhpoTghBCmEVbhe1Gt8uB14uMjGTt2rUAJCQk0KRJkwrLGGBmjTkrK4uJEydiMBgwGAyMHz+e\ngQMHEhISwpgxY1ixYgU+Pj589tlnAAQGBjJmzBgCAwNxcnJi2bJldb6mHB4eXtshVJnEXPPqWrxQ\nN2O2BHOS4O1y4PLlywGYNm0aw4cPZ8uWLfj6+uLq6sqqVasqPa9GWaJQY0EajcYitSMhxJ3P3Hyh\n0WhYVoX2MzDvZqOpzB6VIYQQddmtShS1TRKzEMKu2WIStMWYhBDCaqTHLIQQNsYWk6AtxiSEEFYj\nPWYhhLAxkpiFEMLGONd2ALcgiVkIYddsMQnaYkxCCGE1UsoQQggbY4tJ0BZjEkIAly5d4siRIzRp\n0oSuXbvW+XllbJX0mIUQJvnpp5/oN2QopR5t0P+eybB+fdmwdjUODrJ+sqXZYhKU/5eFsEFRU6Zy\nfto8Lq/7gStxJ9h6/BdiY2NrO6w7kjnTftYUScxC2KCUX5MhfETZiwbOFIQN4pdfkms3qDuUcxW2\nG6WlpdG/f386depE586dWbJkyU1t3n77bUJCQggJCaFLly44OTkZ53C+HUnMQtigwC5dcYgrW2yC\nSxdw0cURFNS1doO6Q5nTY9ZqtSxatIjjx4+TkJDABx98wIkTJ8q1ef755zl8+DCHDx8mJiaG8PBw\nmjRpUmFMkpiFsEEbVq3Ac/NK3Ia3p/6wdjzxwFDjSkDCssxZWsrDw4Pg4GCgbKHWgIAAMjMzb3ut\nTz/9lKioqEpjkonyhbBRJSUlnD59msaNG+Ph4VHb4dgkS0yUf66Cu3/fKfj+utO/abj9RPmpqan0\n69eP48eP4+bmdtP7BQUFeHt7c+rUqUp7zLZ4Q1IIQdk/kzt27FjbYdzxnCrIguF/bNe8WXTrdvn5\n+YwePZrFixffMikDbN68mfvuu6/SpAySmIUQdk7raN7xJSUljBo1iscff7zCclNsbKxJZQyQUoYQ\nog6zRCmjuLHp7etdKl/KUEoxceJEmjdvzqJFi2573KVLl2jXrh3p6ek4O1c+bZL0mIUQdk1bv/rH\nfv/993zyySd07dqVkJAQAKKjozlz5gxQtko2wMaNGxkyZIhJSRmkxyyEqMMs0WNWd1Whfaaski2E\nEDXPBrOgDYYkhBBWZINZ0AZDEkIIKzJzVEZNkMQshLBvNpgFbTAkIYSwIjNGZdQUScxCCPtmg1nQ\nrEmMbjfl3fnz54mIiKBDhw4MHjy43BR3MTEx+Pn54e/vz7Zt28yLXtg9pRQGg6G2wxB1mTmzGNUQ\nsxLz7aa8W7BgAREREfzyyy8MHDiQBQsWAJCUlMSGDRtISkoiPj6eGTNmyH9UolqUUvxt3nwaNGxI\nfRcXoiY/wdWrV2s7LFEXOVZhsxKzEvOtprzLyMhg06ZNTJw4EYCJEyeyceNGAOLi4oiKikKr1eLj\n44Ovry/79u0z8yMIe7R27ToWbfic4s0nKd39O3EpOcz9+6u1HZaoi2ywx2yxS6WmpnL48GHCwsLI\nycnB3d0dAHd3d3JycgDIzMykZ8+exmO8vLzIyMi46Vzz5s0z/hweHk54eLilwhR3iK92fkvB2GfA\n3ROAwqkvs2XRLG4/W4G4E+h0OnQ6nWVPaoM1ZouElJ+fz6hRo1i8eDENGzYs955Go6lwdd9bvXd9\nYhbiVrzcW6E9eYSSP15rfj6CR6tWtRqTqHk3dtTmz59v/knvxMR8bcq78ePHG6e8c3d3Jzs7Gw8P\nD7Kysmj1x38wnp6epKWlGY9NT0/H09PT3BCEHXpx9vN8dm9vLj77IMq1EY57t7H0mx21HZaoi2xw\nuJxZNWalFFOmTCEwMJBZs2YZ90dGRrJmTdl6ZWvWrDEm7MjISGJjYykuLiYlJYXk5GRCQ0PNCUHY\nqZYtW3L8wH6WTRjF4hH9SDp0kC5dutR2WKIussEas1mzy3333Xf07duXrl27GksSMTExhIaGMmbM\nGM6cOYOPjw+fffaZcdb+6OhoVq5ciZOTE4sXL2bIkCHlA5LZ5YQQJrLI7HKPVKH9f8rPLpeWlsaE\nCRP4/fff0Wg0PPXUUzz77LO3PHb//v306tWLzz77jIcffrji68i0n0KIusoiidm0RUXK2q8vn5iz\ns7PJzs4mODiY/Px8unXrxsaNGwkICCh3nF6vJyIiAhcXFyZPnsyoUaMqvI6ski2EsG9mlDJMXSX7\n/fffZ/To0bRs2dLkkIQQwn5V8OCILrtsM8X1Q4avl5GRQVxcHDt37mT//v0VjlK7RhKzEMK+VbRK\ntlfZds38I7duV9Eq2bNmzWLBggXGsosppRdJzMKuZGRkMP3/Pc/JU6foERzE0rffMmk5eXEHa2De\n4ZWtkn3w4EHGjh0LQG5uLlu3bkWr1RIZGXnbc8rNP2E3CgoK6BgcQtagR9HfN4x6cavpnHGC/bt1\nODjI7Za6yCI3/2ZWof3i6q2Sfc3kyZMZMWJEpaMypMcs7Mb+/fu57NoU/dOvAVDcNYwTA8seerr7\n7rtrOTpRa8zIgqaukm3FkISoW+rVq4ehIB8MBnBwgOKrGIqvUq9evdoOTdQmM7LgfffdV6UZMlet\nWmVSO0nMwm706NEDf/cW/DR7LEX3DsFl66cMHTac1q1b13ZoojbZ4Jp/UlgTdsPJyYld8VuY06sL\no5P38PojI9iwdnVthyVq2532SHZNkJt/ojJZWVnMj1nAmaxsHhjQn+l/mWbS2FBx57HIzb/XqtD+\n71glP0kpQ9QpFy5c4J57e5Mb/jClPSLZ9dFiklNSWPTmwtoOTdRVNji7nPSYRZ2ydu1aZqz7giuL\ny1bFITcH7dC2FOXny5A3O2SRHvM7VWj/nPSYhbiJXq+Hetc9EVC/AcpgkD/movpsMAtKF0PUKcOH\nD0d7aDcOK9+ChG9wfu4RosZPwNHRBm+ti7rhTluMVQhrc3d3Z9/uXQz77QAhq//BrIH3smLZB7Ud\nlqjLZFRG5aTGLIQwlUVqzP+qQvsnpcYshBA1zwarYJKYhRD2zczZ5WqCJGYhhH2zwSxogyEJIYQV\nSSlDCCFsjA1mQRkuJ4Swb2YMl0tLS6N///506tSJzp07s2TJkpva/Pzzz/Tq1YsGDRrwzjumPWZo\ng38rhBDCiswoZWi1WhYtWkRwcDD5+fl069aNiIgIAgICjG2aN2/O+++/z8aNG00+r/SYhRD2rUEV\ntht4eHgQHBwMgJubGwEBAWRmZpZr07JlS7p3745WqzU5JOkxCyHsWwU9Zt3hss0UqampHD58mLCw\nMLNDksQshLBvFWTB8B5l2zXzV966XX5+PqNHj2bx4sW4ubnVZEhCCGEHzMyCJSUljBo1iscff5yR\nI0faQkhCCFHHmZEFlVJMmTKFwMBAZs2aVWlbU5k1idETTzzBV199RatWrTh27BgA58+f59FHH+W3\n337Dx8eHzz77jCZNmgAQExPDypUrcXR0ZMmSJQwePPjmgGQSIyGEiSwyidGxKrTvUj7Bfvfdd/Tt\n25euXbsalzeLjo7mzJkzAEybNo3s7Gx69OjB5cuXcXBwoGHDhiQlJVVY8jArMe/Zswc3NzcmTJhg\nTMxz5syhRYsWzJkzh4ULF3LhwgUWLFhAUlIS48aNY//+/WRkZDBo0CB++eWXm1adkMQshDCVRRLz\niSq0D7DO7HJmDZfr06cPTZs2Lbdv06ZNTJw4EYCJEycax+7FxcURFRWFVqvFx8cHX19f9u3bZ87l\nhRDCfPWrsFmJxWvMOTk5uLu7A2WTmufk5ACQmZlJz549je28vLzIyMi45TnmzZtn/Dk8PJzw8HBL\nhymEqIN0Oh06nc6yJ7XBO201GpJGo6lwWfnbvXd9YhZCiGtu7KjNnz/f/JPaYGK2+JN/7u7uZGdn\nA5CVlUWrVq0A8PT0JC0tzdguPT0dT09PS19eCCGqxgaXlrJ4Yo6MjGTNmjUArFmzxjiuLzIyktjY\nWIqLi0lJSSE5OZnQ0FBLX14IIapEOZq+WYtZfwOioqLYtWsXubm5eHt789prrzF37lzGjBnDihUr\njMPlAAIDAxkzZgyBgYE4OTmxbNmyCsscQghhDXobLGXIYqxCiDrLEsPliq6Y3r6BqyzGKoQQNe5q\n/XpVaF1cY3FcTxKzEMKu6R1tb20pScxCCLumt8FF/yQxCyHsWqkkZiGEsC16G0yDtheREEJYkS2W\nMmTNPyGEXdPjaPJ2oyeeeAJ3d3e6dOlyy3Pn5uYydOhQgoOD6dy5M6tXrzYpJknMQgi7dpV6Jm83\nmjx5MvHx8bc999KlSwkJCeHIkSPodDqee+45SktLK41JErMQwq7pcTJ5u9Gtpj6+XuvWrbl8+TIA\nly9fpnnz5jg5VV5BlhqzEMKuVVRj3q8r4ICusNrnnjp1KgMGDOCuu+4iLy/POEVFZSQxCyHsWkWJ\n+Z7whtwT3tD4+qP556p07ujoaIKDg9HpdJw6dYqIiAiOHj1Kw4YNKzxOShlCCLtWiqPJW1X98MMP\nPPLIIwC0b9+etm3bcvLkyUqPk8Rsh5KTkxnQPxTPu5oyZHBvUlNTazskIWqNOTXmyvj7+7Njxw6g\nbHWnkydP0q5du0qPk8RsRUopohe+ibd/IG27BLFq1Wqrx3DlyhUGR9zHgwMOsHfTRcK7JzJ0SF+K\ni60zOYsQtsac4XJRUVHce++9nDx5Em9vb1auXMny5ctZvnw5AC+99BIHDhwgKCiIQYMG8eabb9Ks\nWbNKY5JpP61o0eIl/O3jVRTM+xcUFeDy0ng+XbqYBx980Gox7N27l2dmDOXAlsvGfR37NuSLjXvp\n1KmT1eIQwhIsMe3nTtXL5PYDNHttf5VsUTVr/vs5Bf/vLejUDbr1oWDq31j33y+sGkOjRo3I+b2U\nwj9uNOflw7nzJZXejBDiTlWTNebqklEZVtTQ1RXOZhlfa85m0cjV1aoxBAYG0i98CIPGbmNo+BU2\nbXflkUceoU2bNlaNQwhbYYtzZUgpw4r27NnD0IcepmDMDBwKr+D61ToOfLeHDh06WDUOg8HAunXr\nOHnyBJ07dyUqKkqW+RJ1kiVKGV+qgSa3f0DzjVXykyRmKzty5Aj/jt1APa0TT0yaRPv27Ws7JCHq\nLEsk5jg12OT2D2q2ydJSd6Lg4GCCg4NrOwwhxB9kPmZRbQaDgfeWvM+XO7/F070Vb7zyN6kLC2EB\nxdSv7RBuIqMyaoFSirffWshdrZvSvLkbM5+dRklJSYXHzJw9h1fWxvLtgPGs13rQrfd95ObmWili\nIe5c5oxjrinSY7aikydPsmPHDl55ZQ4u9QvYEQuNG8KEmZ/w+mtNee31BeXal5SU8Mknn5CRkcGH\nHyxFv/0MNG+FfvAoClN/ZtOmTTzxxBO19GmEuDPYYilDesxWoJRi4lN/IbhPX+bMeZZ77ynglb9C\nYAfwbA2vzy4gPv5/5Y4pLS3l/uH9Wf3P6VzOfIVmbldx+OQd4/sGNGzfvpXwfiGMeCCchIQEa38s\nIe4INflIdnVJj7kGGAwGtmzZQlZWFmFhYfzyyy98/sM+iv7+McH/nkBAh8ucPPVn+5OnoFmzFuXO\nsW3bNnIy93Po62IcHWHGJOjQ+00M7brikJmKU8IWfvE2sOClAtIyYMQDA9Ht2idP7wlRRba4tJQk\nZgszGAxEjhnLrp9/xeAfjHrpb9zfvx8l9Z1wen40x0pLadsUvtsHO3ZDo4bw8ylntsa/ZzyHXq/n\n4+XL8PIoS8oAbTxBo4G7V/2D7t17cKhxA1YvyqVLQNn7ySmFfPZZLPPnv15pjOfOnSMhIYGGDRvS\nu3dvHB1t7xdTCGuRxGwHvv76a3Yl/UL++n2grQc/H+XzqDAaNKpPQOd6hLQvJXaTA40bGnhwCOze\nBw2cnXFxcTGeY+GCf5CW+g2/pcGmr6FnN4h5HzxaQffOAfxn3RoCA7wpLPrzugWFDrg4aiuN79ix\nYwwZ3JfO/orMbD1tfO4hbtMOtNrKjxXiTiSJ+Q5VUFDA4489woEDCRQWKYrc20FBPjRuBgd2oZq1\nonDOOxzNTufo4pfRXC2k3wBo1QK2r4eQ+/PocW9vjh85TNu2bdm8eQPvzS/LujNehDPp0NANrhTA\nT0e3svi9d5k162Wi/u95/vZsAWcyHNiw2ZXEfRMrjfX/Zkzgtecv8uQ4KC2F4eMPsGLFCv7yl7/U\n9NckhE26KsPlID4+Hn9/f/z8/Fi4cKG1L29xcXFxuLq68r+NW0hLP09u7mVKjx+H3h7wyRJYsRDe\n+y8MeQQm/hWi/o9GGg1XtsDSaHj8L3C3ewmFyoE333oLgEaNmpCaBn3C4NhOmDYeGjSAHZ/Bv94u\n4pNPPmLaX6az8M3V7Dw4kuz8CXz3/UHuvvvuSuNNTT3DwPvKfnZygn49C0hNPVXxQULcwWpylWyd\nTkfjxo0JCQkhJCSEf/zjHybFZNVHsvV6PR07dmTHjh14enrSo0cP1q9fT0BAwJ8B1aFHsjUaDWi1\nZV1PAOUIPAW0ApKA/wEaePRJNE2b4pL1M0XpmfQ4uJehQAmwWAsGB3C5quGSxoGjx49x4cIFHoyM\nYPKjRVy8BP/9ysCR7eB9F2zdCf/4oBM/7P2pWjE//NAQ2rfeyZt/K+X8Beg/xpW/z1/N6NGjLfGV\nCGFVlngkO0bNMrn9i5r3yl1vz549uLm5MWHCBI4dO3ZTe51Ox7vvvsumTZuqFJdVe8z79u3D19cX\nHx8ftFotY8eOJS4uzpohWMyaNWvAxRXaBsBfXgHfzlC/CWVJGSCQa1+vw5crCdj5Bkv6fs4DLfaS\n0gDOAAZAlcCwq/AUCqUMLFjwFpcuXeLpZ+ZwhWk08vgrBkMD+o7U4Oil4f7xGho431XpAym38+FH\na9lz0J9WXRtwd5iW+0c8xahRoyzwjQhRN5kz7Wdlq2QD1frDYdUac0ZGBt7e3sbXXl5eJCYm3tRu\n3rx5xp/Dw8MJDw+3QnRVM2nSpLIa8qd7wdkFJj8P4XcB2YDHH/9bitZJUdLEh19zcnByusLIofC1\nDuLd4cIlcHSAsyXgXwJOKLZu3cYXX2xHqbtQKpnnn3+Wy3kGLl1uCCjAg717U3n11deIjq58BMaN\n3N3d2ZvwIzk5Obi4uNCoUSPLfSlC1DCdTodOp7PoOSsan5yqO0Oq7ky1z63RaPjhhx8ICgrC09OT\nt99+m8DAwMqPs2Yp4/PPPyc+Pp5//vOfAHzyySckJiby/vvv/xlQHSllaDQaaOMLW5P/3DmwDWTn\ngJsflKZD0WVA4eYCjloNBjTUdzTw3Ubo6Avbd0PUdOjaEU4ehXqN4feLGgqKpgHngVJgM3APMJSy\nPnYs0IigIEeOHLn5j5oQ9sQSpYxX1Esmt39dE33T9VJTUxkxYsQtSxl5eXk4Ojri4uLC1q1bmTlz\nJr/88kul17FqKcPT05O0tDTj67S0NLy8vKwZgmWdzYR1SyAnA1a8iebyeerd1QQWLoDHpoNzW8CB\nfveCoVRRctWAX7uypAwQ0besx7wkGnKKIfZfZTf5nPgXbdmIG5txQg90ATRAEVAf+JnTp1N5//2l\ndeKPmBC2rCbnymjYsKFxKOywYcMoKSnh/PnzlR5n1VJG9+7dSU5OJjU1lbvuuosNGzawfv16a4Zg\nMUopNBoNmqUvw3tzcdA6Us9whcLsInh+HChXKGpC7x6OfLnWwPbdMHoq/PQzZGSVPYq9/whcyit7\ncMTREV5fBPlXFI9RSlvgKrDYAVxcV1NaqqW48CrNMHARB67k+TB37kIKCgp54YXZHD16lCWLF1BU\nVEDUuKk88MADtf0VCVEnXKVejZ07JyeHVq1aodFo2LdvH0opkxZjtWpidnJyYunSpQwZMgS9Xs+U\nKVPKjcioa5RSTJo0iTVr1qAvgrJl9BQUXgWKaHf3Ob5aV3aT7t7uZeOQNRo3OvYpol0bR35NNVBa\naiBqup5HI+GrHaAM4PPH+X90gNZesHpxKfePK2UMZe8VYGApv1BQMJwPPviY++8fxqCBvZn7f1do\n2himT9vBlXdW8OjYsdb/UoSoY8yZAyMqKopdu3aRm5uLt7c38+fPN96YnzZtGv/973/58MMPcXJy\nwsXFhdjYWJPOKyuYWMjSpUt55pln0DpByR+j55o2gf1boN3d8MqbsPRfkHfFCQPhQBMgFQfNUd5/\no4TN22HPXigtgoFAKLC2IXz0T+gRBK27wOzSsoLGRWATWlLphY9PDg9GDqB5/Y945a9l31v8t/Da\n+9UfUidEXWGJGvMz6k2T27+vmSMrmNQlTz/9NE8//TRnz55l7pw5bNmyhd9zc/Hva0CjgXpaKC0A\nH0rJZzeXULRCQ7oqYebfoH49MJSUjW3eU78+O4qv0kBB7nn4YGVZsn8DcAYcgQJKcOR7vD17Ulpa\nQv2Gf/6yNKgPen1pLX0TQtQt8ki2HWjZsiUrVq0CyupLs555hu3f7OHi+VwMlHIG6E0xrYGdaHHE\ngfsMBnKKymrKaRpHGjVuzIXLv1N4BSbPAudSeBZwpeyRlfpABPAxehISEpj70ktMmvgpHq0KaNYE\nnn/dhb8+N7OWvgEh6hZbnI9ZShlWcvHiRTZt2sQ77yzmxI/HcaABpZTSmiIuoOdhwAVYo9EwUily\nge+AekBP4N4/znOWsgFzzwA7gQQHB46fPEl6ejrvvP0qV68WMXbsVCY/MUVWvr6F4uJiPvrwQ06d\n+pmg4B5MmjQJBweZlryuskQpY4paanL7FZqnZZXsO5FSivXr17Ns2TJ+PHgQbVERQ4EOwI/ANw4O\n/NVgQAHXKl/tgNGU1ZePAoeA8cA/gdImTcg+exYnp4r/8XPo0CF27NhB06ZNeeyxx8rNZmcvDAYD\nkSMGUVqYwOC+hfznKxe6hoxi+cdrazs0UU2WSMyT1Icmt1+tmS6J+U537tw5egQH43b+PM4GAyeA\nBhoNfyksxBH4CjhGWemiGWWljJ+BRpSVPZwaNOCn5ORKx4Jv3LiRaU+N47GHSkhOrUfmWW927zmI\nq6trjX4+W7N//34ei+rP8Z1X0Goh/wq06VGfpBOpeHh41HZ4ohoskZjHqRUmt/9UM8Uq+Un+DVeL\nmjdvzuGffmLmkiWMf+stDv74I6F9+vCpqys7nJxIcXbGLyCAfMoS8TmNhkaNGvF8dDQbvvySCwUF\nJj2g8/xz0/nP8kLenVfKplUFeLZMY926dTX++WxNQUEBzZs6cm3qaVcXcHV1pKCgoHYDE7XKnLky\naorc/KtljRs3ZsqUKcbXcVu2sGHDBtLS0pjfowcDBgzg9OnTfPXVVzg7OzNy5EhatGhRwRlvduHi\nZTq2L/tZowF/36smPX1kKRcvXuRf//oXFy6cY8iQYfTt29dq175et27dyMl15q1l+dw/yMCqDVrc\n3duYNF2quHNZcy0/U0kpww48Nm4kDiXxLJp3leTT8NCTzmzarCM0NLTGr33p0iV6hnWlW+cc/HyK\n+fhTZ958czmPPf54jV/7Vk6fPs0zT0/m11+TCQoK4f2lK3F3d6+VWIT5LFHKGKlMf/p4oyZKaszC\nMvLy8nhq6uPEf72dJo3dePPNpTwyZoxVrv3BBx+w6+vZfLa87LnIxEMw9v9akpL6u1WuL+5slkjM\nI9RnJrffrBkjD5gIy2jYsCHrY2tn3uu8vDy87/pz7ug2npB/pbBK57h8+TI6nQ6NRsOAAQPs7qal\nqFm2OI5ZErOoUcOGDSNi0OsM6lNKh3bw/OsNiBwxwuTjMzMz6d69F/n5LoCiadNSDhzYS8uWLWsu\naGFXbLHGLKMyRI0KCgpi7brPefHNdgwc2wKPNo+y9APThyc999xczp71IS9vLHl5UWRlufPSS3+v\nwYiFvSmmnsmbtUhiFjVu6NChHDl6itTfzvLhR6txdnYmISGBgPbtaeTiQv/evcnIyLjlsadP/0Zp\n6Z+r3pSUeHH69G/WCl3YAVscLieJWVhdVlYWwyMi6Hr6NNMLCyExkeGDBt3ypkp4eG+cnY9QNr1T\nMc7OR+lHxfTyAAAXDElEQVTf/z6rxyzuXHqcTN6sRRKznSouLqa0tHZmoEtISMDLwYFAyuYH6afX\nczolhdzc3JtibNPGEy8vhaPjmzg5vUNkZA/mzp1TK3GLO5M5K5g88cQTuLu706VLl1ue+9///jdB\nQUF07dqV3r178+OPP5oUkyRmO3P16lUeHTUKNxcXXJ2dmfn00xgMBqvG0LRpU84bDOj/eJ0PlBoM\nuLm5Gdvo9XoGDRrGnDnvk5zcivr1m/Hcc/+P2Nh1lc4LIkRVmJOYJ0+eTHx8/G3P3a5dO3bv3s2P\nP/7IK6+8wlNPPWVSTJKY7czLc+dyfOtW5uj1zCotZdOqVSz/6COrxtC3b1+CevVivasrOxwdWefi\nwt9ffRVnZ2djG51Ox+HDv1JQ8CgQTkHBeN59910KC6s21E6IypiTmPv06UPTpk1ve+5evXrRuHFj\nAMLCwkhPTzcpJul62JldO3bQo7AQLaAFggsK2Pn110yfMcNqMTg4OBC3ZQvr16/nzJkz/D00lIiI\niHJtLl++jINDY/7sO7ii0ThRUFBQLoELYa6r1L/te1d0ByjQHbTIdVasWMHw4cNNaiuJ2c7c5e1N\nZlISPn+UL7K1Wnr5+Fg1hqtXr/LB0qX8+utxgoPDGDhw4E1t7r33XpTKoGwy1LtxctqPv3+ASQtZ\nClEVFa1g0iA8jAbhYcbXufOXV+sa3377LStXruT77783qb0kZjvzzpIl3BcWRk5xMcXA1aZNeemV\nV6x2fb1ez4ORETipAwzpV8jalRtITNzFipWflmvn7u7Ot99uY/z4J8nM3E337t3597//a5z8v7S0\nlOPHj6PRaOjUqROOjrb39JaoG2p6aakff/yRqVOnEh8fX2HZ43oyV4YdOnv2LNu3b8fJyYlhw4bR\nsGFDq117//79PD5uAMd35uPkVLZyuHf3+hxPSqF169YmnePy5cv06xdBcvIZQNGxY1t0um1W/RzC\nNlhiroy71QmT2/+mCbjpeqmpqYwYMYJjx47d1P7MmTMMGDCATz75hJ49e5p8Hekx26GWLVsybtw4\ni54zOTmZ9PR0OnXqRKtWrW7brqCggKaNHbg2sMLFuWxO5Krc1Js792VOnDBw9eo0AI4f38zLL/+d\nJUsWmfUZhH0yZ3xyVFQUu3btIjc3F29vb+bPn09JSdncMNOmTeO1117jwoULTJ8+HQCtVsu+ffsq\nPa/0mIXZ5r3yCovfeYdW9erxe2kpG774gsGDB9+ybX5+PkFd/ZgadZbhA/Ss2qDlhyMd+GHvUZPL\nEffeO4C9e1tTtiAXwM/06XOO3bu3WeYDiTrDEj3m1uq0ye2zNO1kBRNh+w4ePMjSd99lamEh4y5d\n4qErV3h09Gj0ev0t27u5ufHNzr3sPdaPqGe8yMkbxpdffVulGvE993Shfv2TgAIMNGhwkm7dulrm\nAwm7Y85wuZoiPWZhlg0bNrBg6lRG5uUZ971Vrx6pGRlVXmnFVHl5eYSHD+bkybKeTmCgHzt3xpd7\nQEXYB0v0mBtfzTK5/aX6rWU+ZmH7OnXqxG96PReAppQtFuvm5lajw9oaNmzIvn3f8fPPPwMQEBCA\ng4P8409Uj77U9tKg9JiF2ZYtXcqc2bNppNVS6ujI5vh4wsLCKj9QCDNZosfsfMn09S8LGzeTpaVE\n3XH+/HlycnLw8fGRJ/OE1VgiMdc7d8nk9sXNG9v2zb///Oc/xoH9hw4dKvdeTEwMfn5++Pv7s23b\nn3fKDx48SJcuXfDz82PmzJnVj1rYnGbNmhEQECBJWdQ5pSWOJm/WUu3E3KVLF/73v//dtBR9UlIS\nGzZsICkpifj4eGbMmGH8CzN9+nRWrFhBcnIyycnJFc7KJIQQ1mDQO5m8WUu1E7O/vz8dOnS4aX9c\nXBxRUVFotVp8fHzw9fUlMTGRrKws8vLyCA0NBWDChAls3Lix+pELIYQllDqavlmJxf8EZGZmlnv0\n0MvLi4yMDLRaLV5eXsb9np6et11OaN68ecafw8PDCQ8Pt3SYQog6SKfTodPpLHvSItsblVFhRBER\nEWRnZ9+0Pzo6mhFVWOm4qq5PzEIIcc2NHbX58+ebf9LaWcinQhUm5u3bt1f5hJ6enqSlpRlfp6en\n4+XlhaenZ7lJotPT0/H09Kzy+YUQwqJsMDFbZFT+9cNHIiMjiY2Npbi4mJSUFJKTkwkNDcXDw4NG\njRqRmJiIUop169YxcuRIS1xeCCGqr7QKm5VUOzH/73//w9vbm4SEBO6//36GDRsGQGBgIGPGjCEw\nMJBhw4axbNky4xy6y5Yt48knn8TPzw9fX1+GDh1qmU8hhBDVVVKFzUrkARMhRJ1liQdM+L4Kx/e+\n+Xrx8fHMmjULvV7Pk08+yQsvvFDu/QsXLvDEE09w+vRpGjRowMqVK+nUqVOFl5EJBoQQ9s2MUoZe\nr+fpp58mPj6epKQk1q9fz4kT5Sfej46O5p577uHo0aOsXbvWpIfrJDELIexbURW2G+zbtw9fX198\nfHzQarWMHTuWuLi4cm1OnDhB//79AejYsSOpqamcPXu2wpBsbwCfEEJYU0U39X7UwTHdbd/OyMjA\n29vb+NrLy4vExMRybYKCgvjiiy+477772LdvH7/99hvp6em0bNnytueVxCyEsG8VJebA8LLtmk/L\nj5u+NrChInPnzmXmzJmEhITQpUsXQkJCKl0YQhKzEMK+mTEM7sbnNtLS0so94Qxl84evXLnS+Lpt\n27a0a9euwvNKjVkIYd/MGC7XvXt3kpOTSU1Npbi4mA0bNhAZGVmuzaVLlyguLgbgn//8J/369at0\ntR3pMQsh7Nutl6c0iZOTE0uXLmXIkCHo9XqmTJlCQEAAy5cvB8pWyk5KSmLSpEloNBo6d+7MihUr\nKj2vjGMWQtRZFhnHvKYKx0+0Tn6SHrMQwr7dYhhcbZPELISwbzY4iZEkZiGEfZPELIQQNkYSsxBC\n2BgrzhpnKknMQgj7ZsZwuZoiiVkIYd9kVIYQQtgYqTELIYSNkRqzEELYGKkxCyGEjZFShhBC2BhJ\nzEIIYWNssMYs8zELIezb1SpstxAfH4+/vz9+fn4sXLjwlm10Oh0hISF07tyZ8PDwSkOSaT+FEHWW\nRab9jKrC8evLX0+v19OxY0d27NiBp6cnPXr0YP369QQEBBjbXLx4kd69e/P111/j5eVFbm4uLVq0\nqPAy0mMWQtg3M1YwMWWV7E8//ZRRo0YZl5yqLCmD1JiFEPauouFyZ3WQq7vt26askp2cnExJSQn9\n+/cnLy+PmTNnMn78+ApDksQshLBvFY3KaBpetl3zc9VXyS4pKeHQoUN88803FBQU0KtXL3r27Imf\nn99tj5HELISotrNnz7Jo0WJycs4SGTmcBx98sLZDqroaXiXb29ubFi1a4OzsjLOzM3379uXo0aMV\nJmapMQshquXChQsEBXXn7be3s3LlGcaNm8bixUtqO6yqq+FVsh988EG+++479Ho9BQUFJCYmEhgY\nWGFI0mMWQlRLbGwsFy82o6RkOAAFBe149dXXmTnz2VqOrIpuMwzOFKasku3v78/QoUPp2rUrDg4O\nTJ06tdLEXO3hcrNnz+bLL7+kXr16tG/fnlWrVtG4cWMAYmJiWLlyJY6OjixZsoTBgwcDcPDgQSZN\nmkRRURHDhw9n8eLFNwckw+WEqBMWLVrE3LmfUVw89I89+Tg7f0RBQZ7VYrDIcLleVTh+r3XyU7VL\nGYMHD+b48eMcPXqUDh06EBMTA0BSUhIbNmwgKSmJ+Ph4ZsyYYfwg06dPZ8WKFSQnJ5OcnEx8fLxl\nPoUQwuruv/9+tNqfgR+BTJydv2LMmEdrO6yqM6OUUVOqnZgjIiJwcCg7PCwsjPT0dADi4uKIiopC\nq9Xi4+ODr68viYmJZGVlkZeXR2hoKAATJkxg48aNFvgIQoja0KFDB7Zv30K3btn4+OziySeHsHz5\nB7UdVtXpq7BZiUVqzCtXriQqKgqAzMxMevbsaXzPy8uLjIwMtFptubuVnp6eZGRk3PJ88+bNM/4c\nHh5u0iOMQgjr69WrFwcOfG+16+l0OnQ6nWVPWtcmMYqIiCA7O/um/dHR0YwYMQKAN954g3r16jFu\n3DiLBXV9YhZCiGtu7KjNnz//9o1NVdcS8/bt2ys8ePXq1WzZsoVvvvnGuO/GcX3p6el4eXnh6elp\nLHdc2+/p6VnduIUQwjLupNnl4uPjeeutt4iLi6NBgwbG/ZGRkcTGxlJcXExKSgrJycmEhobi4eFB\no0aNSExMRCnFunXrGDlypEU+hBBCVJuZs8vVhGrXmJ955hmKi4uJiIgAympNy5YtIzAwkDFjxhAY\nGIiTkxPLli0zPra4bNkyJk2aRGFhIcOHD2fo0KEVXUIIIWqeDZYyZNpPIUSdZZFxzC2qcHyudfKT\nPPknhLBvshirEELYGBssZUhiFkLYN0nMQghhY2xwuJwkZiGEfbPBHrPMxyyEEDZGErMQQpghPj4e\nf39//Pz8WLhw4U3vx8XFERQUREhICN26dWPnzp2VnlPGMQsh6iyLjGOmKseXv55er6djx47s2LED\nT09PevTowfr16wkICDC2uXLlCq6urgAcO3aMhx56iF9//bXCq0iNWQhh5yq6+7cL2H3bd/ft24ev\nry8+Pj4AjB07lri4uHKJ+VpSBsjPz6dFixaVRiSJWQhh5yq6+9f7j+2a18u9m5GRgbe3t/G1l5cX\niYmJN51l48aNvPjii2RlZbFt27ZKI5IasxDCzlV/CZNr8wBVZuTIkZw4cYLNmzczfvz4SttLj1kI\nYecKq33kjdMcp6WllVsQ5EZ9+vShtLSUc+fO0bx589u2kx6zEMLOVb/H3L17d5KTk0lNTaW4uJgN\nGzYQGRlZrs2pU6eMNwwPHToEUGFSBukxCyHsXvWfMHFycmLp0qUMGTIEvV7PlClTCAgIYPny5QBM\nmzaNzz//nLVr16LVanFzcyM2NrbS88pwOSFEnWWZ4XK/VOGIDjLtpxBC1DzbeyZbErMQws7Z3ixG\nkpiFEHau+qMyaookZiGEnZNShhBC2BgpZQghhI2RHrMQQtgY6TELIYSNkR6zEELYGOkxCyGEjZHh\nckIIYWOkxyyEEDbG9mrMMu2nmXQ6XW2HUGUSc82ra/FC3YzZMqo/7WdNqXZifuWVVwgKCiI4OJiB\nAweWmyw6JiYGPz8//P39yy2jcvDgQbp06YKfnx8zZ840L3IbURd/mSXmmlfX4oW6GbNllFZhu1ll\nq2QDPPvss/j5+REUFMThw4crjajaiXnOnDkcPXqUI0eOMHLkSObPnw9AUlISGzZsICkpifj4eGbM\nmGGcJm/69OmsWLGC5ORkkpOTiY+Pr+7lhRDCQqrfY9br9Tz99NPEx8eTlJTE+vXrOXHiRLk2W7Zs\n4ddffyU5OZmPP/6Y6dOnVxpRtRNzw4YNjT9fv/JrXFwcUVFRaLVafHx88PX1JTExkaysLPLy8ggN\nDQVgwoQJbNy4sbqXF0IIC6l+j/n6VbK1Wq1xlezrbdq0iYkTJwIQFhbGxYsXycnJqTAis27+vfzy\ny6xbtw5nZ2f27dsHQGZmJj179jS28fLyIiMjA61WW24tLE9PTzIyMm55XlMXOLQV1/61UJdIzDWv\nrsULdTNm871gcks3N7dyr01ZJftWbdLT03F3d7/tdSpMzBEREWRnZ9+0Pzo6mhEjRvDGG2/wxhtv\nsGDBAmbNmsWqVasq/lQmkNVLhBDWYm6+MbUTeeN1KjuuwsS8fft2ky46btw4hg8fDty8amx6ejpe\nXl54enqSnp5ebr+np6dJ5xdCCFtkyirZt8qJleW+ateYk5OTjT/HxcUREhICQGRkJLGxsRQXF5OS\nkkJycjKhoaF4eHjQqFEjEhMTUUqxbt06Ro4cWd3LCyFErTNllezIyEjWrl0LQEJCAk2aNKmwjAFm\n1JhffPFFTp48iaOjI+3bt+fDDz8EIDAwkDFjxhAYGIiTkxPLli0zdtuXLVvGpEmTKCwsZPjw4Qwd\nOrS6lxdCiFpnyirZw4cPZ8uWLfj6+uLq6mpayVfVoueff175+/urrl27qoceekhdvHjR+F50dLTy\n9fVVHTt2VF9//bVx/4EDB1Tnzp2Vr6+vevbZZ2sjbKOtW7eqjh07Kl9fX7VgwYJajeV6Z86cUeHh\n4SowMFB16tRJLV68WCml1Llz59SgQYOUn5+fioiIUBcuXDAec7vv25pKS0tVcHCweuCBB+pEvBcu\nXFCjRo1S/v7+KiAgQCUkJNh8zNHR0SowMFB17txZRUVFqaKiIpuKefLkyapVq1aqc+fOxn3Vic+W\n8kR11Gpi3rZtm9Lr9UoppV544QX1wgsvKKWUOn78uAoKClLFxcUqJSVFtW/fXhkMBqWUUj169FCJ\niYlKKaWGDRumtm7dWiuxl5aWqvbt26uUlBRVXFysgoKCVFJSUq3EcqOsrCx1+PBhpZRSeXl5qkOH\nDiopKUnNnj1bLVy4UCml1IIFCyr8vq/9/2JN77zzjho3bpwaMWKEUkrZfLwTJkxQK1asUEopVVJS\noi5evGjTMaekpKi2bduqoqIipZRSY8aMUatXr7apmHfv3q0OHTpULjFXJT5byxPVVauJ+XpffPGF\neuyxx5RSZX8Fr++BDhkyRO3du1dlZmYqf39/4/7169eradOmWT1WpZT64Ycf1JAhQ4yvY2JiVExM\nTK3EUpkHH3xQbd++XXXs2FFlZ2crpcqSd8eOHZVSt/++rSktLU0NHDhQ7dy509hjtuV4L168qNq2\nbXvTfluO+dy5c6pDhw7q/PnzqqSkRD3wwANq27ZtNhdzSkpKucRc1fhsKU9Ul83MlbFy5UrjyI7M\nzMxydzavjYW+cX9FY6Fr2q3GJtZWLBVJTU3l8OHDhIWFkZOTY7zp4O7ubhzkfrvv25r++te/8tZb\nb+Hg8OevpC3Hm5KSQsuWLZk8eTL33HMPU6dO5cqVKzYdc7NmzXjuuedo06YNd911F02aNCEiIsKm\nY4aq/x7YUp6orhpPzBEREXTp0uWmbfPmzcY2b7zxBvXq1WPcuHE1HY7F1IWHYPLz8xk1ahSLFy8u\n96QmlMVf0Wew5uf78ssvadWqFSEhIbcdV2pL8QKUlpZy6NAhZsyYwaFDh3B1dWXBggU3xWRLMZ86\ndYr33nuP1NRUMjMzyc/P55NPPrkpJluK+VbXr+0YrKHGp/2sbCz06tWr2bJlC998841xX10YC23K\n+MXaVFJSwqhRoxg/frxxWKK7uzvZ2dl4eHiQlZVFq1atgOqNs7SkH374gU2bNrFlyxaKioq4fPky\n48ePt9l4oax35uXlRY8ePQAYPXo0MTExeHh42GzMBw4c4N5776V58+YAPPzww+zdu9emY4aq/d7a\nWp6ottqso2zdulUFBgaqs2fPltt/rah/9epVdfr0adWuXTtjUT80NFQlJCQog8FQq0X9kpIS1a5d\nO5WSkqKuXr1qUzf/DAaDGj9+vJo1a1a5/bNnzzbW5GJiYm66iXKr79vadDqdscZs6/H26dNHnTx5\nUiml1Kuvvqpmz55t0zEfOXJEderUSRUUFCiDwaAmTJigli5danMx31hjrk58tpInqqtWE7Ovr69q\n06aNCg4OVsHBwWr69OnG99544w3Vvn171bFjRxUfH2/cf20YTPv27dUzzzxTG2EbbdmyRXXo0EG1\nb99eRUdH12os19uzZ4/SaDQqKCjI+N1u3bpVnTt3Tg0cOPCWw45u931bm06nM47KsPV4jxw5orp3\n715uuKetx7xw4ULjcLkJEyao4uJim4p57NixqnXr1kqr1SovLy+1cuXKasVnS3miOjRKyeQUQghh\nS2xmVIYQQogykpiFEMLGSGIWQggbI4lZCCFsjCRmIYSwMZKYhRDCxvx/m5bWlieRXYkAAAAASUVO\nRK5CYII=\n", "text": [ "" ] } ], "prompt_number": 19 }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Building a pipeline" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.pipeline import Pipeline" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import svm\n", "pipeline = Pipeline([\n", " ('tfidf', TfidfVectorizer(min_df=1, max_df=0.8, use_idf=True)),\n", " ('clf', svm.LinearSVC())\n", "])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "pipeline.fit(ng_train.data, ng_train.target)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "Pipeline(steps=[('tfidf', TfidfVectorizer(analyzer=u'word', binary=False, charset=None,\n", " charset_error=None, decode_error=u'strict',\n", " dtype=, encoding=u'utf-8', input=u'content',\n", " lowercase=True, max_df=0.8, max_features=None, min_df=1,\n", " ngram_range=(1, 1), nor...ling=1, loss='l2', multi_class='ovr', penalty='l2',\n", " random_state=None, tol=0.0001, verbose=0))])" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "pred = pipeline.predict(ng_test.data)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn import metrics\n", "print metrics.classification_report(ng_test.target, pred,\n", " target_names=ng_test.target_names)\n", "print metrics.confusion_matrix(ng_test.target, pred)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " precision recall f1-score support\n", "\n", " alt.atheism 0.69 0.62 0.65 319\n", " comp.graphics 0.88 0.90 0.89 389\n", " sci.space 0.79 0.89 0.83 394\n", "talk.religion.misc 0.68 0.61 0.64 251\n", "\n", " avg / total 0.77 0.78 0.77 1353\n", "\n", "[[197 14 47 61]\n", " [ 7 352 25 5]\n", " [ 18 21 349 6]\n", " [ 63 12 23 153]]\n" ] } ], "prompt_number": 24 } ], "metadata": {} } ] }