{ "metadata": { "name": "" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Wiki-Class Set-up Guide and Exploration\n", "\n", "[Wiki-Class](http://pythonhosted.org/wikiclass/) is python package that can determine the _quality_ of a Wikipedia page, using machine learning. It is the open-sourcing of the [Random Forest](https://www.wikidata.org/wiki/Q245748#sitelinks-wikipedia) algorithm used by [SuggestBot](https://en.wikipedia.org/wiki/User:SuggestBot). SuggestBot is an opt-in recommender to Wikipedia editors, offering pages that need work which look like pages they've worked on before. Similarly, with this package, you get a function that accepts a string of [wikitext](https://en.wikipedia.org/wiki/Wikipedia:Wikitext), and returns a [Wikipedia Class ('Stub', 'C-Class', 'Featured Article', etc.)](https://en.wikipedia.org/wiki/Wikipedia:Quality_scale#Grades). Wiki-class is currently in `alpha` according to its packager and developer [@halfak](https://twitter.com/halfak), and although I had to make a few patches to get some examples to work, it's ready to start classifying your wikitext.\n", "\n", "#Overview\n", "\n", "1. Setting it up on Ubuntu.\n", "1. Testing the batteries-included model.\n", "1. Using the output by introducing a closeness measure.\n", "2. Testing making our own model.\n", "\n", "##Setup\n", "\n", "At first you may be frustrated to learn that Wiki-Class is Python 3 only. You'll not be able to mix it with pywikibot, which is Python 2.7 only, and that can also mean upgrading some of your other tools. However just try to recall these update gripes next time you encounter a UnicodeError in Python 2.x; and then be thankful to Halfak for making us give Python 3 a try. I outline getting the environment running in Ubuntu 14.04 here.\n", "\n", "Firstly, if you want to use the Ipython notebook with python3 you can do so with `apt-get`. And while we're at it, for convenince we'll also install another version of pip for Python 3.\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!sudo apt-get install ipython3-notebook python3-pip" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[sudo] password for notconfusing: " ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\r\n", "\r\n" ] } ], "prompt_number": 95 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some requirements of Wiki-class, including sklearn, and nltk, which are a pain with Python 3 since they haven't been properly packaged for it yet. So these you'll have to get from source:" ] }, { "cell_type": "code", "collapsed": true, "input": [ "!pip3 install git+https://github.com/scikit-learn/scikit-learn.git\n", "!pip3 install git+https://github.com/nltk/nltk/#" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Making some random pages for a test dataset\n", "We'll need to get some Wikitext, with associated classifications, to start testing. I elected to make a random datasetin pywikibot, which as already stated is Python 2.7 only, and thus needs to be in a separate notebook, you can [view it on the nbviewer still](http://nbviewer.ipython.org/github/notconfusing/Wiki-Class/blob/master/Data%20Download%20Random%20Pages%20with%20Class%20python2.ipynb). Its output is a file `test_class_data.json` [(github link of the bzip)](https://github.com/notconfusing/Wiki-Class/blob/master/test_class_data.json.bz2) which is just a dictionary associating qualities and page-texts.\n", "\n", "Warning, this dataset has some examples that can cause a `ZeroDivisonError` because some of these pages have 0 non-mark-up text. I wrote [this patch](https://github.com/halfak/Wiki-Class/pull/5) which fixes this issue." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Testing the Pre-built Model" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import json\n", "import pandas as pd\n", "from wikiclass.models import RFTextModel" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "/usr/local/lib/python3.4/dist-packages/pandas/io/excel.py:626: UserWarning: Installed openpyxl is not supported at this time. Use >=1.6.1 and <2.0.0.\n", " .format(openpyxl_compat.start_ver, openpyxl_compat.stop_ver))\n" ] } ], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each model is stored in a `.model` file. A default one is included in the github repo. " ] }, { "cell_type": "code", "collapsed": false, "input": [ "!wget https://github.com/halfak/Wiki-Class/blob/master/models/enwiki.rf_text.model?raw=true" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "!mv enwiki.rf_text.model\\?raw\\=true enwiki.rf_text.model" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we load the model." ] }, { "cell_type": "code", "collapsed": false, "input": [ "model = RFTextModel.from_file(open(\"enwiki.rf_text.model\",'rb'))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "classed_items = json.load(open('test_class_data.json','r'))\n", "print(sum([len(l) for l in classed_items.values()]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "38959\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The Wiki-Class-provided model only deals with _'Stub', 'Start', 'B', 'C', 'Good Article'_, and _'Featured Article'_ classifications. It does not include not _'List', 'Featured List',_ or _'Disambig'_ class pages. So we have to sort out the standard classes out of our 38,000 test articles." ] }, { "cell_type": "code", "collapsed": false, "input": [ "standards = {actual: text for actual, text in classed_items.items() if actual in ['Stub', 'Start', 'C', 'B', 'GA', 'FA'] }" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "print(sum([len(l) for l in standards.values()]))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "36873\n" ] } ], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we iterate over our 36,000 standard-class pages, and put their Wiki-Class assessments into a DataFrame." ] }, { "cell_type": "code", "collapsed": false, "input": [ "accuracy_df = pd.DataFrame(index=classed_items.keys(), columns=['actual','correct', 'model_prob', 'actual_prob'])\n", "for actual, text_list in standards.items():\n", " #see if actual is even here, otherwise no fair comparison\n", " for text in text_list:\n", " try:\n", " assessment, probabilities = model.classify(text)\n", " except ZeroDivisionError:\n", " continue\n", " #print(actual, text)\n", " accuracy_df = accuracy_df.append({'actual': actual,\n", " 'correct':int(assessment == actual),\n", " 'model_prob': probabilities[assessment],\n", " 'actual_prob': probabilities[actual]}, ignore_index=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "What you see here is that the output of an assessment is really two things. The _'assessment'_ which is simply the _'class'_ which the algorithm predicts best, but secondly a _dictionary of probablities_ of how likely the text is to belong to each class. \n", "\n", "In our DataFrame we record four data. The _'actual'_ class as Wikipedia classes it; whether the actual class matches the model prediction. The probabilty (read: \"confidence\") of the model prediction. And lastly the probability of the actual class. Note in the \"correct\" case `model_prob` and `actual_prob` are the same." ] }, { "cell_type": "code", "collapsed": false, "input": [ "df = accuracy_df.dropna(how='all')\n", "df.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actualcorrectmodel_probactual_prob
18 Start 0 0.4 0.0
19 Start 1 0.8 0.8
20 Start 0 0.4 0.0
21 Start 0 1.0 0.0
22 Start 1 0.7 0.7
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 7, "text": [ " actual correct model_prob actual_prob\n", "18 Start 0 0.4 0.0\n", "19 Start 1 0.8 0.8\n", "20 Start 0 0.4 0.0\n", "21 Start 0 1.0 0.0\n", "22 Start 1 0.7 0.7" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we look at the `correct` mean averages we should hopefully see something above 1/6th, which would be the performance of just guessing. Which we do." ] }, { "cell_type": "code", "collapsed": false, "input": [ "groups = df.groupby(by='actual')\n", "groups['correct'].mean()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "actual\n", "B 0.247391\n", "C 0.278138\n", "FA 0.854167\n", "GA 0.444444\n", "Start 0.387334\n", "Stub 0.698394\n", "Name: correct, dtype: float64" ] } ], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#See how \"close\" predications are if they are not correct." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we hack on the output. The Random Forest is really just binning text into difference classes, it doesn't know that some of the classes are closer to each other than others. Therefore we define a distance metric on the Standard Wiki classes. I call this order the _\"Classic Order\"_ To get an intuition, consider this example. If an article is a _Good Aritcle_ and the model prediction is also _Good Article_ then it is off by 0; if the model prediction is _Featured Article_ it is off off by 1; if the model prediction is _Start_ then it was off by 3." ] }, { "cell_type": "code", "collapsed": false, "input": [ "classic_order = ['Stub', 'Start', 'C', 'B', 'GA', 'FA']\n", "enum_classic = enumerate(classic_order)\n", "\n", "for enum, classic in dict(enum_classic).items():\n", " print(enum, classic)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0 Stub\n", "1 Start\n", "2 C\n", "3 B\n", "4 GA\n", "5 FA\n" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to iterate over the same dataset as above, but instead of recording \"correctness\", we record the closesness in a DataFrame." ] }, { "cell_type": "code", "collapsed": false, "input": [ "classic_order = ['Stub', 'Start', 'C', 'B', 'GA', 'FA']\n", "classic_dict = dict(zip(classic_order, range(len(classic_order))))\n", "\n", "off_by_df = pd.DataFrame(index=classed_items.keys(), columns=['actual','off_by'])\n", "\n", "for classic in classic_order:\n", " for text in standards[classic]:\n", " try:\n", " assessment, probabilities = model.classify(text)\n", " except ZeroDivisionError:\n", " continue\n", " #print(actual, text)\n", " off_by_df = off_by_df.append({'actual': classic,\n", " 'off_by':abs(classic_dict[assessment] - classic_dict[classic])}, ignore_index=True)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "So it should look something like this as a table" ] }, { "cell_type": "code", "collapsed": false, "input": [ "off_by = off_by_df.dropna(how='all')\n", "off_by.head()" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
actualoff_by
18 Stub 2
19 Stub 1
20 Stub 0
21 Stub 0
22 Stub 0
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ " actual off_by\n", "18 Stub 2\n", "19 Stub 1\n", "20 Stub 0\n", "21 Stub 0\n", "22 Stub 0" ] } ], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And as a chart." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pylab inline" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "WARNING: pylab import has clobbered these variables: ['text']\n", "`%pylab --no-import-all` prevents importing * from pylab and numpy\n" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that the middle classes are less easy to predict where as the ends are easier. This would corroborate our expectations. Since the the quality sprectrum bleed past these rather arbitrary cut-off points,ore of the quality specturm would lie in these intervals, and so its easier to bin them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "ax = off_by.groupby(by='actual',sort=False).mean().plot(title='Prediction Closeness by Quality Class', kind='bar', legend=False)\n", " \n", "ax.set_ylabel('''Prediction Closeness (lower is more accurate)''')\n", "ax.set_xlabel('''Quality Class''')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEnCAYAAACuWyjDAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XdYFOf2B/DvAiIKi4goUUAXxUgTRKwoimAs2HuJCuo1\nRqPGa0zUmKLRaIwpxnij6I1yrwV7rhqVJJZJLCAGUQgWkIgUUUFRkCLt/f3hj4mEsmyZGXY4n+fh\n0dmdnfec3eUwe/adGQVjjIEQQki9YSR1AIQQQsRFhZ8QQuoZKvyEEFLPUOEnhJB6hgo/IYTUM1T4\nCSGknqHCL3PBwcH48MMPAQDnzp2Ds7OzVtuZM2cOVq9erc/QaiU5ORlGRkYoKysTfWwxGFp+RkZG\n+PPPPwGI+54wtOeprqPCXweoVCo0btwYSqUSr7zyCqZPn468vDy9bFuhUEChUAAAfH19cfPmTbWP\nCQ0Nha+vb4XbNm/ejA8++EAvMf1dQkICxo0bh+bNm8PKygqenp74+uuv6ZdcC9evX8fw4cNhZWUF\nS0tL+Pv7IyIiQpCxXn5PcBwHBwcHnbZH7wPxUOGvAxQKBX788Ufk5ubiypUr+P3336vckyopKdFq\n+3X5GL2kpCR0794dbdq0wR9//IEnT57gwIEDiI6OxrNnz6QOz6AkJSWhV69e8PT0RHJyMjIyMjBq\n1CgMGDAAkZGRUodXI3ofiIwRyalUKnb69Gl+efHixWzYsGGMMcYUCgX717/+xZycnFjbtm0ZY4wd\nO3aMeXp6MisrK+bj48NiY2P5x165coV5eXkxpVLJJkyYwCZOnMg++OADxhhjZ8+eZfb29vy6KSkp\nbNSoUax58+asWbNmbN68eezGjRusYcOGzNjYmFlYWLCmTZsyxhgLCgrit8MYY1u3bmVOTk7M2tqa\nDR8+nN27d4+/T6FQsC1btrD27dszKysr9tZbb1Wb++uvv86GDh1a7f137txhCoWClZaWMsYYS09P\nZ8OGDWPW1tbMycmJbdu2jV/30qVLzNvbm1laWjJbW1u2aNEi/r6IiAjWs2dPZmVlxTw9PRnHcfx9\nffv2ZR9++CHr1asXUyqVbMCAASwrK6tWj92xYwdr27YtUyqVzNHRke3evZsxxlhiYiLr06cPa9Kk\nCbOxsWETJkyoMb+tW7eyVq1asZYtW7IvvviCMcZYRkYGa9y4MXv06BG/fnR0NGvevDkrKSmptK0p\nU6awIUOGVLp9zpw5rE+fPoyxyu8Bxhhr06YN//67dOkS69GjB7OysmItW7Zk8+bNY0VFRfy6CoWC\nJSUlMcb+ek/k5eUxMzMzZmRkxCwsLJhSqWT37t1jjRo1qnXsmr4Ptm/fzlxcXJhSqWRt27ZlISEh\n/LqZmZlsyJAhzMrKillbWzNfX1/+vs8++4zZ2dkxpVLJOnToUOH3rj6hwl8HqFQqdurUKcbYi2Ls\n5ubGPvroI8bYi1+0AQMGsOzsbFZYWMiuXLnCWrRowaKiolhZWRn7z3/+w1QqFSsqKmLPnz9nrVu3\nZhs2bGAlJSXs4MGDrEGDBuzDDz9kjFX8pS8pKWEeHh5s0aJFLD8/nxUWFrILFy4wxhgLDQ1lvXv3\nrhBjcHAwv53Tp08zGxsbFhMTw54/f87mz5/PF5bymIcNG8aePn3KUlJSWPPmzVl4eHiVub/yyiss\nNDS02ufm77/wvr6+7K233mLPnz9nV69eZc2bN2dnzpxhjDHWo0cPtmvXLsYYY3l5eSwyMpIxxlha\nWhpr1qwZO3nyJGOMsV9++YU1a9aML+59+/ZlTk5OLDExkRUUFDA/Pz+2dOlStY999uwZs7S0ZAkJ\nCYwxxu7fv8/i4+MZY4xNnDiRrVmzhjHG2PPnz/nntrr8Jk+ezPLz81lcXBxr3rw5/34IDAxkmzdv\n5tdfuHAhW7BggUbP5ZkzZ5ixsTErLCyssvC/vOMRHR3NLl26xEpLS1lycjJzcXFhGzZs4Nd9ufC/\n/J7gOK7SdvURe7m/vw+OHz/O/vzzT8YYY7/++itr3Lgxi4mJYYwxtnTpUvbmm2+ykpISVlJSws6f\nP88YY+zmzZvMwcGBZWRkMMYYu3v3Lp9LfUOtnjqAMYaRI0eiadOm8PX1hZ+fH95//33+/mXLlsHK\nygoNGzbE1q1bMXv2bHTt2hUKhQLTpk1Dw4YNERERgcjISJSUlODtt9+GsbExxowZg65du1Y5ZlRU\nFDIyMrB+/Xo0atQIDRs2hI+PDx9PTXbv3o2ZM2eiU6dOMDU1xdq1axEREYGUlBR+naVLl8LS0hIO\nDg7o168frl69WuW2Hj16hJYtW9bqeUpNTcXFixexbt06mJqawtPTE//4xz/w3//+FwBgamqKxMRE\nZGVloXHjxujevTsAYNeuXQgMDMSgQYMAAP3790eXLl1w/PhxAC9abdOnT4eTkxPMzMwwfvx4Pt6a\nHqtQKGBkZIS4uDgUFBTA1tYWrq6ufCzJyclIT0+Hqakp/9xW5+OPP0ajRo3g7u6O6dOnIywsDAAw\nbdo07Nq1CwBQWlqKvXv3YurUqVVuIysrq8rnsmXLligrK0N2drba57hz587o1q0bjIyM0KZNG7zx\nxhv49ddfq12//L1S1XtGk9g1eR8AQGBgIBwdHQEAffr0wYABA/Dbb78BePHcZ2RkIDk5GcbGxujV\nqxcAwNjYGM+fP0d8fDyKi4vRunVrtG3bttZjygkV/jpAoVDgyJEjyM7ORnJyMjZt2oSGDRvy97/8\npdndu3fx5ZdfomnTpvxPWloaMjIycO/ePdjZ2VXYdps2baocMzU1FW3atIGRkeZvgYyMjArbNTc3\nR7NmzZCens7f9sorr/D/b9y4cbV92mbNmuHevXu1GvfevXuwtraGubk5f1vr1q35cb///nskJCTA\nxcUF3bp14wv73bt3ceDAgQrP2YULF3D//v0q423UqBEfb02Pbdy4Mfbt24ctW7agVatWGDp0KG7d\nugUA+Pzzz8EYQ7du3eDu7o4dO3bUmNvLr3Hr1q3552TEiBG4fv06kpOT8csvv6BJkybo0qVLlduw\nsbGp8rnMyMiAkZERmjdvXmMMwIsvWIcOHYqWLVuiSZMmWL58OR49eqT2cVXRJHZN3gcAcPLkSfTo\n0QPNmjVD06ZNceLECT7Od999F05OThgwYADatWuHdevWAQCcnJywYcMGrFixAra2tpg0aRIyMjK0\nys3QUeE3AOWzcoAXRWH58uXIzs7mf549e4YJEyagZcuWFYov8KJwVcXBwQEpKSkoLS2tcbyqtGrV\nCsnJyfxyXl4eHj16VOmPTm30798fhw4dqtW6rVq1wuPHjyv8EUlJSYG9vT2AF7/Ye/bsQWZmJpYs\nWYKxY8ciPz8frVu3xtSpUys8Z7m5uXjvvffUjqnusQMGDMDPP/+M+/fvw9nZGbNmzQIA2NraYuvW\nrUhPT0dISAjmzp3LT4OsysufllJSUvjn0szMDOPGjcOuXbuwa9cuTJs2rdpt9O/fHwcOHKh0+/79\n+9GvXz8YGxvD3Nwc+fn5/H2lpaXIzMzkl+fMmQNXV1fcvn0bT58+xaefflrjrJry90pV7xlNY6/t\n++D58+cYM2YM3nvvPTx8+BDZ2dkIDAzkP3VYWFjgiy++QFJSEo4ePYqvvvoKZ86cAQBMmjQJ586d\nw927d6FQKLBkyZJajSk3VPgNzKxZs7BlyxZERUWBMYa8vDwcP34cz549g4+PD0xMTLBx40YUFxfj\n8OHDuHz5cpXb6datG1q2bImlS5ciPz8fhYWFuHjxIoAXRSstLQ3FxcX8+uzF90EAXvzy7NixA9eu\nXcPz58/x/vvvo0ePHmjdunWVY9XUOlq5ciUuXryI9957Dw8ePAAA3L59G1OnTkVOTk6FdR0cHODj\n44Nly5bh+fPniI2Nxfbt2zFlyhQAL9oy5UWsSZMmUCgUMDY2xpQpU3Ds2DH8/PPPKC0tRWFhITiO\nq/BHsroYa3rsw4cPceTIEeTl5aFBgwYwNzeHsbExAODAgQNIS0sDAFhZWfFtoeqsXr0aBQUFiI+P\nR2hoKCZMmMDfN23aNOzYsQNHjx6ttlUCvGgXXbx4ER988AH/B+rbb7/Frl27sHbtWgDAq6++isLC\nQpw4cQLFxcVYvXo1nj9/zm/j2bNnUCqVaNy4MW7evInNmzdXO97L7wlbW1s8evSo0mtW29g1eR8U\nFRWhqKgINjY2MDIywsmTJ/Hzzz/z9//444+4ffs2GGOwtLSEsbExjI2NkZCQgDNnzuD58+do2LAh\nzMzM+NervqHCX8f9fU/K29sb27Ztw7x582BtbY327dvzPe4GDRrg8OHDCA0NRbNmzbB//36MGTOm\nyu0ZGxvj2LFjuH37Nlq3bg0HBwfs378fABAQEAA3Nze88soraNGiBf+48scGBARg1apVGDNmDFq1\naoU7d+5g79691cb88mP/rm3btoiIiEBycjLc3NxgZWWFsWPHomvXrrCwsKi0vbCwMCQnJ6NVq1YY\nPXo0PvnkE/j7+wMAfvrpJ7i7u0OpVOKf//wn9u7di4YNG8Le3h5HjhzBmjVr0KJFC7Ru3Rpffvll\nhWL/8hgvx1vTY8vKyvD111/Dzs4OzZo1w7lz5/hC+fvvv6NHjx5QKpUYMWIENm7cCJVKVe1r3Ldv\nXzg5OaF///5499130b9/f/7+Xr16wcjICN7e3jXOlXdycsL58+dx7do1qFQqNG3aFB9//DHOnj3L\nt1iaNGmC7777Dv/4xz9gb28PCwuLCtv84osvsGfPHlhaWuKNN97AxIkTKz03VT1Pzs7OmDRpEtq2\nbQtra2u+jVbb2DV5HyiVSmzcuBHjx4+HtbU1wsLCMGLECH5bt2/fxmuvvQalUgkfHx+89dZb6Nu3\nL54/f45ly5ahefPmaNmyJbKysvg/iPWOkN8cT58+nbVo0YK5u7vXuF5UVBQzNjZmhw4dEjIcQgxW\nQEAA+/777zV6TFpaGrOzs2MbN24UKKra0SZ2IixB9/inT5+O8PDwGtcpLS3FkiVLMGjQoDp9oBEh\nUrl8+TKuXLlSof1TG3Z2dggPD8fTp0/1diS4prSNnQjLRMiN+/r6VvgSsCrffvstxo4dW20vmpD6\nLCgoCEeOHMHGjRsrzGaqLXd3d7i7uwsQmXq6xk6EU6vCn5eXh9TUVCgUCtjb2+vtRUxPT8eRI0dw\n5swZXL58We1sEkLqm//85z9Sh6A1Q45d7qot/Lm5udi2bRv27t2LrKws2NragjGGBw8eoFmzZnj9\n9dcxa9Ys/osXbSxcuBCfffYZFApFhRkChBBCBFRd89/f359t3bqV3b9/v9J9GRkZLCQkhPn7+6v9\nEuHOnTvVfrnr6OjIVCoVU6lUzMLCgrVo0YIdOXKk0nrt2rVjAOiHfuiHfuhHgx9PT88qa6/g5+qp\nqfC/LDg4uNpZPYA8Tyn08ccfSx2C3skxJ8bkmZccc2JMnnlpm1N1tVNtj7+srAy7d+/GnTt38NFH\nHyElJQX3799Ht27d1D0UkyZNwq+//oqsrCw4ODhg5cqV/EFBs2fPVvt4uVP3xbchkmNOgDzzkmNO\ngDzz0ndOagv/3LlzYWRkhDNnzuCjjz6ChYUF5s6di99//13txstPNFUb6s5lQgghRD/UFv5Lly4h\nJiYGXl5eAABra+sKh/IT7QUHB0sdgt7JMSdAnnnJMSdAnnnpOyfF//eBqtW9e3dcvHgRXbp0QUxM\nDDIzMzFgwADExMToNZCalM/6IYQQUnvV1U61R+7Onz8fo0aNwsOHD/H++++jV69eWLZsmSBB1jcc\nx0kdgt7JMSdAnnnJMSdAnnnpOye1rZ4pU6bA29sbp0+fBgAcOXIELi4ueg2CEEKIeNS2eqZOnYqd\nO3eqvU1I1OohhBDNad3q+eOPPyosl5SUIDo6Wn+REUIIEVW1hX/NmjVQKpWIi4uDUqnkf1q0aIHh\nw4eLGaNsUS/ScMgxLznmBMgzL33nVG3hf//995Gbm4vFixcjNzeX/3n8+DE+++wzvQZBCCFEPGp7\n/ACQnZ2NxMREFBYW8rf16dNH0MBeRj1+QgjRXHW1U+2snm3btmHjxo1ITU2Fl5cXIiMj0bNnT/7i\nxYQQQgyL2i93v/nmG0RFRUGlUuHs2bOIiYlBkyZNxIhN9qgXaTjkmJcccwLkmZdoPf5yZmZmaNSo\nEQCgsLAQzs7OuHXrll6DIIQQIh61Pf6RI0dix44d+Oabb3D69Gk0bdoUJSUlOHHihFgxUo+fEEK0\nUF3trNWXu+U4jkNOTg4GDRoEU1NTvQZYEyr8hBCiOa0O4CopKYGzszO/7Ofnh+HDh4ta9OWMepGG\nQ455yTEnQJ55idrjNzExQYcOHXD37l29DkoIIUQ6als9vr6+iImJQbdu3WBubv7iQQoFjh49KkqA\n5eNRq4cQQjSj9Tz+VatWCRIQIYQQaWj05a5U5LrHz3Ec/Pz8pA5Dr+SYE1D387K0tEZubrYoYymV\nTZGT81iUsbRR118rbWibk9Z7/BYWFlAoFACAoqIiFBcXw8LCAjk5ORoHQQgRxouir+nOEQfAT4ux\nFBo/htQtGu3xl5WV4ejRo4iMjBT1RG1y3eMnRF9e7JyJ9TtCv4+GQi/z+Mt16tQJV69e1UtgtUGF\nn5CaUeEnVdH6QiyHDh3ifw4cOIClS5fyp3AguqH5xoZDnnlxUgcgCDm+VqJfc/fYsWN8j9/ExAQq\nlQpHjhyp1cZnzJiB48ePo0WLFoiLi6t0/+7du/H555+DMQalUonNmzfDw8NDwxQIIYRoQtBZPefO\nnYOFhQWmTZtWZeGPiIiAq6srmjRpgvDwcKxYsQKRkZGVg6RWDyE1olYPqYrWrZ6goCA8efKEX87O\nzsaMGTNqNaivry+aNm1a7f09e/bkT/HcvXt3pKWl1Wq7hBBCtKe28F+7dg1WVlb8ctOmTXHlyhW9\nB/L9998jMDBQ79uty6gXaTjkmRcndQCCkONrJXqPnzGGx48fw9raGgDw+PFjlJaW6jWIs2fPYvv2\n7bhw4YJet0sIIaQytYX/nXfeQc+ePTF+/HgwxnDgwAEsX75cbwHExsZi1qxZCA8Pr7EtFBwcDJVK\nBQCwsrJCp06d+CPZyv8a0rL0y35+fnUqHn0ul6sr8VQX31978n4CLVc8krSu5P/yka11OT4h338c\nxyE0NBQA+HpZlVp9uRsfH48zZ85AoVDA398frq6u6h7CS05OxrBhw6r8cjclJQX+/v7YtWsXevTo\nUX2Q9OUuITWiL3dJVbT+cjcyMhIODg6YP38+5s2bB3t7e1y6dKlWg06aNAk+Pj64desWHBwcsH37\ndoSEhCAkJAQA8MknnyA7Oxtz5syBl5cXunXrpmFahq3ynprhk2NOgFzz4qQOQBByfK30nZPaVs+b\nb76JmJgYftnc3LzSbdUJCwur8f5///vf+Pe//12LMAkhhOiL2lZPVadn8PDwQGxsrKCBvYxaPYTU\njFo9pCpat3ocHR2xceNGFBcXo6ioCN988w3atm0rSJCEEEKEp7bwb9myBRcuXICdnR3s7e0RGRmJ\nrVu3ihGb7FEv0nDIMy9O6gAEIcfXSvQev62tLfbt26fXQQkhhEhHbY+/oKAA33//Pa5fv47CwkL+\n9u3btwseXDnq8RNSM+rxk6po3eOfOnUqHjx4gPDwcPTt2xepqamwsLAQJEhCCCHCU1v4b9++jVWr\nVsHCwgJBQUE4ceJErefxk5pRL9JwyDMvTuoABCHH10rfOakt/KampgCAJk2aIC4uDk+ePEFmZqZe\ngyCEECIetT3+bdu2YcyYMYiLi0NwcDCePXuGVatW4c033xQrRurxE6IG9fhJVfR6zV2xUeEnpGZU\n+ElVtP5ylwiHepGGQ555cVIHIAg5vlai9/gJIYTIC7V6CJEBavWQqmjd6snLy8OqVaswa9YsAEBi\nYiJ+/PFH/UdICCFEFGoL//Tp02FqaoqLFy8CAFq1aqXXK3DVZ9SLNBzyzIuTOgBByPG1Er3Hn5SU\nhCVLlvDz+c3NzfUaACGEEHGpLfwNGzZEQUEBv5yUlISGDRsKGlR98fI1QuVCjjkBcs3LT+oABCHH\n10rfOak9O+eKFSswaNAgpKWlYfLkybhw4QJ/MV9CCCGGp8Y9/rKyMmRnZ+PQoUPYsWMHJk+ejN9/\n/x39+vUTKz5Zo16k+CwtraFQKET5sbS0ljpdNTipAxBEXX8PakPU8/EbGRnh888/x4QJEzB06FC9\nDkyIFHJzs6HdtEcOmrZGcnMVWoxDiPDUzuNfunQpbGxsMGHChApf7Fpbi7c3Q/P4ib7Idb67XPMi\nutH6XD0qler/31QVN/bnn3/qN8IaUOEn+iLXAinXvIhutD6AKzk5GXfu3KnwI2bRlzPqRRoSTuoA\nBMBJHYAg5PgeFH0ef1FREb755huMGTMGY8eOxbfffovi4uJabXzGjBmwtbVFx44dq11nwYIFaN++\nPTw9PRETE1P7yAkhhGhFbatn5syZKCkpQVBQEBhj2LlzJ0xMTPDvf/9b7cbPnTsHCwsLTJs2DXFx\ncZXuP3HiBDZt2sRf1evtt99GZGRk5SCp1UP0RK4tEbnmRXRTXe1UO4//8uXLiI2N5ZcDAgLg4eFR\nq0F9fX2RnJxc7f1Hjx5FUFAQAKB79+548uQJHjx4AFtb21ptnxBCiObUtnpMTExw+/ZtfjkpKQkm\nJmr/XtRKeno6HBwc+GV7e3ukpaXpZduGgHqRhoSTOgABcFIHIAg5vgdFnccPAOvXr4e/vz8cHR0B\nvPiyd8eOHXoL4O8fQ/4+g4gQQoh+qS38AQEBSEhIQEJCAgCgQ4cOejtXj52dHVJTU/nltLQ02NnZ\nVblucHAwVCoVAMDKygqdOnXiz19R/teQlqVf9vPzq1PxVLX8156upstQc3/V64uVn+bxabv8Ysy6\n8npWfn3rdny6vr41rc9xHH9KnfJ6WRW1X+6WlJTg+PHjSE5ORklJyYsHKRRYtGhRTQ/jJScnY9iw\nYWq/3I2MjMTChQvpy10iKLl+CSrXvIhutJ7HP2zYMPznP//B48eP8ezZMzx79gy5ubm1GnTSpEnw\n8fHBrVu34ODggO3btyMkJAQhISEAgMDAQLRt2xZOTk6YPXs2vvvuOw3TMmyV99QMnxxzeoGTOgAB\ncFIHIAg5vgf1nZPaVk96enqFWT2aCAsLU7vOpk2btNo2IYQQ7aht9SxevBivvfYaBg4cKFZMlVCr\nh+iLXFsics2L6Ebrefw+Pj4YNWoUysrK0KBBA35jOTk5+o+SEEKI4NT2+BctWoTIyEjk5+cjNzcX\nubm5VPT1hHqRhoSTOgABcFIHIAg5vgf1nZPawt+6dWu4ubnByEjtqoQQQgyA2h5/UFAQ7ty5g8GD\nB/MXXNdkOqc+UI+f6Itce+FyzYvoRusev6OjIxwdHVFUVISioiJBgiOEECIetXv8dYFc9/hfPrpQ\nLup6TtrvGXPQ9NKLdX+Pn4PmOQF1fY+/rr8HtaFtTlofwEUIIUReaI+f1Cty7YXLNS+iG9rjJ4QQ\nAqAWhf/dd99FTk4OiouLERAQABsbG+zcuVOM2GSP5hsbEk7qAATASR2AIOT4HhR9Hv/PP/8MS0tL\n/Pjjj1CpVEhKSsL69ev1GgQhhBDxqO3xu7m5IT4+HjNnzsTYsWMxePBgeHp64tq1a2LFSD1+ojdy\n7YXLNS+iG63n8Q8bNgzOzs4wMzPD5s2b8fDhQ5iZmQkSJCGEEOGpbfV89tlnuHDhAqKjo2Fqagpz\nc3McOXJEjNhkr673Ii0traFQKAT/sbS0ljrVWuCkDkAAnNQBCKKu/15pQ7Tz8Z8+fRoBAQE4dOgQ\nfx3c8o8MCoUCo0eP1msgpO7Jzc2GGAcF5ebSdZYJEVO1Pf6PP/4YK1euRHBwcJUXQNfnBdfVoR6/\nNMTrG1MvXOeRZJoX0U11tZMO4CLVosKv82iUF5EUHcBVB8mxFynXvrE88+KkDkAQcvy9En0ePyGE\nEHmpsdVTVlaGyMhI+Pj4iBlTJdTqkQa1enQejfIiktKq1WNkZIS5c+cKFhQhhBDxqW319O/fHwcP\nHqS/8AKQYy9Srn1jeebFSR2AIOT4eyV6j3/Lli0YP348TE1NoVQqoVQqYWlpWauNh4eHw9nZGe3b\nt8e6desq3Z+VlYVBgwahU6dOcHd3R2hoqMYJEEII0Yxg0zlLS0vRoUMHnDp1CnZ2dujatSvCwsLg\n4uLCr7NixQo8f/4ca9euRVZWFjp06IAHDx7AxKTicWXU45cG9fh1Ho3yIpLSejpnWVkZdu7ciU8+\n+QQAkJKSgqioKLUDRkVFwcnJCSqVCg0aNMDEiRMrneqhZcuWyMnJAQDk5OSgWbNmlYo+IYQQ/VJb\n+OfOnYuIiAjs2bMHAGBhYVGrL3zT09Ph4ODAL9vb2yM9Pb3COrNmzUJ8fDxatWoFT09PfPPNN5rG\nb9Dk2IuUa99YnnlxUgcgCDn+Xone47906RK+++47NGrUCABgbW2N4uJitRuu6jQPf7dmzRp06tQJ\n9+7dw9WrV/HWW28hNze3FmETQgjRltq+iqmpKUpLS/nlzMxMGBmpP+7Lzs4Oqamp/HJqairs7e0r\nrHPx4kUsX74cANCuXTs4Ojri1q1b6NKlS6XtBQcHQ6VSAQCsrKzQqVMn/qrz5X8NaVm/y38pX/ar\nxbKfhuuDH1Os/DSPr3wZau6vev26/Xppsyzu66X561u349P19a1pfY7j+Eky5fWyKmq/3N21axf2\n79+P6OhoBAUF4eDBg1i9ejXGjx9f08NQUlKCDh064PTp02jVqhW6detW6cvdRYsWoUmTJvj444/x\n4MEDeHt7IzY2FtbWFU/TS1/uSoO+3NV5NMqLSEqnk7TduHEDp0+fBgAEBARUKN41OXnyJBYuXIjS\n0lLMnDkTy5YtQ0hICABg9uzZyMrKwvTp05GSkoKysjIsW7YMkydPrnXwhu7lvZK6SLtiwkHT0zIb\nRoHkIL+8OGieE1DXC39d/73ShrY5aX0Frg8++AB9+/bF9OnTYW5urtGggwcPxuDBgyvcNnv2bP7/\nNjY2OHZvRqVBAAAgAElEQVTsmEbbJIQQohu1e/zbt2/HuXPnEBkZCaVSCV9fX/j6+mLkyJFixSjb\nPf66jlo9Oo9GeRFJ6Xw+/vv372Pfvn344osvkJ2djWfPnuk9yOpQ4ZcGFX6dR6O8iKS0PoBr5syZ\n8PHxwZw5c1BSUoJDhw4hOztbkCDrm8qzMeSAkzoAgXBSByAATuoABCHH3yt956S28D9+/BglJSWw\nsrKCtbU1bGxs0KBBA70GQQghRDy1bvXcuHED4eHh2LBhA0pLS5GWliZ0bDxq9UiDWj06j0Z5EUlp\nPavn2LFjOHfuHM6dO4cnT57A398fvr6+ggRJCCFEeGpbPeHh4fD29sahQ4dw48YN7NixAzNmzBAj\nNtmTYy9Srn1jeebFSR1AjSwtraFQKET5sbS0Vh+QhPRdK9Tu8f/rX//C/fv3cfnyZVy5cgXdunVD\nixYt9BoEIYT8XW5uNsQ62C43V/25xeREbY9///79ePfdd9G3b18wxnDu3DmsX78e48aNEytG6vFL\nhHr8Oo9GeekyigxzEpvW8/g9PDxw6tQpfi8/MzMTAQEBiI2NFSbSKlDhlwYVfp1Ho7x0GUWGOYlN\n63n8jDE0b96cX27WrJksnyApUI/fkHBSByAATuoABMJJHYDeid7jHzRoEAYOHIjJkyeDMYZ9+/ZV\nOv8OIYQQw6G21cMYw+HDh3H+/HkoFAr4+vpi1KhRYsUHgFo9UqFWj86jUV66jCLDnMSm87l6pESF\nXxpU+HUejfLSZRQZ5iQ2jXv8FhYWUCqVVf5YWloKGmx9QT1+Q8JJHYAAOKkDEAgndQB6J1qPX8yz\nbxJCCBFPta2e3NxcKJXKGh9cm3X0gVo90qBWj86jUV66jCLDnMSm8bl6Ro0ahQ4dOmDEiBHo0qUL\nfx3cR48e4ffff8f//vc/JCYm4tSpU8JFTQghRO+q7fGfOnUKY8aMwf79+9GrVy80adIETZo0Qe/e\nvXHw4EFMmDCBir6OqMdvSDipAxAAJ3UAAuGkDkDvRJ3H7+/vD39/f70OSAghRFo0nZNUi3r8Oo9G\neekyigxzEpvWp2wghBAiL1T4JUQ9fkPCSR2AADipAxAIJ3UAeif6NXdv376NwsJCAMDZs2exceNG\nPHnypFYbDw8Ph7OzM9q3b49169ZVuQ7HcfDy8oK7uzv8/PxqHzkhhBCtqO3xe3p6Ijo6GsnJyQgM\nDMSIESMQHx+PEydO1Ljh0tJSdOjQAadOnYKdnR26du2KsLAwuLi48Os8efIEvXr1wk8//QR7e3tk\nZWXBxsamcpDU45cE9fh1Ho3y0mUUGeYkNq17/EZGRjAxMcHhw4cxf/58rF+/HhkZGWoHjIqKgpOT\nE1QqFRo0aICJEyfiyJEjFdbZs2cPxowZA3t7ewCosugTQgjRL7WF39TUFHv27MF///tfDB06FIwx\nFBcXq91weno6HBwc+GV7e3ukp6dXWCcxMRGPHz9Gv3790KVLF+zcuVOLFAwX9fgNCSd1AALgpA5A\nIJzUAeid6Ofj3759O0JCQrB8+XI4Ojrizp07mDp1qtoNv/iYVrPi4mJcuXIFp0+fRn5+Pnr27Ike\nPXqgffv2tYueEEKIxtQWfjc3N2zcuBEA8PjxY+Tm5mLJkiVqN2xnZ4fU1FR+OTU1lW/plHNwcICN\njQ0aNWqERo0aoU+fPrh27VqVhT84OBgqlQoAYGVlhU6dOvFfBpf/NaRl/S7/pXzZrxbLfhquD35M\nsfLTPL7yZai5v+r16/brpc2yOK/XXzSPr+IF12v7eOgUr1jPR03rcxyH0NBQAODrZZWYGn369GFP\nnz5ljx49YiqVinXt2pUtXLhQ3cNYcXExa9u2Lbtz5w57/vw58/T0ZNevX6+wzo0bN1hAQAArKSlh\neXl5zN3dncXHx1faVi3CJAIAwAAmwo94r694OVFelJP0qstLbY//6dOnsLS0xOHDhzFt2jRERUXV\n6hw9JiYm2LRpEwYOHAhXV1dMmDABLi4uCAkJQUhICADA2dkZgwYNgoeHB7p3745Zs2bB1dVV7bbl\ngnr8hoSTOgABcFIHIBBO6gD0TvQef2lpKTIyMrB//36sXr0aQO369wAwePDgStfnnT17doXlxYsX\nY/HixbWNlxBCiI7U7vF/9NFHGDhwINq1a4du3bohKSmJvnzVE3kesOYndQAC8ZM6AAH4SR2AQPyk\nDkDv9F0r6CRtpFp0AJfOo1Feuowiw5zEpvUBXLdu3UJAQADc3NwAALGxsXzLh+iGevyGhJM6AAFw\nUgcgEE7qAPRO9HP1zJo1C2vWrIGpqSkAoGPHjggLC9NrEIQQQsSjtvDn5+eje/fu/LJCoUCDBg0E\nDaq+oB6/IfGTOgAB+EkdgED8pA5A7/RdK9QW/ubNm+P27dv88sGDB9GyZUu9BkEIIUQ8agv/pk2b\nMHv2bNy8eROtWrXC119/jc2bN4sRm+xRj9+QcFIHIABO6gAEwkkdgN6JPo+/Xbt2OH36NPLy8lBW\nVgalUqnXAAghhIhL7XTOwsJCHDp0CMnJySgtLQVjDAqFAh999JFYMdJ0TonQdE6dR6O8dBlFhjmJ\nrbraqXaPf8SIEbCysoK3tzfMzMwECY4QQoh41Bb+9PR0/PTTT2LEUu+8fIZD+eAgx1kV8syLg/xy\nAuSYl75rhdovd318fBAbG6u3AQkhhEhLbY/fxcUFt2/fhqOjIxo2bPjiQQqFqH8MqMcvDerx6zwa\n5aXLKDLMSWxa9/hPnjwpSECEEEKkobbVo1KpkJqairNnz0KlUsHc3FyWfxmlQPP4DQkndQAC4KQO\nQCCc1AHonejn6lmxYgU+//xzrF27FgBQVFSEKVOm6DUIQggh4lHb4/f09ERMTAy8vb0RExMDAPDw\n8KAefz1APX6dR6O8dBlFhjmJTevTMjds2BBGRn+tlpeXp9/ICCGEiEpt4R83bhxmz56NJ0+eYOvW\nrQgICMA//vEPMWKTPerxGxJO6gAEwEkdgEA4qQPQO9HP1fPuu+/i559/hlKpREJCAlatWoXXXntN\nr0EQQggRj9oef15eHszMzGBsbIxbt27h1q1bGDx4sKjn5K/rPX5LS2vk5maLMpZS2RQ5OY9FGYt6\n/DqPRnnpMooMcxJbdbVTbeHv3Lkzzp8/j+zsbPTq1Qtdu3aFqakpdu/eLViwf1fXC79c36BU+HUe\njfLSZRQZ5iQ2rb/cZYyhcePGOHz4MObOnYsDBw7gjz/+ECTI+oeTOgABcFIHIBBO6gAEwEkdgEA4\nqQPQO9Hn8QNAREQEdu/ejSFDhgAAysrKarXx8PBwODs7o3379li3bl21612+fBkmJiY4fPhwrbZL\nCCFEe2oL/4YNG7B27VqMGjUKbm5uSEpKQr9+/dRuuLS0FPPmzUN4eDiuX7+OsLAw3Lhxo8r1lixZ\ngkGDBsnyo1bN/KQOQAB+UgcgED+pAxCAn9QBCMRP6gD0Tt9n8VXb4y+Xm5sLhUIBCwuLWm04IiIC\nK1euRHh4OADgs88+AwAsXbq0wnobNmyAqakpLl++jKFDh2LMmDGVg6Qe/8ujybBvLMecAMpLx1Fk\nmJPYtO7xx8XFwcvLC25ubnB1dYW3t3etevzp6elwcHDgl+3t7ZGenl5pnSNHjmDOnDl8kPULJ3UA\nAuCkDkAgnNQBCICTOgCBcFIHoHei9/jfeOMNfPXVV0hJSUFKSgq+/PJLvPHGG2o3XJsivnDhQnz2\n2Wf8XyU5/sUlhJC6Ru0BXPn5+RV6+n5+frU6bYOdnR1SU1P55dTUVNjb21dYJzo6GhMnTgQAZGVl\n4eTJk2jQoAGGDx9eaXvBwcFQqVQAACsrK3Tq1Inve5X/NZRq+QUOf/UWuf//V5hlsfL7iybx+Wm4\nPvgxxXu9NI2vfBlq7q96/br9emmzLM7r9RfN49Pu9xE6xSvW81HT+hzHITQ0FAD4elkVtT3+kSNH\nwtvbG1OnTgVjDLt370Z0dDR++OGHmh6GkpISdOjQAadPn0arVq3QrVs3hIWFwcXFpcr1p0+fjmHD\nhmH06NGVg6Qe/8ujybBvLMecAMpLx1FkmJPYtO7xb9++HQ8fPsTo0aMxZswYZGZmYvv27WoHNDEx\nwaZNmzBw4EC4urpiwoQJcHFxQUhICEJCQrTLQnY4qQMQACd1AALhpA5AAJzUAQiEkzoAvdN3j7/W\ns3qkJN89fg6aTz2r63uRHOSXEyDPvDhoN/Wxru/xc6jLr5U2tL3YusanbBg2bFiNGzt69KjGQWhL\nvoVfq9HqeDHRaiQZ5gRQXjqOIsOcxKbxNXffeeed6v9a1Ltpl4QQIh/VFn5XV1dkZmbCzc2twu3x\n8fFo3ry54IHVDxzkd5QhB/nlBMgzLw7yywmQY17atnqqU+2Xu/Pnz0dWVlal2x89eoSFCxfqLQBC\nCCHiqrbH7+3tjejo6Cof5Obmhvj4eEEDexn1+CuMJsO+sRxzAigvHUeRYU5i03g6Z25ubrUbKy4u\n1k9UhBBCRFdt4XdycsLx48cr3X7ixAm0a9dO0KDqD07qAATASR2AQDipAxAAJ3UAAuGkDkDvRLvm\n7oYNGzB06FAcOHAA3t7eYIwhOjoaFy9exI8//qjXIAghhIinxgO4CgsLsWfPHr6f7+bmhsmTJ8PM\nzEy0AAHq8f9tNBn2jeWYE0B56TiKDHMSm9bX3K0LqPBXGE2GxUSOOQGUl46jyDAnsWl9rh4iJE7q\nAATASR2AQDipAxAAJ3UAAuGkDkDvJLnmLiGEEPmgVo8eyPUjKbV6dB6N8tJlFBnmJDaNz9VT7vz5\n81i5ciWSk5NRUlLCb+zPP//Uf5SEEEIEp3aPv0OHDtiwYQM6d+4MY2Nj/nYbGxvBgysn3z1+DnX5\n9LF0WuaXcZBfXhzotMz8aHW6xuj7tMxq9/itrKwwePBgjQckhBBSN6nd41+6dClKS0sxevRoNGzY\nkL+9c+fOggdXTr57/FqNVsf3IrUaSYY5AZSXjqPIMCexaT2P38/Pr8rz7589e1Z/0alBhb/CaDIs\nJnLMCaC8dBxFhjmJjQ7gEpBce5HU438ZB/nlxYF6/PxodbrG6LvHr3Ye/5MnT/DPf/4T3t7e8Pb2\nxjvvvIOnT59qHAAhhJC6Qe0e/+jRo9GxY0cEBQWBMYadO3ciNjYWhw8fFitGGe/xazVaHd+L1Gok\nGeYEUF46jiLDnMSmdavH09MT165dU3ubkKjwVxhNhsVEjjkBlJeOo8gwJ7Fp3epp1KgRzp07xy+f\nP38ejRs31m909RYndQAC4KQOQCCc1AEIgJM6AIFwUgegd6Kfq2fLli1466230KZNG7Rp0wbz5s3D\nli1baj1AeHg4nJ2d0b59e6xbt67S/bt374anpyc8PDzQq1cvxMbGapYBIYQQjdR6Vk9OTg4AwNLS\nstYbLy0tRYcOHXDq1CnY2dmha9euCAsLg4uLC79OREQEXF1d0aRJE4SHh2PFihWIjIysGCS1el4e\nTYbtAznmBFBeOo4iw5zEpvGRuzt37sTUqVPx5ZdfVpjHzxiDQqHAokWL1A4aFRUFJycnqFQqAMDE\niRNx5MiRCoW/Z8+e/P+7d++OtLS0WiVECCFEO9W2evLz8wG8uOj6yz/Pnj2r8ULsL0tPT4eDgwO/\nbG9vj/T09GrX//777xEYGFjb2GWAkzoAAXBSByAQTuoABMBJHYBAOKkD0DvRrrk7e/ZsAED//v3R\nu3fvCvedP3++Vhuv6ojf6pw9exbbt2/HhQsXqrw/ODiY/+RgZWWFTp068Qc0lD8pUi2/wOGvg0a4\n//9X3TLU3F/1slj5aRuf5ssVD1AR/vXSJt6rGq7/l7r7emm6fvmyOK+X9vFd1XD9iuNJXU+qWr56\n9Wqt1uc4DqGhoQDA18uqqO3xe3l5ISYmpsJtnTt3xpUrV2p6GAAgMjISK1asQHh4OABg7dq1MDIy\nwpIlSyqsFxsbi9GjRyM8PBxOTk6Vg6Qe/8ujybBvLMecAMpLx1FkmJPYNO7xR0RE4OLFi8jMzMRX\nX33FPzg3NxelpaW1GrRLly5ITExEcnIyWrVqhX379iEsLKzCOikpKRg9ejR27dpVZdEnhBCiX9X2\n+IuKivgiX97bf/bsGSwtLXHw4MFabdzExASbNm3CwIED4erqigkTJsDFxQUhISEICQkBAHzyySfI\nzs7GnDlz4OXlhW7duuknM4PASR2AADipAxAIJ3UAAuCkDkAgnNQB6J2+e/xqWz13795FmzZt9Dqo\npuTb6uFQl08mRSdpexkH+eXFgU7Sxo9Wp2uMvk/Sprbwv/baazhw4ACsrKwAAI8fP8akSZPw008/\naRyEtuRb+LUarY4XE61GkmFOAOWl4ygyzElsWl+BKzMzky/6AGBtbY0HDx7oNzpCCKknLC2tkZub\nLWkMak/ZYGxsjLt37/LLycnJMDJS+zBSK5zUAQiAkzoAgXBSByAATuoABMJJHUCNXhR9puHPWS0e\nU/0nGLV7/J9++il8fX3Rp08fAMBvv/2GrVu3aporIYSQOqJW5+rJzMxEZGQkFAoFevToARsbGzFi\n41GPv8JoMuwbyzEngPLScRQZ5gTUjbyqLfw3btyAi4sLoqOjKxTe8qNx6WLrf6kLL6QgI1Hh13U0\nykuXUWSYE1A38qq28M+aNQvbtm2ji63XglynndF0zpdxkF9eHGg6Jz9avXqtqu3xb9u27cVwej5w\ngBBCiLSq3eM/dOhQjSdZGz16tGBB/Z189/i1Gq2O75loNZIMcwIoLx1HkWFOQN3Iq9o9/mPHjkGh\nUODhw4e4ePEi/P39Abxo8fj4+Iha+AkhhOhPtRPyQ0NDsWPHDhQVFeH69es4dOgQDh06hPj4eBQV\nFYkZo4xxUgcgAE7qAATCSR2AADipAxAIJ3UAAuD0ujW1R2KlpqbilVde4ZdtbW2RkpKi1yAIIYSI\nR+08/nnz5iEhIQGTJ08GYwz79u1D+/bt8e2334oVI/X4K44mw16kHHMCKC8dR5FhTkDdyEtt4WeM\n4YcffsC5c+cAAH369MGoUaOEibEaVPgrjCbDN6gccwIoLx1HkWFOQN3IS+0pGxQKBTp37gylUonX\nXnsN+fn5yM3NhVKpFCTM+oWDdnNz6zIO8ssJkGdeHOSXEyDPvDjoMye1Pf6tW7di3LhxePPNNwEA\naWlpGDlypN4CIIQQIi61rR5PT09ERUWhR48e/LV3O3bsiLi4OFECBKjV87fRZPiRVI45AZSXjqPI\nMCegbuSldo+/YcOGaNiwIb9cUlJS44FdhBBC6ja1hb9v37749NNPkZ+fj19++QXjxo3DsGHDxIit\nHuCkDkAAnNQBCISTOgABcFIHIBBO6gAEwOl1a2oL/7p169C8eXN07NgRISEhCAwMxOrVq/UaBCGE\nEPHU2OMvKSmBu7s7bt68KWZMlVCPv8JoMuxFyjEngPLScRQZ5gTUjbxq3OM3MTFBhw4dKlx6kRBC\niGFT2+p5/Pgx3Nzc4O/vj2HDhmHYsGEYPnx4rTYeHh4OZ2dntG/fHuvWratynQULFqB9+/bw9PTk\nZw3VH5zUAQiAkzoAgXBSByAATuoABMJJHYAAOL1uTe0BXOX9/Jc/LtRmVk9paSnmzZuHU6dOwc7O\nDl27dsXw4cPh4uLCr3PixAncvn0biYmJuHTpEubMmYPIyEht8jBQVyG/A03kmBMgz7zkmBMgz7z0\nm1O1hb+goABbtmzB7du34eHhgRkzZqBBgwa13nBUVBScnJygUqkAABMnTsSRI0cqFP6jR48iKCgI\nANC9e3c8efIEDx48gK2trZbpGJonUgcgADnmBMgzLznmBMgzL/3mVG2rJygoCNHR0fDw8MCJEyew\nePFijTacnp4OBwcHftne3h7p6elq10lLS9NoHEIIIZqpdo//xo0b/NG5M2fORNeuXTXacG0P8vr7\nN8716+CwZKkDEECy1AEIJFnqAASQLHUAAkmWOgABJOt1a9UWfhMTkyr/X1t2dnZITU3ll1NTU2Fv\nb1/jOmlpabCzs6u0LU9PTwP4g6BtfP/RfCRRnwttxpJjToA889I8J0DMvOi1+ovmOXl6elZ5e7UV\nPTY2tsIZOAsKCvhlhUKBnJycGgfs0qULEhMTkZycjFatWmHfvn0ICwursM7w4cOxadMmTJw4EZGR\nkbCysqqyv3/16tUaxyKEEFJ71Rb+0tJS3TZsYoJNmzZh4MCBKC0txcyZM+Hi4oKQkBAAwOzZsxEY\nGIgTJ07AyckJ5ubm2LFjh05jEkIIUU/t2TkJIYTIi9oDuAipTxITE3H+/PlKt58/fx5JSUkSRESI\n/lHhF1lRURGuXbuGuLg4FBUVSR2OzgoLC2t1m6FYuHAhLC0tK91uaWmJhQsXShCRMLKysur0+a90\nUVBQgAMHDkgdhl6lpKRg/fr1etseFX4RHT9+HO3atcOCBQswb948tGvXDidOnJA6LJ34+PjU6jZD\n8eDBA3h4eFS63cPDA3fu3JEgIt1FRETAz88Po0ePxpUrV+Du7g53d3e0aNECJ0+elDo8vSgtLcXx\n48cxZcoUqFQq7N27V+qQdPbw4UP861//Qu/eveHn54f79+/rbduaz9MkWlu0aBHOnj0LJycnAEBS\nUhICAwMRGBgocWSay8jIwL1795Cfn48rV66AMcbP9srPz5c6PK09eVL9EZKG+klm3rx5WLt2LZ4+\nfQp/f3+Eh4ejR48euHnzJiZOnIjBgwdLHaJWGGP49ddfERYWhhMnTqB79+44d+4c7ty5g8aNG0sd\nnlZycnJw+PBhhIWF4fbt2xg5ciTu3LlT6eBXXVHhF5GlpSVf9AGgbdu2VbYVDMHPP/+M0NBQpKen\n45133uFvVyqVWLNmjYSR6aZLly7YunUr3njjjQq3b9u2Dd7e3hJFpZvS0lIMGDAAAPDRRx+hR48e\nAABnZ2cDOD6meg4ODnB1dcWMGTPw1VdfwdzcHI6OjgZb9AHA1tYWr732GlauXMm/TocPH9b7OFT4\nRXDo0CEAL4pKYGAgxo8fDwA4cOAAunTpImVoWgsKCsKUKVOwd+9evP7661KHozcbNmzAqFGjsHv3\nbr7QR0dH4/nz5/jhhx8kjk47Lxd3MzMzCSPRr7Fjx+Lo0aPYt28fAMjiyoBr165FWFgY5s6di/Hj\nx2PcuHGCjEPTOUUQHBzM//KVt0Re/r8hH7/g7e2N6OhoqcPQK8YYzp49iz/++AMKhYI/LbmhMjY2\n5veCCwoK0KhRI/6+goIClJSUSBWazsrKysBxHMLCwnDy5Ek8efIE33//PYYMGQILCwupw9NaUlIS\n9u7di7179yIxMRErV67EqFGj8Oqrr+pl+1T4iU6WLl0KGxsbTJgwAebm5vzt1tbWEkZF6qPi4mL8\n9NNPCAsLw08//YSsrCypQ9JYSkoKWrduXeG2uLg4hIWFYd++fXqbUkyFX0TTp0+vsFy+5799+3Yp\nwtELlUpVZZ/YUGfAEMPxv//9D2lpaZg3bx4AoFu3bsjMzAQArFy5EtOmTZMyPK14eXnxF6QaM2YM\n3ybWN+rxi2jIkCF8kSwoKMAPP/yAVq1aSRyVbpKTk6UOgdRTn3/+eYVpm0VFRfj999+Rl5eH4OBg\ngyz8L/vzzz8F2zYVfhGNHTu2wvLkyZPRq1cviaLRnz/++APXr1+vMN3R0H/pSN1XVFRUoS3Su3dv\nNGvWDM2aNUNeXp6EkdV9VPgllJCQwH80NVQrVqzAr7/+ivj4eAwZMgQnT55E7969qfATwWVnZ1dY\n3rRpE/9/Q/29evmsyC+fERmo3VmRa4sKv4gsLCz4Vo9CoYCtrW21F6E3FAcPHsS1a9fQuXNn7Nix\nAw8ePJDV9E5Sd3Xv3r3KYy62bNmC7t27SxSVbnQ9K3JtUeEX0bNnz6QOQe8aNWoEY2NjmJiY4OnT\np2jRokWFi+sQIpSvv/4aI0eOxJ49e9C5c2cAwJUrV1BYWIj//e9/EkdXt1HhF1FAQABOnz6t9jZD\n0qVLF2RnZ2PWrFno0qULzM3NDfpcPcRw2Nra4uLFizhz5gzi4+OhUCgwdOhQgz7mQiw0nVMEBQUF\nyM/PR79+/cBxXIXz2gwaNAg3b96UOkS9uHPnDnJycqq93BshpG6gPX4RbN26FRs2bMC9e/cqnO9F\nqVTyc5AN1cufWBwdHSvdRgipe6jwi6Bnz54YN24cDh48iAULFiA0NBSHDh2CSqXC5MmTpQ5PK+Wf\nYjIzM/H48eMKn2L0fSZBQoh+UatHBF5eXjh9+jSsra3x22+/YcKECdi0aRNiYmJw8+ZNHDx4UOoQ\nNfbNN9/wn2JePghNqVTijTfeMPhPMoTIGV2IRQRlZWX8uWv27duH2bNnY8yYMVi9ejUSExMljk47\nPXv2xIULF7B+/XrcuXMHH3/8Mdzd3dG3b1+D/RRDSH1BhV8EpaWlKC4uBgCcOnUK/fr14+8z1DMj\nzp49G2ZmZliwYAF+++03LFu2DMHBwWjSpEmledWEkLqFevwimDRpEvr27QsbGxs0btwYvr6+AF5c\n2NvKykri6LRT3aeYMWPG0KweQuo4KvwiWL58Ofz9/XH//n0MGDAARkYvPmgxxvDtt99KHJ12yj/F\nNGjQAKdOncLWrVv5+wz1Uwwh9QUVfpH07Nmz0m36uqiCFOT4KYaQ+oJm9RCtRURE8J9iyi/CkpCQ\ngGfPnvGH0BNC6h4q/IQQUs/QrB5CCKlnqPATQkg9Q4WfEELqGSr8xOClpaVhxIgRePXVV+Hk5ISF\nCxfyB8xpw8/PD1euXAHw4jrJOTk5ePr0KTZv3qzxthISEhAYGIhXX30V3t7emDBhAh4+fAiO4zBs\n2DCtYyREF1T4iUFjjGH06NEYPXo0EhIS+FlFy5cv13qb5VdJA4Djx4/D0tIS2dnZ+O677zTaTmFh\nIYYOHYq33noLCQkJiI6Oxty5c5GZmVlhDELERoWfGLQzZ86gUaNGCAoKAgAYGRnh66+/xvbt21FQ\nUHSYSAQAAAQISURBVIDQ0FDMnz+fX3/o0KH49ddfAQBz585F165d4e7ujhUrVlS5fZVKhUePHmHp\n0qVISkqCl5cX3nvvPQQFBeHIkSP8eq+//jqOHj1a4bF79uyBj48PhgwZwt/Wt29fuLm54eXJdFFR\nUfDx8UHnzp3Rq1cvJCQkAADi4+PRvXt3eHl5wdPTE0lJScjLy8OQIUPQqVMndOzYEfv379ftCST1\nEh3ARQxafHx8hWscAC/OENq6dWskJSVV2rNWKBT8bZ9++imaNm2K0tJS9O/fH3FxcejYsWOV669b\ntw7x8fGIiYkBAPz222/4+uuvMWLECDx9+hQRERHYuXOn2tiq4uLignPnzsHY2BinTp3C+++/j4MH\nD2LLli14++23MXnyZJSUlKCkpATHjx+HnZ0djh8/DgB6u/g2qV9oj58YtJpaJupOHbFv3z54e3uj\nc+fOiI+Px40bN6pd9++Hu/Tp0weJiYnIyspCWFgYxo4dy5+Ko6bHVeXJkycYO3YsOnbsiEWLFuH6\n9esAAB8fH6xZswaff/45kpOTYWZmBg8PD/zyyy9YunQpzp8/D0tLS7XbJ+TvqPATg+bq6oro6OgK\nt+Xk5CA1NRXt27eHiYkJysrK+PsKCwsBvLhM5JdffokzZ87g2rVrGDJkCH9fbU2bNg07d+5EaGgo\nZsyYUel+Nze3SrFV5cMPP0RAQADi4uJw7NgxFBQUAHhxWoxjx46hUaNGCAwMxNmzZ9G+fXvExMSg\nY8eO+OCDD7Bq1SqNYiYEoMJPDFxAQADy8/P5NktpaSneeecdTJ48Gebm5lCpVLh69SoYY0hNTUVU\nVBQAIDc3F+bm5rC0tMSDBw9w8uTJGsdRKpXIzc2tcFtwcDA2bNgAhUIBZ2fnSo+ZPHkyLl68iBMn\nTvC3/fbbb4iPj6+wXk5ODn8xmx07dvC3//nnn3B0dMT8+fMxYsQIxMbGIiMjA2ZmZnj99dexePFi\nfvYRIZqgwk8M3g8//ICDBw/i1VdfhY2NDXJycvDFF18AAHr16gVHR0e4urri7bff5nvuHh4e8PLy\ngrOzM15//XX07t27xjGaNWuGXr16oWPHjliyZAkAoEWLFnB1dcX06dOrfIyZmRl+/PFHfPvtt3j1\n1Vfh5uaGLVu2oHnz5hW+a3jvvfewbNkydO7cGaWlpfzt+/fvh7u7O7y8vBAfH4+goCDExcXxX/iu\nWrUKH374oV6eQ1K/0Ll6iKxERERg1qxZOHDgAFxcXAQdKz8/Hx4eHoiJiYFSqRR0LEL0ifb4iaz0\n7NkTf/zxh+BF/9SpU3B1dcWCBQuo6BODQ3v8hBBSz9AePyGE1DNU+AkhpJ6hwk8IIfUMFX5CCKln\nqPATQkg9Q4WfEELqmf8DUOtcX4rBJg4AAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "#Making a model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we test the model-making feature. We will use our dataset of 'standards' from above, using a random 80% for training and 20% for testing." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from wikiclass.models import RFTextModel\n", "from wikiclass import assessments" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 27 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Divvyig up our data into two lists." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import random\n", "\n", "train_set = list()\n", "test_set = list()\n", "for actual, text_list in standards.items():\n", " for text in text_list:\n", " if random.randint(0,9) >= 8:\n", " test_set.append( (text, actual) )\n", " else:\n", " train_set.append( (text, actual) )\n", "\n", "print(len(test_set)/len(train_set))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.2510772571506124\n" ] } ], "prompt_number": 28 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And the next step is quite simple, we just click a button supplying our train_set list, and test by supplying our test_set list. Also the package conveniently supplies a saving function for us to store our model for later use." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Train a model\n", "model = RFTextModel.train(\n", " train_set,\n", " assessments=assessments.WP10\n", ")\n", "\n", "# Run the test set & print the results\n", "results = model.test(test_set)\n", "print(results)\n", "\n", "# Write the model to disk for reuse.\n", "model.to_file(open(\"36K_random_enwiki.rf_text.model\", \"wb\"))" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "pred assessment B C FA GA Start Stub\n", "real assessment \n", "B 130 29 1 5 105 40\n", "C 34 112 0 2 151 33\n", "FA 7 3 4 0 1 0\n", "GA 8 8 0 11 9 1\n", "Start 80 87 0 2 1420 525\n", "Stub 40 32 0 0 547 3973\n" ] } ], "prompt_number": 29 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now to look at accuracy, we norm the DataFrame row-wise." ] }, { "cell_type": "code", "collapsed": false, "input": [ "norm_results = results.apply(lambda col: col / col.sum(), axis=1)\n", "norm_results" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
pred assessmentBCFAGAStartStub
real assessment
B 0.419355 0.093548 0.003226 0.016129 0.338710 0.129032
C 0.102410 0.337349 0.000000 0.006024 0.454819 0.099398
FA 0.466667 0.200000 0.266667 0.000000 0.066667 0.000000
GA 0.216216 0.216216 0.000000 0.297297 0.243243 0.027027
Start 0.037843 0.041154 0.000000 0.000946 0.671712 0.248344
Stub 0.008711 0.006969 0.000000 0.000000 0.119120 0.865200
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "pred assessment B C FA GA Start Stub\n", "real assessment \n", "B 0.419355 0.093548 0.003226 0.016129 0.338710 0.129032\n", "C 0.102410 0.337349 0.000000 0.006024 0.454819 0.099398\n", "FA 0.466667 0.200000 0.266667 0.000000 0.066667 0.000000\n", "GA 0.216216 0.216216 0.000000 0.297297 0.243243 0.027027\n", "Start 0.037843 0.041154 0.000000 0.000946 0.671712 0.248344\n", "Stub 0.008711 0.006969 0.000000 0.000000 0.119120 0.865200" ] } ], "prompt_number": 30 }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we can view the peformance by class, which intriguingly seems to be better than what we got with the batteries-included model." ] }, { "cell_type": "code", "collapsed": false, "input": [ "for c in classic_order:\n", " print(c, norm_results.loc[c][c])" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Stub 0.865200348432\n", "Start 0.671712393567\n", "C 0.33734939759\n", "B 0.41935483871\n", "GA 0.297297297297\n", "FA 0.266666666667\n" ] } ], "prompt_number": 35 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that, having a large number of stubs to train on really gives us a high precision in classifying them.\n", "\n", "So there you have it - a brief playing around with Wiki-Class, an easy way to get rough quality estimates out of your data. If you extend any more examples of using this class, I'd be intrigued to see and collaborate on them.\n", "\n", "\u203d[@notconusing](https://twitter.com/notconfusing)" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }