{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Pivoted document length normalization\n", "It is seen that in *many* cases normalizing the tfidf weights for each terms tends to favor weight of terms of the documents with shorter length. Pivoted document length normalization scheme brings a pivoting scheme on the table which can be used to counter the effect of this bias for short documents by making tfidf independent of the document length.\n", "\n", "This is achieved by *tilting* the normalization curve along the pivot point defined by user with some slope. Roughly following the equation - \n", "`pivoted_norm = (1 - slope) * pivot + slope * old_norm`\n", "\n", "This scheme is proposed in the paper [pivoted document length normalization](http://singhal.info/pivoted-dln.pdf)\n", "\n", "Overall this approach can in many cases help increase the accuracy of the model where the document lengths are hugely varying in the enitre corpus." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "scrolled": true }, "outputs": [], "source": [ "%matplotlib inline\n", "from sklearn.linear_model import LogisticRegression\n", "\n", "from gensim.corpora import Dictionary\n", "from gensim.sklearn_api.tfidf import TfIdfTransformer\n", "from gensim.matutils import corpus2csc\n", "\n", "import numpy as np\n", "import matplotlib.pyplot as py\n", "\n", "import gensim.downloader as api" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# This function returns the model accuracy and indivitual document prob values using\n", "# gensim's TfIdfTransformer and sklearn's LogisticRegression\n", "def get_tfidf_scores(kwargs):\n", " tfidf_transformer = TfIdfTransformer(**kwargs).fit(train_corpus)\n", "\n", " X_train_tfidf = corpus2csc(tfidf_transformer.transform(train_corpus), num_terms=len(id2word)).T\n", " X_test_tfidf = corpus2csc(tfidf_transformer.transform(test_corpus), num_terms=len(id2word)).T\n", "\n", " clf = LogisticRegression().fit(X_train_tfidf, y_train)\n", "\n", " model_accuracy = clf.score(X_test_tfidf, y_test)\n", " doc_scores = clf.decision_function(X_test_tfidf)\n", "\n", " return model_accuracy, doc_scores" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Sort the document scores by their scores and return a sorted list\n", "# of document score and corresponding document lengths.\n", "def sort_length_by_score(doc_scores, X_test):\n", " doc_scores = sorted(enumerate(doc_scores), key=lambda x: x[1])\n", " doc_leng = np.empty(len(doc_scores))\n", "\n", " ds = np.empty(len(doc_scores))\n", "\n", " for i, _ in enumerate(doc_scores):\n", " doc_leng[i] = len(X_test[_[0]])\n", " ds[i] = _[1]\n", "\n", " return ds, doc_leng" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "nws = api.load(\"20-newsgroups\")" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cat1, cat2 = ('sci.electronics', 'sci.space')\n", "\n", "X_train = []\n", "X_test = []\n", "y_train = []\n", "y_test = []\n", "\n", "for i in nws:\n", " if i[\"set\"] == \"train\" and i[\"topic\"] == cat1:\n", " X_train.append(i[\"data\"])\n", " y_train.append(0)\n", " elif i[\"set\"] == \"train\" and i[\"topic\"] == cat2:\n", " X_train.append(i[\"data\"])\n", " y_train.append(1)\n", " elif i[\"set\"] == \"test\" and i[\"topic\"] == cat1:\n", " X_test.append(i[\"data\"])\n", " y_test.append(0)\n", " elif i[\"set\"] == \"test\" and i[\"topic\"] == cat2:\n", " X_test.append(i[\"data\"])\n", " y_test.append(1)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "id2word = Dictionary([_.split() for _ in X_train])\n", "\n", "train_corpus = [id2word.doc2bow(i.split()) for i in X_train]\n", "test_corpus = [id2word.doc2bow(i.split()) for i in X_test]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(1184, 787)\n" ] } ], "source": [ "print(len(X_train), len(X_test))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# We perform our analysis on top k documents which is almost top 10% most scored documents\n", "k = len(X_test) / 10" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get TFIDF scores for corpus without pivoted document length normalisation" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9440914866581956\n" ] } ], "source": [ "params = {}\n", "model_accuracy, doc_scores = get_tfidf_scores(params)\n", "print(model_accuracy)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Normal cosine normalisation favors short documents as our top 78 docs have a smaller mean doc length of 1290.077 compared to the corpus mean doc length of 1577.799\n" ] } ], "source": [ "print(\n", " \"Normal cosine normalisation favors short documents as our top {} \"\n", " \"docs have a smaller mean doc length of {:.3f} compared to the corpus mean doc length of {:.3f}\"\n", " .format(\n", " k, sort_length_by_score(doc_scores, X_test)[1][:k].mean(), \n", " sort_length_by_score(doc_scores, X_test)[1].mean()\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get TFIDF scores for corpus with pivoted document length normalisation testing on various values of alpha." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Score for slope 0.0 is 0.951715374841\n", "Score for slope 0.1 is 0.954256670902\n", "Score for slope 0.2 is 0.955527318933\n", "Score for slope 0.3 is 0.954256670902\n", "Score for slope 0.4 is 0.951715374841\n", "Score for slope 0.5 is 0.950444726811\n", "Score for slope 0.6 is 0.94917407878\n", "Score for slope 0.7 is 0.950444726811\n", "Score for slope 0.8 is 0.94790343075\n", "Score for slope 0.9 is 0.94790343075\n", "Score for slope 1.0 is 0.944091486658\n", "We get best score of 0.955527318933 at slope 0.2\n" ] } ], "source": [ "best_model_accuracy = 0\n", "optimum_slope = 0\n", "for slope in np.arange(0, 1.1, 0.1):\n", " params = {\"pivot\": 10, \"slope\": slope}\n", "\n", " model_accuracy, doc_scores = get_tfidf_scores(params)\n", "\n", " if model_accuracy > best_model_accuracy:\n", " best_model_accuracy = model_accuracy\n", " optimum_slope = slope\n", "\n", " print(\"Score for slope {} is {}\".format(slope, model_accuracy))\n", "\n", "print(\"We get best score of {} at slope {}\".format(best_model_accuracy, optimum_slope))" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.9555273189326556\n" ] } ], "source": [ "params = {\"pivot\": 10, \"slope\": optimum_slope}\n", "model_accuracy, doc_scores = get_tfidf_scores(params)\n", "print(model_accuracy)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "With pivoted normalisation top 78 docs have mean length of 1777.385 which is much closer to the corpus mean doc length of 1577.799\n" ] } ], "source": [ "print(\n", " \"With pivoted normalisation top {} docs have mean length of {:.3f} \"\n", " \"which is much closer to the corpus mean doc length of {:.3f}\"\n", " .format(\n", " k, sort_length_by_score(doc_scores, X_test)[1][:k].mean(), \n", " sort_length_by_score(doc_scores, X_test)[1].mean()\n", " )\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the pivoted normalization" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since cosine normalization favors retrieval of short documents from the plot we can see that when slope was 1 (when pivoted normalisation was not applied) short documents with length of around 500 had very good score hence the bias for short documents can be seen. As we varied the value of slope from 1 to 0 we introdcued a new bias for long documents to counter the bias caused by cosine normalisation. Therefore at a certain point we got an optimum value of slope which is 0.5 where the overall accuracy of the model is increased.\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABDEAAAHzCAYAAAA5EoaVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3XmYNGdZL/7vnbwhIUAIYQmBJARkURAkqOzgsIgQIepB\nFJUlwBF/KkY4Ho+KaF5Ez088Iq4sHgEDSAyIINEgojAuiMFgwhLCJksSCAFC9gAm5Dl/VE3eTmeW\nnp7pma6Zz+e66pruru2pmuquu+56nqeqtRYAAACAebffdhcAAAAAYBKSGAAAAMAgSGIAAAAAgyCJ\nAQAAAAyCJAYAAAAwCJIYAAAAwCBIYrBuVbW3ql633eUYkqr6TFU9aoLpjqmq66rKdxOAHUcMMVtV\n9adV9aIJp50oNgGYNy6UmEbb7gKsR1X9cVV9tKq+UVVPX2PaP+2TCMePff7S/vNV519Fy8D22xBV\n1WFV9ZaqurIPzn5klWkPraqTq+qifjhphem+q//fLxsUVtU/jCee+mTUu6vqqqo6dzRIrKoD++Pp\nc1X1lar6o6raMzL+OVV1ZlV9rapes0r5f7Vf7yPHtv/UqvpyVX2pql5fVbcYm+9nq+pT/T76SFXd\nbWTcbavqDVV1aV+214+M+9Oq+npVXdEPl1dVjYz/oX5bL6+qc6rq+0bGfWtVvaMv03XLbMu3VNW7\n+vV+oqq+f2xfXjey3iuq6pfH5n90Vf1Hv03nV9WTVtpvwLYb1Lmwqu5bVe/vf8/PrKpvW2G6m1TV\nq/pzz+VVdVZVPXaV5Z7Q/7b9ztjn39d/vuLv/xrWE2+ITTagql7cn2+/XFW/uca0v1xVn62qy6rq\nlNFz82rn17XOgWvM+8CqemdVXVxVX6yqN1bV7UfmffvYcr9eVR8cGf/ufr7LqursGomNq+r2VfW2\nPpa5rqqOHtveVeORqnpwVb2vL+8HquohI+MeUVUfrKpL+vn/sqruMDL+t6vq4/2851bVU8fWvX9V\n/Xpftsv7+OCW/bgTqrseGN3uh4/M+5mqunpk3N+OLfsuVfXX/XK/VFUvXu3/zuxIYjCNWnuSuXJ2\nkp9K8h9Z+2Tdknw8ydOWPqjuAvOHknxygvl3rOptdznW8EdJvpbkdkl+LMnLq+qeK0z70iQHJblT\nkvsneWpVnTA6QVUdkOT3kvxblvnfV9WPJdmzzLhTkrw/yWFJfjnJX1TVbfpxv5jkfknuleTu/esX\njMz7uSQvSvLqlTayqr4pyQ8m+fzYqF9PcsskxyT5piSHJ9k7Mt9/T/LMJMe11m6e5HuTfHlk/r/s\nl3lUktsm+e2RcS3Ji1trt+iHQ1prrV/uHZO8LslzW2uHJPn5JG8Y2eb/SvLnSZ61zLbsSfJXSd6W\n5FZJnp3k9TWSXOkdMrLu3xiZ/55J/izJLyU5JMl90u17YD7N+3nkelV1k3S/T69NcmiSk5P8VX9u\nGLcnyXlJHt7/Dr4gyRur6k4rLL4l+c8kT6qq/Uc+f3q6OGQj8cZg9vFKxvbJ3Kmqn0jyfenOOfdJ\n8oT+s+WmfXqSpyR5cJI7JLlpkj8YmWTF8+uIZc+Ba8x7aJJXpItz7pTkiiTXJ8daa48bme8WSf41\nyRtHlv0zSW7fWrtl9p2bD+/HXZfk9CRPXGEXrRiPVNVhSU5L8uJ+mt9KclpVHdrPe06Sx7TWbpXk\niCSfSPLykWVfmeTx/ffs6Ul+r6oeNDL+hUkemOSB/TRPSRcbLnnP6Ha31v5pbH8+fmTc9YnI/vfg\nnUn+vt+eOyZ5fdgWkhisqKp+oaou6LONH62RO75j0x1f3Z3XS/qs7TePjPtMVf1iP/4rVfXqqjpw\nZPzj++zuJVX1nqq692ZvR2vtZa21d+WGP2CrOS3JQ0d+TB+b5ANJLkofGPTX8y/ot++i6u7oH7K0\ngKp6anUZ9y9X1fNHF97P+4tV9cl+/KlVdatJCjYy39Ld7u8fG//j1d1dXxp/bP/5UX0m+4v9Ov+g\n//wG1XprrDlLVS322ez3JLkqyV2q6hkj6/jPqnr2WBm+r/+fXtaX9Xuq6klVdebYdP+jqt46yXZP\nuG9uluS/JfmV1trVrbX3pLswfuoKszw+yW+11r7WWvtsklelu8Af9XNJ/jbJxzIWFFaX1f/VJP9r\ndFxV3T3JsUlOaq19vbX2l0k+lH0n+scn+f3W2qWttS8n+f3R9bbW3tJa+6skF6+yuX+Y5BeSXDP2\n+TFJ3tpau7K1dnmSt6ZLlqT/n56ULtHw0X5dn26tXdKPf0ySI5P8r9baFa21b7TWzh5b/kqB8ZFJ\nLm2tvaNf7unpjpdv6t9/vLX2miQfWWbeb05yRGvtd1vn3Unekxv/31Y6X70gyStaa+9orV3XWruk\ntfapFaYFtsgOiSEWkuzfWvu91to1rbU/SPc7eKNt6c87L2ytnde//5skn06XqF7JF9KdH76n357D\nkjwo3blr9Lyy2j46tro7zZdX1Z+nS85nZPxU+6iqvre62iSXVdV5NVZbsaoeWlX/2i/3vOprqlbV\nTavqJf3/7tKq+ueqOqiqFqrq/LFlfGbpuOjjkb+oqtdV1WVJnl5V31lV7+3X8fmq+oMaSSBV1b1q\nX02DL/THyu2ru5N+2Mh09+vjn81MjDw9yW+31j7fWvt8kpckOWGFaZ+Q5E9aa59rrV2V7uL9h6tq\n9H+1VuJptWu2Zedtrf1ta+3NfUzw1XQ3eh6y3LRVdUySh6VL2C3N/+HW2mjtyQPS3eRIa+2LrbVX\nJLlBbDfimKwQj6RL5lzYl6211v4syZfSxXBLy/7CyHZflz6e6Mfvba19vH/9viT/nO57kz6e/tkk\nP95aO7+f5iOtta+vtb8mGH9Ckgv6eOWrrbX/aq19aI1lMSOSGCyrqu6R5KeTfEefxXxMks8sM93d\nk7whyYlJbpMuK3tajVSPT/Kj/fzflO7O8wv6eY9Nd+H44+nuWL8yyduqy3QuV6alqmXLDX+4Gdvd\n+1q6Oy9P7t8/Lft+1Jey289IdwJbSHKXJDdPd2G5dGf4ZelqAtwhya3TXeQtOTHJ8Ukeni7DfEm6\nE8skPpnkof3/5IUZyYpXV4X+pCRP7ccfn+Ti/qT91+mCqTulyxyfMrY9q3lKkv/eb+Nn0yVzvrdf\nxzOSvLT2JUvun+5O1c/1mfuHpztu/irJnUcDr3QXqScvt8Kqetkq/+vxC+sld09ybWvtkyOffSD7\nTprL2W/s9beOlOFO/fa9KMuf0P53uv/zRWOf3yvJp/pAZaVyjK/3yBpr9rHCOpf+z19rrb19mdF/\nlO5u0KH9ifyJ6b6TSXcM3jHJvfuA81N90Li0ngemS9acXF2i6301UsWy91N9sHhmVf23kc//Pcm5\nVfWE6qpx/kC679EHM50b/C96n62uqcirq+rWI58/IF1u8IN9kPu6mjApCMzGDooh7pUb/459MKuf\nV5bWd3hf3nNWmqT/+7rsq/355HTny+svuFbbR/22vjXdufRWSd6U7nd/qZbcSvtouZok465M8pT+\nXP69SX6y+maC/fnx9HQ1FW+T5L7par0mXQ2+Y9NdVB6WrmbejZoR9sZjkOOTvKlf5xuSfCPdBemt\n++U9Kl3N2vTnzL/vy3FEkrsm+Yf+4vfd6WrQLnlqklNaa98YL0BV/egqx8VXqurI8Xl690x3bl+y\n2nHRcuPz/oFJRmscrnR+XbLSOXCSeZc8PMmHVxj3tCT/tJSEW1Jd04mvpquR+u7W2kpJi3GrxSPJ\njWOc/TKy/6rq6Kq6JMnV6W4o/dZyK6mqmyb5zpHtuneSa9PVcLqwqj5WVT81MktLcmx1TUE+Vt0N\nyfHk1p/1Sa93VNV9Rj5/YLr/w+n9/O+uqvFYha3SWjMYbjSkOxlclO6EccDYuL1JXte//pUkfz4y\nrpJckK46ZdJdOD97ZPzjknyyf/3yJL82tuyPLs07g2365yRPW2Oa16S7aH1Iump1t0x3p+Sg0fmT\n/EOS/29kvrunqzK/f7q7828YGXdwuoDkkf37jyy97t8f0c+7X7rM9XVJ9ptwm85K8oT+9TuS/Mwy\n0zwoyReXW+bo/7J/f4P1pwsE9q5RhrckObF//cokL1lhupcl+fX+9b2SfGX82Nrg//dh6TL7o5/9\neLqT7nLTvy7Jm9MlZ+6arlrvV0fG/1WSJ40cF782Mu470jVPutH/LF2w9N6xdf1Gktf0r1+U5F/S\nBX63T3JGukDt8LF5XrQ0z8hnt0hXzfjoke/X6LF0h3RVHb/RD+9Y2sfp7nxcl66m0SHpElofS/Lf\n+/F/3I9/Rn8c/3C6BNut+/HHpguS90v3Pb48yYNH1v2sdFVVr0lXC+Nxy+zzuya5buyzA/p9//P9\n68ek+768vR9/s3R3MvdL10zoTUn+dmT+/0ryqX7ZN0vyF0lev1nHlcFgWP+QHRJD9OU7Zeyz16er\nabfafAeku8B++SrTnJAurjgoXZxxSJL3pjtnX//7v8o++q50F6WfG1vue5b2yyr76GEj+/eRq23L\nyHy/m+R3+te/lOTNy0yzX7qLznsvM24hyfljn12//v64WFyjDM9N8pf96x9J8v4VpvuhJP/Sv94/\nyYXpEmqbeYxfm+TuI+/vlrHz28i4Z6U7394pXVz5tnTn2wf041c8v2btc+Cq5+aR6e6TrobnQ1Yo\n4yezQozc78PHJnneMuP29Nty9Njnq8Ujt04XXzw53Xfl6f00N/q+9Nv2v5b21TLjT05y+sj7H+3L\n83/TJYrunS4GfnQ//s5J7tS//tZ0ScZfHJn/Qf18N03X/PfCdE15kuTv0sUc39Nv9/9MF79sWixr\nmHxQE4Nlte5u9nPTnVQuqq4ToiOWmfQO6dqALs3Xkpyf7o7vktHqg+f18yTdj/nPjWa9090tXm49\nW6m1rinCbdPd8TmttTbeFOWIdLUSlpyX7gft8H7cBSMLuzo3bBpwTJK3jGzzR9KdDA/PGqrqadVV\n71ya91vTXQwn3b77z2VmOyrJZ9sNqwSux3j1z8dV1b/1Wf9LkhyX7oS0WhmS7kTzo/3rpyY5tbU2\n3hxiI65MFwSOOiTdhfVyTkzy1XRtLd+S7q7P55Kkqp6Q5OattTf101b2NSXaL11C5rlj+3TprsJK\n5bi8f/0b6ZJPZ6dLZrwlXQ2S8Rody9XE2Jsu+D9vhenemC5Qunm/zk9lX3vNr/Z/f6u1dnnrmtC8\nMt3/b2n8p1trr2ldU5JT0/3vH5IkrbWzWtdU47rW1QL5s/RVP6vq0emqx35Xa+2AdMH1q2qFDvBG\n9cfA96e703dhkuf123FBP/6q1tp/9Ov9YpLnJHlMdc2Hki5gfk1r7ZOtq/3yv0e2CdgGOyiGuCKr\n/57fSH+OeF262mjPWWsFfXzxN+mSFYe11t6b7nd9qZbCavvoiPTnrRGjsclK++gOWUNVPaD2dex4\naZKfyL5z/VHpzi/jbpMuKbNSHLCWC0bfVNXd+5oAF1bXxOQ3JihD0iUJ7lldE4nvTnJZm7wGwaTG\nz/WH9J8t59XpasAupms+9K7+86Xz3Irn17XOgavNu6Sq7pquFsSJfXybsfEPTReD/sVyhe9jgr/t\n1/uEVffKPivGI621i9P1J/I/0iXwvidd0u+C8YW0rsnrUl80N7hurar/k65GzGitm6VY59da16T3\nQ+n65DquX96n+/gnrbUPJ/m1dH2MLa3vvf18X22t/WaSS9PdJFta9j+3rvnqta213053PI7WMmaL\nSGKwotbaKa21h6U7CbZ0FynjPtePT9LV6U53Yhk9qR499npp3HlJfqO1dquR4eb9xdONVNce9IoV\nhpdNv6Uren26H9jXLjPu8+mSEUuOTpeI+EK6C7GjRsp9cPaddJNuux87tt0Ht9YuXK0wffXNP05X\nRfew1nV49OHsu4g9P93dr3HnJzl6mepySXfCPXjk/e2Xmeb66p7VtUV+c7pqfbfry3D6BGVIa+2M\nJP9VXROFH0kX5C2rql6xyv96pfaHH0+ypz9ZL/m2rFB1sj/pP6W1dkRr7d7p7jSc0Y9+ZJLv6AOn\nC9OdIJ9bVW9JVxvi25Oc2o97Xz/PBdX1rn1Our5Dbj5WjnP69X6ttfYzrbUjW2t3TVcjZbngarmm\nPo9McuJIuY5K13Hcz4+s55X9yfeq3DBJ8bF0dxBWWs8HVhi3XDnG3TddNdT/SJI+WDwjyaMnmDet\ntQ+11hZaa7dprT0uXbXx960x29L5a9omK8AM7ZAY4px0d7BH3ScrNBHpy/+qdDdBntiWab6wgtem\nizdGOwlcOq+utI8uSBdvjCZ8Mjpt1rmPxrwhXVOVI1trSx1ELpXpvIz0UTDiy+mSN8vFAVdlJN7o\nY5Lbjk0zfr55ebobPXdtXROTX86+3/7z0jXnvZE+MfTGdM1hn5Ll47ilcvzYKsfF5bVyc5Jz0p37\nlqwWb7TW9eNw59ba0f02XdBaG09ArcdE13B97PjOdBf1f7bCZE9PV7Pm6jUWd0BW2OfLWC0eSWvt\nn1pr92+t3TpdU5Zvzsrn/QPS1UIZ7XvuhemSH49prY0mj1aKCVaLZVbrI6ONjL9BnNR/F9kubQ6q\ngxjmb0jXPOKR6apU3SRdFvk1/bi92VcV9B7pLoQfme5H5n+mq5K2px//mXRf+jumaxv5L9nXpODb\n052E7p/uB+Jm6e7G3nyTt+WAdHcG3pOub4eDktQK0/5pkhf1r2+V5BEj40abkzwr3UXzMemyzH+R\n5LX9uHulu3vzkH7f/Xa6KvZLVSafm66ZxlKTgNsmOb5/fUxWaE6SLtv81f5/s3+6av/XJHlmP/4H\n+/15v35/3jVdwLdfurv+/yddAHFQ9lVTfHS6zpSOSlfF8a9y4+Ykzxopwy3SJWse3q/jcekCk6Wq\nq9+ZrorgI/v13jHJPUbmf366E8wnZnTcnpIu8Dq43/+XJvmWFaa9S7rk0v79dnxpadr+f3q7fjg8\nXRb/JUkO7cffbmT4jn6fHZF9VSXf2+/vg5L8QG7YLOMO/VDp2leel76aYz9+/36+/z9d4HVguo7l\nku47NFqu89K1Mz24H/+udB2FHpSuKuTL0lep7cefnK45yc3T3Y07N8kzRpb9lXTBxP798fTldAmz\npePr5v3/9THp7kQuVfl+eL//vq1/f2w/7+h2HZTuGL6u36YDR8bdux9/cMaqZ6b7fbhHv95bJzk1\nXbvnpXmfke4Oz537+d+Y5OTt/g01GHbzkB0SQ/Rl+ky6mnsHprsL/uml8i0z/SvS/f7fbIJln5Du\nru7S+0dk3znm10f214r7qN+3n+3Ld0C6O/D/lX3n5FX3UVZpTpKuOdBSzHP//v1SnHN0unPAk/py\n3Dr7fv//MN1d9SPSnUse1JfzluniheP6sp6UG8ZG1x8XI2U4I10NlUp3kfuxpX2WLh75fLo+Mw7s\n399/ZN4HpzuXXJ7kqBkc4z+RLhmxdE7/cEaaPo1Ne6t0SZ9Kdx78UPqmnP341c6va50DV5v3jv0+\n+LlVtuOm6WKlhbHP75EuNrpp//96Srqmnvcdmeagft3XpfvOHzQybq145Nh+uYeka6o0+l34gX55\n+6WLkd+Y5MyR8b+ULgY/fIVt+sd038WbJPmWdMfuI/pxj1uarz+mPpSuQ/iki4WXYveD0jVzvSjJ\nrUZ+165K10xu/3Q1Rz+RFX4PDLMdtr0Ahvkc0l1UnNH/GF6crmre7ftxJ6U/kfXvvz9dRvrSdBe9\n3zIy7tPpnqJwTroLudeM/ch9T7rM6yXpTkanZvOTGIv9D+w3+r/XZYU2sxnr+2Bs3GgSo9KdWM9L\n19butUluOTLt09IFFl9Od+H+qew7UVf/w/fRfv9+MvuCsmP6ci7bJ0a6wObidBeML+n39zNHxv9E\nv9wr0iULloKKo9I1W/hyP+/vjszzh/3+/3i6JM/16x9ffv/ZT6WrcXJJv91vyA37i/j+dEHn5f0y\nv3tk3FH98k+a0XF7q347r0wXeD55ZNzDklwx8v5J6e5wXZWuf4vvXmW5qx0XN/qfpbsT9u50TR3O\nzQ37rXhYuu/FVf24Hxlb3t6R43Rp+NUV1v3psWUfk+67+uX+ODk9yTeNjL9FukTP5f2x+4Kx5T20\nP26uSPe9fMjIuH9K9x2/LF1zmB8am/en053ML08XND1vrFxL27L0PfzUyPjfSpdAuSJdteq7jIx7\ncrrvz5XpfiP+NF0toPF99sV+ODkj30WDwbD1Q3ZWDHHfdLXlru7/ftvIuOenb4/f/+5f1093xcjw\nIyss9+nparAtN+5FSV494T769nTnsMvTJdxPyQ3Pycvto5uN7N+VkhhPTHcevTxd8vv3x/5vD03X\n2eNl6c4nT+0/PyjdI8wv6Mu7mD5p3W/z59NdGP5cbhgb3eC46D97WLrz5BXpzkEvHN1n6W4a/X26\n88eF6Z6uNTr/x7NCv1ibdGy8uD++L07ym2Pjrkh/Dk3XX8ZH0533P5OuOerotCueX7PGOXCNeU/q\nj8nR4/HysXX/SLqmpOPb9s39//fy/tg5I8n3jU0zfl7/xsi4Y7J6PPKGvtyXpjtmbzMy7jkj23xh\nP+1RY+v96th2jfZrcYckb+8//890TypZGvd/0sWwV/bj9mbfjaKlzlqv7Mv9ziT3G9vmH0gX61yW\nLlGz7I0yw+yH6v8hM9NXFzszXbWpJ4yNO6E/mJbaQP1Ba+3VMy0QW6qqPp3uTv671pyYHa/vRfqi\nJMe21qZtMwvscmKL3UEMwUZU1T8k+TPff9h59qw9yYb9bLrqVuOPD0y6dkantNZO3IJyANvvJ5O8\nTwID2CCxBbCi6h75fr90j20FdpiZduzZd4ZzXJI/yfKdptQKnwM7TFV9JsnPpKtCCjAVsQWwmqo6\nOd3jMH+2dZ1KAjvMrGtivDRdpyjjj6da0pI8sX9awcfTtaG+0eN1GK7W2p23uwzMh9baMdtdBmBH\nEFvsEmIIptFae/p2lwGYrZklMarq8Um+2Fo7q6oWVpjstCRvaK1dU1XPTtch26OWWdZsO+4AADas\ntTbTGhCbFVuIKwBg/q0UV8yyOcmDkxzfd8p0SpJHVtUNntPcWvtKa+2a/u2r0vWwvKzt7gF1aMNJ\nJ5207WUY0mB/2Wf22fwN9tew9tkW2bTYYrv/V0MbfB/tM/ts/gb7yz7byftsNTNLYrTWnt9aO6p1\nVQGfnORdrbWnjU5TVbcfeXt8uk66AABuRGwBAGzF00mSroOtliRV9cIkZ7bWTktyYlUdn+TadM8Q\nPmGLygMADJvYAgB2oS1JYrTWFpMs9q9PGvn8+UmevxVl2G0WFha2uwiDYn+tn322fvbZ+thf67eb\n9pnYYmvtpmNrs9hn62efrY/9tX722frN4z6rtdqbzIOqakMoJwDsVlWVNuOOPTeLuAIA5ttqccUs\nO/YEAAAA2DSSGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAwCJIYAAAAwCBIYgAAAACDIIkBAAAA\nDIIkBgAAADAIkhgAAADAIEhiAAAAAIMgiQEAAAAMgiQGAAAAMAiSGAAAAMAgSGIAAAAAgyCJAQAA\nAAyCJAYAAAAwCJIYAAAAwCBIYgAAAACDIIkBAAAADIIkBgAAADAIkhgAAADAIEhiAAAAAIMgiQEA\nAAAMgiQGAAAAMAiSGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAwCJIYAAAAwCBIYgAAAACDIIkB\nAAAADIIkBgAAADAIkhgAAADAIEhiAAAAAIMgiQEAAAAMgiQGAAAAMAiSGAAAAMAgSGIAAAAAgzDz\nJEZV7V9VZ1XVacuMO7CqTq2qT1TVv1XVnWZdHgBg2MQWALB7bUVNjJ9N8pEkbZlxz0pycWvtbkle\nmuTFW1AeAGDYxBYAMKaqJhqGbqZJjKo6MslxSf4kyXJ76/gkJ/ev35zkUbMsDwAwbGILANjdZl0T\n46VJfj7JdSuMv2OS85OktXZtksuq6rAZlwkAGC6xBQDsYntmteCqenySL7bWzqqqhY0ub+/evde/\nXlhYyMLChhcJAExpcXExi4uLW7rOzYwtxBUAMD/WE1dUa8s1J924qvrfSZ6a5NokByU5JMmbW2tP\nG5nmb5Psba39W1XtSXJha+22yyyrzaqcAMDGVVVaazNtaLtZsYW4AoCdaNL+LoZwDlwtrphZc5LW\n2vNba0e11u6c5MlJ3jUaZPTeluTp/esfTPIPsyoPADBsYgsAYGbNScZU+h7Eq+qFSc5srZ2W5FVJ\nXldVn0hycbqABABgLWILYLB20h1z2Goza06ymVT7BID5thXNSTaLuALYbpIYzMJOOq62pTkJAAAA\nwGaSxAAAAAAGQRIDAAAAGARJDAAAAGAQJDEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAkMQAA\nAIBBkMQAAAAABkESAwAAABgESQwAAABgECQxAAAAgEGQxAAAAAAGQRIDAAAAGARJDAAAAGAQJDEA\nAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAkMQAAAIBBkMQAAAAABkESAwAAABgESQwAAABgECQx\nAAAAgEGQxAAAAAAGQRIDAAAAGARJDAAAAGAQJDEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAk\nMQAAAIBBkMQAAAAABkESAwAAABgESQwAAABgECQxAAAAgEGQxAAAAAAGQRIDAAAAGARJDAAAAGAQ\nZprEqKqDquqMqjq7qj5cVXuXmeaEqvpSVZ3VD8+cZZkAgGESVwAAe2a58Nba16rqEa21q6tqT5J/\nqaq3t9bOGJ0sySmttRNnWRYAYNjEFTBfqmqi6VprMy4JsJvMvDlJa+3q/uVNkhyQ5LqxSaofAABW\nJa4AgN1t5kmMqtqvqs5OclGSv2ut/fvYJC3JE6vqA1X1pqo6ctZlAgCGSVwBALvbTJuTJElr7bok\n962qWyZ5S1Xdq7V2zsgkpyV5Q2vtmqp6dpKTkzxqfDl79+69/vXCwkIWFhZmWm4AYGWLi4tZXFzc\n8vWKKwBg51lPXFFb2Uatqn4lydWttZesMH7/JBe31g4d+7xpSwcA86uq0lrb0mYc4grYXvrEmJ59\nxyzspOM6PwU+AAAgAElEQVRqtbhi1k8nuU1VHdq/vmmS705y7tg0tx95e3ySj8yyTADAMIkrAIBZ\nNyc5IsnJ/Z2Q/ZKc2lo7vapemOTM1tppSU6squOTXJvk4iQnzLhMAMAwiSsAYJfb0uYk01LtEwDm\n23Y0J5mWuAI2x06qur7V7DtmYScdV9vWnAQAAABgs0hiAAAAAIMgiQEAAAAMgiQGAAAAMAiSGAAA\nAMAgzPoRqwAAsOvspKcEAMwTNTEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAkMQAAAIBBkMQA\nAAAABkESAwAAABgESQwAAABgECQxAAAAgEGQxAAAAAAGQRIDAAAAGIQ9210AAACAoamq7S4C7Epq\nYgAAAACDIIkBAAAADIIkBgAAADAIkhgAAADAIEhiAAAAAIMgiQEAAAAMgiQGAAAAMAiSGAAAAMAg\n7NnuAgAAwFaoqomma63NuCQATEtNDAAAAGAQ1MQAAAAGTS0b2D3UxAAAAAAGQRIDAAAAGARJDAAA\nAGAQJDEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBD2bHcBAADYvapq4mlbazMsCQBDIIkBAAC7\nxKRJIwkjYF5pTgIAAAAMgiQGAAAAMAgzS2JU1UFVdUZVnV1VH66qvctMc2BVnVpVn6iqf6uqO82q\nPADAsIktYH2q6kYDwNDNLInRWvtakke01u6b5L5JHltVDxib7FlJLm6t3S3JS5O8eFblAQCGTWwB\nAMy0OUlr7er+5U2SHJDkurFJjk9ycv/6zUkeNcvyAADDJrYAgN1tpkmMqtqvqs5OclGSv2ut/fvY\nJHdMcn6StNauTXJZVR02yzIBAMMltgCA3W3WNTGu66t8HpnkAVV1r1muDwDY2cQWALC77dmKlbTW\nLquqdyd5bJJzRkZ9LsnRST5fVXuS3LK19pXllrF3797rXy8sLGRhYWFm5QUAVre4uJjFxcVtW/9G\nYwtxBQDMj/XEFdVam0khquo2Sa5trV1aVTdN8o4kv9laO31kmp9Kcu/W2k9W1ZOTfH9r7cnLLKvN\nqpwAwMZVVVprM330wWbFFuKK+bKeJ2Zs9P826bo24/jYynVttAzLmaRc87CNS7ajLFvxtBe/VazH\nPH0nN2q1uGKWNTGOSHJyVe2frtnKqa2106vqhUnObK2dluRVSV5XVZ9IcnGSGyUwAAB6YgvYIh7H\nCrO3k5IOW2lmNTE2kzsmADDftqImxmYRV8wXNTHmsybGZlITY3p+q3a2zT5u5+F3Z7NsV00MAACA\nQZqXJBBwQ5IYAAAAPckLmG+SGAAAAOwqO6npxW6z33YXAAAAAGASkhgAAADAIEhiAAAAAIMgiQEA\nAAAMgiQGAAAAMAieTgIAANtkkickeDoCwD5qYgAAAACDoCYGAAAAc2uSGkuJWku7hZoYAAAAwCBI\nYgAAAACDIIkBAAAADIIkBgAAADAIkhgAAADAIHg6CQAAsCtN+tQLYH6oiQEAAAAMgiQGAAAAMAiS\nGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAwCJIYAAAAwCDs2e4CAACw81TVti+7tTazMgCwPSQx\nAABgQrNMzjAZ/wPY3SQxAABgjo1ftKthsnuslrBxHLBb6RMDAAAAGISJkhhVdXBV3WPWhQEAdgex\nBQAwjTWTGFV1fJKzkryjf39sVb1t1gUDAHYmsQVbqaquHwAYvklqYuxN8oAklyRJa+2sJHeZYZkA\ngJ1tb8QWAMAUJkliXNNau3Tss+tmURgAYFcQWwAAU5nk6STnVNWPJdlTVXdLcmKSf51tsQCAHUxs\nAQBMZZKaGM9Jcq8kX09ySpLLkzx3loUCAHY0sQUAMJVa7fnCVbUnyTtba4/YuiItW47mOcgAML+q\nKq21NXtOnIfYQlyxNeahI83W2lTlWCM+3kiRNsWkx+88lDWZvLyT2sh2jZdlXvbRNHbT79ik/6f1\n7JNZLHO9NrsM87BNm2W1uGLVmhittWuTXFdVh86kZADAriK2AAA2YpI+Ma5K8qGqemf/Oklaa+3E\n2RULANjBxBYAwFQmSWL8ZT8s1TmpkdcAAOsltgAAprJqnxjXT1R1YJK7928/2lq7ZqaluvH6tV0F\ngDk2aZ8YI9NvW2whrtga89DXwLR9YizNu5x52a5JzENZk/nqE2PURo6PebCbfsf0iaFPjFFr1sSo\nqoUkJyf5bP/R0VX19NbaP25eEQGA3UJsAQBMa5JHrP5Okse01h7eWnt4ksckeekkC6+qo6rq3VV1\nTlV9uKpu1Na1qhaq6rKqOqsfXrC+TQAABmaq2EJcAQBM0ifGntbax5betNY+3j8ebRLXJHlea+3s\nqrp5kvdX1Ttba+eOTfePrbXjJ1wmADBs08YW4goA2OUmCRjeX1V/kuT16Tre+rEkZ06y8NbaF5J8\noX99ZVWdm+QOScaDjeE2RgMA1muq2EJcAQBM0pzkJ9MFBycm+Zkk5/SfrUtVHZPk2CRnjI1qSR5U\nVWdX1elVdc/1LhsAGJQNxxbiCgDYndZ8OklV3SzJ11pr3+jf75/kwNba1ROvpKvyuZjk11trbx0b\nd4sk32itXV1Vj0vye621u49NoxdxAJhj63k6yUZjC3HFMMzDUx88nWT7y5p4Osms7KbfMU8n8XSS\nUZM0J3lXkkclubJ/f3CSdyR58IQrPyDJm5O8fjzQSJLW2hUjr99eVS+rqsNaa18ZnW7v3r3Xv15Y\nWMjCwsIkqwcAZmBxcTGLi4vTzj51bCGugJUN+YIc2N3WE1dMUhPj7Nbafdf6bIV5K90j1C5urT1v\nhWkOT/LF1lqrqvsneWNr7ZixadwxAYA5ts6aGFPFFuKKYZmHC+rdVhNjHsq2HDUxZmM3/Y6piaEm\nxqhJamJcVVXf3lp7f7+w70jy1QnX/ZAkT0nywao6q//s+UmOTpLW2iuT/GCSn6yqa5NcneTJEy4b\nABimaWMLcQXrMuQLVACWN0lNjO9M8udJLuw/OiLJD7fWJnpCyWZwxwQA5ts6a2Jsa2whrtgaQ08g\nzHNth3ku23LUxJiN3fQ7piaGmhg3GDfJBlTVTZLcI12P3x9rrV2zuUVcc/2CDQCYY+tJYvTTb1ts\nIa7YGkO+OEzmO1Ewz2VbjiTGbOym3zFJDEmMUWs+YrWqfijJQa21DyX5gSSnVtX9NrmMAMAuIbYA\nAKa1ZhIjya+01i6vqoem60n81UleMdtiAQA7mNgCAJjKJEmMb/R/H5/k/7bW/jrJAbMrEgCww4kt\nAICpTJLE+FxV/XGSH07yN1V10ITzAQAsR2wBbLsh94cBu9kkTye5WZLHJvlga+0TVXVEknu31v5u\nKwrYl0EHXAAwx9b5dJJtjS3EFVtj6BeI89x55jyXbTnz2rHn0O2m3zEde+rY8wbjBrIBgg0AmGPr\nfTrJdhJXbI2hX2jOc6Jgnsu2HEmM2dhNv2OSGJIYo1TdBAAAAAZBEgMAAAAYhImSGFV1TFU9un99\ncFUdMttiAQA7mdiCeVdVmi0AzKE1kxhV9ewkb0ryyv6jI5O8ZZaFAgB2LrEFADCtSWpi/HSShya5\nPElaax9PcrtZFgoA2NHEFgDAVPZMMM3XW2tfX6pOV1V7ksx/d6YAwLwSW8AGaObCuJ30VApYyyQ1\nMf6xqn45ycFV9d3pqn+eNttiAQA7mNgCAJhKrZWNq6r9kzwryWP6j96R5E+28gHrnucOAPNttee5\nLzPttsYW4oqtsVNqC4wfKztlu7bSZn/f/A86o/t1p9fEmMX2zcM+2+wyzMM2bZbV4opJkhg3S/K1\n1to3+vf7JzmwtXb1ppd05TIINgBgjq0zibGtsYW4YmvslAtNSYyNk8SYDUmMG5PE2P5t2iyrxRWT\nNCd5V5Kbjrw/OMnfb0bBAIBdSWwBTGzpcbceewskkyUxDmytXbn0prV2RbpgAwBgGmILAGAqkyQx\nrqqqb196U1XfkeSrsysSALDDiS0AgKlM8ojV5yZ5Y1Vd2L8/IskPz65IAMAOJ7YAAKayZseeSVJV\nN0lyj3TPcP9Ya+2aWRdsbP064AKAObaejj376bctthBXbI2d0neBjj03bqPfN/t8eTr2vDEde27/\nNm2WDT2dpF/Ag5PcOV3NjZYkrbXXbmYh11i/YAMA5tgUSYxtiy3EFVtjp1x4SmJsnCTGbEhi3Jgk\nxvZv02ZZLa5YszlJVb0+yV2SnJ3kGyOjtiyJAQDsHGILAGBak/SJ8e1J7umWBQCwScQWAMBUJnk6\nyYfTdbgFALAZxBYAu1xVTTTAuElqYtw2yUeq6n1Jvt5/1lprx8+uWADADia2AACmMkkSY2//tyWp\nkdcAANPY2/8VWwDutrNpdlLHlqxs0qeTHJPkrq21v6+qg5Psaa1dPuOyja5fs1kAmGNTPJ3kmGxT\nbCGu2Bo75cLU00k2bq3vm306naE/nWQ9Zd7sY2Q9y/R0ku2xWlyxZp8YVfXsJG9K8sr+oyOTvGXz\nigcA7CZiCwBgWpN07PnTSR6a5PIkaa19PMntZlkoAGBHE1sAAFOZJInx9dbaUqdbqao90W4VAJie\n2AIAmMokSYx/rKpfTnJwVX13uuqfp822WADADia2AACmsmbHnlW1f5JnJXlM/9E7kvzJVvaIpQMu\nAJhv6+nYc7tjC3HF1tBZI0t07DkbOvacno495+c4WMlqccVETyfZboINAJhv6306yXYSV2wNF6Ys\nkcSYDUmM6UlizM9xsJLV4oo9q8z0oVWW2Vpr99lwyQCAXUNsAQBs1IpJjCRP6P/+VP/3dUkqyY/N\ntEQAwE4ltgAANmSSPjHObq3dd+yzs1prx860ZDdcn2qfADDH1tknxrbGFuKKraGJAEs0J5kNzUmm\npznJ/BwHK1ktrpjk6SRVVQ8defOQdHdNAACmIbYAAKayWnOSJc9M8pqqumX//tIkz5hdkQCAHU5s\nMWDumgOwnSZ+OslSoNFau2ymJVp+3ap9AsAcm+bpJNsVW4grNkYSg/XSnGQ2NCeZnuYk83McrGTa\np5M8tbX2uqr6uSRt5PNK14P476yx0qOSvDbJ7fr5/7i19vvLTPf7SR6X5OokJ7TWzppgmwCAgRFb\nAAAbtVpzkoP7v7fISKCxDtckeV5r7eyqunmS91fVO1tr5y5NUFXHJblra+1uVfWAJC9P8sAp1gUA\nzD+xBQCwIaslMb6p//uR1tob17vg1toXknyhf31lVZ2b5A5Jzh2Z7PgkJ/fTnFFVh1bV4a21i9a7\nPgBg7oktAIANWe3pJMf11Tt/aaMrqapjkhyb5IyxUXdMcv7I+wuSHLnR9QEAc0lsAQBsyGo1Md6e\n5JIkN6+qK8bGtdbaIZOsoK/u+RdJfra1duVyk4wve5LlAgCDI7YAADZkxSRGa+3nk/x8Vb2ttXb8\nNAuvqgOSvDnJ61trb11mks8lOWrk/ZH9Zzeyd+/e618vLCxkYWFhmiIBAJtgcXExi4uL65pnnmIL\ncQUAzI/1xBUTP2J1vfrqoicnubi19rwVpjkuyXNaa8dV1QOT/G5r7Uadb3kUGgDMt2kesTrFOjYl\nthBXbIzHYbJeHrE6Gx6xOj2PWJ2f42AlUz1idWTmJyb5zSSHZ1/1zEmqfD4kyVOSfLCqlh5t9vwk\nR/cLeGVr7fSqOq6qPpnkqiTPWHNrAIBBE1sAANNasyZGVf1nksePPr5sq7ljAgDzbT01MbY7thBX\nbIy75qyXmhizoSbG9NTEmJ/jYCWrxRWrPZ1kyRe2M4EBAOw4YgsAYCprNidJcmZVnZrkrUn+q/+s\ntdb+cnbFAgB2MLEFADCVSZIYt0zy1SSPGftcoAEATENsAQBMZWZPJ9lM2q4CwHzbiqeTbBZxxcbo\nv4D10ifGbOgTY3r6xJif42AlG+oTo6qOqqq3VNWX+uHNVXXk5hcTANgNxBYAwLQm6djzNUneluQO\n/XBa/xkAwDTEFgDAVCZ5xOoHWmvfttZns6TaJwDMt3U+YnVbYwtxxcao+s96aU4yG5qTTE9zkvk5\nDlay0UesXlxVT62q/atqT1U9JcmXN7eIAMAuIrYAAKYySRLjmUl+KMkXklyY5ElJnjHLQgEAO5rY\nAgCYiqeTAAAb5ukku4eq/6yX5iSzoTnJ9DQnmZ/jYCUbfTrJa6vq0JH3t6qqV29mAQGA3UNsAQBM\na5LmJPdprV269Ka1dkmS+82uSADADie2AACmMkkSo6rqsJE3hyXZf3ZFAgB2OLEFADCVPRNM85Ik\n762qNyapdJ1v/cZMSwUA7GRiCwBgKhN17FlV90ryyCQtybtaax+ZdcHG1q8DLgCYY+vt2HM7Ywtx\nxcbohJH10rHnbOjYc3o69pyf42Alq8UVnk4CAGyYp5PsHi44WS9JjNmQxJieJMb8HAcr2dDTSQAA\nAADmgSQGAAAAMAiSGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAwCJIYAAAAwCBIYgAAAACDIIkB\nAAAADIIkBgAAADAIe7a7AACwW1XVRNO11mZcEgCAYVATAwAAABgESQwAAABgECQxAAAAgEGQxAAA\nAAAGQRIDAAAAGARJDAAAAGAQJDEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAkMQAAAIBBkMQA\nAAAABkESAwAAABiEmSYxqurVVXVRVX1ohfELVXVZVZ3VDy+YZXkYlqqaaABgdxBXAAB7Zrz81yT5\ngySvXWWaf2ytHT/jcgAAwyeuAIBdbqY1MVpr/5zkkjUmcysdAFiTuAIA2O4+MVqSB1XV2VV1elXd\nc5vLAwAMl7gCAHa4WTcnWct/JDm6tXZ1VT0uyVuT3H2by7QrradvidbaDEsCAFMTVwDADretSYzW\n2hUjr99eVS+rqsNaa18Zn3bv3r3Xv15YWMjCwsKWlJHdY9JEjiQOQLK4uJjFxcXtLsYNiCsAYJjW\nE1fUrC/IquqYJKe11u69zLjDk3yxtdaq6v5J3thaO2aZ6ZoLx9max5oYW51UkMQAttpO+t2pqrTW\nZt4fhbhi+3kyGOu11vfNMTWd0f06xPPJesq82cfIepY5y3222WWYh23aLKvFFTOtiVFVpyT5riS3\nqarzk5yU5IAkaa29MskPJvnJqro2ydVJnjzL8gAAwyWuAABmXhNjM7hjMntqYuyszCUwDDvpd2er\namJsBnHFxrhrznqpiTEbamJMT02M+TkOVrJtNTFgHpMjAAAADNN2P2IVAAAAYCKSGAAAAMAgaE4C\nsAbNogAAYD6oiQEAAAAMgiQGAAAAMAiakwDE490AAGAI1MQAAAAABkESAwAAABgESQwAAABgEPSJ\nweBVlcdaTsBjQgEAgKGTxIA5MWmSQYIBAADYrTQnAQAAAAZBTQwYmPEaG2pmAAAAu4WaGAAAAMAg\nqInBuk3Sd4PaAQAAAGw2SQx2tfU8sQMAAIDtpTkJMyE5AAAAwGaTxAAAAAAGQRIDAAAAGARJDAAA\nAGAQJDEAAACAQZDEAAAAAAbBI1aZK55qAgAAwErUxAAAAAAGQRIDAAAAGATNSYAtN2mzodbajEsC\nAAAMiSQG7BD6EwEAAHY6SQyYktoEAAAAW0ufGAAAAMAgSGIAAAAAgyCJAQAAAAyCPjGAQdOhKQAA\n7B5qYgAAAACDoCYGMzMPd8jnoQwAAABsDkkMmLG1EikewQoAADAZzUkAAACAQZDEAAAAAAZBEgMA\nAAAYBEkMAAAAYBAkMQAAAIBBmGkSo6peXVUXVdWHVpnm96vqE1X1gao6dpblYb55HOp0qsq+A3YF\ncQUAMOuaGK9J8tiVRlbVcUnu2lq7W5JnJ3n5jMuzIUsXi2sNAMBM7Ki4AgBYv5kmMVpr/5zkklUm\nOT7Jyf20ZyQ5tKoOn2WZAGBaktnbS1wBAGx3nxh3THL+yPsLkhy5TWWBibhIAZhb4goA2OH2bHcB\nkoxfDbblJtq7d+/1rxcWFrKwsDC7EgEAq1pcXMzi4uJ2F2M54goAGJj1xBXV2rLn9k1TVcckOa21\ndu9lxr0iyWJr7c/79x9N8l2ttYvGpmuzLuckJr37Pg9lXa+h1yxYaZ8PYbuWyj5PZd2C34VNK8c8\n7bdkmN9/JrfZ54GddF6pqrTWZv6F3ElxxVDN2+8u82+t75tjajqj+3WI55P1lHmzj5H1LHOW+0xc\nsbLV4ortbk7ytiRPS5KqemCSS8cDDWZH++354X8AsCnEFQCww820OUlVnZLku5LcpqrOT3JSkgOS\npLX2ytba6VV1XFV9MslVSZ4xy/IAwyfhA7uXuAIAmHlzks0wL9U+d1L1nGRnXQwOuTnJPJrn5iTz\n/j8dyvef6aj2ubKtak6yGeYlrhiqef8dZv5oTjIbmpNMT3OS+TkOVrJaXDEPHXvONT+qAAAAMB+2\nu08MAAAAgImoicGOsFRjZghVo3YbtZkAAIDNIokxhzbjom8eH9sJ7A6zaI+5k9p4AgAwPc1JAAAA\ngEFQE2OHUgMDAACAnUZNDAAAAGAQ1MRgx1ELBQAAYGdSEwMAAAAYBEmMOaMWAQAAACxPc5IZ8ChA\nAAAA2HxqYgAAAACDIIkBAAAADIIkBgAAADAIkhgAAADAIEhiAAAAAIMgiQEAAAAMgiQGAAAAMAiS\nGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAwCJIYAAAAwCBIYgAAAACDIIkBAAAADIIkBgAAADAI\nkhgAAADAIEhiAAAAAIMgiQEAAAAMgiQGAAAAMAiSGAAAAMAgSGIAAAAAgyCJAQAAAAyCJAYAAAAw\nCJIYAAAAwCBIYgAAAACDIIkBAAAADIIkBgAAADAIkhgAAADAIEhiAAAAAIMw8yRGVT22qj5aVZ+o\nql9YZvwJVfWlqjqrH5456zIBAMMkrgCA3W3PLBdeVfsn+cMkj07yuST/XlVva62dOzJZS3JKa+3E\nWZYFABg2cQUAMOuaGPdP8snW2mdaa9ck+fMk3zc2TfUDAMBqxBUAsMvNOolxxyTnj7y/oP9sVEvy\nxKr6QFW9qaqOnHGZAIBhElcAwC430+Yk6QKJtZyW5A2ttWuq6tlJTk7yqPGJ9u7de/3rhYWFLCws\nbFIRAYD1WlxczOLi4lavVlwBADvQeuKKam2SeGA6VfXAJHtba4/t3/9Skutaay9eYfr9k1zcWjt0\n7PM2y3Kupmp2NVKX26ZZrm83aK3Zh5tgM79vG/l/DPE7sl2/VfNk0v/RevbVLJY5jc0ux7xs12ao\nqrTWZvoF3QlxxU4w77/DzJ+1vm+OqemM7tchnk/WU+bNPkbWs8wZXy9vahnmYZs2y2pxxaybk5yZ\n5G5VdUxV3STJDyd521jhbj/y9vgkH5lxmQCAYRJXAMAuN9PmJK21a6vqOUnekWT/JK9qrZ1bVS9M\ncmZr7bQkJ1bV8UmuTXJxkhNmWSYAYJjEFQDATJuTbBbNSZiU5iSbQ3OS6Q3hN3XWNCdR7XPeaU6y\nMfP+O8z80ZxkNjQnmZ7mJPNzHKxkO5uTAAAAAGwKSQwAAABgECQxAAAAgEGQxAAAAAAGQRIDAAAA\nGARJDAAAAGAQJDEAAACAQZDEAAAAAAZBEgMAAAAYBEkMAAAAYBAkMQAAAIBBkMQA4P+1c/exktV3\nHcffH9jluWVFm1Io7VJBhdqULRXoA4nauCyNKVWwVFO6tqZRa32IpiK1ptRERRPTpiVViSgUmwIC\ntjQx8lC7kRgXXNiFZcvDLmEbFhAMFoTW0qV8/WN+t8xe7tzdudy5M+fO+5VM7pnfOXPO73z3d858\n9zvnHEmSJKkTLGJIkiRJkqROsIghSZIkSZI6wSKGJEmSJEnqBIsYkiRJkiSpEyxiSJIkSZKkTrCI\nIUmSJEmSOsEihiRJkiRJ6gSLGJIkSZIkqRMsYkiSJEmSpE6wiCFJkiRJkjrBIoYkSZIkSeoEixiS\nJEmSJKkTLGJIkiRJkqROsIghSZIkSZI6wSKGJEmSJEnqBIsYkiRJkiSpEyxiSJIkSZKkTrCIIUmS\nJEmSOsEihiRJkiRJ6gSLGJIkSZIkqRMsYkiSJEmSpE6wiCFJkiRJkjrBIoYkSZIkSeoEixiSJEmS\nJKkTLGJIkiRJkqROsIghSZIkSZI6wSKGJEmSJEnqBIsYkiRJkiSpE0ZaxEiyLsm9SbYnOX+O+Qcm\nuarN35jktaPsj6TFs2HDhnF3oXOM2XCM1/CmIWbmFuMxDWNL6hqPSy2FSRxnIytiJNkfuBhYB5wI\n/GKSE2Yt9ivAE1V1PPAp4M9H1R9Ji2sST2iTzpgNx3gNb7nHzNxifJb72JK6yONSS2ESx9kor8Q4\nBdhRVTurajdwJXDWrGXeBVzepq8F3jHC/kiSpG4zt5AkacqNsohxNPBQ3/tdrW3OZarqOeCpJEeM\nsE+SJKm7zC0kSZpyqarRrDg5G1hXVR9q798HnFpVv9m3zFbgjKp6pL3fAZxSVf8za12j6aQkSVo0\nVZVRrn+xcgvzCkmSJt+gvGLFCLf5MHBM3/tj6P1iMnuZ1wCPJFkBHD67gAGjT4okSVInLEpuYV4h\nSVJ3jfJ2kk3A8UlWJzkAOBe4ftYy1wPr2/Q5wFdH2B9JktRt5haSJE25kV2JUVXPJfkIcAOwP3Bp\nVd2T5JPApqr6CnApcEWS7cATwHtH1R9JktRt5haSJGlkz8SQJEmSJElaTKO8neQlS7Iuyb1Jtic5\nf9z9mSRJdia5K8nmJLe1tiOS3JTk/iQ3JlnVt/xnWhzvTLJmfD1fOkn+Lslj7SFvM21DxyjJ+rb8\n/Unev9T7sVQGxOvCJLvaONuc5My+eRe0eN2bZG1f+9Qct0mOSfK1JNuS3J3kt1q742yAeWLmWJtD\nkoOS3JpkS4vXha392Na+PcmVSVa29gOTXNXaNyZ5bd+65ozjtJmGcbMQ5hV7Z14xPHOL4ZhXDM+8\nYnjLIreoqol80btMdAewGlgJbAFOGHe/JuUFPAgcMavtL4Dfb9PnAxe16XcC/9ymTwU2jrv/SxSj\n05MIFl8AAAikSURBVIE1wNaFxgg4AngAWNVeDwCrxr1vSxivTwC/O8eyJ7ZjcmU7RncAmbbjFjgS\nOKlNHwbcB5zgOFtQzBxrg2N2SPu7AtjYxs7VwHta+18Bv9amPwx8rk2fC1w5Txz3G/e+jSGWUzNu\nFhAb84q9x8i8YnFi5vl+cLzMKxYvZo6z+ePW6dxikq/EOAXYUVU7q2o3cCVw1pj7NGlmP139XcDl\nbfpy4N1t+qyZ9qq6FViV5JVL0sMxqqpbgG/Oah4mRkcCZwA3VtWTVfUkcBOwbtR9H4cB8YIXjzPo\nxeuLVbW7qnbSO2mdypQdt1X1X1W1pU0/A9wDHI3jbKB5YgaOtTlV1bfb5AH0EoUCfgq4prX3j7H+\nsXct8I42PVccTxltzyfS1IybBTKvmId5xfDMLYZjXjE884qF6XpuMclFjKOBh/re7+KFAaneQLsx\nyaYkH2ptr6yqx9r0Y8BMQnEUL47lq5emmxNnmBgd3dp3zdE+TT7SLlG8tO/yxUFxGRTHZS/Janq/\nNt2K42yf9MVsY2tyrM0hyX5JttAbSzfS+0Xtyap6vi3yMC/s+/e/O6vqOeCpJD/IlI6xOZhbDGZe\nsTCe7xfG8/1emFcMz7xi33U9t5jkIoZPHJ3f26rqZOBM4DeSnN4/s3rX+PTHcHYlcurjuw8xUu9S\nstcBJwGPAn853u5MpiSH0atM/3ZVPd0/z3E2txaza+jF7BkcawNV1fNVdRK9/ySeCvzYYq16kdbT\nJdO4z/vKvOIl8ny/zzzf74V5xfDMK4bT9dxikosYDwPH9L0/hj0rPVOtqh5tf/8b+Cd6l+481i4h\nI8mrgMfb4rNj+erWNo2GidGuOdqnahxW1ePVAH/LC5eIGa+mPfToWuCKqvpSa3aczaMvZv8wEzPH\n2t5V1VPA14C30LtkeOY7fCYm0IvLawCSrAAOr6on8HtgxtSNm31lXrFgnu+H5Pl+fuYVwzOvWLiu\n5haTXMTYBByfZHWSA+g9ROT6MfdpIiQ5JMnL2vShwFpgK734rG+LrQdmTnzXA+9vy59G71Khx5hO\nw8boRmBtklVJfgD4GeCGpe3y+LQvyhk/R2+cQS9e701yQJJjgeOB25iy4zZJgEuBr1fVp/tmOc4G\nGBQzx9rckvzQzCWwSQ6mNzbuoZdw/EJbbD3w5TbdP/bOAb7a1z5XHKfNVIybYZlXvCSe74fk+X4w\n84rhmVcMb1nkFjUBT0cd9KJ3SeN99B4ScsG4+zMpL+BYek+C3QLcPRMbek8ivhm4n97Ja1XfZy5u\ncbwTeNO492GJ4vRF4BHgu/Tu4/rAQmLUPre9vdaPe7+WMF4fBD4P3NVi8iV692TOLP+xFq97gTP6\n2qfmuAXeDjzfjsXN7bXOcTZ0zM50rA2M1xuAO1pctgIfb+3H0rtPejtwFbCytR9I7+ni2+ndE7x6\nb3Gcttc0jJsFxMS8Yt/iZF7x0mNmbjF/vMwrFidm5hXzx6zzuUXaxiVJkiRJkibaJN9OIkmSJEmS\n9H0WMSRJkiRJUidYxJAkSZIkSZ1gEUOSJEmSJHWCRQxJkiRJktQJFjEkSZIkSVInWMSQlrEkFyb5\nvXH3Y18kWZ/kVQPmXZbk7BFs82N906uTbF3sbUiStJyYW+x1m+YW0ohZxJCWtxp3B4bwy8BRA+YV\no9mXC0awTkmSljNzi/mZW0gjZhFDWmaS/GGS+5LcAvxoX/tJSTYmuTPJdUlWtfbjktycZEuS25O8\nLslPJvlK32cvTrK+Te9M8qdJNif5zyRrktyQZEeSX+37zEeT3Na2d2FrW53kniSXJLm7fe6gJOcA\nbwa+kOSOJAfNtWttHScn2ZBkU5J/SXJka9+Q5KIkt7b9f3trPyTJ1Um2tf3e2NZxEXBw248r6CUy\n+8/u22L+20iS1EXmFuYW0iSxiCEtI0lOBs4F3gi8E/gJXviV4fPAR6vqjcBW4BOt/QvAZ6vqJOAt\nwKNzrLr/14oCvlFVa4BbgMuAnwdOAz7Z+rEWOK6qTgHWACcnOb19/jjg4qr6ceBJ4OyqugbYBPxS\nVb2pqr4zVx+SrAQ+2z7zZuDvgT/p69f+VXUq8Dt9+/dh4Imqej3wR8DJQFXVHwD/V1Vrquo8eonM\n8bP7NlecJUmaFuYW5hbSpFkx7g5IWlSnA9e1L+rvJLkeIMnLgcOr6pa23OXAPyY5DDiqqr4MUFXf\nbcvvbTvXt79bgcOq6lvAt5I8m+RwYC2wNsnmttyh9BKMh4AHq+qu1n47sLpvvfNtOPR+/Xk9cHPr\n4/7AI33LXNf+3tG33rcBn277ty3JXQw2X98kSZpG5hY95hbShLCIIS0vxZ5f1oO+uPeWSTzHnldq\nHTxr/rPt7/N90zPvZ84rf1ZVl+yx0WT1rOW/B/RfVrkv96Zuq6q3Dpg3s+7vsef5ba+Z06zPz6xj\n9n5LkjRtzC1eWK+5hTQBvJ1EWl7+DXh3uxf0ZcDPAlTV/wLfnLmXEzgP2FBVzwC7kpwFkOTAJAcD\n3wBOTHJAu7/1pwdsb64v8AJuAD6Y5NC23qOTvGIv63gaePk8+1bAfcArkpzW1rsyyYnzfAbg34H3\ntOVPBN7QN293Eou5kiQNZm7xYuYW0hh5gEnLSFVtTnIVcCfwOHBb3+z1wF8nOQR4APhAaz8P+Jsk\nfwzsBs6pqp1JrgbuBh6kdwnlnJtkz184qvXjpiQnAP/RLs18GnjfHMvT9/6y1r9vA2+d697Vqtrd\nHtT1mXZp6QrgU8DXB/QN4HPA5Um2AfcC24Cn2rxLgLuS3A58fJ6+SZI0lcwt5lyvuYU0RqnyOJK0\nfCXZD1hZVc8m+WHgJuBHquq5MXdNkiR1kLmFNF5eiSFpuTsU+Nf29PEAv26SIUmSXgJzC2mMvBJD\nkiRJkiR1gg/2lCRJkiRJnWARQ5IkSZIkdYJFDEmSJEmS1AkWMSRJkiRJUidYxJAkSZIkSZ3w/zmp\nz/ZI/D4SAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "best_model_accuracy = 0\n", "optimum_slope = 0\n", "\n", "w = 2\n", "h = 2\n", "f, axarr = py.subplots(h, w, figsize=(15, 7))\n", "\n", "it = 0\n", "for slope in [1, 0.2]:\n", " params = {\"pivot\": 10, \"slope\": slope}\n", "\n", " model_accuracy, doc_scores = get_tfidf_scores(params)\n", "\n", " if model_accuracy > best_model_accuracy:\n", " best_model_accuracy = model_accuracy\n", " optimum_slope = slope\n", "\n", " doc_scores, doc_leng = sort_length_by_score(doc_scores, X_test)\n", "\n", " y = abs(doc_scores[:k, np.newaxis])\n", " x = doc_leng[:k, np.newaxis]\n", "\n", " py.subplot(1, 2, it+1).bar(x, y, linewidth=10.)\n", " py.title(\"slope = \" + str(slope) + \" Model accuracy = \" + str(model_accuracy))\n", " py.ylim([0, 4.5])\n", " py.xlim([0, 3200])\n", " py.xlabel(\"document length\")\n", " py.ylabel(\"confidence score\")\n", " \n", " it += 1\n", "\n", "py.tight_layout()\n", "py.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above histogram plot helps us visualize the effect of `slope`. For top k documents we have document length on the x axis and their respective scores of belonging to a specific class on y axis. \n", "As we decrease the slope the density of bins is shifted from low document length (around ~250-500) to over ~500 document length. This suggests that the positive biasness which was seen at `slope=1` (or when regular tfidf was used) for short documents is now reduced. We get the optimum slope or the max model accuracy when slope is 0.2." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 1 }