{ "cells": [ { "cell_type": "markdown", "id": "d0c39104", "metadata": {}, "source": [ "# Inverted Indexes and Searching" ] }, { "cell_type": "code", "execution_count": 16, "id": "0e14bd7d", "metadata": { "collapsed": false }, "outputs": [], "source": [ "from nltk.corpus import brown\n", "import re" ] }, { "cell_type": "markdown", "id": "4e335090", "metadata": {}, "source": [ "## Building an Inverted Index" ] }, { "cell_type": "markdown", "id": "565ac13d", "metadata": {}, "source": [ "We start off with a collection of documents. These have already been tokenized into words for us." ] }, { "cell_type": "code", "execution_count": 137, "id": "335a0603", "metadata": { "collapsed": false }, "outputs": [], "source": [ "docs = list(brown.files())[:500]\n", "texts = [None]*len(docs)\n", "for i,doc in enumerate(docs):\n", " with brown.open(doc) as stream:\n", " texts[i] = brown.words(fileids=[doc])" ] }, { "cell_type": "markdown", "id": "10a7ff8c", "metadata": {}, "source": [ "Next, we apply linguistic preprocessing in order to remove variations that are of no interest to retrieval. We already talked about more sophisticated methods when we talked about stemming." ] }, { "cell_type": "code", "execution_count": 138, "id": "94e6efb8", "metadata": { "collapsed": false }, "outputs": [], "source": [ "for i,t in enumerate(texts):\n", " texts[i] = [re.sub('(s|ing)$','',w.lower()) for w in t]" ] }, { "cell_type": "markdown", "id": "4dabebdb", "metadata": {}, "source": [ "Now we build a dictionary: a sorted list of unique tokens, together with a way of mapping tokens to integers." ] }, { "cell_type": "code", "execution_count": 139, "id": "ce974e9c", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "43259 ['', '!', '$.027'] ['zwei', 'zworykin', '{0,t}']\n" ] } ], "source": [ "dictionary = sorted(list(set([w for t in texts for w in t])))\n", "print len(dictionary),dictionary[:3],dictionary[-3:]" ] }, { "cell_type": "code", "execution_count": 140, "id": "8c998c90", "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "15435" ] }, "execution_count": 140, "metadata": {}, "output_type": "execute_result" } ], "source": [ "wids = {w:i for i,w in enumerate(dictionary)}\n", "wids[\"find\"]" ] }, { "cell_type": "markdown", "id": "d3dedf42", "metadata": {}, "source": [ "Given the dictionary, we now add each document to the set of postings for each of the terms contained in the document." ] }, { "cell_type": "code", "execution_count": 141, "id": "12acbb25", "metadata": { "collapsed": false }, "outputs": [], "source": [ "psets = [set() for w in dictionary]\n", "for d,text in enumerate(texts):\n", " for w in text:\n", " psets[wids[w]].add(d)" ] }, { "cell_type": "markdown", "id": "c7f0dec4", "metadata": {}, "source": [ "The inverted index is the sorted list of postings for each token in the dictionary.\n", "\n", "(Of course, in practice, we wouldn't use the `set` data structure and then create the list\n", "out of it, we'd likely use the inverted index data structure itself during the building.)" ] }, { "cell_type": "code", "execution_count": 144, "id": "2780be5f", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15435\n", "[0, 3, 5, 7, 8, 20, 21, 23, 25, 26, 27, 29, 31, 33, 40, 42, 43, 45, 47, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 65, 66, 68, 69, 70, 71, 73, 75, 76, 77, 83, 85, 87, 88, 90, 91, 92, 94, 95, 96, 98, 99, 100, 102, 104, 105, 106, 107, 108, 110, 111, 114, 116, 118, 121, 123, 124, 125, 126, 127, 128, 130, 132, 134, 135, 136, 137, 139, 140, 143, 144, 145, 146, 147, 148, 152, 153, 154, 156, 158, 162, 163, 164, 166, 169, 170, 171, 173, 175, 178, 179, 180, 181, 182, 183, 185, 186, 187, 189, 193, 197, 200, 201, 202, 205, 206, 207, 209, 210, 211, 212, 213, 214, 215, 216, 217, 219, 220, 221, 222, 224, 225, 226, 228, 230, 231, 232, 234, 235, 238, 239, 240, 242, 243, 244, 245, 248, 249, 250, 252, 255, 256, 257, 258, 262, 263, 269, 272, 275, 278, 279, 280, 281, 282, 287, 288, 293, 297, 301, 303, 305, 307, 311, 312, 313, 315, 317, 320, 322, 323, 324, 327, 328, 329, 331, 334, 335, 336, 339, 340, 343, 344, 345, 346, 347, 348, 350, 351, 353, 354, 356, 359, 362, 366, 367, 372, 375, 376, 377, 378, 380, 381, 383, 386, 387, 388, 391, 392, 394, 395, 397, 398, 399, 404, 405, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 420, 421, 422, 423, 424, 426, 428, 429, 430, 433, 434, 435, 437, 438, 439, 444, 445, 446, 447, 450, 451, 452, 454, 460, 463, 465, 466, 467, 468, 469, 472, 473, 475, 476, 477, 478, 479, 480, 482, 483, 484, 488, 491, 492, 494, 495, 496, 497, 498]\n", "find\n" ] } ], "source": [ "invindex = [sorted(list(s)) for s in psets]\n", "wid = wids[\"find\"]\n", "print wid\n", "print invindex[wid]\n", "print dictionary[wid]" ] }, { "cell_type": "markdown", "id": "0438d030", "metadata": {}, "source": [ "# Compression" ] }, { "cell_type": "markdown", "id": "d72ff8ba", "metadata": {}, "source": [ "Inverted indexes can be compressed greatly by taking advantage of their statistics.\n", "\n", "For example, by default, the set of document indexes just forms a fairly uniform distribution over the range of term indexes." ] }, { "cell_type": "code", "execution_count": 149, "id": "227bdd7e", "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(array([20, 32, 33, 26, 36, 21, 28, 26, 34, 30]),\n", " array([ 0. , 49.8, 99.6, 149.4, 199.2, 249. , 298.8, 348.6,\n", " 398.4, 448.2, 498. ]),\n", " )" ] }, "execution_count": 149, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlQAAAHcCAYAAAAQkzQBAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAG61JREFUeJzt3W9s3nW9//HXV8b5mYXBtkCv7cdMSpABG2OtEucNp8XR\ncQxuDkmIaGYjwxgSbyBGDYkk9SQHhsoNwTvEcE6KN+QsJszF4AL+KYImLuoWjRj/bmHOtjq7wcaf\njG3f3w1m/e0MutFPu/b69vFIrliutt9+3nza8vT6Xv1eVV3XdQAAmLC3TPcCAADanaACACgkqAAA\nCgkqAIBCggoAoJCgAgAodEZBdezYsXR3d2fdunVJktHR0fT29mbp0qVZu3ZtDh48OKWLBACYyc4o\nqL72ta9l2bJlqaoqSbJ58+b09vbm97//fdasWZPNmzdP6SIBAGa0+jT27t1br1mzpv7hD39Yf/CD\nH6zruq4vv/zyenh4uK7ruh4aGqovv/zyUz4viZubm5ubm5tb29xKzMlpfOYzn8lXvvKVvPDCC2P3\njYyMpNVqJUlarVZGRkZe93NdhL099ff3p7+/f7qXwQTZv/Zm/9qXvWtv/zwLN1HjnvL77ne/m46O\njnR3d79hHFVVVbwIAIB2Nu4jVD/96U+zbdu2PP7443nllVfywgsvZOPGjWm1WhkeHs6iRYsyNDSU\njo6Os7VeAIAZZ9xHqO65557s3bs3u3fvzqOPPpr3v//9+eY3v5n169dnYGAgSTIwMJANGzaclcVy\ndvT09Ez3Eihg/9qb/Wtf9m52q+ozfKLTU089lfvvvz/btm3L6Ohobr755jz33HPp7OzMli1bMn/+\n/JMPXFWeQwUAtIXSbjnjoHrTBxZUAECbKO0WV0oHACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJ\nKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJ\nKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJ\nKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACs2Z7gUAk+f88xfm0KED072MKTNv\n3oK88MLodC8D4BRVXdf1lBy4qjJFhwbeQFVVSZr8c+f3CjA1SrvFKT8AgEKCCgCgkKACACgkqAAA\nCgkqAIBCggoAoNC4QfXKK69k1apV6erqyrJly3LXXXclSfr7+7NkyZJ0d3enu7s727dvPyuLBQCY\niU57HaqXXnopc+fOzdGjR/Oe97wnX/3qV/ODH/wg8+bNy5133vnGB3YdKjjrXIcKYGKm/DpUc+fO\nTZIcOXIkx44dy4IFC5LELzUAgBNO+9Izx48fzzve8Y786U9/yu23357ly5fn29/+dh588ME88sgj\nueaaa3L//fdn/vz5p3xuf3//2Ns9PT3p6emZzLUDAEzI4OBgBgcHJ+14Z/zSM88//3yuv/76bN68\nOcuWLctFF12UJLn77rszNDSUhx9++OQDO+UHZ51TfgATc9ZeeuaCCy7IDTfckJ///Ofp6OhIVVWp\nqiq33XZbduzYMeEFAAC0u3GDav/+/Tl48GCS5OWXX86TTz6Z7u7uDA8Pj33MY489lhUrVkztKgEA\nZrBxn0M1NDSUvr6+HD9+PMePH8/GjRuzZs2afPzjH8+uXbtSVVUuueSSPPTQQ2drvQAAM84ZP4fq\nTR/Yc6jgrPMcKoCJOWvPoQIA4PUJKgCAQoIKAKCQoAIAKCSoAAAKnfalZwCAN3b++Qtz6NCB6V7G\nlJk3b0FeeGF0upcx47lsAjSIyybA2efnrhlcNgEAYJoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKC\nCgCgkKACACgkqAAACgkqAIBCXsuPUzT5dam8JhUAU8Fr+XGKZr8uVbO/L5u9d0nT94/25OeuGbyW\nHwDANBNUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAhQQUAUEhQAQAU8tIzE9Dkl2YBAN48Lz0z\nAbPhZQaaO19zvy+T2fG92eT9oz35uWsGLz0DADDNBBUAQCFBBQBQSFABABQSVAAAhQQVAEAhQQUA\nUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhcYNqldeeSWr\nVq1KV1dXli1blrvuuitJMjo6mt7e3ixdujRr167NwYMHz8piAQBmoqqu63q8D3jppZcyd+7cHD16\nNO95z3vy1a9+Ndu2bcuFF16Yz3/+87nvvvty4MCBbN68+eQDV1VOc+i2VVVVkmbO9pomz9fc78tk\ndnxvNnn/aE9+7pqhtFtOe8pv7ty5SZIjR47k2LFjWbBgQbZt25a+vr4kSV9fX7Zu3TrhBQAAtLs5\np/uA48eP5x3veEf+9Kc/5fbbb8/y5cszMjKSVquVJGm1WhkZGXndz+3v7x97u6enJz09PZOyaADg\nbJlz4lG4Zvq3f3tr7rrrC8XHOe0pv396/vnnc/311+fee+/Nhz/84Rw4cGDsfQsXLszo6OjJB3bK\nr401eb7mfl8ms+N7s8n7R3uaDT93TZ+vruupP+X3TxdccEFuuOGG/OIXv0ir1crw8HCSZGhoKB0d\nHRNeAABAuxs3qPbv3z/2F3wvv/xynnzyyXR3d2f9+vUZGBhIkgwMDGTDhg1Tv1IAgBlq3OdQDQ0N\npa+vL8ePH8/x48ezcePGrFmzJt3d3bn55pvz8MMPp7OzM1u2bDlb6wUAmHHO+DlUb/rAnkPVxpo8\nX3O/L5PZ8b3Z5P2jPc2Gn7umz3dWn0MFAMDrE1QAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFB\nBQBQSFABABQSVAAAhQQVAEAhQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBoznQvAIDk/PMX5tChA9O9\njCkzb96CvPDC6HQvA6ZMVdd1PSUHrqpM0aGnXVVVSZo522uaPF9zvy+T2fG92dT9s3ftazbsXdPn\nq+u6uFuc8gMAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJKgCA\nQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoNGe6FwBn15xU\nVTXdi4BZyM8ezSaomGWOJqmnexFTyH+wmKma/LPn5w6n/AAAigkqAIBCggoAoJCgAgAoJKgAAAoJ\nKgCAQuMG1d69e3Pttddm+fLlueqqq/LAAw8kSfr7+7NkyZJ0d3enu7s727dvPyuLBQCYiaq6rt/w\nwiDDw8MZHh5OV1dXDh8+nHe+853ZunVrtmzZknnz5uXOO+984wNXVcY5dFt77eJ0zZztNU2er8mz\nJbNhPr9X2lWT52vybMlsmK+u6+JuGffCnosWLcqiRYuSJOedd16uvPLK7Nu3L0ka+0sNAODNOuMr\npe/Zsyc7d+7Mu9/97vzkJz/Jgw8+mEceeSTXXHNN7r///syfP/+Uz+nv7x97u6enJz09PZOxZgCA\nQoMnbif3ykSNe8rvnw4fPpyenp588YtfzIYNG/K3v/0tF110UZLk7rvvztDQUB5++OGTD+yUXxtr\n8nxNni2ZDfP5vdKumjxfk2dLZsN8k3HK77RB9eqrr+aDH/xgPvCBD+SOO+445f179uzJunXr8utf\n//rkAwuqNtbk+Zo8WzIb5vN7pV01eb4mz5bMhvkmI6jG/Su/uq6zadOmLFu27KSYGhoaGnv7scce\ny4oVKya8AACAdjfuI1TPPPNM3vve9+bqq68+8f+eknvuuSff+ta3smvXrlRVlUsuuSQPPfRQWq3W\nyQf2CFUba/J8TZ4tmQ3z+b3Srpo8X5NnS2bDfGfllN+EDyyo2liT52vybMlsmM/vlXbV5PmaPFsy\nG+ab8lN+AACcnqACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAK\nCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAK\nCSoAgEKCCgCgkKACACgkqAAACgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAK\nCSoAgEKCCgCgkKACACg0Z7oXAHDm5qSqquleBMApBBXQRo4mqad7EVNEKEI7c8oPAKCQoAIAKCSo\nAAAKCSoAgEKCCgCgkKACACgkqAAACo0bVHv37s21116b5cuX56qrrsoDDzyQJBkdHU1vb2+WLl2a\ntWvX5uDBg2dlsQAAM1FV1/UbXiVveHg4w8PD6erqyuHDh/POd74zW7duzX//93/nwgsvzOc///nc\nd999OXDgQDZv3nzygasq4xy6rb12peZmzvaaJs/X5NkS87WzJs+WNHu+Js+WzIb56rou7pZxH6Fa\ntGhRurq6kiTnnXderrzyyuzbty/btm1LX19fkqSvry9bt26d8AIAANrdGb/0zJ49e7Jz586sWrUq\nIyMjabVaSZJWq5WRkZHX/Zy3vnXev77QnH/LnDn/p3C5AACTYfDELenv7y8+2rin/P7p8OHDed/7\n3pe77747GzZsyIIFC3LgwIGx9y9cuDCjo6MnH7iqkjyaZHXxImeei9P0hz+bO1+TZ0vM186aPFvS\n7PmaPFsyG+abjFN+p32E6tVXX81NN92UjRs3ZsOGDUlee1RqeHg4ixYtytDQUDo6Ot7gsy9M8n8n\nvDgAgHYw7nOo6rrOpk2bsmzZstxxxx1j969fvz4DAwNJkoGBgbHQAgCYjcY95ffMM8/kve99b66+\n+uoTp/CSe++9N+9617ty880357nnnktnZ2e2bNmS+fPnn3zgqkry/SRrpnL906T5D382d74mz5aY\nr501ebak2fM1ebZkNsw3Gaf8zug5VBM6sKBqY02er8mzJeZrZ02eLWn2fE2eLZkN8035ZRMAADg9\nQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAh\nQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAh\nQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAh\nQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEAFBJUAACFBBUAQCFBBQBQSFABABQ6bVDdeuutabVa\nWbFixdh9/f39WbJkSbq7u9Pd3Z3t27dP6SIBAGay0wbVJz7xiVOCqaqq3Hnnndm5c2d27tyZf//3\nf5+yBQIAzHRzTvcBq1evzp49e065v67rMzj8I0mePvF2z4kbAMB0Gzxxe+3MW6kJP4fqwQcfzMqV\nK7Np06YcPHjwDT7q40n6T9x6JvqlAAAmWU9e65NpDKrbb789u3fvzq5du7J48eJ89rOfLV4IAEC7\nmlBQdXR0pKqqVFWV2267LTt27JjsdQEAtI0JBdXQ0NDY24899thJfwEIADDbnPZJ6bfcckueeuqp\n7N+/P29729vypS99KYODg9m1a1eqqsoll1yShx566GysFQBgRqrqM/tzvTd/4KpK8v0ka6bi8NOs\nSjIl/9pmiCbP1+TZEvO1sybPljR7vibPlsyG+eq6TlVVZ3gFg9fnSukAAIUEFQBAIUEFAFBIUAEA\nFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAhQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEA\nFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAhQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEA\nFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEAhQQUAUEhQAQAUElQAAIUEFQBAIUEFAFBIUAEA\nFBJUAACFBBUAQCFBBQBQSFABABQSVAAAhQQVAEChcYPq1ltvTavVyooVK8buGx0dTW9vb5YuXZq1\na9fm4MGDU75IAICZbNyg+sQnPpHt27efdN/mzZvT29ub3//+91mzZk02b948pQsEAJjpxg2q1atX\nZ8GCBSfdt23btvT19SVJ+vr6snXr1qlbHQBAG5jzZj9hZGQkrVYrSdJqtTIyMjLORz+S5OkTb/ec\nuAEATLfBE7ekv7+/+GhVXdf1eB+wZ8+erFu3Lr/+9a+TJAsWLMiBAwfG3r9w4cKMjo6eeuCqSvL9\nJGuKFznzVEnG/dfW5po8X5NnS8zXzpo8W9Ls+Zo8WzIb5qvrOlX12v9O1Jv+K79Wq5Xh4eEkydDQ\nUDo6Oib8xQEAmuBNB9X69eszMDCQJBkYGMiGDRsmfVEAAO1k3FN+t9xyS5566qns378/rVYr//Ef\n/5EPfehDufnmm/Pcc8+ls7MzW7Zsyfz58089sFN+bazJ8zV5tsR87azJsyXNnq/JsyWzYb7JOOV3\n2udQTfjAgqqNNXm+Js+WmK+dNXm2pNnzNXm2ZDbMNy3PoQIA4GSCCgCgkKACACgkqAAACgkqAIBC\nggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBC\nggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBC\nggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAACgkqAIBC\nggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKDSn5JM7Oztz/vnn55xzzsm5556bHTt2TNa6AADa\nRlFQVVWVwcHBLFy4cLLWAwDQdopP+dV1PRnrAABoW8WPUF133XU555xz8qlPfSqf/OQn/9dHPJLk\n6RNv95y4AQBMt8ETt6S/v7/4aFVd8BDT0NBQFi9enL///e/p7e3Ngw8+mNWrV7924KpK8v0ka4oX\nOfNUSZr8yFyT52vybIn52lmTZ0uaPV+TZ0tmw3x1XaeqqqKzbkWn/BYvXpwkueiii3LjjTd6UjoA\nMCtNOKheeumlHDp0KEny4osv5oknnsiKFSsmbWEAAO1iws+hGhkZyY033pgkOXr0aD72sY9l7dq1\nk7YwAIB2UfQcqnEP7DlUbazJ8zV5tsR87azJsyXNnq/JsyWzYb5pfw4VAACCCgCgmKACACgkqAAA\nCgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAA\nCgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAA\nCgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoAIAKCSoAAAKCSoAgEKCCgCgkKACACgkqAAA\nCgkqAIBCggoAoJCgAgAoJKgAAAoJKgCAQoIKAKCQoILGGZzuBVBkcLoXwIQNTvcCmEYTDqrt27fn\niiuuyGWXXZb77rtvMtcEFBmc7gVQZHC6F8CEDU73AphGEwqqY8eO5dOf/nS2b9+eZ599Nt/61rfy\n29/+drLXBgDQFiYUVDt27Mjb3/72dHZ25txzz81HPvKRfOc735nstQEAtIU5E/mkffv25W1ve9vY\nPy9ZsiQ/+9nPXucjr5voutpANd0LmGJNnq/JsyXJl07cmqrJ+/fP2Zq6f/aufTV575KqKp9vQkF1\nJl+4ruuJHBoAoO1M6JTfxRdfnL1794798969e7NkyZJJWxQAQDuZUFBdc801+cMf/pA9e/bkyJEj\n+Z//+Z+sX79+stcGANAWJnTKb86cOfn617+e66+/PseOHcumTZty5ZVXTvbaAADawoSvQ/WBD3wg\nv/vd7/LHP/4xd91110nvc42qme3WW29Nq9XKihUrxu4bHR1Nb29vli5dmrVr1+bgwYNj77v33ntz\n2WWX5YorrsgTTzwxHUvm/7N3795ce+21Wb58ea666qo88MADSexhO3jllVeyatWqdHV1ZdmyZWO/\nO+1dezl27Fi6u7uzbt26JPavXXR2dubqq69Od3d33vWudyWZ5L2rJ9nRo0frSy+9tN69e3d95MiR\neuXKlfWzzz472V+GAj/+8Y/rX/7yl/VVV101dt/nPve5+r777qvruq43b95cf+ELX6jruq5/85vf\n1CtXrqyPHDlS7969u7700kvrY8eOTcu6ec3Q0FC9c+fOuq7r+tChQ/XSpUvrZ5991h62iRdffLGu\n67p+9dVX61WrVtVPP/20vWsz999/f/3Rj360XrduXV3Xfn+2i87Ozvof//jHSfdN5t5N+kvPuEbV\nzLd69eosWLDgpPu2bduWvr6+JElfX1+2bt2aJPnOd76TW265Jeeee246Ozvz9re/PTt27Djra+Zf\nFi1alK6uriTJeeedlyuvvDL79u2zh21i7ty5SZIjR47k2LFjWbBggb1rI3/5y1/y+OOP57bbbhv7\na3b71z7q/3UFgsncu0kPqte7RtW+ffsm+8swyUZGRtJqtZIkrVYrIyMjSZK//vWvJ/0Fp/2cWfbs\n2ZOdO3dm1apV9rBNHD9+PF1dXWm1WmOnbu1d+/jMZz6Tr3zlK3nLW/71n0/71x6qqsp1112Xa665\nJt/4xjeSTO7eTehJ6adbMO2tqqpx99EezwyHDx/OTTfdlK997WuZN2/eSe+zhzPXW97yluzatSvP\nP/98rr/++vzoRz866f32bub67ne/m46OjnR3d2dwcPB1P8b+zVw/+clPsnjx4vz9739Pb29vrrji\nipPeX7p3k/4IlWtUtadWq5Xh4eEkydDQUDo6OpKcup9/+ctfcvHFF0/LGvmXV199NTfddFM2btyY\nDRs2JLGH7eaCCy7IDTfckF/84hf2rk389Kc/zbZt23LJJZfklltuyQ9/+MNs3LjR/rWJxYsXJ0ku\nuuii3HjjjdmxY8ek7t2kB5VrVLWn9evXZ2BgIEkyMDAw9h/p9evX59FHH82RI0eye/fu/OEPfxj7\n6wimR13X2bRpU5YtW5Y77rhj7H57OPPt379/7K+IXn755Tz55JPp7u62d23innvuyd69e7N79+48\n+uijef/7359vfvOb9q8NvPTSSzl06FCS5MUXX8wTTzyRFStWTO7eTcET6evHH3+8Xrp0aX3ppZfW\n99xzz1R8CQp85CMfqRcvXlyfe+659ZIlS+r/+q//qv/xj3/Ua9asqS+77LK6t7e3PnDgwNjH/+d/\n/md96aWX1pdffnm9ffv2aVw5dV3XTz/9dF1VVb1y5cq6q6ur7urqqr/3ve/Zwzbwq1/9qu7u7q5X\nrlxZr1ixov7yl79c13Vt79rQ4ODg2F/52b+Z789//nO9cuXKeuXKlfXy5cvH2mQy966qay+6BwBQ\nYtJP+QEAzDaCCgCgkKACACgkqAAACgkqAIBCggoAoND/A+ZEAtlUqRnTAAAAAElFTkSuQmCC\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "l = [0]+invindex[wid]\n", "hist(l)" ] }, { "cell_type": "markdown", "id": "225b6796", "metadata": {}, "source": [ "But if the document ids have been assigned randomly, their deltas are Poisson distributed, and we get a distribution that compresses much better than the uniform distribution." ] }, { "cell_type": "code", "execution_count": 150, "id": "65cfbd0f", "metadata": { "collapsed": false }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlYAAAHcCAYAAAAUZuQ8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH3dJREFUeJzt3X9sXfV9//HXDTYCLXSEltysdrZMS6zEwYQUCBQtm1tw\nUJGIMhplM1uJElj/YKtK13WDSpMSpGK3bCrQLpNWFeq20wL7h0QVs4DBzbpRCCjlhwirM5oI55c1\nFgLJ0uLg+PtHU3dZWPa184mv7Twe0v3jnnvvOe97SMgz93N9UhkeHh4OAACnbVq9BwAAmCqEFQBA\nIcIKAKAQYQUAUIiwAgAoRFgBABRyyrBau3ZtqtVq2traRrZt3bo1S5YsyeLFi3PllVfm+eefH3ms\nq6sr8+bNy/z58/P444+fuakBACagyqmuY/X9738/06dPzy233JJXXnklSdLe3p677ror119/ff7x\nH/8xX/nKV/L0009n+/btufnmm/P8889nz549ue6669LX15dp03woBgCcHRpO9eDSpUuza9euE7b9\nyq/8St5+++0kycGDB9PU1JQk2bRpUzo7O9PY2Jg5c+Zk7ty52bp1a66++uqR11YqlcLjAwCcOaO9\njvopw+r9dHd35zd/8zfzp3/6pzl27Fh+8IMfJEn27t17QkQ1Nzdnz549pz0gp2fdunVZt25dvcc4\nqzjn4885H3/O+fhzzsffWD4QGvU63a233poHHnggb7zxRr761a9m7dq1RQcCAJisRh1WW7duze/8\nzu8kSVauXJmtW7cmSZqamtLf3z/yvN27d48sEwIAnA1GHVZz587Nli1bkiRPPfVUWlpakiTLly/P\nxo0bMzg4mJ07d2bHjh1ZsmRJ2WkZtfb29nqPcNZxzsefcz7+nPPx55xPDqf8qcDOzs5s2bIlb775\nZqrVau6+++60tbXlj/7oj/Luu+/m/PPPz4YNG7J48eIkyT333JMHH3wwDQ0Nuf/++3P99defeLBK\nxXesAIBJYSzdcsqwKk1YAQCTxVi6xUWmAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChE\nWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCA\nQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhDeN9wH/4h3/I4cOH\nx/uw46a1tTVXXXVVvccAAOqgMjw8PDxuB6tUMm3aOTn//E+N1yHH1dDQvixY8NNs21ar9ygAwGmq\nVCoZbSaN+ydW55xzfv7rvx4a78OOk1qOHVtX7yEAgDrxHSsAgEKEFQBAIcIKAKAQYQUAUIiwAgAo\nRFgBABQirAAAChFWAACFCCsAgEJOGVZr165NtVpNW1vbCdu/9rWvZcGCBbnkkkvy53/+5yPbu7q6\nMm/evMyfPz+PP/74mZkYAGCCOuU/abNmzZp85jOfyS233DKy7emnn87mzZvz8ssvp7GxMf/xH/+R\nJNm+fXsefvjhbN++PXv27Ml1112Xvr6+TJvmQzEA4OxwyupZunRpZsyYccK2v/mbv8ldd92VxsbG\nJMnFF1+cJNm0aVM6OzvT2NiYOXPmZO7cudm6desZGhsAYOIZ9T/CvGPHjvzzP/9zvvjFL+a8887L\nX/7lX+aKK67I3r17c/XVV488r7m5OXv27Dnp9UNDg0nWHb/XfvwGAFBftVottVrttPYx6rB67733\n8tZbb+XZZ5/N888/n1WrVuXHP/7x+z63UqmctO2cc87NsWPrRj0oAMCZ1N7envb29pH769evH/U+\nRv0FqObm5tx0001JkiuvvDLTpk3Lm2++maampvT39488b/fu3Wlqahr1QAAAk9Wow2rFihV56qmn\nkiR9fX0ZHBzMhz70oSxfvjwbN27M4OBgdu7cmR07dmTJkiXFBwYAmKhOuRTY2dmZLVu25D//8z8z\ne/bs3H333Vm7dm3Wrl2btra2nHvuufn2t7+dJGltbc2qVavS2tqahoaGbNiw4X2XAgEApqrK8PDw\n8LgdrFJJY+P0HD16aLwOOc5qWbRoXV58sVbvQQCA01SpVDLaTHKRKQCAQoQVAEAhwgoAoBBhBQBQ\niLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYA\nAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBh\nBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAK\nEVYAAIUIKwCAQk4ZVmvXrk21Wk1bW9tJj/3VX/1Vpk2blgMHDoxs6+rqyrx58zJ//vw8/vjj5acF\nAJjAThlWa9asSW9v70nb+/v788QTT+TXfu3XRrZt3749Dz/8cLZv357e3t7cfvvtOXbsWPmJAQAm\nqFOG1dKlSzNjxoyTtv/Jn/xJvvKVr5ywbdOmTens7ExjY2PmzJmTuXPnZuvWrWWnBQCYwBpG+4JN\nmzalubk5l1566Qnb9+7dm6uvvnrkfnNzc/bs2XPS64eGBpOsO36v/fgNAKC+arVaarXaae1jVGF1\n5MiR3HPPPXniiSdGtg0PD/+vz69UKidtO+ecc3Ps2LrRHBYA4Ixrb29Pe3v7yP3169ePeh+jCqvX\nX389u3btyqJFi5Iku3fvzuWXX57nnnsuTU1N6e/vH3nu7t2709TUNOqBAAAmq1FdbqGtrS0DAwPZ\nuXNndu7cmebm5mzbti3VajXLly/Pxo0bMzg4mJ07d2bHjh1ZsmTJmZobAGDCOWVYdXZ25pprrklf\nX19mz56dhx566ITH//tSX2tra1atWpXW1tZ84hOfyIYNG953KRAAYKqqDJ/qS1KlD1appLFxeo4e\nPTRehxxntSxatC4vvlir9yAAwGmqVCqn/C75+3HldQCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEA\nFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQV\nAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChE\nWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIWcMqzW\nrl2barWatra2kW1f+MIXsmDBgixatCg33XRT3n777ZHHurq6Mm/evMyfPz+PP/74mZsaAGACOmVY\nrVmzJr29vSdsW7ZsWV599dW89NJLaWlpSVdXV5Jk+/btefjhh7N9+/b09vbm9ttvz7Fjx87c5AAA\nE8wpw2rp0qWZMWPGCds6OjoybdrPXnbVVVdl9+7dSZJNmzals7MzjY2NmTNnTubOnZutW7eeobEB\nACaehtN58YMPPpjOzs4kyd69e3P11VePPNbc3Jw9e/ac9JqhocEk647faz9+AwCor1qtllqtdlr7\nGHNYfelLX8q5556bm2+++X99TqVSOWnbOeecm2PH1o31sAAAZ0R7e3va29tH7q9fv37U+xhTWH3r\nW9/KY489ln/6p38a2dbU1JT+/v6R+7t3705TU9NYdg8AMCmN+nILvb29uffee7Np06acd955I9uX\nL1+ejRs3ZnBwMDt37syOHTuyZMmSosMCAExkp/zEqrOzM1u2bMmbb76Z2bNnZ/369enq6srg4GA6\nOjqSJB/96EezYcOGtLa2ZtWqVWltbU1DQ0M2bNjwvkuBAABTVWV4eHh43A5WqaSxcXqOHj00Xocc\nZ7UsWrQuL75Yq/cgAMBpqlQqGW0mufI6AEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUI\nKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQ\niLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYA\nAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhpwyrtWvX\nplqtpq2tbWTbgQMH0tHRkZaWlixbtiwHDx4ceayrqyvz5s3L/Pnz8/jjj5+5qQEAJqBThtWaNWvS\n29t7wrbu7u50dHSkr68v1157bbq7u5Mk27dvz8MPP5zt27ent7c3t99+e44dO3bmJgcAmGBOGVZL\nly7NjBkzTti2efPmrF69OkmyevXqPProo0mSTZs2pbOzM42NjZkzZ07mzp2brVu3nqGxAQAmnobR\nvmBgYCDVajVJUq1WMzAwkCTZu3dvrr766pHnNTc3Z8+ePSe9fmhoMMm64/faj98AAOqrVqulVqud\n1j5GHVb/XaVSSaVSOeXj/9M555ybY8fWnc5hAQCKa29vT3t7+8j99evXj3ofo/6pwGq1mv379ydJ\n9u3bl5kzZyZJmpqa0t/fP/K83bt3p6mpadQDAQBMVqMOq+XLl6enpydJ0tPTkxUrVoxs37hxYwYH\nB7Nz587s2LEjS5YsKTstAMAEdsqlwM7OzmzZsiVvvvlmZs+enbvvvjt33nlnVq1alW9+85uZM2dO\nHnnkkSRJa2trVq1aldbW1jQ0NGTDhg2nXCYEAJhqKsPDw8PjdrBKJY2N03P06KHxOuQ4q2XRonV5\n8cVavQcBAE5TpVLJaDPJldcBAAoRVgAAhQgrAIBChBUAQCHCCgCgEGEFAFCIsAIAKERYAQAUIqwA\nAAoRVgAAhQgrAIBChBUAQCHCCgCgEGEFAFCIsAIAKERYAQAUIqwAAAoRVgAAhQgrAIBChBUAQCHC\nCgCgEGEFAFCIsAIAKERYAQAUIqwAAAoRVgAAhQgrAIBChBUAQCHCCgCgEGEFAFCIsAIAKERYAQAU\nIqwAAAoRVgAAhQgrAIBChBUAQCHCCgCgEGEFAFCIsAIAKERYAQAUIqwAAAoZc1h1dXVl4cKFaWtr\ny80335x33303Bw4cSEdHR1paWrJs2bIcPHiw5KwAABPamMJq165d+cY3vpFt27bllVdeydDQUDZu\n3Jju7u50dHSkr68v1157bbq7u0vPCwAwYTWM5UUf+MAH0tjYmCNHjuScc87JkSNH8uEPfzhdXV3Z\nsmVLkmT16tVpb28/Ka6GhgaTrDt+r/34DQCgvmq1Wmq12mntozI8PDw8lhf+7d/+bT7/+c/n/PPP\nz/XXX5/vfOc7mTFjRt56660kyfDwcC666KKR+0lSqVTS2Dg9R48eOq2hJ65aFi1alxdfrNV7EADg\nNFUqlYw2k8a0FPj666/nvvvuy65du7J3794cPnw43/3ud08aplKpjGX3AACT0pjC6oUXXsg111yT\nD37wg2loaMhNN92UH/zgB5k1a1b279+fJNm3b19mzpxZdFgAgIlsTGE1f/78PPvss/nJT36S4eHh\nPPnkk2ltbc2NN96Ynp6eJElPT09WrFhRdFgAgIlsTF9eX7RoUW655ZZcccUVmTZtWj7ykY/k05/+\ndA4dOpRVq1blm9/8ZubMmZNHHnmk9LwAABPWmL+8PqaD+fI6ADBJjNuX1wEAOJmwAgAoRFgBABQi\nrAAAChFWAACFCCsAgEKEFQBAIcIKAKAQYQUAUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBA\nIcIKAKAQYQUAUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBAIZXh4eHhcTtYpZLGxuk5evTQ\neB1ynP1yknfqPcQZc8EFM/LOOwfqPQYAjItKpZLRZlLDGZrlLPVOknHr1HF36FCl3iMAwIRmKRAA\noBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKs\nAAAKEVYAAIUIKwCAQsYcVgcPHszKlSuzYMGCtLa25rnnnsuBAwfS0dGRlpaWLFu2LAcPHiw5KwDA\nhDbmsPrsZz+bG264Ia+99lpefvnlzJ8/P93d3eno6EhfX1+uvfbadHd3l5wVAGBCqwwPDw+P9kVv\nv/12Fi9enB//+McnbJ8/f362bNmSarWa/fv3p729Pf/2b//2i4NVKmlsnJ6jRw+d/uQTUiXJqE/n\nJFLJGH65AMCkVKmM/s+9hrEcaOfOnbn44ouzZs2avPTSS7n88stz3333ZWBgINVqNUlSrVYzMDBw\n0muHhgaTrDt+r/34DQCgvmq1Wmq12mntY0yfWL3wwgv56Ec/mmeeeSZXXnll7rjjjlxwwQX5+te/\nnrfeemvkeRdddFEOHDjwi4P5xGqS84kVAGePsXxiNabvWDU3N6e5uTlXXnllkmTlypXZtm1bZs2a\nlf379ydJ9u3bl5kzZ45l9wAAk9KYwmrWrFmZPXt2+vr6kiRPPvlkFi5cmBtvvDE9PT1Jkp6enqxY\nsaLcpAAAE9yYlgKT5KWXXsptt92WwcHB/MZv/EYeeuihDA0NZdWqVXnjjTcyZ86cPPLII7nwwgt/\ncTBLgZOcpUAAzh5jWQocc1iNhbCa7IQVAGePcfuOFQAAJxNWAACFCCsAgEKEFQBAIcIKAKAQYQUA\nUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBAIcIKAKAQYQUAUIiwAgAoRFgBABQirAAAChFW\nAACFCCsAgEKEFQBAIcIKAKAQYQUAUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBAIcIKAKAQ\nYQUAUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBAIcIKAKAQYQUAUIiwAgAoRFgBABQirAAA\nChlzWA0NDWXx4sW58cYbkyQHDhxIR0dHWlpasmzZshw8eLDYkAAAk8GYw+r+++9Pa2trKpVKkqS7\nuzsdHR3p6+vLtddem+7u7mJDAgBMBmMKq927d+exxx7LbbfdluHh4STJ5s2bs3r16iTJ6tWr8+ij\nj5abEgBgEmgYy4s+97nP5d57780777wzsm1gYCDVajVJUq1WMzAw8L6vHRoaTLLu+L324zcAgPqq\n1Wqp1WqntY9Rh9X3vve9zJw5M4sXL/5fD16pVEaWCP+nc845N8eOrRvtYQEAzqj29va0t7eP3F+/\nfv2o9zHqsHrmmWeyefPmPPbYY/npT3+ad955J5/61KdSrVazf//+zJo1K/v27cvMmTNHPQwAwGQ2\n6u9Y3XPPPenv78/OnTuzcePGfPzjH893vvOdLF++PD09PUmSnp6erFixoviwAAAT2Wlfx+rnS353\n3nlnnnjiibS0tOSpp57KnXfeedrDAQBMJpXhn/9Y33gcrFJJY+P0HD16aLwOOc4qScbtdNZBJeP4\nywUA6qpSGf2fe668DgBQiLACAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLAC\nAChEWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKsAAAKEVYAAIU0\n1HsAJpOGVCqVeg9xxlxwwYy8886Beo8BwCQmrBiF95IM13uIM+bQoakbjQCMD0uBAACFCCsAgEKE\nFQBAIcIKAKAQYQUAUIiwAgAoRFgBABQirAAAChFWAACFCCsAgEKEFQBAIcIKAKAQYQUAUIiwAgAo\nRFgBABQirAAAChFWAACFjCms+vv787GPfSwLFy7MJZdckgceeCBJcuDAgXR0dKSlpSXLli3LwYMH\niw4LADCRVYaHh4dH+6L9+/dn//79ueyyy3L48OFcfvnlefTRR/PQQw/lQx/6UP7sz/4sX/7yl/PW\nW2+lu7v7FwerVNLYOD1Hjx4q+iYmjkqSUZ/OSWTqv78x/HYAYIqqVEb/58KYPrGaNWtWLrvssiTJ\n9OnTs2DBguzZsyebN2/O6tWrkySrV6/Oo48+OpbdAwBMSg2nu4Ndu3blhz/8Ya666qoMDAykWq0m\nSarVagYGBk56/tDQYJJ1x++1H78BANRXrVZLrVY7rX2MaSnw5w4fPpzf/u3fzl/8xV9kxYoVmTFj\nRt56662Rxy+66KIcOHDgFwezFDjJTf33ZykQgJ8bt6XAJDl69Gg++clP5lOf+lRWrFiR5GefUu3f\nvz9Jsm/fvsycOXOsuwcAmHTGFFbDw8O59dZb09ramjvuuGNk+/Lly9PT05Mk6enpGQkuAICzwZiW\nAv/lX/4lv/Vbv5VLL700lUolSdLV1ZUlS5Zk1apVeeONNzJnzpw88sgjufDCC39xMEuBk9zUf3+W\nAgH4ubEsBZ7Wd6xGS1hNdlP//QkrAH5uXL9jBQDAiYQVAEAhwgoAoBBhBQBQiLACAChEWAEAFCKs\nAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFNJQ7wFg4mhIpVKp9xBnzAUXzMg7\n7xyo9xgAU5qwghHvJRmu9xBnzKFDUzcaASYKS4EAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChE\nWAEAFCKsAAAKEVYAAIUIKwCAQoQVAEAhwgoAoBBhBQBQiLACAChEWAEAFNJQ7wGA8dKQSqVS7yHO\niAsumJF33jlQ7zEAhBWcPd5LMlzvIc6IQ4emZjACk4+lQACAQnxiBUwBU3eZM7HUCZOJsAKmgKm7\nzJlY6oTJxFIgAEAhwgoAoBBhBQBQiLACAChEWE15tXoPcBaq1XsAOONqtVq9RzjrOOeTQ/Gw6u3t\nzfz58zNv3rx8+ctfLr17Rq1W7wHOQrV6DwBnnD/kx99EPecf+MBFqVQqU/I2FkXDamhoKH/8x3+c\n3t7ebN++PX//93+f1157reQhAIAJ5NCht/Kzy51MxdvoFQ2rrVu3Zu7cuZkzZ04aGxvze7/3e9m0\naVPJQwAATFhFLxC6Z8+ezJ49e+R+c3NznnvuuROec/To4SRT+WJ3E/G9rS+4r4n4/koq9f5KnvOS\npvJ/v6n83jIhryy/fv1E/XU+dU3ccz7xfn3WS9Gw+r9+4w8PT90rIwMAFF0KbGpqSn9//8j9/v7+\nNDc3lzwEAMCEVTSsrrjiiuzYsSO7du3K4OBgHn744SxfvrzkIQAAJqyiS4ENDQ35+te/nuuvvz5D\nQ0O59dZbs2DBgpKHAACYsIpfx+oTn/hEfvSjH+Xf//3fc9ddd41sd32r8dff35+PfexjWbhwYS65\n5JI88MAD9R7prDA0NJTFixfnxhtvrPcoZ4WDBw9m5cqVWbBgQVpbW/Pss8/We6Qpr6urKwsXLkxb\nW1tuvvnmvPvuu/UeaUpau3ZtqtVq2traRrYdOHAgHR0daWlpybJly3Lw4ME6Tjj1vN85/8IXvpAF\nCxZk0aJFuemmm/L222+fch/jcuV117eqj8bGxnz1q1/Nq6++mmeffTZ//dd/7byPg/vvvz+tra0T\n8qe4pqLPfvazueGGG/Laa6/l5Zdf9in5GbZr16584xvfyLZt2/LKK69kaGgoGzdurPdYU9KaNWvS\n29t7wrbu7u50dHSkr68v1157bbq7u+s03dT0fud82bJlefXVV/PSSy+lpaUlXV1dp9zHuISV61vV\nx6xZs3LZZZclSaZPn54FCxZk7969dZ5qatu9e3cee+yx3HbbbX4Kdhy8/fbb+f73v5+1a9cm+dnX\nEX75l3+5zlNNbR/4wAfS2NiYI0eO5L333suRI0fS1NRU77GmpKVLl2bGjBknbNu8eXNWr16dJFm9\nenUeffTReow2Zb3fOe/o6Mi0aT/Lpauuuiq7d+8+5T7GJaze7/pWe/bsGY9Dc9yuXbvywx/+MFdd\ndVW9R5nSPve5z+Xee+8d+U3ImbVz585cfPHFWbNmTT7ykY/kD//wD3PkyJF6jzWlXXTRRfn85z+f\nX/3VX82HP/zhXHjhhbnuuuvqPdZZY2BgINVqNUlSrVYzMDBQ54nOLg8++GBuuOGGUz5nXP7vb0mk\nvg4fPpyVK1fm/vvvz/Tp0+s9zpT1ve99LzNnzszixYt9WjVO3nvvvWzbti233357tm3bll/6pV+y\nNHKGvf7667nvvvuya9eu7N27N4cPH87f/d3f1Xuss9Lp/Ht2jN6XvvSlnHvuubn55ptP+bxxCSvX\nt6qfo0eP5pOf/GT+4A/+ICtWrKj3OFPaM888k82bN+fXf/3X09nZmaeeeiq33HJLvcea0pqbm9Pc\n3Jwrr7wySbJy5cps27atzlNNbS+88EKuueaafPCDH0xDQ0NuuummPPPMM/Ue66xRrVazf//+JMm+\nffsyc+bMOk90dvjWt76Vxx577P/rLxHjElaub1Ufw8PDufXWW9Pa2po77rij3uNMeffcc0/6+/uz\nc+fObNy4MR//+Mfz7W9/u95jTWmzZs3K7Nmz09fXlyR58skns3DhwjpPNbXNnz8/zz77bH7yk59k\neHg4Tz75ZFpbW+s91llj+fLl6enpSZL09PT4C/M46O3tzb333ptNmzblvPPO+z+fPy5h9d+vb9Xa\n2prf/d3f9ZM74+Bf//Vf893vfjdPP/10Fi9enMWLF5/00w6cOT6iHx9f+9rX8vu///tZtGhRXn75\n5Xzxi1+s90hT2qJFi3LLLbfkiiuuyKWXXpok+fSnP13nqaamzs7OXHPNNfnRj36U2bNn56GHHsqd\nd96ZJ554Ii0tLXnqqady55131nvMKeV/nvMHH3wwn/nMZ3L48OF0dHRk8eLFuf3220+5j8qwL4MA\nABThR5cAAAoRVgAAhQgrAIBChBUAQCHCCgCgEGEFAFDI/wMQqdZCpTWlJAAAAABJRU5ErkJggg==\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "deltas = [l[i]-l[i-1] for i in range(1,len(l))]\n", "_=hist(deltas)" ] }, { "cell_type": "markdown", "id": "b36fe337", "metadata": {}, "source": [ "There are other forms of compression that we can use. For example, the dictionary terms (as strings) are sorted and can be represented in a Trie, analogous to prefix compression in B-trees." ] }, { "cell_type": "markdown", "id": "cf98f1c7", "metadata": {}, "source": [ "Tricks like these are important for improving the practical performance of information retrieval systems.\n", "\n", "However, often, you may be better off using a general purpose solution instead of a special purpose solution. E.g., the problem of representing lists of integers compactly has (implicitly) found attention in other areas of computer science, and you may be able to just reuse those results." ] }, { "cell_type": "markdown", "id": "9164b5a6", "metadata": {}, "source": [ "## Matrix View" ] }, { "cell_type": "code", "execution_count": 135, "id": "6e6d88ba", "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 135, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlIAAAD9CAYAAAB3CplhAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzsnXdYU+cXx78BA8gQcQGCgMgSF07cSnHUhbNa/Wlx1jrq\n3lprrRa17lZbtbW1ta66tWrVuqrWausWByqoIE5wgrL8/WGSZtwkd7834f08T58mN+99z5EkN+e+\n7znfo3rz5s0bUCgUCoVCoVA440DaAQqFQqFQKBRbhQZSFAqFQqFQKDyhgRSFQqFQKBQKT2ggRaFQ\nKBQKhcITGkhRKBQKhUKh8IQGUhQKhUKhUCg8ET2Q2rNnDyIiIhAaGorZs2eLPT2FQqFQKBSKYlCJ\nqSOVn5+P8PBw7N+/H35+fqhduzbWrl2LihUrimWCQqFQKBQKRTGIuiJ18uRJhISEICgoCGq1Gu+/\n/z62bdsmpgkKhUKhUCgUxVBEzMnS0tJQrlw53XN/f3/8/fffBmNUKpWYJikUCoVCoVAkxdLmnaiB\nFNsgyZXjvFt27ULH1q11z9VqNcoHB+Pa1auszh84dCiWff01R6vSULp0aagcHPDg/n1W47v16IH1\na9ZI7BV7cgHUqVkTZ/79l/H1eg0a4K9jx1jPV7N2bfx76pRI3pEhKTUVof7+stmbOHUqouvVQ4dW\nrayOjaxcGSk3b6JmnTo4cOgQ1HqvtYmLw2/bt0vnqB5/nzuH6GrVZLHFh/998AF++eknq+NGjh2L\nBV9+KYNHpuQCBu+flk9nzMBnU6bI7Y7sTPnsM8z49FOLYz4aOhTfKuRab4y5949imeAKFXDzxg3d\n84N//YWYevVk9SHLyuui5kidOHEC06ZNw549ewAACQkJcHBwwPjx4/8zqFJxDqQo4nLq4kXUrlyZ\n17n0YmBKv4ED8f2yZbLZU6vVcHBwwOvXr62OVUogJTa169bFqRMnZLc7Y/ZsTNG7nskJ/e7ZNvT9\n44eLiwtevXpF1IcsWF6REjVHqlatWkhKSkJKSgpycnKwfv16xMXFmYxr1bYt57knfPKJGC6KgpeX\nF2kXrDJi7Fizr/ENovgwcMgQ3eMatWoJni9cU7jg4eEheC5j3mnWjNd5i7/9VmRPLJObm8sqiLJn\nSARRAHRB1M59+yyOi2T4jnmVKCGJT0rGt2xZ0i4IZtPOnaLPOWrcONHntFd+P3yYtAtWETWQKlKk\nCL7++mu0bNkSkZGR6NatG2PF3m4OH0xXNzd4lSiBWZ9/LqarjHTs0gVDR460Oi4zMxOTp02zOm56\nQgKcnZ0Njo2dNMngeZ8BA1j7N2nqVN3jDl266B5369HDZOxCzfZDGW9vODhwe5vrN2wIAHB0dISP\nry/iOnbUveaAtyswbFm2ZAnKBQYCAE7/8w8AoHrNmgCAT6ZPR/ngYIPxYydNQo8PPjA739XLlwEA\nz58/Rx2G5d3mLVta9Ulr35gD+/dbPZeJfb//zus8vkycOhVbd+/mfB5J0bjpCQkErYuH9iawbfPm\nFsclXrxocqxkyZImx/wDAnSP27Zvb3FOc+/fu23aWDyPCy31UijEIP3uXatj2FxLa0VHi+CNIaXL\nlIGTk5PVcZ153Pgzof/+Jd+8iWZmrlVeXl5wcXERxSYbyuqlJehf65XCGhZb7qQR/draqlUrXL16\nFdevX8fEiRMFz5f18iUyMzIEzREWHs5q3JaNG/H1ggWsxmqDAktMnTjRZOXgyy++MHj+w4oVrOwB\nwBfTp+seb924UfdYm0M1lSHYfHD/PgoKCizOaxxYHD96FMBbOYt76enYvmWL7jVHgHEbK9rCnvWd\nW7cMnmvzqz6fOhXJN28avPblF1+Y/eL07t/f4PnJv/4yGWMc1NRmuACby++yhLmLHgB4e3tzno8E\njgRtT9VcC8T80deycOlSVuMcHBxQzNMTcR07olr16ijm6cn5JqPpO+8gLCLC6riRDCvCrm5uJsdS\nb9/WPd5ppcLZ3PtXvkIFq/6w5fddu97OGRwMR0d5PjFXr1yxOuYfo6IltjRs0sTsayVLlYJaE0hV\nrFSJ1/xMhJr5vdH/a3r7+GC/mRuwzMxMWbey7qam6h6nJCfLZpcty5YsMXjOZ0dLaojcpL7/v//J\nao9tUjoXuKyqycV0ntuffAILY/5mCGrE5sfvvuN8zimeF2BjzF30EubNw3cyb+3ZItotnj2//Yb2\nnTpxPn/WvHmMx4sWLYoRgwczvnbIaPuvoKAAz54+xfYtW3DuzBk8e/rU6k2GMc+ePcM1Fj/8TAnp\n58+e5WSLLWuNbjxKlSplMmbV2rWs5vrm++8BvF0xyc/PF+4c3gYpllZ+/Pz8ALwNdB2LmK9/eq97\nd4PnM+fMsWr7qGZbKKpGDYPjcxYsQF5eni7v5fKlSwDersILDSCTWPzeGK/EK4WuRn9jtsyXOMG/\nmKen7rESf3uJBFLrfvlFVntDhg+X1Z6YTNFbhbJE/48+ktiTwsUPLD+jE0ePltgT+0B/5Wfb5s1W\nx48z2gKfYO7vbKFSuGnduuyc48BMFttQcjNn4UKD5yqGVbZ4nj+QYnD50iXk5OSYfX2RXpBsqe77\nV6NgcM7Mmax9OHv6tMHzcSNHAiLUWfXq04fXeSLWeIkK30KKUUOHiuyJIUqXTSISSL1jJb9AbO6m\npclqT0xm6OVFWYKuiohLHw6rpuaSzbX5Wu07dwYA1K1fn3Gcg4MDPPXuuOyRWnXqcBo/x2gL3BzZ\nWf8VJtdr0ICTDU8eW3uVq1a1OqZSlSqoLUEQx4b2nTohLy8PwTy3+wb162dyrIpIshUlGVbKgP9y\npAoKChATG6s73qhpU4vzPX36VJA/15OSkPXypcGx/Px8TitxP//wAy/bZm8MWGJpy9Le+HzWLDx9\n8oS0GxYRVf6AlUGVCk2io0XbcpGTipGRuJyYKGiO8VOmYPaMGSJ5RJECY70aNzc3ODg64vmzZ4zj\n5ZY/iGnWDAGBgVil2YaxhL78wZ+HDhm8Jqf8wYuCArhzDFrsjcCgINxKSSHthqi4ubvDSa1GZmYm\nGjdtiiNGnzEKM53eew+bf/2VtBsmNGvRAvv37iXthuKQVf6ALUoIooJDQrDQKInNGvpBlJubG8oF\nBFhMQgbeionqwzWIKqvJH1i5erXumLYKzhLa6pslesnsfhKLRgYY+VW6TBmzYwODggwqU5jyOtiy\nYtUq3ucCQERkpMFzbRBVTlNR9fLlS4Mg6ku9rZSR48YhJDRUkH2u1K1fHx31qjZtgaMSlzBv3LFD\n0vm1zJ4/n/e5pUqXFtGT/2BbTMOGURy3dl6+eIHMzEwA4BRE+fj6wrN4cU62jGGTI2WJgKAgFC1a\nlPE1beWyVMgdRH0+ezarcbYQRHFdeZYDIoGUWk1eluzm9esYoadxZAk/f394+/gYHHv58iXu3L5t\nNglZS0eB5cTabcm+PXvqjhlXwQHAZ0ZbIdrqmyF68gppetUZUnDbyK+HDx7gIzN757dSUgwqUx49\nesTb7oD4eN7nAsAVM6uMd/QqqvQZO2KE7vGCOXNkr3TJy8tDbm6urDaFot2K+NBMYrhQurRrJ8m8\nxuTl5bEax1RGXszK9i3fJOfhY8YwHme6zlrbypzP8gdXKPfS0w22a5q1aGFxvFqtNvn3GL8XXCtC\nvYoXh4OZv7m2ctle+MQoQNaX5dEvBJA6aZwPxoUKXDpnyAWRQKqMAsrFR3IQREtLTcX9e/dYj+/y\n/vtWxzTVywWwBNs7xE8nTTKrj0QSMdo1bNi6lfXY+L59Aby9++fyOeNbrTJ52jSUMQqypebxo0eS\nB8VSsZylVIFSmczyuqEvGaLloBWdsuI8V2jMXZt8fH1Nvjvu7u4W52Kj6QQAfT/8kNU4NgwYNMjq\nSkgZb2+Tm9lPjQoS9vz2G+O55q6hgeXLc/DSMlyuUcDb9AF9qrDIvROTIL2qwXJ6WmZSJ43zwXin\nQ4kQCaSE/gg4FikC/3Ll0KBRI1bjme5UtKsnxonCvmXLwkkvWueTGG+pj56rqyvc3Nxw6I8/WM11\nm2VORWyLFmZlDBYx/HhZEr00JpxBN8d4a0n/y6ilN0PiqvE2mr7mSrcePfC+ZuWtZu3aaNm6NZyc\nnFitOPXq3RsAsGrlSgDAo4cPDd4HfYVlpiRW49U0JoKMLrwdunTBzGnTDHRY5MDbx8fEF4o8dOra\nVbK5Hz9+jJ6azzEXzMme3Ll9G107dDA49sxMnh/w9jt9+OBBAG+/z5ZWr1YuX25yLLZFC4wwszpm\niRXffMN4XF+fLS01Fal37nCeGzC/yrZ9yxa8fPGC15zGGP+drWHcM/DC+fOC7LP9LdSiFTYGgBvX\nrwuyLTUdbCCNwSazP/Pz8pB65w6O/fmn7lgvCxcgpjuVTevXAwCGaWQDtD+u6XfvIuf1awwfMwY1\na9fGASutIIwZN3kyjmguRkxkZWXhpVGliCU2avzUpy3Dl1a7TM4kTjicYTuFi1osk2DeFj1BUIB5\nG+xHhmRo4200fc2V9WvWYJ0mF+zfU6fw+65dyMnJYazOMc43KbCQCFilWjUDheU3DNpBJ44fN3he\n1s8PVaOiDI7pW3AsUgSHDxzA0u++k71FjC0yxIKCfzvN57l3//5ws7JiIib6229Mwq1MRAuoxqsY\nGYkevXpZHLP6xx85z7tcc05gUJCgMvEtGzfqctnu3L7NWWPrj717sXDuXN72jblvpbG70Bwpe0L/\nt9DemCtDVwShIp9EAikuPaeKFy9uMUlauxQ9ULMkeZJnZG9c0bRo7lxUqlKF8zxabZNpmv8f/Osv\n1j4xKZMzsZNhGfmfkycBwKw4oRR8IeJFkyuPHj7UbaEGBQebBHb6XDh3zqD8/uiRI1bnv5uWZiKg\neEsvFyo/Lw+ZGRkY3L+/LhjX76/mX64cu39IIWHwsGFmpQN2aD7PP373nWgrBGzQ335jWwATw7Mf\nI/C2WGXNzz/zPt8cH2puIm+lpChWn0gLUz4XXw28yePGIYHgNYhiPwgV+SQif1CsSBHWSZsUdgwd\nMQJfGwnz2QINGzdmFdgoGa38QVFXV52ukZOzM3IkaiysVqvh4ODAqnGxUuQP7IUZs2ebFS28/fAh\nAixU5jWNjWW9pU+xjq1e8yi2hyLlD5q/+65stiz1gBOLdhz2x1ezKHv18PCwqNjMRHZ2NqfxSsHW\ngihLhQT64pBSBVEAMGbiRKznmNxKml0HDpB2QRQs9ca0FEQB4BxEWVtt2bZnDxrHxCCcoTE8X5h6\nBErJZwkJvJO+9fMe+eDq5iZbP0FbobVM1a9cOKwAuSRrEAmk5OyVI0cPuB0cftR6vvee1THPnz/n\n3L5ATkHIwszGdetMjlnTEhObv44dw+YNG2S1KZTW77xD2gVRYKPhJhYTrSRut3/3XRw5eNAgcVgo\niZqec3Lx6cSJBlvmXGBbQWmOrJcvraqYBwYFAQD6WMjx40NLgbI4UsE3ob+iURGREIzzlZab0XuU\nY5GELTaZbM6VCBHv2Ci2g1wrn3JX0NVr0EDS6jGKeex9a+53IwHhwo5Wid7SSiQflPp35ttYm23H\nj5CwMKtjjBdafjFTGCVkkaRGrVq8z2WCSCD1Poc+ZmJwRcQ7NortsG/PHsltJMybR/scsiDJRnWv\nxESq/mhFXV15nbeEITjgojYutKOAFg8PD17nGVd4LuPY965IkSLEmuEyvWfWBFvlwJXnZ4kt169d\nE2UefUFRPpz+5x9R/NBCJJBa98svrMd6lSjBqFEkFGNBNHugjqY0eydHyQYKfyaOHo3pMpTn2jpc\nS+ntEaY2ORu2beM8z+gJEwyeWwoFyleoYDZQYZKkcOAQWAjtKKCFbzBjfN7APn0sjr9irBXH026V\natUEB2BM57Odc/yUKYJsW0KlUrESlKYYQiSQ4qKKm5mRwahRFMNSGRxgFpTctnkz6/MB7u0HxCI4\nJIT12JsaYbW2PEREtejLBBRmuPTamjpxooSe2AfhEtwMkYCNYGYoi+0LLV3bt+fsw7xZswyef6BR\n82ci+caNtzmXLNH2zWPDGJE+98YioWy/exPNCJGaI8Iovy0vN5eXXMSFc+cEy0xkMWgJ6rfMsYSU\nTe9DwsIY80C50L5TJ5G8eUuTmBiTY2wqluWESCDFpIrLlYMcchWuXrlicnFz57icbK79gJbuVoT2\nuDJCUz1zk4PqLN9+dfqBplaPSgvTD4cYzY+HjRqle0wy8fJzox8lLWx7bTV/913ZBTlPHD9uUTeL\nK7379xdtLnOMkLkaTCrM9V/UJ0mk7Qu2iNGGCQA+HjUKLVq1Mvu6ccK1vlAin8bJE6dOZTxu7rtn\nnPMoNNmcJJ3MFB29x7NVlVicO3OG9Vhz1ZZcFymsUVaE3xupIaIj1aVdO+zi0a3dz9/foL1MSFiY\naHuuhZWAoCBdG5padeqYBFLWqB0dzVrMkA1rN21C986dRZtPKCVKlkTG48e65/7lyplUtmh1pOTi\nf/HxiKxUidUPib3oSLVs3VqxCbpsqVm7Nq5evgxPLy+k8ayO4kvC3LlWqwDFZuTYsVjw5ZdWx+lf\ng0gQVaMGzp4+zfha9Zo1zbbeEkrb9u2xk8fWLkV+rOlI8Q6kgoKCUKxYMTg6OkKtVuPkyZPIyMhA\nt27dcOvWLQQFBWHDhg0mjThVKhX4pLN1eu89bGahwUShyI3cgdTYSZNQp25dvBcXZ3WsvQRS9oCD\ng4NBrpijo6PV8nu5WfzttzqlfgpFCTg5OSEnJ4eoD5IJcqpUKhw6dAhnzpzBSc0qxqxZs9C8eXNc\nu3YNsbGxmGVm24QPSgyibE0U0VZozNBUWC4CAgM5VRFpWxTJyeNHjwQ3/pabvzhsGdgrrm5uBs+L\ne3kR8sQ89+/dI+0ChWJAgIzabXwRlCNlHKFt374d8ZpKjvj4eGw1E2gwJX9zgakxb7yFhEup6Max\n4zfFMl5eXijm6ckp4VVsbt+6ZZKc+4WF7YmZBAKplcuXy9pTUQyWM3xnlYS5PoDGCNHvemH0uXrM\nM6fRmEYi3niQ+DyTwsfXV3AZvVD0G2dTmLmelETaBavw3toLDg6Gp6cnHB0dMXDgQAwYMABeXl66\nH8E3b96gRIkSJj+KKpUKPqVK6S4iDgCoSL98lK9QAck3bpB2w67Y/+efaNaokWz2WrVti+CQECxh\n0WdMKVt7LwoK4O5g+/q/tevWxakTJ0i7QdHD3d0d+fn5Ntsmi2IZErlk+QD0BVtyIdHW3rFjx3Dm\nzBns3r0bS5YswZ9//mnwukqlMquL0aVTJ6gBqEGDKLmhQZT4rF29WtD5Tk5OCOVQ9RRVowaay9yW\nRiiXLlwg7YIo2HsQ9cn06aRd4MyLFy9oECUSZf38SLtgAomEfEdAF6OoWYznHUj5+voCAEqXLo2O\nHTvi5MmT8Pb2xj3NHnt6ejrKlCnDd3odJUqWxEMGzQ0KxV7IyclB0tWrpN2QlOAKFUi7QGHB52Yk\nCSjioi+Zct9IR4skjx4+JO2CpOyWqHk6r0AqKytLl0fy8uVL7N27F1WqVEFcXBxWadoGrFq1Ch3M\n5BBx0ZHKePwYpY2SNG2dn9evBwCUDw4m7IkyKVmqFONxsfVEPDw8eKsb68NFgJEUF3j20BKLzkaN\nSO2dUqVLk3bB5uAigmvr6FdGehcrxuqcOjI06SVdHSc1rSRqns4rkLp//z4aNWqEqKgoREdHo23b\ntmjRogUmTJiAffv2ISwsDAcOHMAEo1YGWgq7enavbt0AAMk3b8pu+z0bkP83l4R7V+RKtefPnwMi\nyKiZE6ZjYtCwYYLt/XXsGDZv2MDpHA+WF2upaGZjW5FCiW3RgrQLNgdbEVwtjRkUr+2ZkwKa9Noy\nh0XUKZQKXoFU+fLlcfbsWZw9exYXL17ERE2rgBIlSmD//v24du0a9u7da6IhpeXhgwf8PaYI4lcG\n+f9xkycT8MQ6VapV0z1uJFHDVzEI4hBIfbN4sWB79Ro04Fw9FkR49VNM0VYhzJ4/X9D5H1jp56Zl\nPYd+osbMFeEzYkvw7e0mtAerX7lycHFxETQHX8rZQEm/UmhvpGivRIiU0dBASlnMmTmTtAuMXDh3\nTvf4T4aGr0phw9q1pF1QPGtFbhvBl/F6rYn48NMPP4jkiXnGiLBqKRX/Xrok+px8e7vt2LJFkN20\nO3fw6tUrQXPw5Y5xA2WFMGP2bNIumPCEoBwOW4gEUkI7Z1MoSsKBfp6tInbT4uo1awIA5n31lajz\nysXcRYtIu8CLmpUqAQCGjx5N2BPhvyMhoaEmIqmFnSnjx5N2wYSNRu3kTl++TMgT8xAJpEqLUM1H\noSiFJxa6ti/78UfR7W3+9VdJO8BLgdh5kdr+Z6M//liU+Vq3a8dqHFvhTibKeHvrHo8ZPpz3PFKi\nDVCtsWjePIk9sc7Tp08FnX89KQlZCqsIr1m7NmkXFEcXo+9mjYoVCXliHiKBFOnEV66MnTTJ7GsV\nNXdoUlPcy4vYfj6FPwN79xZ9zk7vvYfxU6aIPq+UiN0RXmzYNlEX0q7iwf37vM+VC6ka9EqBUEX3\nkiVLQu3kJI4zIvHvqVOkXaDwgOZIseDLL74w+9plCXIGmHiSmUlsP59inpHjxiEkNJS0G4qnoYKL\nBbjANuBSGmxX0rgIw5Im9fZtQednZ2ejgFDTaFv6OwvF1dUVjo78pbejatQQ0RtpoIEUhSKABXPm\niCJpYO9s3b2btAuiUKRIEdIu6LiluY4uWbHC6tiL58+zmjOFgCQLX4TKx2RlZSGfUCClVAFetZqN\njjc3hP6dz54+LaI30kAkkPI0I4tAodgiYuXp2DNK/eHgipD8zmiRBRUDNb4MGTBAtDn1FbcphQ/9\nPD4Ke4gEUhUjI0mYpVAohKhXvTppF0Qh/e5dVuOYtiP+tgFBxV9++om0CxSCpIkselxYINOOnZaL\nU+yI7j17knZB8cT360faBVmxhe0IJo4qWK+NQlEqRAKpY0eOkDBLoYhOWT8/rF29mrQbimPDtm1o\nExenU61e9f33hD0Sh5Fjx5J2gaJhmEBxVYry+fvcOZtoeE5mRYpCsRMePXxY6Fp6sKFr+/b4bft2\n3qrVSmXBl18Ss30hKQlt4uKI2VcaiwW2+ykMBJYvjxIlS5J2gzfR1arhbloaaTesQqQEpVXbtti9\ncycJ0xSKqOTk5Ci6pQfFfqhCZTYoHLmVnEzaBcGonZwUL/1DZEWKBlHcoMn5FIoyaBobS9oFXnTp\n1g0AUIFAMFamTBlBOkLmaBwTI+j8kqVKcRLkjKxcWZA9Cj/iOnQg7YJV6NaeDXA5MZG0CxSZ+XnD\nBtIuiEo7G7gYsmnPIUQDT6rWWGzaumxcvx4AcCMpyeI4scUPq1WvjgcPHvDSEWrYuLHF1x89fMjX\nLQBAzuvXeFNQwHp84sWLguxR+GELlaQ0kLIxbGG/27N4cQQGBZF2Q3JKly4tmSZar65dJZmXFP+c\nPMnrPJVKZSKC6ePrK4ZLJrBpz9Ghc2c48Wwr8oxHb7iPhg61OkbMti5iVxueO3OG97lHrRQlCQ1s\nnj9/jry8PEFz2ApN33mH1ThPT0+JPRGOj48PaRdMIBJIdevRg4RZu0Box3M5ePrkCW6lpJB2Q3JU\nDg428X4oAQcH/pca47+xkLmEMnPaNP4n8/isfPv11/ztUSgaDh04YPH1IkWKILJSJagIfrfYokQf\niXj06vVrEmbtgsePHpF2gaLhwf37eJKZiX4DB5J2xSpy9YQ0x1WefdHevHmD3Nxcg2Okq3hycnJ4\nnfda4QmzhY05CxeSdkEx5OXlIfHSJTzJzCTtikXmLlrEWhSXD56enihVujTn84hU7d28fp2EWQpF\nEqrZgGq3i4sLUfvuCryLpBRuxo0YQdoFCkfGDB8u6fxPNdvvYeHh8Pb1RfKNG0i9c8fqeRavbn37\n9oW3tzeqVKmiO5aRkYHmzZsjLCwMLVq0wJMnT3SvJSQkIDQ0FBEREdi7d6/ZeWtHR1t1jMIMm4RY\nirwsnjePtAtWef78OVH7Lzgk9SqZhk2akHZBckJkruwLUHA+JZtEfj6MHj9eknmFMmz0aNIuKIJr\nV6/iz0OHWAVRgJVAqk+fPtizZ4/BsVmzZqF58+a4du0aYmNjMWvWLABAYmIi1q9fj8TEROzZsweD\nBw9GgZmL58rly1k5RzGFTUIsRV6uW6mEUgJVo6JIu2CAs7Oz1TFMSd18E73Fol379kTtM6FWq0Wd\nT+jn2Znj6udtQvmU4RUrws3d3eIYMRP59Zk3e7Yk8wrFFm4KlYjFQKpRo0bw8vIyOLZ9+3bEx8cD\nAOLj47F161YAwLZt29C9e3eo1WoEBQUhJCQEJ81U6nTo3FkM3ymUQsn3y5aRdkEwgeXLWx3jX66c\nyTHSqxcL584lap8Jb4VVMYXaiHDo1cuX8fLFC9JuUOwAzokL9+/fh7e3NwDA29sb9+/fBwDcvXsX\n/v7+unH+/v5IM5MUunXTJj6+UiiKpH7DhrLas4XkdmN+WLHC4Pm1K1esnnPzxg2TY9evXRPNJz5I\nmejKF7bbD3Jx8cIF0i5QKLIiKANUpVJZLP8291qu3n/cZdooFGUxZ+ZMWe3t2rEDSxYtktWmUD62\nweCPQilMtOvYUXIbVapVMzlWo1YtAEDv/v3Ru39/znO6urrCzc0NgUFBuvN7anbNGlgRdTVHPgzj\nFGtwDqS8vb1x7949AEB6ejrKaNR6/fz8cEfvzig1NRV+fn6Mcwz88EOoAagBiN84gEKRl/e6d5fV\nXut27TBE4uoVsbGXZHNjMVCvEiUIeSINfnq7CnJRRrPDwZVKekVQfHB2diamSUbi72yNHVu2oKyZ\n32yxuHDuHMoFBBgcO/3PPwCAH7/7Dj9+953JOcEVKlicMysrCy9fvsStlBTd+atXrQIAHLMi6moO\nR0AXo7DJQOT8KYqLi8MqjZOrVq1CB03rh7i4OKxbtw45OTlITk5GUlIS6tSpw3V6u8fH11fyDytF\nXh7JrO01NyEB3Wyg5Yo+Zawk9doKjx8/Nnj+/NkzTud7enqiQkiImC6JipAWOHwZMHgwr/OSGbZ+\nuVC+QgVoAP1iAAAgAElEQVQUdXUVNAdfmrVsScSuNYS23WHDA006EFtI68axwWIg1b17d9SvXx9X\nr15FuXLl8MMPP2DChAnYt28fwsLCcODAAUyYMAEAEBkZia5duyIyMhKtWrXC0qVLzW7tFeamxffS\n023ig0FhT5DMCdBjJk7Eek2Rh61QS8Kbqu9//lnU+dZt2WL2tdycHIN+dFxbjDx9+hQ3FKyjFxQc\nLLvNmZ9+yuu8rKwsQXavJCYSSzYvqdBWX3zFZrnwmqMg9ysRhWzF7iWpxWIgtXbtWty9exc5OTm4\nc+cO+vTpgxIlSmD//v24du0a9u7di+J6vcYmTZqE69ev48qVK2hpIeKWqnmnvfEhzzs1irwkEU6A\ntgWOHDoEQBpdnn69eok63/uaXBGmIoKmsbG8+9GxqVQkTXkWgVRnO+kDWbJUKagZ5DQGDhkiue35\nc+ZIboNiiti9JLUQ2SDmenca26KFIHud3nvP5FicSIl1/kb7vdZwcXGBS9GirMYuX7qUj0s2iW/Z\nsqRd4E1KcrKs9q5euYITx49zOkcprYWM270IwVruhFCOHz1qcky79dWWh57UreRkqNVquCt4m/P3\nXbusjtm0YYOoNru8/z6v8/6nSSjmi0vRonBkyJFatmSJoHnZ8NHQoTZZfUthhkggxVWQ8w8LKuls\n2PzrrybHtltYvudCKsceYq9evcKr7GxRbNsTSiwrZ0uQxCsNxlpu4RERaMKym3tpTd+okqVKie4X\nF14UFGDjjh24eP68aHMayyM0btoUH48aJcrcTWNjAQBNYmIAAAc0geudW7ewYds27Ny2jde8ubm5\neKFg7aJvvv8eALDNSIhZSn7/7Tde5/2iydXly/30dM7bTGLx7ddfG+jBkfh+Fje6rlD4QxtgUSgC\nmZ6QIOn8TN3OzYuOWD+XFF3atZN0/iOHDuGr+fNFmevQH38AAA4fPAgAeKd+fQBv/55dFaZuPnvB\nAtHmmjRmDACg/bvvmrwWGBQET71UDrGwJKFjiXGTJws1LOx8ESFRPcj37y43G3fsIO2CVYhcZdvL\nrGxeMTLS6hhzjWfr1K2LokWL6pq+hkVEGOR4GZdyalcPjI8bM3LsWKs+AUCwpsLHzd1d9FYQ1tDf\n/oyqUQMxzZpxOt9YmVp7l6+Fz/aIFj53U0WKFIFHsWI63RKu6FdbOjo6opinJ6rXrIkPunVjPYez\nszNc3dw42c0wqhQD2Cdgcq2QkQopmhYzadJIzVO93qJcYSrzb6xZ8RLC+JEjGY9buwYxYZxsrr1W\n16pTB7dSUvD0yROLcgV8Vlaecax81CJUvy0vNxdv3rwRNAdfjHMFSVRLZmZkyG6TD9qbIz6fZ7lQ\nvZH5k6RSqdAkOhqn/v5bTrMUimTUlvnzHNOsGQICA7FKsw1jicjKlZFy8yZq1qmDPzUJ31raxMXh\nt+3bJfJSOoaPGYNFMrZqcXV1hUvRorqAtm79+pxz1OSgR69eWCNyBSPwNumeKV+MKwMGDcKKb74R\nwSP+9PjgA6z56Sfm1yT6+1FsnyzAYtBNZEWqXoMGJMxSKJIgd0PguvXro2OXLpzOEbKSwpew8HBU\nCA2Fu4eHqIKcUgdRG7ZuRYkSJdAzPh5LVqxAVlaWLogaOGSIoCBqk4TSL2IEAaPGjzc5JkYQBYB4\nEAXAIIgKCApCUb3Cn1u3bpFwiWIHEAmkSFzUKW+xl9JlQPqec6M1GmmWWPzNN5I3ERZjS9ejWDER\nPOHGtatXcSMpCS+eP5dka09L+QoVTBLyhdC1Qwc8f/ECq1etwpABA3THHR0dERAYKGjuzm3bCnXP\nBKbPxyyeuWJSF06wZdDHH0tu43ZKCrL1Cn/4qmDzge13euykSRJ7QhEDIoHUqpUrSZilwHrp8uez\nZ8vkiXCkDmDmzZpldcywQYPQgcPq0B5N8jIXxJCGyBYoXiiUv86ckWzu5Bs3kJmZKeqcTO2tPDw8\nMHncOFHtiIG3j4/JsQkWqhd37t9v9rVhH30kik9C+earryy/zmJbW8kwvWcjGT5bX37xhRzu6Jgw\nZYpoczHJDrHhxNmzGMPiJtYcS4wapOuzQm9F8lcR0xqI5EhVj4jAVRbd3+XAwcEBfuXK4Y6ZZd2o\nGjUsinj16tMHP//wg1Tu6ShdpgxevnghWM3XnmjctKlO6JEEE6dORcL06QivWBFXL1+WzW5UjRoo\n4+2Nvbt3Wx1rjzlSbHD38ICzszMeP3qEjl26YMvGjaLM26lrV5z591/B7UmUTrmAAKSlpqKAxZbs\nZ198gU9tcOXEx9cXmRkZBhIIn8+ejU8YtjcpyqNdhw7YIVOHB2s5UkQCqaiICFxTSCBFoQglXOYb\ng2rVq6OMtzf2sdD6UUogFd+vH6vkeKVTu25dnDpxgrQbFApFRhSZbB5ZuTIJsxQWzKStCzjzs8hK\nz9Zo3a4dhgwfLqtNodhDEAUADRs1Iu0CJ5ydneHo6EjaDYoRcrShsRd+WLPG4DkbOSO5IRJIbRVp\nmZ0iPkrMAVE6SxYtIu2C4nmoYDVvLiz48kvSLnAiJzeX1fZcYSQ8IgJuLDXdxM7xk6LnpL3Sp0cP\ng+dyt+Rig3JkjykUG6VIkSKkXbAK10rZhUb9xrQVjJs17Tx2HzjAab5TJ09yGs+F762U/VtKet3+\n++8mY9bptY/63KjgYLaVajhLKuNaIdjvzOgYScGbggJOopO2lMQ9c84cTPr0U87nhYSFAXjbs/Ll\ny5eszqlnRrCZLx/17SvqfFJhTqiaiQqhoSbHwiMiDHrraptiOzs7w8nZ2excjWNizAab2dnZOsmZ\nSlWqoK6m64B2p6tqVBQcHR3ROCYGPr6+KOvnB3cPD6hUKhSTqHqZSI6Uq5wGKRQJadGqFcoFBEhe\nQahPvQYNUNbPj1XzWKXkSI0aPx7zbagi1BxNY2N17WOsMWDwYKwoRI3H5aZxTAyO8KiCpVC4osgc\nKS5RrhQYty6RkxYMPaz48i2HasGGjRuLZlfp1I6OltxGm7g4AEAHmdsdAW9/zHv16SO7XSEkXb3K\n+1zj1kIk4dLKgymIUqvVcHd3F9MlURk4dChpF1gjJIiKqlEDHh4eNrGaTFE+dEWKA14lSii6P5GT\nkxNycnJIu1GoWPrdd+jQpQvKStDM1RwTp05FdL166NCqldWxSlmRkgNnZ2eDUnYKxRIRkZG4c/s2\nXtpJ/h5FOhS5ImWrOCi8W7atdPO2xLF//yXtAicmjR2LUAbxRlvk98OHAUCS1a6rt2+LPqeW/X/+\nCUCez/+1O3d4n2tvq8LLfvyRtAuCuJ6UhCyWOVJiM2vePCJ2V6xaZfH16SxEiCmmEAukqkZFoXO3\nbmjUpAkA4OORI7Fm40Z07dEDn86cicnTpmENy+q+gMBAqNVqg/EtGO7WrX2ItKz85RfG448fP0ZX\nowoCrh2pXVxcDPo7GdNnwAA0eecdTnNqYXM3rp/4N27yZN3jZSy3Cdt16MBqXLOWLQ2ex3XsyOq8\nBiyqWVQqFYJDQljNVzs6mpPcxjwjReXoevUsji8fHIz3e/ZkPb+Saan5LkohMDtuxAjR59TSTCNJ\n8OrVK8lsaAkTkBZwVK8FCdvrRlh4OG97fAnSJARb49OJE3WP/fz9pXJHMvJyc/HmzRs0sCBpUblq\nVavzcP0NAIAJo0dzPkcMBsTHW3x9qqaopJinp6JlM/gUGUgJka296GrVcOHcOTnNUiiS0W/gQFmT\nzWOaNUNAYCArbSalbO3NmD0bU2xcMbp1u3bYtWMH7/MDg4JwKyVFPIcoFIosCNra69u3L7y9vVGl\nShXdsWnTpsHf3x/Vq1dH9erVsVuvTUVCQgJCQ0MRERGBvXv3mp1XjmRgCsVeKevnx3mlgnSjcP2V\nUFtFSBAFgAZRPPjNQl9ACkUpWAyk+vTpgz1GbShUKhVGjRqFM2fO4MyZM2il2UJLTEzE+vXrkZiY\niD179mDw4MFUCI5CkYCAwEBU0ru5YYOnjMnwTNSiN09wcKApqVxp06wZUft80ywKAxsF3ljYExa/\n2Y0aNYKXl5fJcaYlrm3btqF79+5Qq9UICgpCSEgITpoR4du6aRNPdykUii1SytVWa3XFw5WlijZF\nORzmKDxbmOjSrh1pFxQDr1ukr776CtWqVUO/fv3wRLNlcPfuXfjrJRz6+/sjLS2N8fyMx4/5mKXI\nQJ8BA2S32bxlS0TVqCG7XbGo37Ah53OEaCOdPX0ae/W21Cm2wYvnz1HU1RVlvL1Ju0KhUESEcyA1\naNAgJCcn4+zZs/D19cVoC9UH5sqRfQIDkQsgF0A+C5udu3YFwD+3asmKFYzHy1eowGs+PrCtWiPJ\nZwkJuvdslFFicH0BzVq17UWMWaBpQ7JP06YDgElVpCW6GY1t2bo1J7/atm/Pabw5jh89yvkcturY\nTETVqMFYlUqRntnz5yNY77rR3UzFZmD58ozHs7Oy8OD+fUE+tO/USdD59oI95N1RlEk+oItRclmM\nt1q1l5KSgnbt2uHChQsWX5ul0Z+YoPnRfPfdd/HZZ58h2ij4UalU6NGlC21crFD8y5VDqgCtHKGE\nR0Tg6pUrxOzzoeN772HLr7/KZs8WBTkrVamCSwzXEAqFL94+Prh/7x5pNyiFANEFOdPT03WPt2zZ\noqvoi4uLw7p165CTk4Pk5GQkJSWhjpk7BhpEKZcHHFpgSMHtW7eI2ueDVI0w7YnkGzdIu0CxM2iK\nCEUpWAykunfvjvr16+Pq1asoV64cVq5cifHjx6Nq1aqoVq0aDh8+jAWabueRkZHo2rUrIiMj0apV\nKyxdutQulLYLGzmEW2xkZ2cTtc8H2q/LOg/spA3Hh4MHk3bBBHNb53wIi4gQbS6pyc1ls+liu3w+\nezZ69emD3QcOoHjx4tgtQ+L7x6NGSW7DHqG99igUAXTo3Bm9BwxABxGbUVvDFrf2KG/zpm4lJ5N2\nw4DGTZviiNHnorBQslQpPHv2DLki9Cf1KlECWS9f0l6PMhMSGorrSUkGx8oFBOBeejqKeXqiVp06\nOnX6+bNnG+T+zps1C0NGjMCShQvRoFEjlPb2hq+vL/Ly8nAvPR1Pnz6Fr68v1q9ZY3Vrj8itNB+F\nXz9/f6SlpvKyFxYejmtG3eeDK1TATRvYbqhctSounj/PeryjoyPc3Nzw7NkzzrZiW7TAk8xM/Hvq\nFOdzCytbN21CyVKlSLuheNZu3ozuhTxJWmlBFIBCG0QBwONHj0SbS8nN7EnQOCYGRw4elNyOcRAF\nAHc0fT0fP3qE33ftwu+7dulem2fUS3DJwoUAgGOafp18IaIQ95BHHg7fIAqASRAFwCaCKACcgigA\nyM/P5xVEAcAfe/fSIEqPkePGsRq39uefJfbEkOfPnyPDxi7c2gqr+L59RZlvw7ZtoszDlcnTpqFG\nrVq8zm2o6WWoj9rJSahLokJauFWf9Vu3Sjp/SFiY6Npeq9auZTyu39fU1pgwZQrnc44cPIjpCQm8\nbX46YwaqRkVBpVJh82+/8Z5HLohs7ZV2c8NLQl23KRSxkbvXHheUsrV39fZthPNo7mrvODk5IUeE\nrSWx8PLyQmZmJmk3KBRFIXrVnhhIHUTZYidye6VCSAhpF8xiq+0fJk6diq02JshJgyhm2ARRderV\nk8GTtxSmICoiMhJu7u6k3aDYAUQCqSrVqkk6v5BtQIq43Lh+nbQLZilM7R+o/IDtcvKvv0i7YJdc\nSUzESzupJqWQhUggdeHcORJmJaFZy5akXVA0Hbt0sfh6terVOc+5x0oSI5+WLXxhm0dFGjlV/Cny\nMmbiRNIuCCIgKIi0CxQZadC4seQ2HB0ddflvvmXLcj5/8rRp6P/RR6zH03bkAtmv196EYsoWK+Kr\n586c4TznuzExFl/n07KFL+XNtAKRkry8PM4aOgUFBWZf27Rzp1CXeBPTrBkx2/bCXAFJvfqQ0v27\nzbGC2xr0M6UcenzwAQDg9sOHaNS0KQDg2JEjVs9jG8T8vGED4/H8/HxkaVKI3m3TBtWqV4ejoyOr\nOQFg5rRp+O7bb1mPp4GUzJQqXZpX09JyAQHwsKCg/b/4eCFuUXgybNAg2W0+fvSI8/Z1dlaW2dc6\nt20r1CXeHNy/n0ijbL6MGDOGtAuS4ODgAGdnZ9JuiMLB/ftJu2CVKZ99RtoFWVjz008AgIDSpU2K\nXSyhH8SsXL3a7Lhemj68lvhhxQqcO3MG+flsOvvygwZSMvPo4UNeTUvv3L6N5xZkDX5ZtUqIWxQb\nwtvHB0EcV8JIJ9V+ZaGq8QczTcWVyG2NRo2SWP7jj4LOHz56NAoKCvDq1StGiQZ7xcfX12rwOHTk\nSElsp9y8Kcm89khfM43BlQSRQMqW2hBQKFIR26IFAMDZ2RlrN28G8N9SOACM0uRf9Rs40OC8XTt2\nYMmiRZxsJTFoqcnJx0b/Bltls5mtBJJ82Lu3oPMXzZune3z08GGB3tgO99LTrSqRf61pgSY2q+mN\nr11BJJBqKEOyGYWidP7YuxcA8Pr1a53qt3YpHADmz5kDACYaVXfT0nDtyhXWdipWqoTQ8HCh7lIo\nFAqFASKB1E8rV5IwS6HYBf0/+giLvvmG9fgUBbYmoVAoFHuBSCCVl5fHeqxarYZL0aISekNRCtEy\nCg+KRVhEBELDwki7YRFLieZy0VhTsUMRh4S5c0WdT26B1x/WrNE97ilwa1IuxkyYoHu8qxBp0FGs\nQ6RFTN2oKJw/e1ZOsxSKZMjdIqZegwYo6+eHTSzydZTSIoZCofxHxy5drErDUJSDIlvE0CCKQuGP\nr5+fzeU8aZPpudImLk5UP2bPny/qfLaApfJxyltJGrlRehA1Z+FCk2PNWrZEy1atENexIyIqVgQA\nxHXsiLiOHeV2T3EQWZEa+uGHWLl8uZxmbZ6g8uWRmZGBp0+fknaFYoTcK1LuHh5wcnJCxuPHVsfS\nFSnl4OzsjOJeXrh/7x5pVygUCgcUuSK1/pdfSJi1aVKSk2kQRQEAfDxyJF1lsEVUKjg4KFu6T+zc\nK6EUKVIEkZUqkXaDQrEIkW/1S410O4VCKRy018g7FGbCwsORfvcuaTcsMlFhyu15eXlIvHSJtBsU\nikWKkHaAQqFw48Tx47iblkbaDU5s45kjZU/YU7N2JRMSFobr166RdoNSiLC4InXnzh3ExMSgUqVK\nqFy5MhYvXgwAyMjIQPPmzREWFoYWLVrgyZMnunMSEhIQGhqKiIgI7NUIDhrj4+sr4j+BQiFLSGio\nrPbq1q+Pjl26yGpTKIWp9QiFLDSIYsZWZCZsEYuBlFqtxoIFC3Dp0iWcOHECS5YsweXLlzFr1iw0\nb94c165dQ2xsLGbNmgUASExMxPr165GYmIg9e/Zg8ODBjF3n76WnS/OvoVAIcD0pSVZ7X37xBbq2\nby+rTaEUptYjFFOcXVxIu8CIUv2SgtUCezKKzSWjfoMexYohNDwc3f73P1n9eKd5c8FzWAykfHx8\nEBUVBQBwd3dHxYoVkZaWhu3btyM+Ph4AEB8fj61btwIAtm3bhu7du0OtViMoKAghISE4efKkYCcp\nFMp/jJ00CRu2bSPtBoXCGqWK1sq9mkz5j0rBwQbPnz97hqSrV2UvRjuwb5/gOVgnm6ekpODMmTOI\njo7G/fv34e3tDQDw9vbG/fv3AQB3796Fv7+/7hx/f3+kMeRy5Or9ly/IfQql8LF7504s1WyzUyiN\nbEA1/uL587Lac3JyQjFPT6vjLl24IIM3FC1VNQsz3Xr00B2rFhWFd9u0gdrJCZ7Fi0tqvy3Llfx8\nGMYp1mAVSL148QKdO3fGokWL4OHhYfCaSqWCSqUyey7Ta00bN4YagBqAIxsHKBQF071nT1nt+ZYt\na3OCnBTpMNYHowA5OTl4RuViFIdWjHu9Xougc2fPYs9vvyE3JwdP9fKtpWAny5V8R0AXo6hZjLca\nSOXm5qJz587o1asXOnToAODtKtQ9jahceno6ypQpAwDw8/PDnTt3dOempqbCz8/PZM7SmvEUij1w\nSOa+W1E1aqB5y5ay2qRQKBQKMxYDqTdv3qBfv36IjIzEiBEjdMfj4uKwatUqAMCqVat0AVZcXBzW\nrVuHnJwcJCcnIykpCXXq1DGZd6vC5fEpFC48evSItAtWOXbkCGkXKBQKxS6xGEgdO3YMq1evxsGD\nB1G9enVUr14de/bswYQJE7Bv3z6EhYXhwIEDmKDpih0ZGYmuXbsiMjISrVq1wtKlSxm39nzLlpXm\nX0OhyExYRARyc3JIu2GVYsWKkXbBLvhw8GDSLkjKyLFjSbtAUSCTPv2UtAuSElS+vKDzifTac5XT\noJ1QslQpZL18iezsbNKuUIzoN3AgdmzdikcPH6Jrjx5YJ3H7lolTpyK6Xj10aNXK6lhtrz1vHx8k\nG5Uby9lrb3pCAqZOnGj29R4ffIA1P/0kiy8UCoXCBUX22uNLpSpVeJ9bt359ET2Rn8ePHtEgSqFU\njYrCg/v3UVBQIHkQxRf/gACi9q0lx9MgikKh2CpEAqn3eQpuCSlVPXH8OO9zKRRLvNe9u6z2nj9/\njoyMDFltCqU77bVH0bDom29IuyAqc21EimTY6NGkXbBbiARS62QW3KJQpOST8eNltffmzRu8YegY\nQKHYAsMHDSLtgqiMGTaMtAusWDxvHmkX7BYigZQYkuwUihKQM/n4m5UrAQB309Jw7epV2exSKFLS\nuVs30i5QKIIgEkiJIclOoSiB5UuXymZrUN++AID0tDRcu3JFNrsUipRsWr+etAsUjjSNjSXtgmiI\n0b6ISCDl4+tLwiyFIgnDRo3iNL52dLQgexVCQ1GLQZ+NQqHIR0TFiqRdIMahP/4g7YJoJF27JngO\nIoFU63btSJi1SRL09rVvpqcT9IRijkCOGiSn/v5bkL31v/yCz6ZMETQHhUIRxpXLl0m7QFEIRAKp\nrZs2kTBrk0zUq7QI5riS16FLF7HdoTDwQNO0Wy7GTpqEDSx7RlEolujZuzdpFygUm4dIIFUxMpKE\n2UIHbcUjD7/v2iWrvbOnT2Pv7t2y2qTYJ6t//JG0C8Tw8fWFs7MzaTcoVhhjQchXKZAR5GRoG0Oh\n2CpnT5+W1d7dtDTO+/pJtMqPwoLCtEL1Kjsb+YRkRBo0bkzErjkWf/ut1TFd3n9fBk/e0rVHD93j\nuQkJstnlC5FAKjwigoRZCktG0H5bnGBzERKT1u3aYcjw4ZzOsaYsTqEAhWuF6smTJ8jLzSViW2lN\nxId99JHVMRvXrbM65vuffxbDHWxYs0aUeeTCplrEiMnML78k7YJiWUj/NpwYLbMg39yEBHTr0EFW\nmxQKhWKNfr16kXaBCEQCqZXLl5Mwa8BkuupCEYncnBxZ7b3XvTumTJ8uq00KhUKhMGOzK1KDOawC\nNH3nHQk9kZ4uVPnXhIFDhpB2QUe7jh1ltXc3LQ1JHAU5U27eNHj+0dChYrpUaGjVti1pFygUisIg\nEkgFBgUJnmMph0aRhw4cEGyPJBup8q8Jy5YsIe2CjlvJybLaO/THH/jphx84nRMUHGzw/NuvvxbT\npULD7p07SbtAocjK2s2bRZ+zRMmS8CpRQvR5+VA1KsrsaxUrVWI1B5FA6uGDByTMUiii41m8OM6f\nPSurTXcPD5QoWVJWm0JJSk0l7QJxGjZpItpc9Rs21D0uXry4aPNSKMZ079RJ9DkzHj9GZkaG6PPy\nwdL1+/KlS6zmIBJIqaj8AcVOIPFZ/njkSKxcvVp2u0IoIFRmriSOHj4s2lzHjx7VPVY52GyGBqUQ\n8ZNmZ6VFq1YAgKP//APgbZqB/o2BLULkG1i6TBkSZimAzX9glcaTzEzZbV65fBl/HTsmu10hhAcE\nkHbBbhHzzr6JjeeTUrhRISRENlsfaHJ9tWLCDWvVAvA2zUD/xoAEawSKVxMJpDyKFSNhlgIQ/8Cy\nIU7m5G1zRNerx2pcewmWvi3xJDMT92ys7+KM2bNJu0BhwWEbzydVEh4eHvAksO2q/a5VqVYNJUqU\ngKubG+O41u3a4cb163K6plh6CGynZjGQunPnDmJiYlCpUiVUrlwZizUJ3tOmTYO/vz+qV6+O6tWr\nY7deu4qEhASEhoYiIiICe/fuZZz3wrlzgpym2Dfbt2wh7QIA4O+//mI1ToziCS7UrV8fHW2sj+KU\n8eNJu0ChSEq9Bg1Q3MsLABBZqRKeP3+Op0+eyN4STftdu3DuHIJDQpD18iXjuF07dsjpll1TxNKL\narUaCxYsQFRUFF68eIGaNWuiefPmUKlUGDVqFEaNGmUwPjExEevXr0diYiLS0tLQrFkzXLt2DQ50\nD5+iINRqNXJFVDRePH++aHNZg+93yV5ylKpGRSHx4kXk5eWRdoUiEkWKFLGL91N/uz1RL0n5cmIi\nq/OLqNWiK63/c/KkqPMJZfiYMVi6aBGn66+Dg4Pir18Wr8o+Pj6I0pQGuru7o2LFikhLSwMAvHnz\nxmT8tm3b0L17d6jVagQFBSEkJAQnFfZGUijePj6kXeCNm7s7vl+2DCM46mhlZ2VJ5JG8nD971i5+\ndAszzVq2NHheslQpQp4oi8KQO7xo7lyU9fPjdI6zi4tE3ogH69vblJQUnDlzBnXr1gUAfPXVV6hW\nrRr69euHJ0+eAADu3r0Lf39/3Tn+/v66wEufYqVKIRdALoB8Yf5TKJxJvXOHtAu8ef7sGWrWro3W\nHIUh7zJ8DykUY8SUaDDH/t9/N3h+/949yW3aAumF5Dt6KyWF03gSN4H5gC5GYbN2xiqQevHiBbp0\n6YJFixbB3d0dgwYNQnJyMs6ePQtfX1+MHj3a7LlM5eFVIiOhBqAG4MjGAQqFoiOqRg1dCTFbSDct\nXrJiBQCgaWwsUT/EILhCBdIucOZjozQMc4gp0aB0fHx94ezsTNoNRdG9kPbKM8YR0MUoahbjrQZS\nubm56Ny5M3r27IkOmkapZcqUgUqlgkqlQv/+/XXbd35+frijd7efmpoKP4ZlPKV1vqZQ+DLls89k\nt1xX384AACAASURBVPng/n3Od3WnT52SxhmW1KpTB8BbVXZbZuTYsbh54wZpNzjzlYx5fHIxjGVw\naI576el4/fq1KL78un07Dhw/rnt+8vx53X8xzZrhg759DY5ZolxAgG7cwCFD0LBxY4N5AaB6zZq6\nY2JWDQeVLy/aXIUJ1RumZCcNb968QXx8PEqWLIkFCxbojqenp8PX1xcAsGDBApw6dQpr1qxBYmIi\nevTogZMnT+qSza9fv26wKqVSqeAq4T+IQpGbfgMH4vtly0i7wUhk5cpIuXkTNevUwZ+HDhm81iYu\nDr9t307GMQqFQrERssCcF67F4orUsWPHsHr1ahw8eNBA6mD8+PGoWrUqqlWrhsOHD+uCrMjISHTt\n2hWRkZFo1aoVli5dSlXMKXZNWEQEPh45UlabE6dOxVY9yREKxdZwcXGBWv120yRQglWQ+o0aiT6n\nmIwaN47XeY2aNtU93n3gAHYfOIDhY8Zg2KhRuufa17SsV4icjD1jcUVKEoMqFepGRcnen4xCkQq5\nV6QmTp2K6Hr10IFFnhRdkbINesbHY/WqVaTdoFAoDAhakZKK0qVLkzBLoUhC/48+ktXe1StXcEIv\nH8MWkKKDvD1BgygKxXaxKMgpFYVBL4NSeNiwdq2s9sIjIli3r1EKUnSQp1AoFCVAZEVq3S+/kDBL\noUiCA80DpFAoFFk4ffmyrPYuJCVZHUN7t1BYU8zTEw6OhspfLd59l5A3ymHThg2kXVA8LxTe4oEt\nlatWZTWutka4mEKhiEuNihVltVclNNTqGGKBVM/evQEAZf39MZJnBQMpAmRuUqsUnj19ioJ8Qy36\nvXv2iDJ3aFiYKPOQILZFC93jbj16SG7v4P79WPX995LbERN7aVp88fx5Vi0uTp04IYM3//FOs2ay\n2hMbW2vCLZRO772H2ObNSbtBEQligdTqH38EANxNTcWCOXNIuQHgrVI0F25bEUP0LVtWgDeGBIeE\niDYXH/SF37gSzuHOIenaNd52SGIcAK5fs0ZymzHNmiG+Xz9O55Cukp0xezan8f7lyknkiXCKe3kB\n4P7dKO7lhXoNGojuT1h4OA7s38/7fM/ixXWPRxEKeLds3CjZ3Fx7u8nB5l9/xR/79klqI4LF9dcW\nVfqVCN3aA3D29GlR50u/e1e0uW5evy7aXHw48++/vM+9yrCXPXfxYiHuKA5bCQCfP3tG1P6rV684\njVdyP8TEixcBcP9uPMnMxF/Hjonuz7WrVwWd/1TTKxV421RWbooUkbbmSb/PpIPDfz954RUrws3d\nXdDcRdSmDUTY/Hu+kkEu5QqLXCJjlf7F334rlTsA+L3X+u8ZX8SYw+L8ks5OUSR+/v7w9PQUbT6t\n+NuHgwdbHTtm2DDR7Aph7KRJos01fsoU0eaSigZ6bSZIkCTwx54iD8VEvC6wpWSpUoLO5yJO6+zi\nont89fJlvHzxQpDtMgwV6Gz+PR8PHCjIrlTMnjHD5JiYgsNeJUpwPkf/PeOLs7OzpOLgRAKpBgpX\nnbV30lJT8fTpU1HmcvfwQMc2bQAAy5cuFWVOvlhawn9X46OWL7/4QjS7TBcfEpS3sExvKZBh2gIY\nM2GCqHfNVapVAwA0eecd0eYUk3eaN0efAQMMjrVq29ZkXKeuXVG3fn1eNvRz6cREX+1aKJWqVBFt\nLrbcv3eP8zk+mhZlAEyEaf38/U3Gj54wAQCQnZVlMIe2aXFzDkUzpfR0EPVXu7Tw+fdYQ8xmwl8t\nW4ZSZoK9tNRU0/F67eGE8vDBA87nNBPhe5OdnW1RUFMoRJTNufba+ywhAZ9OnCiJP1wICAzE7Vu3\nrI7r2qMHNsiQK2MPhIWHG2xNJMydi4ljxhD0iBsNGjXCo4cPcfXKFdlsclU2b9i4MS4nJhJVNn9R\nUAB3iZfX5aBzt27YtH49r3PL+vkx/vCqVCqUCwy0mntpa9Rv1AjH//yTtBsUGyfx5k1EBgcTs+9Y\npAie5+UpT9mcK9rEdC0qlQouApf79O9o2MImiAIgWxBVomRJXv8OJaENorRbCj9+950kdoTmQpjj\n2J9/omGTJpLMLRZstlwp7KgTHc143NnFxWoeRrmAAPzBkCP15s0bQUHUcqProzV26fVhM+YbEatB\n2QZRfD+fM2UqUnJwcNCtXFHkh2QQBQD5eXlWx9jEipRL0aJ4lZ1tcMzBwQEFArRp1E5OyM3J4X2+\nVPQZMAA/rFhB2g3ZcXR0RH5+PooWLYpso/daDBwcHU2kG8Rg4dKliO/XD14yXmjVajUcHBzw+vVr\nq2MjK1fGkb//Rsc2beiKlAiYu24IvR4JwcnJCTkiXcvEnEtL5apVce3qVeSw+LxyYeacOZgsg3SO\nSqUCVCq8sRMtNFtj7+HDaEH4ZlWRvfa4YhxEARB80VJiEAWgUAZRAJCvCXKkCKIASBJEAcCIwYOx\n7OuvJZnbHGX9/BAWEcF6fIACelu2ljE3Sl80s2KlSqLObe66weZ6xCfRlg2WcuOYcHJyMrvCInYQ\nBbzV3jIXRJUoWRKB5cvzmldoEOXq5gZHI4FhJt68eVNogqjQ8HDSLphAOohig00EUhSKklnz88+y\n2itfoQKqc9Q+S7l5UyJv2NGsZUvZbF08f173+PKlS6zOqd+oEbp272513Oz58wG8XYGypBfX7X//\nMzmWmZHByhd9mmv+bsWKFUOxYsUYx5QPDkYdDr0Xc3JyzK5mtmzdmrOPlojr2NHi6xmPH+NWcrKo\nNtnixjKQUhpSFmtVjIyUbG6+DBwyhLQLVlF8IOXk5ARXNzfd834Cy0YXLFnCalylypUF2dFn8rRp\n6NC5M9Zu3sx7Djd3d6gZNEuEwFWIlIlEo4vghE8+ETwn8N+d0crVqwEAQ4YPZxxXLiBAFHt8iYmN\nlV3ssl6DBujUtSunczzM/AjLBSmhR7Yc//NPVs2nx48aBeDtCpQlvbj1AvqJenh46PR29v3+OwDg\n2bNneGZGC2zPb7+Jdr36fdcuUebRsn3LFlHn0+d/8fGCzn/44AHn/DIlcEzCBH4x3i8HBwfOq6SW\nWMbyN5skis+RcnBwgEql0m39UChaXN3ckPXyJevxPr6+uJeeznq8u4cHXjx/bnVc9549sVYT8MkB\n16q9lJs3UaVaNfz9118Gr1nKkapWvTrOnTljce4W777LukWQveRIzZ4/XxdMSYmPry8yMzJY5cGJ\niWfx4gYCnUqgVOnSePTwIVxdXZGlJ18glCJFiiA/P1/Ssnhz+JYtazYQDypfHpmZmYp7HwozisyR\n4tqPTEohLYrtwvVzwVXdlu38cl+Iv164EP1E1JVhwloQBXDrsxhOeOVQLL6YPl0WO1IrMZu1q8Br\nre5vIbZvBP+tlt7flORkyYIopeq42TpEvq1c+pEVFBQgz0L5YUxsrBguESGuY0dUqlxZt33VOCbG\n4PUQlo18oznkRwCAt48P3D08ALxtnslEOItkZr7ChGLx8sULfDR0KOvxTBo++gw2Ul1n01Zl6Xff\nSSatYI6hI0bge455WU6Ey7dr1alD1L5YtOvQQRY7d9PSZF+NAoAgCUrNrV2jO3frZvH1B/fvAwCn\n1Wc25OXmElmNApiFL+XgsAXpC+DtSjQTXH9jSMK2i4Obuzvad+6Meg0bAvhPoNXV9b89M3PCpcYo\nfq3d2g/lwT/+QK/evfGx3nJ7m7g4FCtWDCPHjUP7Tp10x11cXFDcy4tRsVguBuhppmzfsgWXLl5E\n3549AQBHDh40GHvdQh83/WaTxls21rh/755uy2rzr78avNa6XTsAsCowWbN2bZw4fpzxtUEff2z2\nvKpRUYzH27ZvD8C6pkx3zd9Ky7ciVswtZdkHUL+h7uD+/Yk3BLbG0JEjkWzUU0tuapvRX5ILvpVh\nxty5fZvzOdprENsbI5II6a2pT6/evXWPD/7xh8Wx5gROue5c8GHoiBGMx+M6dsRIGaQV9BHSIF4s\nzK1Ec/2NIcmxI0dYjXv54gW2bdqEv44eBQDcS0/H69evDbaPHz16xGouizlSr169QpMmTfD69Wvk\n5OSgffv2SEhIQEZGBrp164Zbt24hKCgIGzZsQHFNB/GEhASsXLkSjo6OWLx4MVoYybvz0ZGiUJRM\n63btsGvHDtns8cmRCgkLMwn4xNaRKhcQgOLFi+OCXtWcloZNmuDo4cOi2bJEiRIlkMGhQq5qVBQe\nP3rEapUguEIFk0avoeHhjC14nJ2dUbtuXSRevIiMx49Z+8MV4+4AtkZAUBAvQdLywcFIFlCN6urq\nitevX9P8WxvAUk6ZHAjKkXJxccHBgwdx9uxZnD9/HgcPHsTRo0cxa9YsNG/eHNeuXUNsbCxmzZoF\nAEhMTMT69euRmJiIPXv2YPDgwYz6KmJXn1EoJLFUBi81zs7O6Nytm0lXdUdHR4PSbmtVe1wbg34y\nfTp+MlpJuHP7NmMQBQB7jFZbpYRLEAUA58+e1QVRZ69cQdKdO7rXHhvpmhkHUYD5PoavX7/G0cOH\ndUFUTLNmnPxii1RBlNDuEWzhq+ouJIgCgKysLCQL6Ivn5OQkyD6FPSSDKDZY3drT7hfm5OQgPz8f\nXl5e2L59O+I1pafx8fHYunUrAGDbtm3o3r071Go1goKCEBISgpMnT5rMWcbbW8x/A4VSaAkqXx6b\n1q9HaaMu9J7Fi+va7hw6ccLqcneFkBBW9g4cP46vly/H51On4gMruS22RrOWLVGnalVU1NsGLFm0\nqMGYETz7QPYZMAAdOnc2Oa4vHio1Hw0disYcGhzrb0W2YLH6aWuUDw5GeGAg7/PLCTh3ymef8T5X\nLg7//TdpF2wGq4FUQUEBoqKi4O3tjZiYGFSqVAn379+HtyYY8vb2xn1NMuDdu3fhr9d529/fH2kM\nCb4pqanIBZALgC6qUmydoyz35KVAm8tmfMeW8fixTgDy++XL0UjvB9SrRAkUK1bMIB8t8eJFg/PH\nTJhg8Fyb83L6n38w9MMPTX5YzSWp2hL7f/8dOTk5FotbbvPIkQLediwYPmgQnJycdAEuALx48YLX\nfPro5yNZ4tuvv8YRozZBxuhrM+kLm+7dvZuPa6zQ5mWKhTnNOWOys7MFdTy4kZTE+9wZn37K+1yx\nCLai9dSEY15jZOXKkiySjJk40eD5l4sWIbxiRdHt6JMP6GKUXBbjrQZSDg4OOHv2LFJTU3HkyBEc\nNFqiV6lUFsvEmV5T6/1ne7qyFIohDVlWiZDi55UrDZ5nZmTg2bNnWL50qdlz5mq268dNnvx2Do1w\n4RhNZaPxDysbuQR7YPOGDRZft9Rio6yfH3JycvDs6VPdMRWAr5cvF+STSkSphF9WrRJtLraYyy/k\nG2AtWbRIiDtW0X4ntLBdzVUaTNvUQki8eFFXYSkmcxMSDJ6PHT4cVy9fFt2OPo4wjFOswfob6Onp\niTZt2uDff/+Ft7c37mn2ltPT01FGs63g5+eHO3r5BampqfDz8zOZS6qeUxRKYeV9o2pGfY6cPInT\np07pnm/fu9dkjLu7OwKDgtCqbVuDO8A5M2eiy/vvm4wfrLnr/23/fpNjTFy6cMHyP0Bkftcktmtb\nrFhS8eei8D9y7FiLr5vLlwKYVfjd3N0x9MMPdc+11atc+MkoUBbCJyLrZE2cOpX3uVIXcBQtWhQO\nVlrE1Klb1+TYnJkzDZ7fuH6ds+1PPv+c8zlCqM+zrUyNWrVE9oQ7P3CQSyKFxUDq0aNHeKIRBsvO\nzsa+fftQvXp1xMXFYZXmzmXVqlXooNFWiYuLw7p165CTk4Pk5GQkJSWhDoN+DBt9HgqFwp51FpTV\nG9epY9AMOs6okhZ4u8V0KyUFu3fuNLkD3Lhuncn4pZq7/jZ6CdRLLawEWNtGEJuWmkan2hYrZ0+f\nNjvW0mvGlDSjK/OnXqBqDqYS8otGyfk7t21j7YsWoSta+pQSucF1AovAbLzRCg9bphkFNFxJvnkT\n2VaU0k+eOCHIhjlKy9xI/DjPtjKn//lHZE+4o12MWbJiBWFPzGMxkEpPT8c777yDqKgoREdHo127\ndoiNjcWECROwb98+hIWF4cCBA5igyaeIjIxE165dERkZiVatWmHp0qWMW3uWchAoFFsj/e5d3Z1b\nMU9PNI6JQRlvbzSOiYFarUblqlXROCYGJUqWJObj0X/+EdynUghFXe1D9MRcxXGj2rVl9uQ/9Fe0\nhEKiuTVTHi0TI4xWA4VWf0dERsoupqtFqRV/i7/91uB5lWrVTMaUlPk6pq0oHjJggKx2uaD4XnsU\nitKpFR2Nf2SscLGkI+Xg4IBSpUvrchUiK1dGr969MZGh2kxsHSlLKL3XXv1GjVjdtbdq2xa7d+7k\nPH9sixa4npSEW0ZNvvnQvGVL3UobW6YnJGCqUdIuEx26dEF+Xh52aCqxSfDJ9On45+RJq3/nlq1b\nC2qyXLpMGTx98gQ5OTm85+BLxy5dsGXjRtnt2iIDhwwh3rhYkb32AoOCSJilUERn4JAheChBgiVf\nft6wwSThMyQ0VPd47ebNrOaZMXs2L/tylvOLCdutD75bMn/s3QtPvWo9IXANogCwCqIAYOvGjbIH\nUcYVWJ9PncoqWOWjMq9PyVKloCa0MkSDKPbIGUR9+8MPvM4jEkjd4inARqEojWVLlmD46NFEbP97\n6ZLJsf916WJyLL57d93j7notk7TsYEg+X/jll7rHxYwCAGe9vn2Ojo7wLVtWV0BinPNjbwjRDpKq\nldCchQtFm2vWvHmizcWWd1u35nWesWQHV64kJuKlCPIT9oxStyCl4qM+fXidRySQ+pzn3S6FokQ+\nMdJckoualSqJMk8fhp5m+iX1DhbkTYC324mWJFDsiZnTppF2wYRxZvrF8WECgZuCRQSCNwo7Csv3\nWihEAqlPxo8nYZZCkYSXInelt8bmX3/F7Bkz0L5zZ4NkWW0ye9PYWACGTVArWgi6mBpzPnzwQPdY\nW7mr5fXr17rH+fn5SEtNtdpLjmmljEKhKBv97zrFPEQCKaZKAArFVpG7Gs7Lyws+vr7YtmmTwdaE\nNpg59McfAIAz//6re03ssnaubGOZm0WhUCi2BpFA6sK5cyTMUiiSIHdDzbtpaZwa1T58+RJPjVaV\n5OYFQ/Pyeg0aEPDE/gizoKZOoVCkR7n1yBSKjeBbtqys9v4XH4+Zc+awHl/azQ2exYtL6BE//jp2\njLQLnHHRaNooiWQRtZ/UarVgfSYKpbBBAykKpRBw+p9/0JNlc1uKeSz10iOFt4+PqHOxbTw7b/Fi\n0ezaMq3atiXtAoUwRAKpBjz7/lAoFH7UqFULqzWNh0nwMUFVdTFxVaBCe6pef1Mx5kpLTWU1drSm\ngXVhh49AK8W+IBJIHePZ94dCUSJHNQ1ybZHe/fvLYqcWQ89NW4SpXx6FQinc0K09CkUgpy9fJu2C\nRcxJHwQEBuLH775DYPnykvuwVAHbQJGVKxO17+7ujojISNHn9fP3Jz6XsWirVLi4uMBBo3EmdEvT\n2dlZN5fciPmeiUlZTYNgCjdoIEWhCETpWxwpZvq73bt37+3/Zag6TL5xQ3IbVn0g0JBXn+zsbFF6\n7Rmjr/kllCmffcbrPLkUwnNyclCgqQC1pl1mjdzcXN1cciPmeyYmjx4+JO2CTUIDKR7oCx3aCvMW\nL8aHgweTdsOALQIajgJA2w4dGI9P/fxzs+cs49lLyRJCW1VITXZWFv48dMjkeI5GbE8O0b0HCmjF\nkZ2VJXgOId+h/Px8ZGdnC/bBmI9HjRJtrkH9+vE6Lz8/n7dNLkUQ+oFPbm4ub5vGc8kNiUbJbBg4\ndChpF2wSGkjxQF/oUMnEdeyoezx62DAsX7qUmC8ODg4mMgEdefbY0rLTTHPV6Z98YvacgZpeSnXq\n1hVkW5/QsDDR5pKKcgEBRO3P12sL1blbN5QsVcqslEAlM1twtaOjJfGNCyS/Q+aYN2sWaRcEQaoI\nonSZMoWul5w1vpo/n7QLNgkNpOyY7Vu2kHZBR0FBgezClZY4eeIEaRdkJSg4mKh9fdmATevX4/Gj\nR3j16hXj2EtmVvhO/f23JL5xoXW7dqRdoOgxYswY3uc+fPBAsStDFHGIqlFDFjs0kKJQBKBEoUsm\ncgj3zLKXqr1dO3aQdkHxqFQqFFGrZZGKWDh3ruQ27JXiXl4AYNercmdPn5bFDg2kKBQB0O7o7NDm\no3zQt6/FcYdPnMBXy5YBAKLr1QMA/LZ/P2s7w0TMFxKbho0bk3ZBNlQAIPF3Y9zkyZLOb+/Qa5d4\nFPpASqVSobiNrCpQlMeTzEzZbW7+9VfMnjGD0zk3rl+XyBt2aMu9f1q50uK4JnXr6sQ7tZpNbZo1\nAwD0/fBDq3b+3969R0V13XsA/w4BURSwKg9hMNBhcBhQwKBUY3oVBGsUYpQYpQGNj3g16xqNtV4T\nV2q6okDUNmqatGnNjSsml6RJFKVC1LhcajSgRVojN0LJYGaAeH2h8pDhse8fwtzBgXmcOTP7zPD7\nrMWSOXPO3j/WVvzNOXvv324Hz/GwZ3f4M6dOiReIkX8XcYKwGAtSGGNob29HS3OzCBH1782tW+26\nPlQul2TJH2e5fesWAOlOfHclAz6RYoyhkXNBV0JsMe+ZZ7Bx82arz7/e3AxfX18HRmTZW9u3293G\n+++9J0Ik9uG5O3x//vj226K1JcXJ9I5Sp9P1O0+PmPLz84PP0KG8w5Aks4nU/fv3kZSUhPj4eKjV\namzatAkAsGXLFsjlciQkJCAhIQHFxcWGa3Jzc6FUKqFSqXD06NE+27Vlwqafv7/Ti8I60seff46P\n/vpX3mH069+Sk53WV2FJCd78/e8d0vbHn39u0/lPzZ/vkDik4KmZM+Hr58c1BneZIyVFUSLW/5Ni\nLcEeYtYUJLa7e/euw+8yuioZY4yZO6GlpQU+Pj7o6OjA1KlTsWPHDnz11Vfw9fXFyw/NR6isrERW\nVhbOnz+Puro6zJgxA1VVVb12j5XJZJBetSpChFu2ciX2ds/rcQZPT094eHhYdUteHRuL2u+/x2OT\nJpnsJTU7IwN/O3TIQVH2NnjwYPr0TwhxSS148PSqPxYf7fWsvNDr9ejs7MRPumf699VoYWEhFi1a\nBC8vL4SHhyMyMhJlZWUCQyeE9GXDK6/g08JCq88/KYGtHqR8p4On2PHj7W7D3Aa0tnp1yxbR2iJk\noLCYSHV1dSE+Ph5BQUGYPn06Yrrrdu3ZswdxcXFYtmyZYY5RfX095EY1hORyOerq6kzabDf6Er4f\nrvVmP/WUybGkKVMEt6eKjrYnHBPe3t4IHj1a0LW/zc0VNRbifqaJuPmoUFLbVV+oeQsWCL62r/pq\n3/7zn/aEA8D8BrS22mpnIiX095gQc53wOH6g1Z6b8sQTvEOQhJlz5kCuUBjyFEssJlIeHh6oqKiA\nTqfDqVOncPLkSaxatQoajQYVFRUYPXo01q9f3+/1fS2x9DL6esSKIO31tz4+vZeePSu4ve9ELlLb\n1taGHxsaBF37Wve8NTJwHDl8GH/YtYt3GDbpWYnn6r749FPB19bpdCJGIk1Cf48JcdDGeZBC1D90\nI+APf/6zw/vk6ezp07xDkIQvi4qgq6kx5CmWWL1qz9/fH7Nnz8aFCxcQGBgImUwGmUyG5cuXGx7f\nhYaGQqvVGq7R6XQIHWAZPSG2kslkNi3DfjI9HS++9JIDIyLEMUaOGsU7BLu8uGIF7xCIBJlNpG7c\nuGF4bNfa2opjx44hISHBUDUeAA4cOIBx48YBADIyMlBQUAC9Xg+NRoPq6mpMotU6hJjFGHP7vVyu\nS6BoMeHvjoS2mvnaRWqmEunzNPdmQ0MDFi9ejK6uLnR1dSE7OxspKSnIyclBRUUFZDIZIiIi8Kfu\nFUtqtRoLFiyAWq2Gp6cn3nnnHdo9lRAr2FKJ/mptrc37ufD+Dyxg2DCu/UvBhMRElF+4wDsMrjo6\nOniHYPD4Y4/xDoG4CYvbH4jeIW1/QNzMwZISzP3FL2y6Jnb8eMETjTe99hqSJk/G3FmzLJ6rjo1F\n9pIlOFJUxHX7A3cxa84cFBcV8Q7DrSTPmIETNpQBIsTZ7N7+gBBi3uEDB2y+RozVWtb6uwTugvz3\nF1/wDkEUvJOoiUlJorUllY2OKYki9kq18YOs2LgkUgt/+Use3RIyIP39/HneIeBlEevBuaNRAQFm\n3w8MCgIAnC8tFa3Phvp60doihKdjJSVc++eSSBV89BGPbglxiIL9+3mHYJaHhwfKugsAG3vXQgFh\nW6188cV+36u2sPQ/OTVV1FiMLX3hBZOSQQuysgS19cpvfmNyLPPZZwW1ZczSXFLj6hBC/NfHH1t1\nXu6OHVa3+SejuoO73n3X1pAM1jxUIcMeq9essXhOVk4OACBSqeRWOy5v506TY5tff51DJH2LUqkk\nU9D5s8OHDd//VKHAMAnOt6Q5UoTYKTwiArUajdP6s3WOVH8lYk6cPYtkOzamtUVTVxeG2ZkMiCF3\n505sMrPvnSWz5sxBTXU1qq5cETEqwkOoXI6bN25Q6SJikcvPkRoyZAj8hw+3q40pU6eKFA2QOnMm\nlixfbnJcHhaGN/LzRetHbGPCw3mH4JaezsxESlqaoGtnzZkjcjR9e+2NN6CpqTE5vjMvz+o2Zj75\nZK/XU3/+c5NzJkpgB3VL7EmigAe1R5vs2MrB29vbUGZLiuY984zT+0xx4N1Ic9rb221aLUvc23NL\nlhi+X/jcczZdK/lEqrW11e6l22fPnBEpGuDYl1/ig7/8xeS4TqvF5o0bRetHbD/U1vIOwS0d+Owz\nwdfaO3HZ188PX548if/sLhGijo01vGf8uOS3mzfj3r17Jteru8s99aXnUc0wX18olErMSEuDj48P\nHnnkQS2CM6dOmVxzXgI1/exlaTK39upVk92urZWSloa2tjbcvn0bAPDTyEhB7TwsSkAdw/6S3ktO\nXATR46tjx5zeJwD877VrFvdvGy7hpJeIa7/Ro2pbp2twSaS8vKzZdJ0QYs69u3cxc9o05HUX+5gQ\nQQAACKpJREFUra389lvDe+/u2WP4/npzM8bHx5tcX11VZXKs59/mS6tWAQCa7t1DTXU1Nqxdi47O\nTnR2PqiO6enZ/xZ0gwcPFnX/OGdsnPhI989zvrQUVUbVGbwGDYK3t7fhtbl5YLb6/l//suq82RkZ\nvV6v3bCh1+uXfvUrq9p52eiDXn9J73+sW2dVWz2u/PCDTefb64zRCtTXt20zJPbGY9SjzEJSaM3f\n0cbupFdsf+yenxiXkGD4GQAgc+FCnL90ySF9WuN3b7/Nre/+HHOBsjVc5kjdvHEDYS5eKoCQHstW\nrsTe7k1pncGaOVKhcjnqdDqoY2MxdOhQVF6+jOaHHkmtXrMG7+ze3etY2Jgx0Pbzn2PU2LGGuUHB\no0f3W1dtXFwcqq5cQZvR3BOpzJHqz6iAANy4ft3keHhEBHx8fFB5+TKAB1sGCF3tNszXF0193Bkk\nwvkPH447jY2IjonB/3SPkbW8Bw9Gu17P5fGePCwMOqOEnUibJOdIURJF3InYRawtKS4q6pUALX3h\nhV5/Ag8K5Cq7H/lcvnQJExITTdq52sfj3v6SKAC9JlibK0576R//6JVEAZB0EgWgzyQKAGo1GkMS\nBdi3ZQAlUeLrmfZhaxIFAG3373ObI0VJlHvh8tvt8T4mqhLiqp7vY/GBUNbMd6koL8fR4mLD6/ff\ne6/Xnz2qJbKy7Nevvso7BEIIcRguidRYlYpHt5L2ycGDvEOQhCMnTvAOwWbnvv5atLasWVYfEBiI\nR21chdnc3CwwIvu9uXUrt74JIcTRzBYtJs7z7Ny5vEOQhCeTk3mHIHmNt2/j7p07Nl0zRCKb6xFC\niLuR9sQFQgagxyZONPt+SGgoomy8q+thtDKIEAKMefRRDBkyhHcYxA3QHSlCJMZSbby+JokTQmzz\nw9WrvEMgboLLHamrTiynQYirGjFyJBRKpcnxKJUKP3NSaRciDbyr2xNC+sfljpSYk3MJcVe3bt7E\nrZs3TY5XffedzW3p29rECGnA8/Pzw927d53eL+/q9u5miI8PWltaeIdB3ASXO1K21rEhRKq2bd/O\nOwSrlJ47xzsEtyCT+H5YxDri7btPCKdEqpYe7RE38cpD5TqcYWx0NCY//rhN1zwxbZpjghlgwsaM\n4R0CEUEL3Y0iIuLyaO8EpyKVxH6dAGj9F19XBOykrqmpobETwbccivr2oPFzbTR+7ovLHSnjUhbE\ntfApqCBt9pQNscaogADD90mTJwtqI0KhoLFzcTR+ro3Gz31xSaQ6Ozp4dEuIQ4wOCXFo+8Z14ErP\nnYOnpycGDRrk0D4JIYRYh0site/993l0S4hbGDFyJELkct5hEEIIASBjjDGndiij9RKEEEIIcR3m\nUiWnTzZ3ct5GCCGEEOIwtCkKIYQQQohAlEgRQgghhAhEiRQhhBBCiEBOTaRKSkqgUqmgVCqRn5/v\nzK6JFbRaLaZPn46YmBjExsZi9+7dAIBbt24hNTUVUVFRSEtLQ2Njo+Ga3NxcKJVKqFQqHD16lFfo\nxEhnZycSEhKQnp4OgMbPlTQ2NiIzMxPR0dFQq9UoLS2l8XMRubm5iImJwbhx45CVlYW2tjYau4GC\nOUlHRwdTKBRMo9EwvV7P4uLiWGVlpbO6J1ZoaGhgFy9eZIwxdu/ePRYVFcUqKyvZhg0bWH5+PmOM\nsby8PLZx40bGGGOXL19mcXFxTK/XM41GwxQKBevs7OQWP3lg586dLCsri6WnpzPGGI2fC8nJyWF7\n9+5ljDHW3t7OGhsbafxcgEajYREREez+/fuMMcYWLFjAPvjgAxq7AcJpd6TKysoQGRmJ8PBweHl5\nYeHChSgsLHRW98QKwcHBiI+PBwAMGzYM0dHRqKurw6FDh7B48WIAwOLFi3Hw4EEAQGFhIRYtWgQv\nLy+Eh4cjMjISZWVl3OIngE6nw5EjR7B8+XLDClkaP9dw584dnD59GkuXLgUAeHp6wt/fn8bPBfj5\n+cHLywstLS3o6OhAS0sLQkJCaOwGCKclUnV1dQgLCzO8lsvlqKurc1b3xEa1tbW4ePEikpKScO3a\nNQQFBQEAgoKCcO3aNQBAfX095EYbQ9KY8rdu3Tps374dHh7//0+bxs81aDQaBAQE4Pnnn8eECROw\nYsUKNDc30/i5gBEjRmD9+vUYM2YMQkJCMHz4cKSmptLYDRBOS6RoI07X0dTUhPnz52PXrl3w9fXt\n9Z5MJjM7ljTO/BQVFSEwMBAJCQn97tdG4yddHR0dKC8vx+rVq1FeXo6hQ4ciLy+v1zk0ftJUU1OD\nt956C7W1taivr0dTUxP279/f6xwaO/fltEQqNDQUWq3W8Fqr1fbKyIk0tLe3Y/78+cjOzsbcuXMB\nPPgk9eOPPwIAGhoaEBgYCMB0THU6HUJDQ50fNAEAnD17FocOHUJERAQWLVqEEydOIDs7m8bPRcjl\ncsjlckycOBEAkJmZifLycgQHB9P4SdyFCxcwZcoUjBw5Ep6enpg3bx7OnTtHYzdAOC2RSkxMRHV1\nNWpra6HX6/HJJ58gIyPDWd0TKzDGsGzZMqjVaqxdu9ZwPCMjA/v27QMA7Nu3z5BgZWRkoKCgAHq9\nHhqNBtXV1Zg0aRKX2Amwbds2aLVaaDQaFBQUIDk5GR9++CGNn4sIDg5GWFgYqqqqAADHjx9HTEwM\n0tPTafwkTqVS4ZtvvkFraysYYzh+/DjUajWN3UDhzJntR44cYVFRUUyhULBt27Y5s2tihdOnTzOZ\nTMbi4uJYfHw8i4+PZ8XFxezmzZssJSWFKZVKlpqaym7fvm24ZuvWrUyhULCxY8eykpISjtETYydP\nnjSs2qPxcx0VFRUsMTGRjR8/nj399NOssbGRxs9F5OfnM7VazWJjY1lOTg7T6/U0dgOE04sWE0II\nIYS4C9rZnBBCCCFEIEqkCCGEEEIEokSKEEIIIUQgSqQIIYQQQgSiRIoQQgghRCBKpAghhBBCBPo/\nnFSSicQRsB8AAAAASUVORK5CYII=\n" }, "metadata": {}, "output_type": "display_data" } ], "source": [ "tdmatrix = zeros((1000,400))\n", "for tid in range(1000):\n", " for did in invindex[tid]:\n", " if did>=400: break\n", " tdmatrix[tid,did] = 1\n", "figsize(10,8)\n", "imshow(tdmatrix.T,cmap=cm.hot)\n", " " ] }, { "cell_type": "markdown", "id": "4098bcba", "metadata": {}, "source": [ "## Queries" ] }, { "cell_type": "markdown", "id": "3585458a", "metadata": {}, "source": [ "In terms of this inverted index, we can now express queries as intersections and unions of sorted lists of postings." ] }, { "cell_type": "code", "execution_count": 58, "id": "40bcc8d9", "metadata": { "collapsed": false }, "outputs": [], "source": [ "def intersect_postings(u,v):\n", " # u and v must be sorted\n", " result = []\n", " while u!=[] and v!=[]:\n", " if u[0]==v[0]: \n", " result.append(u[0])\n", " u = u[1:]\n", " v = v[1:]\n", " elif u[0] occurrences\n", "!head occurrences" ] }, { "cell_type": "markdown", "id": "e1c165ef", "metadata": {}, "source": [ "Next, we sort on the document name and then assign document ids.\n", "The sorting process is potentially expensive and may use a lot of memory,\n", "but this is handled efficiently by the merge sort algorithm inside `sort`.\n", "\n", "We also use the `-u` (unique) argument because we are only interested in a single\n", "correspondence per document.\n", "\n", "We assign document ids by watching when the document name changes in the sorted list." ] }, { "cell_type": "code", "execution_count": 112, "id": "b37123e6", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 A\r\n", "1 ADDRESS\r\n", "1 Admiral\r\n", "1 Aggressors\r\n", "1 All\r\n", "1 Allies\r\n", "1 Almighty\r\n", "1 America\r\n", "1 American\r\n", "1 Americans\r\n" ] } ], "source": [ "!sort -u occurrences | awk 'BEGIN{FS=\":\"};{if($1!=last){did++;last=$1};print did,$2}' > did-token\n", "!head did-token" ] }, { "cell_type": "markdown", "id": "216dfc48", "metadata": {}, "source": [ "The `sort` program lets us sort on arbitrary keys, so we now sort on the token as keys." ] }, { "cell_type": "code", "execution_count": 113, "id": "46d8f315", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 A\r\n", "2 A\r\n", "3 A\r\n", "4 A\r\n", "5 A\r\n", "6 A\r\n", "8 A\r\n", "9 A\r\n", "10 A\r\n", "11 A\r\n" ] } ], "source": [ "!sort -k 2,2 -k 1,1n did-token > bytoken\n", "!head bytoken" ] }, { "cell_type": "markdown", "id": "9944140e", "metadata": {}, "source": [ "Finally, we repeat the process of assigning ids to tokens sequentially,\n", "and we also invert the order.\n", "This now gives us a representation of the inverted index that we can read in directly." ] }, { "cell_type": "code", "execution_count": 114, "id": "b9a7fb45", "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1 1\r\n", "1 2\r\n", "1 3\r\n", "1 4\r\n", "1 5\r\n", "1 6\r\n", "1 8\r\n", "1 9\r\n", "1 10\r\n", "1 11\r\n" ] } ], "source": [ "!awk '{if($2!=last){wid++;last=$2};print wid,$1}' bytoken > invindex\n", "!head invindex" ] }, { "cell_type": "markdown", "id": "e0998e8c", "metadata": {}, "source": [ "We can now read in the inverted index very easily (but, of course, if the whole\n", "inverted index fit into memory, we could have just built it directly)." ] }, { "cell_type": "code", "execution_count": 115, "id": "202c7e93", "metadata": { "collapsed": false }, "outputs": [], "source": [ "invindex = [[] for i in range(20000)]\n", "for l in open(\"invindex\").readlines():\n", " wid,tid = [int(s) for s in l.split()]\n", " invindex[wid].append(tid)" ] }, { "cell_type": "code", "execution_count": 116, "id": "cfd51ad2", "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[2, 11, 13, 16, 17, 21, 23, 24, 25, 38, 57, 58]" ] }, "execution_count": 116, "metadata": {}, "output_type": "execute_result" } ], "source": [ "invindex[700]" ] }, { "cell_type": "markdown", "id": "a4163e31", "metadata": {}, "source": [ "Of course, we would have to keep track of the string-to-id assignments (these can just be logged to a file).\n" ] }, { "cell_type": "markdown", "id": "6e715539", "metadata": {}, "source": [ "Again, what makes this work efficiently is the fact that we have reduced all the operations to...\n", "\n", "- sequential operations that use constant memory (`grep`, id assignment)\n", "- text file sorting, which uses highly optimized algorithms for disk/memory sorting\n", "\n", "If you're building large inverted indexes for research purposes, this is a good way of doing it.\n", "It is similar in spirit to `MapReduce` (and can be run distributed by using distributed sorting etc.).\n" ] }, { "cell_type": "markdown", "id": "a3895c82", "metadata": {}, "source": [ "Note that you don't get a *dynamic index*: if you have to insert a large number of new documents,\n", "you need to restart the process." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 5 }