{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline\n", "import pandas as pd\n", "\n", "import numpy as np\n", "from __future__ import division\n", "import itertools\n", "\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "import logging\n", "logger = logging.getLogger()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3 Finding Similar Items\n", "===================\n", "Problem: \n", "+ \"similar\": finding sets with a relatively large intersection. \n", " - shingling \n", " - minhashing \n", "+ too many pairs of items to test \n", " - locality-sentitive hashing\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.1 Applications of Near-Neighbor Search\n", "\n", "The *Jaccard Similarity* of sets $S$ and $T$:\n", "\\begin{equation}\n", "SIM(S,T) = \\frac{| S \\cup T |}{| S \\cap T |}\n", "\\end{equation}\n", "\n", "##### Similar-sets Problem\n", "1. Similar Documents \n", "In many applicatons, the documents are not identical, yet they share large portions of their text. \n", "eg: Plagiarism, Mirror Pages, Articles from the Same Source. \n", "2. Collaborative Filtering(Similar Tastes) \n", "eg: \n", " + Online Purchases \n", " + Movie Ratings \n", " When data is not binary: \n", " - Ignore low-rated customer/movie pairs \n", " - \"liked\" VS \"hated\" \n", " - 1-to-5-starts, put it in a set $n$ times. <= _Jaccard similarity for bags_\n", " \n", "##### Exercises" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a:set([1, 2, 3, 4]), b:set([2, 3, 5, 7]), SIM:0.333\n", "a:set([1, 2, 3, 4]), b:set([2, 4, 6]), SIM:0.400\n", "a:set([2, 3, 5, 7]), b:set([2, 4, 6]), SIM:0.167\n" ] } ], "source": [ "#Exercise 3.1.1\n", "logger.setLevel(logging.WARNING)\n", "\n", "data = [\n", " set([1, 2, 3, 4]),\n", " set([2, 3, 5, 7]),\n", " set([2, 4, 6]),\n", "]\n", "\n", "def Jaccard_similarity_calc(set_a, set_b):\n", " \"\"\"calculate the Jaccard similarity of two sets\n", " \n", " res = \\frac{a \\cap b}{a \\cup b}\n", " \"\"\"\n", " \n", " assert isinstance(set_a, set), '{} is not a set'.format(set_a)\n", " assert isinstance(set_b, set), '{} is not a set'.format(set_b)\n", " \n", " logging.debug('a:{}, b:{}'.format(set_a, set_b))\n", " logging.debug('inter:{}, union:{}'.format(set_a.intersection(set_b), set_a.union(set_b)))\n", " return len(set_a.intersection(set_b)) / len(set_a.union(set_b))\n", "\n", "for comb in list(itertools.combinations(range(3),2)):\n", " set_a, set_b = data[comb[0]], data[comb[1]]\n", " print('a:{}, b:{}, SIM:{:.3f}'.format(set_a, set_b, Jaccard_similarity_calc(set_a, set_b)))\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "a:[1, 1, 1, 2], b:[1, 1, 2, 2, 3], JbSIM:0.333\n", "a:[1, 1, 1, 2], b:[1, 2, 3, 4], JbSIM:0.250\n", "a:[1, 1, 2, 2, 3], b:[1, 2, 3, 4], JbSIM:0.333\n" ] } ], "source": [ "#Exercise 3.1.2\n", "logger.setLevel(logging.WARNING)\n", "\n", "data = [\n", " [1, 1, 1, 2],\n", " [1, 1, 2, 2, 3],\n", " [1, 2, 3, 4],\n", "]\n", "\n", "def Jaccard_bag_similarity_calc(a, b):\n", " \"\"\"calculate the Jaccard bag similarity of two sets\n", " \n", " see page 76, movie ratings 3.\n", " intersecton = min(times of element in the two sets)\n", " union = sum of length of the two sets\n", " res = intersection / union\n", " \"\"\"\n", " \n", " from collections import Counter\n", " \n", " count_a = Counter(a)\n", " count_b = Counter(b)\n", " logging.debug('count_a:{}\\n count_b:{}'.format(count_a,count_b))\n", " \n", " inter = [min(count_a[x], count_b[x]) for x in count_a if x in count_b]\n", " logging.debug('intersection:{}'.format(inter))\n", " \n", " return sum(inter) / (len(a) + len(b))\n", " \n", "for comb in list(itertools.combinations(range(3),2)):\n", " set_a, set_b = data[comb[0]], data[comb[1]]\n", " print('a:{}, b:{}, JbSIM:{:.3f}'.format(set_a, set_b, Jaccard_bag_similarity_calc(set_a, set_b)))\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_exercise 3.1.3_\n", "\n", "引理: \n", "定义\n", "\n", "\\begin{equation}\n", " I\\{A\\} = \\begin{cases}\n", " 1 & \\text{if $A$ occurs.} \\\\\n", " 0 & \\text{otherwise.}\n", " \\end{cases}\n", "\\end{equation}\n", "\n", "令 $X_A = I\\{A\\}$,则有 $E[X_A] = \\sum I\\{A\\}P[A] = P[A]$\n", "\n", "\n", "令 $x_i$ 表示 S 和 T 中有 $i$ 个相同元素,则:\n", "\n", "\\begin{align}\n", " E[x_i] & = P[\\text{S 和 T 中有 $i$ 个相同元素}] \\\\\n", " & = \\frac{C_n^i \\, C_{n-i}^{2(m-i)}}{C_n^{m} C_n^{m}}\n", "\\end{align}\n", "\n", "所以\n", "\n", "\\begin{align}\n", " E[J] & = \\sum_{i=0}^m E[x_i] \\, J_i \\\\\n", " & = \\frac{\\sum_{i=0}^m \\, C_n^i \\, C_{n-i}^{2(m-i)} \\, \\frac{i}{2(m-i)}}{C_n^{m} C_n^{m}}\n", "\\end{align}\n", "\n", "\n", "`#todo 推导最终式`\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.2 Shingling of Documents\n", "\n", "**k-shingle**: any substring of length $k$ found within the document. \n", "+ variation: set -> bag\n", "+ multiple white-space => one white-space\n", "+ choosing k \n", " - **large enough** that prob. of **any given shingle** appearing in **any given document** is **low**. \n", " k 越大,对相似性要求越高\n", " - estimate the number of k-shingles as $20^k$, instead of 27 char, 因为各个字符的出现频率并非一致\n", " - in general, \n", " email: k = 5; \n", " large documents, such as research articles: k = 9 \n", "+ stop words, such as \"and\", \"you\", \"to\" \n", " - ignore them in many application\n", " - **\"stop words + two words\"** used as shingle in finding similar **news articles**.\n", " \n", "**hashing Shingles**: h(shingle) = bucket\n", "+ 9-shingles, then hash to 4 char > 4-shingles" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Example 3.3: ['ab', 'bc', 'cd', 'da', 'bd']\n", "other example: ['he', 'el', 'll', 'lo', 'o ', ' i', 'i ', ' m', 'm ', ' s', 'so']\n" ] } ], "source": [ "logger.setLevel(logging.WARNING)\n", "\n", "def to_alphanumeric_string(s):\n", " \"\"\"remove any non-alphanumeric characte in string s.\n", " \"\"\"\n", " import re\n", " \n", " s = s.lower()\n", " s = re.sub('\\W', ' ', s)\n", " logging.debug('after: del special char: {}'.format(s))\n", " s = re.sub('\\s+', ' ', s).strip()\n", " logging.debug('after: del multiple whitespace: {}'.format(s))\n", " \n", " return s\n", "\n", "def k_shingle_extrac(document, k):\n", " \"\"\"extract all k-shingles from document.\n", " \n", " \"\"\"\n", " document = to_alphanumeric_string(document)\n", "\n", " assert len(document) >= k, 'k should be less than the length of document'\n", " \n", " k_shingles = []\n", " while len(document) >= k:\n", " shingle = document[0:k]\n", " if shingle not in k_shingles:\n", " k_shingles.append(shingle)\n", " document = document[1:]\n", " \n", " return k_shingles\n", "\n", "\n", "print('Example 3.3: {}'.format(k_shingle_extrac('abcdabd', 2)))\n", "print('other example: {}'.format(k_shingle_extrac(\"Hello, I'm So-So. \", 2)))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "##### exercise" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "first 10 3-shingles:['the', 'he ', 'e m', ' mo', 'mos', 'ost', 'st ', 't e', ' ef', 'eff']\n" ] } ], "source": [ "#exercise 3.2.1\n", "\n", "data = \"The most effective way to represent documents as sets, for the purpose of iden- tifying lexically similar documents is to construct from the document the set of short strings that appear within it. \"\n", "\n", "print('first 10 3-shingles:{}'.format(k_shingle_extrac(data,3)[0:10]))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "stop-shingles:['the most effective', 'to represent documents', 'as sets for', 'for the purpose', 'the purpose of', 'of iden tifying', 'is to construct', 'to construct from', 'from the document', 'the document the', 'the set of', 'of short strings', 'that appear within', 'it']\n" ] } ], "source": [ "#exercise 3.2.2\n", "logger.setLevel(logging.WARNING)\n", "\n", "def stopwords_shingles_extract(document, k):\n", " \"\"\"extract the stopwords-shingle.\n", " \n", " stropwords-shingle = stop-words + k words\n", " \"\"\"\n", " stop_words_list = ['the', 'you', 'to', 'as', 'for', 'of', 'is', 'that', 'it', 'from']\n", " \n", " document = to_alphanumeric_string(document)\n", " document = document.split()\n", " logging.debug('split:{}'.format(document))\n", " \n", " shingles = []\n", " k = k + 1 #len(shingle) = 1 stop-word + k words\n", " while(document):\n", " try:\n", " logging.debug('check:{}'.format(document[0]))\n", " if document[0] in stop_words_list:\n", " shingle = ' '.join(document[0:k])\n", " logging.debug('hit: {}'.format(shingle))\n", " \n", " if shingle not in shingles:\n", " shingles.append(shingle)\n", " \n", " except IndexError:\n", " logging.debug('Index Error: no of char:{}, k: {}'.format(len(document), k))\n", " k = len(document)\n", " continue\n", " \n", " document = document[1:]\n", " \n", " return shingles\n", "\n", "print('stop-shingles:{}'.format(stopwords_shingles_extract(data, 2)))\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`#exercise 3.2.3`\n", "\n", "k-shingles 最大数量:所有提取的 shingle 均不同 \n", "故从左向右依次取片,有: \n", "the largest number = n - (k - 1) = n - k + 1\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.3 Similarity-Preserving Summaries of Sets\n", "\n", "##### Matrix Representation of Sets: \n", "(r,c) = 1: the element for row r is a member of the set for column c. \n", "(r,c) = 0: otherwise \n", "\n", "+ matrices are almost always **sparse**" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1S2S3S4
a1001
b0010
c0101
d1011
e0010
\n", "
" ], "text/plain": [ " S1 S2 S3 S4\n", "a 1 0 0 1\n", "b 0 0 1 0\n", "c 0 1 0 1\n", "d 1 0 1 1\n", "e 0 0 1 0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Example 3.6\n", "\n", "logger.setLevel(logging.WARNING)\n", "\n", "my_database = ['a', 'b', 'c', 'd', 'e']\n", "my_features_dict = {\n", " 'S1':['a', 'd'],\n", " 'S2':['c'],\n", " 'S3':['b', 'd', 'e'],\n", " 'S4':['a', 'c', 'd']\n", " }\n", "\n", "def matrix_representation_create(database, features_dict):\n", " \"\"\"create the matrix representation of one database.\n", " \n", " \"\"\"\n", " matrix_ = np.zeros((len(database), len(features_dict)), dtype=np.int)\n", " matrix = pd.DataFrame(matrix_, index=database, columns=sorted(features_dict.keys()))\n", " \n", " for feature_name, values in features_dict.iteritems():\n", " for value in values:\n", " matrix.loc[value, feature_name] = 1\n", " \n", " return matrix\n", " \n", " \n", "my_matrix = matrix_representation_create(my_database, my_features_dict)\n", "my_matrix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Minhashing\n", "\n", "_target_: sets of shingles are large =====> much smaller representations called \"**signatures**\"\n", "\n", "**minhash**\n", "1. pick a permutation of the rows\n", "2. h(columu c) = the row which has the first \"1\"" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1S2S3S4
h1acba
\n", "
" ], "text/plain": [ " S1 S2 S3 S4\n", "h1 a c b a" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Example 3.7\n", "\n", "logger.setLevel(logging.WARNING)\n", "\n", "def minhash(matrix, row_orders):\n", " \"\"\"calculate the minhash value of matrix according to element permutation in Fig 3.3.\n", " \n", " \"\"\"\n", " hash_fun_names = ['h{}'.format(i) for i in range(1, len(row_orders)+1)]\n", " hash_table = pd.DataFrame(np.zeros((len(row_orders), matrix.shape[1])), index=hash_fun_names, columns=matrix.columns)\n", "\n", " for row_order, hash_fun_name in zip(row_orders, hash_fun_names):\n", " matrix_p = matrix.loc[row_order,:]\n", " logging.debug('after permutation: \\n{}'.format(matrix_p))\n", " \n", " for c in matrix_p.columns:\n", " first_one_index = next((i for i, x in enumerate(matrix_p.loc[:,c]) if x), None)\n", " hash_table.loc[hash_fun_name, c] = row_order[first_one_index]\n", "\n", " return hash_table\n", "\n", "\n", "minhash(my_matrix, [['b', 'e', 'a', 'd', 'c']])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**minhash and Jaccard Similarity**: $P[h(S1) = h(S2)] = SIM(S1, S2)$\n", "\n", "*Proof*:\n", "rows can be divided into three classes:\n", "\n", "+ Type X rows have 1 in both columns (means both S1 and S2 have these features)\n", "+ Type Y rows have 1 in either S1 or S2\n", "+ Type Z rows have 0 in both columns (means both S1 and S2 don't have them)\n", "\n", "\\begin{align}\n", " | S1 \\cap S2 | & = len(X) \\\\\n", " | S1 \\cap S2 | & = len(X+Y) \\\\\n", " SIM(S1, S2) &= \\frac{len(X)}{len(X+Y)}\n", "\\end{align}\n", "\n", "randomly permute rows, there is at least one colunm whose value is 1:\n", "$$ P[h(S1) = h(S2)] = \\frac{len(x)}{len(X+Y)} $$\n", "\n", "**minhash signature**: \n", "use vector $[h_1(S), h_2(S), ... , h_n(S)]$ to represent matrix M \n", "namely, reduce the dim: columns(M) -----> n\n", "\n", "**Computing Minhash Signatures** \n", "picking a random permutation of large rows and find its hash value both are **time-consuming** ==> simulation" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matrix:\n", " S1 S2 S3 S4\n", "0 1 0 0 1\n", "1 0 0 1 0\n", "2 0 1 0 1\n", "3 1 0 1 1\n", "4 0 0 1 0\n", "\n", "minhash: 5dim -> 2dim \n", " true hash res:\n", " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0\n", "\n", "Minhashing\n", "S1-S2 minhash:0.000 true SIM:0.000\n", "S1-S3 minhash:0.500 true SIM:0.250\n", "S1-S4 minhash:1.000 true SIM:0.667\n", "S2-S3 minhash:0.000 true SIM:0.000\n", "S2-S4 minhash:0.000 true SIM:0.333\n", "S3-S4 minhash:0.500 true SIM:0.200\n" ] } ], "source": [ "logger.setLevel(logging.WARNING)\n", "\n", "my_matrix.index = range(my_matrix.shape[0])\n", "print('matrix:\\n{}\\n'.format(my_matrix))\n", "\n", "h_rows = [\n", " [1, 2, 3, 4, 0],\n", " [1, 4, 2, 0, 3]\n", "]\n", "\n", "def minhash_by_rows(matrix, row_orders):\n", " \"\"\"calculate the minhash value of matrix according to row_permutation in Fig 3.4.\n", " \n", " \"\"\"\n", " hash_fun_names = ['h{}'.format(i) for i in range(1, len(row_orders)+1)]\n", " hash_table = pd.DataFrame(np.zeros((len(row_orders), matrix.shape[1])), index=hash_fun_names, columns=matrix.columns)\n", "\n", " for row_order, hash_fun_name in zip(row_orders, hash_fun_names):\n", " logging.debug('row_order:{}, h:{}'.format(row_order, hash_fun_name))\n", " matrix_p = matrix.copy()\n", " matrix_p.index = row_order #new rows permutation\n", " matrix_p.sort_index(inplace=True) #adjust rows orders\n", " \n", " logging.debug('after permutation: \\n{}'.format(matrix_p))\n", " \n", " for c in matrix_p.columns:\n", " first_one_index = next((i for i, x in enumerate(matrix_p.loc[:,c]) if x), None)\n", " hash_table.loc[hash_fun_name, c] = first_one_index\n", "\n", " return hash_table\n", "\n", "my_minhash_res = minhash_by_rows(my_matrix, h_rows)\n", "print('minhash: 5dim -> 2dim \\n\\\n", " true hash res:\\n{}\\n'.format(my_minhash_res))\n", "\n", "print('Minhashing')\n", "\n", "for comb in list(itertools.combinations(range(4),2)):\n", " s_a, s_b = 'S{}'.format(comb[0]+1), 'S{}'.format(comb[1]+1)\n", " print('{}-{}'.format(s_a, s_b)),\n", " \n", " set_a, set_b = set(my_minhash_res.iloc[:,comb[0]]), set(my_minhash_res.iloc[:,comb[1]])\n", " print('minhash:{:.3f}'.format(Jaccard_similarity_calc(set_a, set_b))),\n", " print('true SIM:{:.3f}'.format(Jaccard_similarity_calc(set(my_features_dict[s_a]), set(my_features_dict[s_b]))))\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "对 $M$ 中的所有行做一次随机组合得到 $\\bar{M}$,非常耗时。 \n", "解决方法是不直接操作矩阵 $M$,而是分两步操作:\n", "\n", "1) 选择合适的哈希函数生成行号序列来模拟随机排列结果,如上个代码区块中 `h_rows` 即可用 $h_1(x) = (x + 1) \\% 5$ 和 $h_2(x) = (3x + 1) \\% 5$ 生成对应的行号" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "h_rows:[[1, 2, 3, 4, 0], [1, 4, 2, 0, 3]]\n" ] } ], "source": [ "def add_hash_func(a, b, c):\n", " return lambda x: (a*x + b) % c\n", "\n", "h_funcs = [\n", " add_hash_func(1, 1, 5),\n", " add_hash_func(3, 1, 5)\n", " ]\n", "\n", "h_rows = []\n", "for h_func in h_funcs:\n", " h_rows.append(map(h_func, range(5)))\n", " \n", "print('h_rows:{}'.format(h_rows))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2) 再按照行号重新对行排序,生成新矩阵 $\\bar{M}$ \n", "但第 2 步排序仍然需要操作矩阵,见函数 `minhash_by_rows` 中的语句 `matrix_p.sort_index(inplace=True)`。" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hash functions computed for the matrix:\n", " S1 S2 S3 S4 h1 h2\n", "0 1 0 0 1 1 1\n", "1 0 0 1 0 2 4\n", "2 0 1 0 1 3 2\n", "3 1 0 1 1 4 0\n", "4 0 0 1 0 0 3\n", "\n", "signature matrix\n", "(SIG):\n", " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0\n" ] } ], "source": [ "#Fig 3.4\n", "\n", "df_matrix = my_matrix\n", "df_h_rows = pd.DataFrame(np.array(h_rows).T)\n", "df_h_rows.columns = ['h{}'.format(x+1) for x in df_h_rows.columns]\n", "\n", "print('Hash functions computed for the matrix:\\n{}\\n'.format(pd.concat([df_matrix, df_h_rows], axis=1)))\n", "print('signature matrix\\n(SIG):\\n{}'.format(my_minhash_res))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let SIG(i,c) be the element of the signature matrix for the ith hash function and column c.\n", "\n", "替代方法是借助思路:$ SIG(i,c) = min(h_i(S_c=1)) $ \n", "即对于 $h_i(S_c)$ 值来说,它只可能是 “1” 对应的行号,而第一个 “1” 肯定是最小的行号。\n", "\n", "比如,我们想得到 SIG(1,1) 的值, \n", "$S1 = [1, 0, 0, 1, 0]^T$,对应的 $h1 = [1, 2, 3, 4, 0]^T$, \n", "SIG(1,1) 只能取 “1” 的行号,即 1 或 4, \n", "又 SIG 是第一个遇到的 “1”,所以 SIG(1,1) 肯定是 1。\n", "\n", "**思路总结**: \n", "我们对每个 SIG(i,c) 都可以通过遍历 $S_i$ 中 \"1\" 位置,取出 $h_c$ 中相应位置的行号,行号中最小的值就是解。\n", "\n", "而这个思路是可以并行化处理的,有两种方法:\n", "\n", "1. 列遍历:即从左至右遍历 $S_i$ 得到 \"1\" 位置时,我们可以据此取出所有 $h_c$ 相应的行号,分别取最小。\n", "2. 行遍历:书中 Example3.8 例子,从上往下遍历,逐行比较。 \n", " 步骤如下: \n", " 1. Compute $h_1(r), h_2(r), ... , h_n(r)$.\n", " 2. For each column $c$ do the following:\n", " + If $c$ has $0$ in row $r$, do nothing\n", " + else, if $c$ has $1$ in row $r$, \n", " for each $i = 1,2, ..., n$, \n", " $SIG(i,c) = min \\left( SIG(i,c), \\, h_i(r) \\right)$\n", " \n", "行遍历的方法,粗略想,计算量应该更少。 \n", "`#todo: 验证计算量`" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hash matrix:\n", " S1 S2 S3 S4 h1 h2\n", "0 1 0 0 1 1 1\n", "1 0 0 1 0 2 4\n", "2 0 1 0 1 3 2\n", "3 1 0 1 1 4 0\n", "4 0 0 1 0 0 3\n", "\n", "row:0,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 inf inf 1\n", "h2 1 inf inf 1\n", "\n", "row:1,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 inf 2 1\n", "h2 1 inf 4 1\n", "\n", "row:2,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 2 1\n", "h2 1 2 4 1\n", "\n", "row:3,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 2 1\n", "h2 0 2 0 0\n", "\n", "row:4,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1S2S3S4
h11301
h20200
\n", "
" ], "text/plain": [ " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#Example 3.8\n", "\n", "logger.setLevel(logging.WARNING)\n", "\n", "def minhash_signatures_calc(df_M, hash_funcs, nagging=False):\n", " \"\"\"computing minhash signatures by the way in Example 3.8.\n", " \n", " \"\"\"\n", " logging.debug('data matrix:\\n{}\\n'.format(df_M))\n", "\n", " h = []\n", " \n", " for hash_func in hash_funcs:\n", " h.append(map(hash_func, range(df_M.shape[0])))\n", " \n", " df_h = pd.DataFrame(np.array(h).T)\n", " df_h.columns = ['h{}'.format(x+1) for x in df_h.columns]\n", " logging.debug('hash matrix:\\n{}\\n'.format(df_h))\n", " \n", " if nagging:\n", " print('hash matrix:\\n{}\\n'.format(pd.concat([df_matrix, df_h], axis=1)))\n", " \n", " df_signatures = pd.DataFrame(np.ones((df_h.shape[1], df_M.shape[1]))*np.inf, index=df_h.columns, columns=df_M.columns)\n", " logging.debug('signatures matrix:\\ninit\\n{}\\n'.format(df_signatures))\n", "\n", " for r in df_M.index:\n", " for c in df_h.columns:\n", " r_1_loc = df_M.loc[r,:] == 1\n", " logging.debug('r:{}, c:{}, 1 loc:\\n{}\\n'.format(r,c, r_1_loc))\n", " \n", " sig_c = df_signatures.loc[c,:]\n", " line_bigger_loc = sig_c > df_h.loc[r, c]\n", " logging.debug('bigger row loc:\\n{}\\n'.format(line_bigger_loc))\n", " \n", " sig_c[line_bigger_loc & r_1_loc] = df_h.loc[r, c]\n", " logging.debug('modified:\\n{}\\n'.format(sig_c))\n", " \n", " df_signatures.loc[c,:] = sig_c\n", " if nagging:\n", " print('row:{},\\n signature matrix:\\n{}\\n'.format(r, df_signatures))\n", " \n", " return df_signatures\n", "\n", "minhash_signatures_calc(df_matrix, h_funcs, nagging=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### exercise 3.3" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S1-S2 true SIM:0.000 fraction:0.000\n", "S1-S3 true SIM:0.250 fraction:0.250\n", "S1-S4 true SIM:0.667 fraction:0.667\n", "S2-S3 true SIM:0.000 fraction:0.000\n", "S2-S4 true SIM:0.333 fraction:0.333\n", "S3-S4 true SIM:0.200 fraction:0.200\n" ] } ], "source": [ "#exercise 3.3.1\n", "\n", "#generate 120 permutations\n", "h_rows = list(itertools.permutations(range(5),5))\n", "my_minhash_res = minhash_by_rows(my_matrix, h_rows)\n", "\n", "for comb in list(itertools.combinations(range(4),2)):\n", " s_a, s_b = 'S{}'.format(comb[0]+1), 'S{}'.format(comb[1]+1)\n", " print('{}-{}'.format(s_a, s_b)),\n", " #calc Jaccard similarity\n", " print('true SIM:{:.3f}'.format(Jaccard_similarity_calc(set(my_features_dict[s_a]), set(my_features_dict[s_b])))),\n", " #calc the fraction of the 120 permutations in which the value is same\n", " print('fraction:{:.3f}'.format(sum(my_minhash_res.loc[:,s_a] == my_minhash_res.loc[:,s_b])/120))\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "hash matrix:\n", " S1 S2 S3 S4 h1 h2 h3 h4\n", "0 1 0 0 1 1 1 4 4\n", "1 0 0 1 0 2 4 1 2\n", "2 0 1 0 1 3 2 3 0\n", "3 1 0 1 1 4 0 0 3\n", "4 0 0 1 0 0 3 2 1\n", "\n", "row:0,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 inf inf 1\n", "h2 1 inf inf 1\n", "h3 4 inf inf 4\n", "h4 4 inf inf 4\n", "\n", "row:1,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 inf 2 1\n", "h2 1 inf 4 1\n", "h3 4 inf 1 4\n", "h4 4 inf 2 4\n", "\n", "row:2,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 2 1\n", "h2 1 2 4 1\n", "h3 4 3 1 3\n", "h4 4 0 2 0\n", "\n", "row:3,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 2 1\n", "h2 0 2 0 0\n", "h3 0 3 0 0\n", "h4 3 0 2 0\n", "\n", "row:4,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0\n", "h3 0 3 0 0\n", "h4 3 0 1 0\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1S2S3S4
h11301
h20200
h30300
h43010
\n", "
" ], "text/plain": [ " S1 S2 S3 S4\n", "h1 1 3 0 1\n", "h2 0 2 0 0\n", "h3 0 3 0 0\n", "h4 3 0 1 0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#exercise 3.3.2\n", "h_funcs[2:4] = [\n", " add_hash_func(2, 4, 5),\n", " add_hash_func(3, -1, 5)\n", "]\n", "\n", "minhash_signatures_calc(df_matrix, h_funcs, nagging=True)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false, "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fig 3.5:\n", " S1 S2 S3 S4\n", "0 0 1 0 1\n", "1 0 1 0 0\n", "2 1 0 0 1\n", "3 0 0 1 0\n", "4 0 0 1 1\n", "5 1 0 0 0\n", "\n", "hash matrix:\n", " S1 S2 S3 S4 h1 h2 h3\n", "0 0 1 0 1 1 2 2\n", "1 0 1 0 0 3 5 1\n", "2 1 0 0 1 5 2 0\n", "3 0 0 1 0 1 5 5\n", "4 0 0 1 1 3 2 4\n", "5 1 0 0 0 5 5 3\n", "\n", "row:0,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 inf 1 inf 1\n", "h2 inf 2 inf 2\n", "h3 inf 2 inf 2\n", "\n", "row:1,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 inf 1 inf 1\n", "h2 inf 2 inf 2\n", "h3 inf 1 inf 2\n", "\n", "row:2,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 5 1 inf 1\n", "h2 2 2 inf 2\n", "h3 0 1 inf 0\n", "\n", "row:3,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 5 1 1 1\n", "h2 2 2 5 2\n", "h3 0 1 5 0\n", "\n", "row:4,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 5 1 1 1\n", "h2 2 2 2 2\n", "h3 0 1 4 0\n", "\n", "row:5,\n", " signature matrix:\n", " S1 S2 S3 S4\n", "h1 5 1 1 1\n", "h2 2 2 2 2\n", "h3 0 1 4 0\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
S1S2S3S4
h15111
h22222
h30140
\n", "
" ], "text/plain": [ " S1 S2 S3 S4\n", "h1 5 1 1 1\n", "h2 2 2 2 2\n", "h3 0 1 4 0" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#exercise 3.3.3\n", "\n", "my_database = range(6)\n", "my_features_dict = {\n", " 'S1':[2, 5],\n", " 'S2':[0, 1],\n", " 'S3':[3, 4],\n", " 'S4':[0, 2, 4]\n", " }\n", "\n", "df_fig_3_5 = matrix_representation_create(my_database, my_features_dict)\n", "print('Fig 3.5:\\n{}\\n'.format(df_fig_3_5))\n", "\n", "\n", "#(a)\n", "h_funcs = [\n", " add_hash_func(2, 1, 6),\n", " add_hash_func(3, 2, 6),\n", " add_hash_func(5, 2, 6)\n", "]\n", "df_matrix = df_fig_3_5\n", "my_minhash_res = minhash_signatures_calc(df_matrix, h_funcs, nagging=True)\n", "my_minhash_res" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "S1-S2 minhash:0.250 true SIM:0.000\n", "S1-S3 minhash:0.200 true SIM:0.000\n", "S1-S4 minhash:0.500 true SIM:0.250\n", "S2-S3 minhash:0.667 true SIM:0.000\n", "S2-S4 minhash:0.667 true SIM:0.250\n", "S3-S4 minhash:0.500 true SIM:0.250\n" ] } ], "source": [ "#(b) h_3 is a true permutation.\n", "\n", "#(c)\n", "\n", "for comb in list(itertools.combinations(range(4),2)):\n", " s_a, s_b = 'S{}'.format(comb[0]+1), 'S{}'.format(comb[1]+1)\n", " print('{}-{}'.format(s_a, s_b)),\n", " \n", " set_a, set_b = set(my_minhash_res.iloc[:,comb[0]]), set(my_minhash_res.iloc[:,comb[1]])\n", " print('minhash:{:.3f}'.format(Jaccard_similarity_calc(set_a, set_b))),\n", " print('true SIM:{:.3f}'.format(Jaccard_similarity_calc(set(my_features_dict[s_a]), set(my_features_dict[s_b]))))\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`#exercise 3.3.4` \n", "不是很明白意思,大概想法是: \n", "从 $n$ 中抽取 $m$ 个元素集合 $S$ 和 $T$,则从 $S$ 和 $T$ 中各抽一次,元素相同的概率是多少?\n", "\n", "`#exercise 3.3.5` \n", "元素相同的概率是 0" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.4 Locality-Sensitive Hashing for Documents\n", "\n", "minhash 降低了维度,但计算耗时问题仍然存在: \n", "令样本集有 $k$ 个元素,两两比较次数是 $C_k^2 = O(k^2)$。 \n", "若 $k$ 很大,则比较次数成乘方倍增长。\n", "\n", "解决思路: \n", "1. 并行处理 \n", "2. 只比较可能相似的样本 \n", " often we want only the most similar pairs or all paris that are above some lower bound in similarity. \n", " ----> Locality-sensitive hasing (LSH) or near-neighbor search.\n", " \n", "**LSH**: \n", "\"hash\" items several times, in such a way: $$P[\\text{similar items in the same bucket}] >> P[\\text{dissimilar ...}]$$\n", "\n", "candidate pair: any pair hashed to the same bucket. \n", "false postive: dissimilar items in the candidate pair. \n", "false negtive: similar items NOT in the cnadidate pair.\n", "\n", "if we have minhash signatures for the items, an effective way to choose the hashings is: \n", "1. divide the signature matrix into $b$ bands consisting of $r$ rows each.\n", "2. for each band $b_i$, hash each vector in $b_i$ \n", " hash function $h$: same vectors -> same bucket\n", " \n", "也就是说,将两个样本向量分割,只要有一个节点哈希到一起,就认为是候选对。\n", "\n", "注意,统计的是每个节点各自哈希的情况,所以每个节点使用相同的哈希函数,但使用独立的哈希表存储中间结果。如下图:\n", "\n", "matrix | --> | hash table\n", ":-------:|:------:|:------------:\n", "band 1 | h(v) | hash_talbe_1\n", "band 2 | h(v) | hash_talbe_2\n", ".... | h(v) | ....\n", "\n", "\n", "**Analysis** the probabilty of two pairs to become a candidate pairs: \n", "Given the pairs $s_a$ and $s_b$ have Jaccard similarity $s$, we use $b$ bands of $r$ row each to split their signature matrix $M_a$ and $M_b$. \n", "1. $\\forall \\, i^{th} \\, \\text{row}, \\, prob_{rowY} = P[M_a[i,:] == M_b[i,:]] = \\text{Jaccard similarity} = s$\n", "2. the possibility that one band($r$ row) is the same is: $$prob_{bandY} = prob_{rowY}^r = s^r$$\n", " .......................................not the same is: $$prob_{bandN} = 1 - prob_{bandY} = 1 - s^r$$\n", "3. $s_a$ and $s_b$ are totally different: $$prob_{pairN} = prob_{bandN}^b = (1 - s^r)^b$$\n", " $s_a$ and $_b$ **at least agree in one band**: $$prob_{pairY} = 1 - prob_{pairN} = 1 - (1 - s^r)^b$$\n", " \n", "\n", "**threshold**: the value of $s$ at which the probability of becoming a candidate is 1/2. \n", "namely, $1 - (1 - s^r)^b = \\frac{1}{2}$, the solution $s \\approx (1/b)^{1/r}$" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[(0.20000000000000001, 0.0063805813047682625),\n", " (0.30000000000000004, 0.047494259124971183),\n", " (0.40000000000000008, 0.18604955214914409),\n", " (0.50000000000000011, 0.47005071531687648),\n", " (0.60000000000000009, 0.80190245383822212),\n", " (0.70000000000000018, 0.97478054418804061),\n", " (0.80000000000000027, 0.99964394210947927)]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAecAAAFVCAYAAADVDycqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0lHWC7vGntqyVhSUBAiQQkB3CpoAaUTaloVtpEKIt\n9jhee7rP6Zme7nbO8Y/bjP7hEaev956eFme6nWnvtKPgpWm3tIpgUBRBgRAg7EtICCFkJUtlqeV9\n7x+BtIgWEFJ5a/l+zuFA1VtJPf5M1ZPfW+/7e22maZoCAABhw251AAAAcCXKGQCAMEM5AwAQZihn\nAADCDOUMAECYoZwBAAgz11XO+/fv1+rVq6+6v6ioSCtWrFBBQYE2btzY6+EAAIhFzms94OWXX9Y7\n77yj5OTkK+73+Xxau3atNm3apISEBD300EOaN2+eBgwYELKwAADEgmvOnHNycvTiiy/q62uVnDp1\nStnZ2UpJSZHL5dKMGTO0e/fukAUFACBWXLOcFy1aJIfDcdX9ra2tSklJ6b6dnJyslpaW3k0HAEAM\nuuZu7W+TkpIij8fTfdvj8SgtLS3o15imKZvN1tOnBICY0Ozx6uyFFp290KLqeo9qL7artrFddU3t\nqm/qkGHc+KrLdpvkdNjldNrlsNvldNjksNtkd9jlsNlkt0s2m012u032S+/TNlvXfTabZLt0h+3S\n/ZLUdUvSdbytx+o7v91u0/M/zb/hr+txOefm5qq8vFxNTU1KTEzU7t279fjjjwf9GpvNptpaZteh\nlpGRwjiHGGMcerEyxk2tnTpV1axT55pUdr5ZVXUeNbf5rnqczSb1S4nXyCEpSk2KU0pSnFKSXEpJ\ndCkpwaXEeKcS4x1KjHcq3uVQnMuuOKdDLqddca6uQv66WBnjSHTd5Xx5xltYWKi2tjatXLlSTz31\nlB5//HEZhqEVK1YoMzMzZEEBIBq0dfhUWtagg6fqdezsRdU1dXRvs0kamJ6gKUNSlTUwWVkDkpXZ\nL1ED0xKU5o77xoJFdLL19VWp+C0t9PhtOPQY49CLpjFuaO7Ql0dqtP9knU5UNsm49LbrTnQpNytV\no4amaXRWqkYMSVVifI93aN6waBrjcJaRkXLtB31N3/0UAEAM8fkDKj5ep88OntfhsgaZ6poZjxiS\nqrzRA5Q3aqCGD3J3f74LfBXlDAC9qL6pQ5u/rNDnpdVq6/RLkkYPTdMdkwdr2i0ZSk2OszghIgHl\nDAC9oLqhTe/tLNfOQ9UKGKbS3HFaPC1bd04eoiEDkq/9DYCvoJwB4CZcaGjTn7ef1p6jNTIlDe6f\npCVzcjRrwiA5HRzAhZ6hnAGgBzq9ARXuPKPNX1bIHzCVnenW0ttHaPqYDNntfI6Mm0M5A8ANME1T\nu4/W6I2ik2ps6VT/1HitmneLZo7NYJEl9BrKGQCuU2NLp/7wl8M6dKZRTodNS28foSWzcxQfd/US\nx8DNoJwB4DoUH6/VK+8dkafDr8m5A/Twwls0qF+S1bEQpShnAAii0xfQGx+d0MclVXI57Vp971jd\nPTWLXdgIKcoZAL5FVZ1H6948qPP1bRqWkay/u3+Shg7ktCiEHuUMAN/g0JkGvfRmqdo7/VowY5ge\nvGeUXE4+W0bfoJwB4Gu276/Sq5uPyWaTnvjuBM2ZONjqSIgxlDMAXGKYpjZ9ckrv76qQO9Gln35/\nssYMT7c6FmIQ5QwAkvwBQy+/e1i7j9ZoUL9E/ePKPI7GhmUoZwAxzx8w9O9vH1Lx8VqNGZamny6f\nIneiy+pYiGGUM4CY5g8Y+t2lYh6Xna6frchjURFYjlXZAcSsy8W8l2JGmKGcAcQkf8DQ797pKuax\nwylmhBfKGUDMMU1T//X+Ue091lXM//ggxYzwQjkDiDnv7jijHaXVGjE4RT97cArFjLBDOQOIKTtL\nq/XWZ2UakJqgn62YooQ4jotF+KGcAcSMo+WN+sN7R5QY79Q/rsxTmjve6kjAN6KcAcSEqjqPXvzz\nQUnST5dxAQuEN8oZQNTzdPj0mz/tV1unX3+zeJzGj+hvdSQgKMoZQFQzTVP/WXhEtRc7tGROju6Y\nPMTqSMA1Uc4AotoHX1ao5GSdxuf007L8XKvjANeFcgYQtY6fvahNH59WmjtOP/reRNntNqsjAdeF\ncgYQlZo8Xv3b26WSpJ/cP0lpyXEWJwKuH+UMIOoYhqnfv3NITa1eLZ+byzWZEXEoZwBR5y87z+hI\neaOmjh6oe2dlWx0HuGGUM4CoUl7dond2nFG/lHj97ZLxstv4nBmRh3IGEDV8fkP/UXhYAcPUY4vH\nyZ3osjoS0COUM4Co8dZnp3WuzqO7pw3VpNwBVscBeoxyBhAVTlY26YMvKpSRnqCV94yyOg5wUyhn\nABGv0xvQf/zlsGRKjy+ZwJWmEPEoZwARb+PHJ1XT2K57Z2Vz2hSiAuUMIKKdrGxSUfE5ZQ1M1rL8\nkVbHAXoF5QwgYvkDhv5r81FJ0t8sHieX02FxIqB3UM4AItaWPWd1rtajuVOzNHpomtVxgF5DOQOI\nSHVN7Xr7szKlJLm0fC5HZyO6UM4AIo5pmnp9ywl5fYZWzRvNYiOIOpQzgIiz70SdSk7WaVx2uuZM\nHGx1HKDXUc4AIkqH16/XthyXw27T6nvHysba2YhClDOAiPLujjNqbOnU4tk5GjIg2eo4QEhQzgAi\nRs3Fdm3Zc1b9U+O1dE6O1XGAkKGcAUSMP207KX/A1Iq7RynOxTnNiF6UM4CIcPzsRe05VqtRWama\nNX6Q1XGAkKKcAYQ9wzS1/qMTkqSC+bdwEBiiHuUMIOztLK1WeXWLZk8YpFGsBIYYQDkDCGud3oA2\nfXJKLqedlcAQMyhnAGHt/S/KdbHVq3tvy9aAtASr4wB9gnIGELYutnbqgy8qlOaO03dmZ1sdB+gz\nlDOAsPXu52fk9Ru6/86RSohzWh0H6DOUM4CwVF3v0faSKmX2S9Sdk4dYHQfoU0HL2TAMrVmzRgUF\nBVq9erUqKiqu2L5lyxYtX75cK1as0Pr160MaFEBseX3zUQUMU8vyc+V0MI9AbAm6n2jr1q3y+Xza\nsGGD9u/fr7Vr1+qll17q3v7cc8/prbfeUmJiopYsWaKlS5cqJSUl5KEBRLfK2lZ9XFyp4Zlu3To+\n0+o4QJ8LWs7FxcXKz8+XJOXl5am0tPSK7S6XS83NzbLb7TJNk4UBAPSKN7eflmlK378rV3beVxCD\ngpZza2ur3G53922HwyHDMGS3d+1ieuyxx7R8+XIlJiZq0aJFVzz222RkMLPuC4xz6DHGoXGsvEH7\nTtRp/Ij+mj97BL/0hxg/x+EpaDm73W55PJ7u218t5qqqKr322msqKipSYmKi/umf/kkffPCB7rvv\nvqBPWFvb0guxEUxGRgrjHGKMcej859tde+h+uGSC6upaLU4T3fg57hs9+QUo6FEW06dP1/bt2yVJ\nJSUlGjt2bPe2zs5O2e12xcXFyW63q3///mpp4X8ygJ47dKZBR8obNSm3vybmDrA6DmCZoDPnhQsX\naseOHSooKJDUdQBYYWGh2tratHLlSi1btkwFBQWKj49XTk6Oli1b1iehAUQf0zT19qdlkqTld7FM\nJ2KbzTRNsy+fkF0ooceuqtBjjHvf4TMN+l8bSjR19ED9w4opjHEfYIz7Rq/v1gaAvmCapt75rGvW\n/L07R1gbBggDlDMAyx2ruKjjlU2aMmqARgxOtToOYDnKGYDl3tlxadZ8x0iLkwDhgXIGYKljFY06\nWnFRk3L7KzeLWTMgUc4ALPbOjjOSmDUDX0U5A7DMicqLOlLeqIkj+mn00DSr4wBhg3IGYJnuWfOd\nzJqBr6KcAVii7HyzDpU1aFx2um4Zlm51HCCsUM4ALPHernJJ0tLbR1gbBAhDlDOAPne+3qPiY7Ua\nMThF43P6WR0HCDuUM4A+98EXFTIlfWd2DpeEBL4B5QygTzW2dOrz0moN6p+k6WMyrI4DhCXKGUCf\n+nB3hQKGqcWzsmW3M2sGvgnlDKDPtLb79HFJldLdcZozcbDVcYCwRTkD6DPbiivV6Q1o0a3Zcjl5\n+wG+Da8OAH2i0xfQlj2VSop3au7ULKvjAGGNcgbQJz47cF6t7T7NmzFMifFOq+MAYY1yBhByhmHq\nw90VcjrsWjBjmNVxgLBHOQMIuX0nalV7sUN3TB6s1OQ4q+MAYY9yBhBym788K0ladOtwi5MAkYFy\nBhBSp8416eS5JuWNGqAhA5KtjgNEBMoZQEht/rJCknTvbdkWJwEiB+UMIGRqL7Zr7/Fa5QxK0dhs\nLgsJXC/KGUDIbNl9VqYp3XvbcC5wAdwAyhlASHg6fPr0wHn1S4nXzHGZVscBIgrlDCAkPimpUqcv\noIUzh8vp4K0GuBG8YgD0On/A0NY9Z5UQ59BdeSzVCdwoyhlAr9tztEYXW73Kn5KlpASW6gRuFOUM\noNdt2VMpm6T5M1mqE+gJyhlArzp1rkll55s19ZaBykxPtDoOEJEoZwC9asuerqU6F8xkqU6gpyhn\nAL2moblDe47WaliGW+NYdAToMcoZQK8pKj4nwzS1cOYwFh0BbgLlDKBXdPoC+qTknNyJLs2eOMjq\nOEBEo5wB9Ipdh6rl6fDr7mlZcjkdVscBIhrlDOCmmaaprXsq5bDbdM80Tp8CbhblDOCmHS5v1Lk6\nj24dl6l+KfFWxwEiHuUM4KZ9tKdSEqdPAb2FcgZwU2outmv/yTqNHJKq3KxUq+MAUYFyBnBTthVX\nypS0YAafNQO9hXIG0GOdvoA+3X9eqUkurtkM9CLKGUCP7TpUrbZOv+6aOlQuJ28nQG/h1QSgR0zT\n1Ed7L58+NdTqOEBUoZwB9MjxsxdVWevR9DEZnD4F9DLKGUCPfLS36/Sp+RwIBvQ6yhnADWto7lDx\n8ToNz3TrlmFpVscBog7lDOCGbdvXdfWp+TO4+hQQCpQzgBvi8we0fX+VkhOcmj2Bq08BoUA5A7gh\nu4/WqKXNp/y8LMW5uPoUEAqUM4AbUlR8TjaJ06eAEKKcAVy3M9XNOl3VrMmjBigjPdHqOEDUopwB\nXLei4nOSpHnTOX0KCCXKGcB1aW336YvDF5SZnqhJuf2tjgNENWewjYZh6Omnn9bx48flcrn07LPP\nKjs7u3v7gQMH9Pzzz8s0TQ0aNEjPP/+84uLiQh4aQN/77MB5+fyG7p42VHZOnwJCKujMeevWrfL5\nfNqwYYOefPJJrV27tnubaZpas2aN1q5dq9dff11z5sxRZWVlyAMD6HuGaerjfefkctp155QhVscB\nol7QmXNxcbHy8/MlSXl5eSotLe3eVlZWpvT0dL3yyis6ceKE5s6dq9zc3NCmBWCJ0tMNqrnYrjun\nDJE70WV1HCDqBZ05t7a2yu12d992OBwyDEOS1NjYqH379umRRx7RK6+8op07d2rXrl2hTQvAEkXF\nl9bR5kAwoE8EnTm73W55PJ7u24ZhyG7v6vP09HRlZ2d3z5bz8/NVWlqq2bNnB33CjIyUm82M68A4\nh16sjHF1vUcHT9drbE4/zZyc1afPHStjbCXGODwFLefp06dr27ZtWrx4sUpKSjR27NjubcOHD1db\nW5sqKiqUnZ2tvXv3asWKFdd8wtralptPjaAyMlIY5xCLpTHetO2kTFO6a/KQPv1vjqUxtgpj3Dd6\n8gtQ0HJeuHChduzYoYKCAknSc889p8LCQrW1tWnlypV69tln9ctf/lKmaWr69OmaO3duz5IDCEte\nX0Cf7q9SSpJLM8dlWh0HiBlBy9lms+mZZ5654r6RI0d2/3v27NnauHFjaJIBsNzuozXydPi1ZE6O\nXE6WRQD6Cq82AN/q8jrac6f27WfNQKyjnAF8o7LzzSo736y80QM1MI11tIG+RDkD+EbbutfR5upT\nQF+jnAFcpbXdpy+OdK2jPWEk62gDfY1yBnCVHQdZRxuwEuUM4AqGaWpbMetoA1ainAFc4XBZ1zra\ns8YPYh1twCKUM4ArFF06EOweDgQDLEM5A+hW19Su/afqNHJIikYOSbU6DhCzKGcA3T4pqZJpSvO4\n+hRgKcoZgCTJ5ze0fX+VkhOcupV1tAFLUc4AJEl7jtWopc2n/ClZinM5rI4DxDTKGYCkrhXBbJLu\nnsY62oDVKGcAqrjQopPnmjQpd4Ay+yVZHQeIeZQzAE6fAsIM5QzEuLYOn3YdrtbAtARNyR1gdRwA\nopyBmLfjYLW8vkvraNtZRxsIB5QzEMNM01TRvnNyOmysow2EEcoZiGGHyxt1oaFNt44bpNSkOKvj\nALiEcgZiWNHeSknSPA4EA8IK5QzEqPqmDpWcrFPOoBTlZrGONhBOKGcgRn1ccu7SOtpDZbNxIBgQ\nTihnIAZ9dR3tWRMGWR0HwNdQzkAM2nOUdbSBcEY5AzGoqLiyax1tDgQDwhLlDMSY8uoWnapq1uRR\nA5SZnmh1HADfgHIGYsxHxZdPnxpmcRIA34ZyBmJIa7tPXxy+oMz0RE3K7W91HADfgnIGYshnB87L\n57+0jjanTwFhi3IGYoRhmtq2r1JxTjvraANhjnIGYsSBU/WqvdihWRMGyZ3osjoOgCAoZyBGfHRp\nHe35MzgQDAh3lDMQA87Xe3SorEFjhqUpe1CK1XEAXAPlDMSAor3nJEnzmDUDEYFyBqJce6dfn5We\nV7+UeE0fk2F1HADXgXIGotyOg+fV6Q3o7qlZcjp4yQORgFcqEMUM09RHxefkdNg0dyrraAORgnIG\notjhMw260NCmW8cNUmpynNVxAFwnyhmIYh/t6Tp9asFMDgQDIgnlDESpmovtOnCqXrlZqRo5JNXq\nOABuAOUMRKmivZUyxaIjQCSinIEo1N7p16cHqpSWHKdbx2VaHQfADaKcgSj0eWm12jsDumfaUE6f\nAiIQr1ogyhimqa17zsrpsOnuaZw+BUQiyhmIMgdP1etCY7tmTeD0KSBSUc5AlNm656wkaeHM4RYn\nAdBTlDMQRc7VturQmUaNHZ7O1aeACEY5A1Fk697Li44wawYiGeUMRInWdp92llZrYFqCpt0y0Oo4\nAG4C5QxEie37q+T1G5o/Y5jsdpvVcQDcBMoZiAL+gKGP9lYq3uVQ/pQhVscBcJMoZyAK7D1Wq8aW\nTt0xebCSElxWxwFwkyhnIMKZpqnNX1bIJmnhrRwIBkSDoOVsGIbWrFmjgoICrV69WhUVFd/4uF/9\n6ld64YUXQhIQQHDHz17UmeoWTRuToUH9kqyOA6AXBC3nrVu3yufzacOGDXryySe1du3aqx6zYcMG\nnThxQjYbB6AAVvhwd9eiI/fexqwZiBZBy7m4uFj5+fmSpLy8PJWWll61/cCBA1q1apVM0wxdSgDf\n6EJDm0pO1GnkkFSNHppmdRwAvSRoObe2tsrtdnffdjgcMgxDklRTU6N169ZpzZo1FDNgkQ/3nJWp\nrlkze6+A6OEMttHtdsvj8XTfNgxDdntXn2/evFmNjY164oknVFdXp46ODo0aNUoPPPBA0CfMyGBJ\nwb7AOIee1WPc7PFqx8FqZfZL1H135MoRhZeGtHqMYwFjHJ6ClvP06dO1bds2LV68WCUlJRo7dmz3\nttWrV2v16tWSpDfffFOnT5++ZjFLUm1ty01GxrVkZKQwziEWDmP87udn5PUFNG/aUDU0eK79BREm\nHMY42jHGfaMnvwAFLeeFCxdqx44dKigokCQ999xzKiwsVFtbm1auXHnFY9mlBvQdn99Q0d5KJcY7\nlJ+XZXUcAL0saDnbbDY988wzV9w3cuTIqx63bNmy3k0FIKgvDl9Qk8er+27LVmJ80JcxgAgUfR9S\nAVHOuLToiMNu04KZw6yOAyAEKGcgwhw4Va9zdR7dNn6Q+qcmWB0HQAhQzkCEeX9XuSRp8axsi5MA\nCBXKGYggJyubdKKySVNGDdCwTPe1vwBARKKcgQjy/hfMmoFYQDkDEaKqzqN9J+qUm5WqMcPTrY4D\nIIQoZyBCfPBF11XhFs/KYV0BIMpRzkAEaGzp1M5D1RrUP0nTxgy0Og6AEKOcgQiwZfdZBQxTi2dl\ny86sGYh6lDMQ5to6fPq45JzS3HGaM3Gw1XEA9AHKGQhzW/dWqsMb0KJbh8vl5CULxAJe6UAY6/D6\ntWX3WSUnOHX31KFWxwHQRyhnIIx9vK9Kng6/Fs4czgUugBhCOQNhyusL6IMvK5QQ59B8LnABxBTK\nGQhTnx44r2aPV/NnDFNygsvqOAD6EOUMhCF/wNB7u8oV57Rr4a3DrY4DoI9RzkAY+ry0Wo0tnZo7\ndahSk+KsjgOgj1HOQJgJGIbe21kup8Om+7jABRCTKGcgzHx5pEY1F9t155Qs9UuJtzoOAAtQzkAY\nMQxT7+44I4fdxmUhgRhGOQNh5IvDF1Td0KY7Jg9WRnqi1XEAWIRyBsJEwDD09o4yOew2LZ0zwuo4\nACxEOQNhYmfpBdU0tis/L0sDmTUDMY1yBsKAP2Do3c/L5HTYtHROjtVxAFiMcgbCwOel1aq92KG7\n8rLUPzXB6jgALEY5AxbzBwwVfn5GToddS/isGYAoZ8Bynx08r7qmDt09jfOaAXShnAEL+fxds2aX\n067vzOazZgBdKGfAQh/vO6eG5k7dM22o0t3MmgF0oZwBi7R3+vXu52eUGO/QEo7QBvAVlDNgkc1f\nVqi13af7bstWCleeAvAVlDNggWaPV5t3n1VqchzXawZwFcoZsEDh52fU6Q3ou7ePUEKc0+o4AMIM\n5Qz0sdqL7dq275wy0hM0d2qW1XEAhCHKGehjb316WgHD1LL8XDkdvAQBXI13BqAPna1p1a5DF5Sd\n6dZtEwZZHQdAmKKcgT608eOTMiUtv3uU7Dab1XEAhCnKGegjB0/Xq/R0g8bn9NOkkf2tjgMgjFHO\nQB8IGIbeKDopm00qmH+LbMyaAQRBOQN94JOSKlXVeZQ/JUvDM91WxwEQ5ihnIMTaOnx669MyJcQ5\ntOyuXKvjAIgAlDMQYoWfl6u13aclc3KUlswynQCujXIGQqimsU1b9pzVwLQELWKZTgDXiXIGQmjj\ntlMKGKZW3D1KLqfD6jgAIgTlDITIkTMN2nu8VqOHpenWcZlWxwEQQShnIAT8AUP/veW4bJIeXsCp\nUwBuDOUMhMCW3Wd1vr5Nd08fqhGDU62OAyDCUM5AL2to7tA7O84oJcml73PqFIAeoJyBXrbhoxPq\n9AX04N2jlZzgsjoOgAhEOQO9qLSsXnuO1Wr00DTdPnmw1XEARCjKGeglPr+h1z48LptNemTRGK46\nBaDHKGegl3zwZYUuNLZr/vRhyh6UYnUcABGMcgZ6wfl6j97dcUZpyXF6IH+k1XEARDjKGbhJhmnq\n/75/VP6AoUcWjVESB4EBuEnOYBsNw9DTTz+t48ePy+Vy6dlnn1V2dnb39sLCQv3xj3+Uw+HQmDFj\n9PTTT7PYAmLOx/vO6URlk2aMzdCMsawEBuDmBZ05b926VT6fTxs2bNCTTz6ptWvXdm/r6OjQb37z\nG7366qtav369WltbtW3btpAHBsJJfVOHNn58SknxTj2ycIzVcQBEiaDlXFxcrPz8fElSXl6eSktL\nu7fFx8frjTfeUHx8vCTJ7/crISEhhFGB8GKapv64+Zg6vQGtmj9aae54qyMBiBJBd2u3trbK7XZ3\n33Y4HDIMQ3a7XTabTf3795ckvfrqq2pvb9ftt99+zSfMyOAo1r7AOIfe4bNNOni6XlNvydCyeWP4\nSCcE+DkOPcY4PAUtZ7fbLY/H0337cjF/9favf/1rlZeX67e//e11PWFtbUsPo+J6ZWSkMM4h5kqI\n0+/ePKg4l10PzR+turpWqyNFHX6OQ48x7hs9+QUo6G7t6dOna/v27ZKkkpISjR079orta9askdfr\n1bp167p3bwPRzjRNvbixRK3tPi2/a5Qy0hOtjgQgygSdOS9cuFA7duxQQUGBJOm5555TYWGh2tra\nNGnSJG3atEkzZ87Uo48+Kkn64Q9/qAULFoQ+NWChTw+c1xeHqjU+p5/mzxxmdRwAUShoOdtsNj3z\nzDNX3Ddy5F8XWDhy5EhoUgFhqqaxTeu3nlByglOPLxnPEp0AQoJFSIDrFDAMvVx4WJ2+gH68PE/9\nUzk7AUBoUM7AdXp/V4VOnWvWbeMzNXfaUKvjAIhilDNwHcqrW/T2Z2XqlxKvRxaN5bQpACFFOQPX\n0N7p17+/c0gBw9Tffme83ImsnQ0gtChnIIjLq4BdaGjTfbdla+LI/lZHAhADKGcgiE9KqvTF4Qsa\nPTRN35+ba3UcADGCcga+RXl1i17fekLuRJd+fP9EOR28XAD0Dd5tgG/Q3unXv71dKn/A0P9YOoHT\npgD0KcoZ+BrTNPXK+0dV09iu78zO0ZRRA6yOBCDGUM7A12z+8qz2HK3RmGFpWnbXyGt/AQD0MsoZ\n+IoDp+q18eOTSnfH6ccPTJLDzksEQN/jnQe45Hy9R797p1ROh11/v3yK0t1caQ2ANShnQJKnw6d/\n/dMBtXcG9NjicRo5JNXqSABiGOWMmBcwDP37W6W60NiuxbOzNXviYKsjAYhxlDNimmmaeuOjkzp0\nplFTRg3Q8rtGWR0JAChnxLYPvqzQ1r2VyhqYrB99d6Lsdi5oAcB6lDNi1s7Sam3cdkr9UuL1i5V5\nSkpwWh0JACRRzohRh8oa9If3jigp3qmfr8xjBTAAYYVyRswpr27Ri28elM1m098vn6xhGW6rIwHA\nFShnxJQLDW36Pxv3y+sN6O++N0Fjs/tZHQkArkI5I2ZcaGzTv6zfp2aPVz9YNEYzxmZaHQkAvhHl\njJhQc7Fd//L6PjW2dKpg3mjNmz7M6kgA8K0oZ0S9uovt+vXrxWps6dSD94zSotuyrY4EAEFRzohq\n9U0d+pf1+1Tf3Knlc3O1eFaO1ZEA4Jo4sRNRq7qhTS9sKFF9c4ceyB+pJXNGWB0JAK4L5YyoVF7d\nov/9/0rU0ubT9+/K1dLbR1gdCQCuG+WMqHOkvFG/3XRAnd6AHr13rO6eNtTqSABwQyhnRJW9x2r1\nu3dKZZrSjx+YpFvHcboUgMhDOSMqmKaprXsqtaHohOKcDv10+WRNHNHf6lgA0COUMyKeP2Dovz88\npu37zytaaFgpAAAJ3klEQVQ1OU4/WzFFI4ekWh0LAHqMckZEa2nzat2bpTp+9qKyB7n1D8uncBEL\nABGPckbEqqxt1b/+6YDqmjo0c2yGHl8yQfFxDqtjAcBNo5wRcUzT1GcHzuu1Lcfl9Ru6/86R+u4d\nI2S32ayOBgC9gnJGRGnv9OvVD49p16ELSop36onvTtSMsRlWxwKAXkU5I2JUXGjRv719SBca2pSb\nlaoff2+iBqYnWh0LAHod5YywFzAMffjlWb35aZn8AUP3zcrW9+/KldPB0vAAohPljLB2rs6jP/zl\niMrONys1OU6PLR6nvNEDrY4FACFFOSMsBQxDH3xRobc/K5M/YGrOxEF6aMEYuRNdVkcDgJCjnBF2\njp+9qNe2HNfZmlalueP06L1jNe0WDvoCEDsoZ4SNxpZObdx2UrsOX5Ak3Tl5iFbNH63kBGbLAGIL\n5QzLeX0BbdlzVoWfl6vTF9CIwSn6wcIxGjU0zepoAGAJyhmW8QcMfbq/Su9+fkYXW71KSXLpoQW3\n6M4pQ1hQBEBMo5zR5wzD1M5D1Xr7szLVNXUozmXXd2bn6Duzs5XELmwAoJzRdzp9AX1+8Lw27z6r\nmsZ2OR02LZg5TEvmjFBacpzV8QAgbFDOCLlmj1dFxZUqKj6n1nafnA675k7N0ndvH8EVpADgG1DO\nCAnTNHWiskmflJzT7qO18gcMJSc4tfT2EZo/YxgzZQAIgnJGr2pu82pnabW276/S+fo2SdKgfola\nMHO47pw8hEs6AsB1oJxx09o6fCo+Xqcvj1zQ4TONMkxTTodNsyYM0ty8LI3NTpeNo68B4LpRzuiR\nJo9XB07Vad/xOpWW1csfMCVJI4ekaNaEwZozcZBSkth1DQA9QTnjugQMQxUXWnXwdL32n6xX2fnm\n7m3DMpJ12/hBum18pjL7JVmYEgCiA+WMbxQwDJ2tadXR8os6WtGoE5UX1d4ZkCQ57DaNy05X3uiB\nmjJqgIYMSLY4LQBEF8oZMk1T9U0dOlPdotNVzTpd1aQz1S3y+o3ux2T2S9St4/ppwoh+mjSyP4uF\nAEAIUc4xxDRNNbf5VF3v0fmGNlXWtOpsTasqa1u7Z8WSZLNJQwcmKzcrVWOGp2tcdj/ORwaAPhS0\nnA3D0NNPP63jx4/L5XLp2WefVXZ2dvf2oqIivfTSS3I6nVq+fLkefPDBkAdGcJ2+gKpqW3W8rEF1\nTe2qa+pQfVOHLjS2q7qhTe2d/iseb7NJg/snaXKuW8Mz3crNStOIwSlKjOf3NgCwStB34K1bt8rn\n82nDhg3av3+/1q5dq5deekmS5PP5tHbtWm3atEkJCQl66KGHNG/ePA0YMKBPgscKnz8gT4dfng6/\n2jp88rT71dLmVUu7T80er1rafGr2dKqx1auLLZ1q+1r5Xuaw25TZL1HjstM1eECSBvdP0rAMt4YO\nTFaci3OPASCcBC3n4uJi5efnS5Ly8vJUWlrave3UqVPKzs5WSkqKJGnGjBnavXu37rvvvm/9fp52\nn1rbfTcd2jTNaz/ma//4622z+9+Xv83l72eakilTMiXj8mNNybj0t2maMsyuCzcYZtcf0+jaHggY\nCpimDMNUIGAqYJjyBwz5AoYCgb/+2+f/6x+v31CnNyCvP6BOX0Beb0Ad3oDavf6uvzsD8gf++rlv\nMMkJTvVLidfIrFQNHpgsd7xTA9MSNDAtQQPSEtQvJV4Ou/26vhcAwFpBy7m1tVVut7v7tsPhkGEY\nstvtam1t7S5mSUpOTlZLS0vQJyv4n+/dZNzoZpOUEO9QQpxT7kSXBqYlKineoaQEl5ITnN1/u5Nc\nSk2KU0pSnFKTXEpJjlP8V2a/GRkpqq0N/v8CABC+gpaz2+2Wx+Ppvn25mCUpJSXlim0ej0dpaWlB\nn+zdF+6/may4ARkZKdd+EG4KYxx6jHHoMcbhKeh+zunTp2v79u2SpJKSEo0dO7Z7W25ursrLy9XU\n1CSv16vdu3dr6tSpoU0LAEAMsJlBPsA1TVNPP/20jh07Jkl67rnndOjQIbW1tWnlypXatm2b1q1b\nJ8MwtGLFCj388MN9FhwAgGgVtJwBAEDf4/BdAADCDOUMAECYoZwBAAgzlDMAAGEmJOVsGIbWrFmj\ngoICrV69WhUVFVdsLyoq0ooVK1RQUKCNGzeGIkLUu9YYFxYWauXKlXrooYf0z//8z9e1qhqudK0x\nvuxXv/qVXnjhhT5OFz2uNc4HDhzQD37wAz388MP6+c9/Lq/Xa1HSyHWtMd6yZYuWL1+uFStWaP36\n9RaljA779+/X6tWrr7r/hnvPDIHNmzebTz31lGmapllSUmL+5Cc/6d7m9XrNhQsXms3NzabX6zWX\nL19u1tXVhSJGVAs2xu3t7eaCBQvMjo4O0zRN8xe/+IX50UcfWZIzkgUb48vWr19vrlq1ynzhhRf6\nOl7UCDbOhmGY999/v1lRUWGapmm+8cYb5qlTpyzJGcmu9bN8zz33mE1NTVe8P+PG/f73vzeXLl1q\nrlq16or7e9J7IZk5X++a3C6Xq3tNbtyYYGMcHx+vN954Q/Hx8ZIkv9+vhAQu+Xijgo3x5e0HDhzQ\nqlWr2DNxE4KNc1lZmdLT0/XKK69o9erVam5uVm5urlVRI9a1fpZdLpeam5vV2dkp0zRls9msiBnx\ncnJy9OKLL171ftCT3gtJOX/bmtyXt93omty4WrAxttls6t+/vyTp1VdfVXt7u26//XZLckayYGNc\nU1OjdevWac2aNRTzTQo2zo2Njdq3b58eeeQRvfLKK9q5c6d27dplVdSIFWyMJemxxx7T8uXLtXTp\nUt1zzz1XPBbXb9GiRXI4rr7KX096LyTl3NtrcuNqwcb48u3nn39eO3fu1G9/+1srIka8YGO8efNm\nNTY26oknntDLL7+swsJCvfXWW1ZFjWjBxjk9PV3Z2dnKzc2V0+lUfn7+VbM+XFuwMa6qqtJrr72m\noqIiFRUVqb6+Xh988IFVUaNST3ovJOXMmtyhF2yMJWnNmjXyer1at25d9+5t3JhgY7x69Wr9+c9/\n1quvvqof/ehHWrp0qR544AGroka0YOM8fPhwtbW1dR/AtHfvXt1yyy2W5Ixkwca4s7NTdrtdcXFx\nstvt6t+/P3sze1lPei/oVal6auHChdqxY4cKCgokda3JXVhY2L0m91NPPaXHH3+8e03uzMzMUMSI\nasHGeNKkSdq0aZNmzpypRx99VJL0wx/+UAsWLLAycsS51s/xV/EZXc9da5yfffZZ/fKXv5Rpmpo+\nfbrmzp1rceLIc60xXrZsmQoKChQfH6+cnBwtW7bM4sSR7fL7wc30HmtrAwAQZliEBACAMEM5AwAQ\nZihnAADCDOUMAECYoZwBAAgzlDMAAGGGcgYAIMz8f18AKKl5y6pnAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#Example 3.11\n", "\n", "b = 20\n", "r = 5\n", "s = np.linspace(0, 1, 100)\n", "\n", "def p(s, r, b):\n", " return 1 - (1 - s**r)**b\n", "\n", "plt.plot(s,p(s, r, b))\n", "\n", "s = np.arange(0.2, 0.9, 0.1)\n", "\n", "zip(s, p(s, r, b))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "if we consider two documents with 80% similarity, then their false negative is $1 - 0.99964... \\approx 0.00035$, which means that only roughly 1 in 3000 pairs that are as high as 80% similar will fail to become a candidate pair." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Combining the Techniques\n", "整合前面讲过的方法,找到相似文本的方法如下:\n", "\n", "1. k-shingles \n", " [opt] hash k-shignles to shorter bucket numbers \n", "\n", "2. Sort the document-shigles pairs \n", " 不用生成完成的矩阵,而是依据 shingles 顺序后面用到时逐行生成(节省内存)\n", " \n", "3. Pick n, compute the minhash signatures by feed the sorted list line by line \n", " 逐行遍历\n", " \n", "4. Choose a threshold $t$ \n", " $t \\approx (1/b)^{1/r}$ and $n = b \\times n$ to determine appropriate $b, n$.\n", " \n", "5. Construct candidate pairs using LSH\n", "\n", "6. Examine each candidate pair's signatures, $SIM > t$?\n", "\n", "7. [opt] check document themselve\n", "\n", "`#todo: 找个例子,代码实现`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### exercise 3.4\n", "`#maybe:`\n", "\n", "`#exercise 3.4.3`\n", "\n", "`#exercise 3.4.4`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.5 Distance Measures\n", "\n", "**LSH** 本身的意义: \n", "就是为一个特定的距离 $l$,找到合适的哈希函数 $h$ 来实现 \"S\" 曲线的效果: \n", "即此距离下越近,哈希到同一区块的可能性越高。\n", "\n", "这章介绍距离,为下一章找哈希函数打基础。 \n", "\n", "##### 距离的定义\n", "\n", "定义集合 $X$ 上的度量為一函數距离 $d: X \\times X \\to \\mathbf{R}$ \n", "对于 $\\forall x, y, z \\in X$,应满足:\n", "\n", "1. 正定性 \n", "$d(x, y) \\ge 0$ \n", "$d(x, y) = 0$ if and only if $x = y$\n", "\n", "2. 交换性 \n", "$d(x, y) = d(y, x)$\n", "\n", "3. 三角不等式 \n", "$d(x, y) \\le d(x, z) + d(z, y)$\n", "\n", "\n", "##### 几种距离\n", "1. Euclidean Distance: $\\| \\cdot \\|_2$ \n", " $L_r\\text{-norm} = (\\sum \\| \\cdot \\|_1^r)^{1/r}$ \n", " $L_\\infty\\text{-norm} = max(\\| \\cdot \\|_1)$\n", " \n", "2. Jaccard Distance: $d(x, y) = 1 - SIM(x, y)$\n", "\n", "3. Cosine Distance: \n", " 向量夹角 \n", " $d(x, y) = arc cos\\{ \\frac{x \\cdot y}{\\|x\\|_2 + \\|y\\|_2} \\}$\n", "\n", "4. Edit Distance: \n", " 1. 直接计算 \n", " Given $x = x_1 x_2 ...$, $y = y_1 y_2 ...$. \n", " $d(x, y)$ is the smallest number of insertion and deletion of single char($x \\to y$). \n", " 有[动态规划算法](http://blog.csdn.net/xanxus46/article/details/38678335), \n", " \n", " \\begin{equation} \n", " \\operatorname{lev}_{a,b}(i,j) = \\begin{cases}\n", " \\max(i,j) & \\text{ if} \\min(i,j)=0, \\\\\n", " \\min \\begin{cases}\n", " \\operatorname{lev}_{a,b}(i-1,j) + 1 \\\\\n", " \\operatorname{lev}_{a,b}(i,j-1) + 1 \\\\\n", " \\operatorname{lev}_{a,b}(i-1,j-1) + 1_{(a_i \\neq b_j)}\n", " \\end{cases} &\\text{ otherwise.}\n", " \\end{cases}\n", " \\end{equation} \n", " \n", " 具体内容见维基[Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance)。\n", "\n", " 2. 间接计算 \n", " $d(x, y) = len(x) + len(y) - 2 LCS(x, y)$ \n", " LCS is the longest common subsequence of $x$ and $y$.\n", "\n", "5. Hamming Distance: $d(x, y) = sum( \\, diff(x, y) \\,)$\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### 3.6 The Theory of Locality-Sensitive Function\n", "\n", "对于每种距离定义,需要找到合适的哈希函数使其满足 $P-s$ 是 \"S\" 型曲线(越相似,概率越高)。\n", "\n", "a family of functions $\\{h_i(x)\\}$ should statify 3 conditions:\n", "\n", "1) $P[\\text{close pairs become candidate pairs}] > P[\\text{dissimilar pairs...}]$ \n", " 补充: \"close\" - distance measure, \"P\" 是概率。 这里就说明 \"Prob(x, y) - d(x, y)\" 应该是 ”S\" 型曲线。 \n", " eg: Probility using minhash to Jaccard distance is \"S\" curve." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfYAAAFkCAYAAADSRRn0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XtclGXCN/DfPQdmBmY4H0ROCioeSBTJs6J4TtsyTVEj\nK7fafZ6n3Wc33c3tDdsnW2233j1Yvc+ntrVnrcRK6ykyTQTTPOIBEM+gghwERBxgBpiBud8/UFZT\nQJF77jn8vp+PHx0Ghp+XI7+57rnv6xJEURRBRERELkEhdwAiIiLqOSx2IiIiF8JiJyIiciEsdiIi\nIhfCYiciInIhLHYiIiIXInmx5+XlITU19baPZ2VlYf78+UhJScFnn30mdQwiIiK3oJLywd9//318\n9dVX8PLyuuXjVqsVa9euxebNm6HVarFo0SIkJycjICBAyjhEREQuT9IZe1RUFN5++238eA2coqIi\nREZGwmAwQK1WY8SIEcjJyZEyChERkVuQtNinT58OpVJ528cbGhpgMBjab3t5eaG+vl7KKERERG5B\n0kPxHTEYDDCZTO23TSYTfHx8Ov2a3737A6wtNqmjOaS7WvNXvPHbvz7bJgIQ2z4iim2fI0KEKAI2\nUYQoirDZRNhsQGv7n21oaRXR0vqv3222e191WCEAft5aBPnqEOirQ5CfJ0L8PREZYkBEiAE+eg8I\ngnDPj0tERJ2Tpdijo6NRXFwMo9EInU6HnJwcLFu2rNOv+cO/jUd1NWf1UgsKMtw2zjabCEtLKywt\nNlitNlhaWtFkaUVTcwsaLa1osrTA1NiC+kYLGsxW1JutqDNbUFvfjHOXruF0ce1t38dLq0LvQC/0\n6eWNmDBv9Avzgb+31l5/TVndaYypZ3GMpccxto+gIEPXn/Qjdin2GzOzjIwMmM1mLFiwAC+99BKW\nLVsGm82G+fPnIzg42B5RqBsUCgFaDxW0Hvf+tTabCKPJgpq6JlTXNqK8xoTyK22/CsuMOFdqxI7D\nbZ/rZ9Cgf7gPhsYEIC46AN6e3fiGRERuTnCm3d346lB69nwV3mxtRfHlehSVGVFYZkRReR3qTBYA\ngAAgurc3hsYEYOSgEIT4e9olkz1wpiM9jrH0OMb24bAzdqI70aiVGBDhiwERvgAAURRRfsWE/KIa\n5BVeQWFZHYrK6/DFngvoH+6D8UND8eDAYGg9+LQlIuoIf0KSwxAEAWFBeoQF6TFrdBRMTVbkF9bg\nh+MVOFVci3OlRnyy4xxGDQ7GjJGRCA3w6vpBiYjcDIudHJaXVo0xcb0wJq4XrlxrxN6Cy9h7vAK7\n8yqwJ68CiQODMXtMFCJD7v1QFRGRq2Kxk1MI9NXhkfF98fC4Pjh6phoZ+y8i53QVck5XYWhMAB6b\nGM2CJyICi52cjEIQkDgwGCNig3D8/FVk7L+I/KIaHD9fg8nDw/DohGjodWq5YxIRyYbFTk5JEAQM\njQnA0JgAFJyvwSeZ55B1tAyHTlVh/qQYjB8aCgUXwCEiN8RtW8npxUUH4L+WjcTjk2NgbbXhw29P\n4/V/HkHlVbPc0YiI7I7FTi5BpVRg1qgo/OHZ0Rg1OAQXKurw6voc7Mkvv20TIiIiV8ZiJ5fiZ9Dg\n+Z8MwfM/GQKFQsD6rafx3/97AuYmq9zRiIjsgu+xk0saNTgEMb298d7XJ5Fzugrny4342aNxiOnd\n+WZDRETOjjN2clmBvjr8dslwPDK+L67WN+OPnxzD4dNVcsciIpIUi51cmlKhwCPj++KX8+OhUAh4\n98sCbD1QzPfdichlsdjJLQyNCcDKJQnwM2jw+a4ifPjtabS02uSORUTU41js5DYiQwz4P08mIirE\ngD35FfjLZ3lotrTKHYuIqEex2Mmt+Bk0eGlJAob1C8TJi7X46+csdyJyLSx2cjsaDyX+bW4cRsQG\n4XTJNZY7EbkUFju5JZVSged/MgQjBrDcici1sNjJbamUCjz/CMudiFwLi53c2o1yT7he7u9+WYBW\nG8+WJyLnxWInt6dSKvCzR4ZgSF9/HD9fg092nON17kTktFjsRGgr9397NA7hQV7IPlaG7YcuyR2J\niKhbWOxE1+k0Kvzn4/Hw1Xvg0+xCLj9LRE6JxU50E39vLf7z8XhoPJR4P+MkisqMckciIronLHai\nH4kMMeDnjwxBS6sNf9ucj6t1TXJHIiK6ayx2ojsYGhOIRVP6o95sxX//7wmuK09EToPFTtSBKSPC\nMXJQMArLjPh8V5HccYiI7gqLnagDgiBg6cyB6OXvie9yLuHIGZ5MR0SOj8VO1AmdRoV/mxsHD5UC\n/9h6CpW1ZrkjERF1isVO1IXwID1SZ8SisbkV/++LAlisXHaWiBwXi53oLox7IBQT40NRUtWATdmF\ncschIuoQi53oLi2eOgC9A72QfbQMJy5clTsOEdEdsdiJ7pKHWomfzhkEpULAP7aegrnJKnckIqLb\nsNiJ7kGfXt6YM7YPauubsTHznNxxiIhuw2Inukezx0QhqpcBewsu49jZarnjEBHdgsVOdI9USgV+\nOmcwVEoF/mfbadSZLXJHIiJqx2In6oawQC88NjEadWYrNmw/w/3bichhsNiJumn6gxEYEO6DI2eq\ncezcFbnjEBEBYLETdZtCIWDprIFQKQV8vOMsmiwtckciImKxE92P0AAvzBoVhdr6Zny554LccYiI\nWOxE92v2mCgE++qQebgUJZX1cschIjfHYie6Tx5qJZ6YPgA2UcQ/t5+BjSfSEZGMWOxEPSAuOgAj\nBwXjfHkddueWyx2HiNwYi52oh6RM6Q+dRonPdxXBaOK17UQkDxY7UQ/x1Wvw2MQYmJtb8Dl3gCMi\nmbDYiXrQ5OFhiAzWY2/BZVy8XCd3HCJyQyx2oh6kUAhYOKU/ACA98xxXpCMiu2OxE/WwQVF+GN4/\nEGdLjThyhpvEEJF9sdiJJLBgcj8oFQI+zS6EtaVV7jhE5EZY7EQSCPH3xJQR4bhibELm4VK54xCR\nG2GxE0nk4XF9oNep8fW+i6jj5W9EZCcsdiKJeGnVeGR8XzRZWvHlnvNyxyEiN8FiJ5JQ0rDeCA3w\nxPd55Si7YpI7DhG5ARY7kYRUSgXmT4qBKAJf7uasnYikx2InktiwfoGI6e2NI2ercaGCi9YQkbQk\nK3abzYa0tDSkpKQgNTUVJSUlt9y/Y8cOzJs3D/Pnz8fGjRulikEkO0EQ8FhSDABgy/dFMqchIlcn\nWbFnZmbCarUiPT0dy5cvx9q1a2+5f82aNVi/fj02btyI9evXo76e+1iT6xoU5Ychffxw4mIt8gu5\naA0RSUeyYj969CgmTJgAAIiPj0dBQcEt96vVatTV1aG5uRmiKEIQBKmiEDmEG7P2f35zikvNEpFk\nVFI9cENDA/R6ffttpVIJm80GhaLttcTTTz+NefPmQafTYfr06bd8bkeCggxSxaWbcJylERRkwNih\nZdiXX4ELVSaMiguVO5JL4/NYehxjxyRZsev1ephM/7q85+ZSLy8vx8cff4ysrCzodDqsWLEC27Zt\nw8yZMzt9zOpqHq6XWlCQgeMsoYdGRuLA8QqszziBPkFeUCh4pEoKfB5Lj2NsH9158STZofiEhATs\n3r0bAJCbm4vY2Nj2+5qbm6FQKODh4QGFQgF/f3++x05uoXegFyYnRqCs2oSDJyvljkNELkiyGfu0\nadOwd+9epKSkAGg7WS4jIwNmsxkLFizA3LlzkZKSAo1Gg6ioKMydO1eqKEQOZfH0gdh1pBRf7b2A\nkYODoVTwqlMi6jmC6ERn8fCwj/R4eE16QUEGvLkhB9/nluPZhwdjzJBeckdyOXweS49jbB8OdSie\niDo2e3QUlAoBGfsuwmZzmtfWROQEWOxEMgj01WFsXC9U1JiRc7pK7jhE5EJY7EQymT22DxSCgK/3\nXYTNed4RIyIHx2Inkkmwrw5j4kJQfsWEI2e4Gh0R9QwWO5GM5oztA0EAvt57gbN2IuoRLHYiGYX4\neWL04F4orTbh2NkrcschIhfAYieS2ZyxUe2zdie6+pSIHBSLnUhmoQFeGDUoBCVVDcgvqpE7DhE5\nORY7kQN4aHQUAGDrgWKZkxCRs2OxEzmA8GA9hsYE4FypEWcvXZM7DhE5MRY7kYO4MWv/lrN2IroP\nLHYiBzEgwhf9wn2QV1SD0qoGueMQkZNisRM5kPZZ+0HO2omoe1jsRA5kaEwAwoK8cPBkFa5ca5Q7\nDhE5IRY7kQNRCAIeGhUFmyhi+6FLcschIifEYidyMA8OCkaAtxa788tRZ7LIHYeInAyLncjBqJQK\nzBwVCWuLDZlHOGsnonvDYidyQOOHhkKvUyP7aBmaLa1yxyEiJ8JiJ3JAGrUSk4eHwdTUgr0FFXLH\nISInwmInclDJI8KhUirwXc4l2GzcHIaI7g6LnchB+Xh5YMyQEFTVNiK3kFu6EtHdYbETObDpIyMB\nANsPlcichIicBYudyIGFBXrhgei2zWHOl9fJHYeInACLncjBzRgZAYCzdiK6Oyx2Igc3KMoPEcF6\nHD7DZWaJqGssdiIHJwgCZoyMgCgCOw6Xyh2HiBwci53ICYwcFAJfvQd255fD3GSVOw4ROTAWO5ET\nUCkVmDIiHM2WVuzJ54I1RNQxFjuRk0gaFgYPlQI7j5RywRoi6hCLnchJ6HVqjInrhSvGJi5YQ0Qd\nYrETOZGpI8IBADtyuOsbEd0Zi53IiYQF6TGkjx/OXLqGksp6ueMQkQNisRM5mamJbQvWZPLSNyK6\nAxY7kZN5ICYAIX46HDhZiTqTRe44RORgWOxETkYhCJiaGIGWVht25ZbJHYeIHAyLncgJjY3rBZ1G\nieyjZWhptckdh4gcCIudyAnpNCpMGNobRpMFOaer5I5DRA6ExU7kpKaMCIcAnkRHRLdisRM5qSBf\nHeL7BeJCRR33aieidix2Iic2JbFtwZqdR7hgDRG1YbETObHBUX4IDfBEzukqGHnpGxGBxU7k1ARB\nQHJCOFpaRezmpW9EBBY7kdMbG9cLWg8lso/x0jciYrETOT2dRoXxD4TiWoMFx85x1zcid8diJ3IB\nydd3fdt5mCfREbk7FjuRC+jl74m4vv44W2rkrm9Ebo7FTuQiptyYtR/hgjVE7ozFTuQiHogJQJCv\nFgdOVqKh0Sp3HCKSCYudyEUorl/6Zm2x4Yf8CrnjEJFMWOxELmT80FB4qBTIPlYKmyjKHYeIZMBi\nJ3IhXlo1Rg0OQfW1JhScr5E7DhHJgMVO5GKSE9pOoss6ypXoiNwRi53IxUT1MiAmzBvHi2pQda1R\n7jhEZGeSFbvNZkNaWhpSUlKQmpqKkpKSW+7Pz8/HkiVLsHjxYvzqV7+CxcINLIh6SnJCOEQAuzhr\nJ3I7khV7ZmYmrFYr0tPTsXz5cqxdu7b9PlEUkZaWhrVr1+KTTz7BmDFjUFrKa2+JekpibDAMnmrs\nyS+HxdoqdxwisiPJiv3o0aOYMGECACA+Ph4FBQXt9124cAG+vr5Yv349UlNTUVdXh+joaKmiELkd\ntUqBifG9YWpqwcFTlXLHISI7kqzYGxoaoNfr228rlUrYbG07T9XW1uLYsWN44oknsH79euzfvx8H\nDhyQKgqRW5o0LAyC0HYSnchL34jchkqqB9br9TCZTO23bTYbFIq21xG+vr6IjIxsn6VPmDABBQUF\nGD16dKePGRRkkCou3YTjLD17jHFQkAEjB/fCwROXUdvYgtgof8m/pyPh81h6HGPHJFmxJyQkIDs7\nG7NmzUJubi5iY2Pb74uIiIDZbEZJSQkiIyNx5MgRzJ8/v8vHrK7m5hZSCwoycJwlZs8xHh/XVuyb\nd57Dsw8Ptsv3dAR8HkuPY2wf3XnxJFmxT5s2DXv37kVKSgoAYM2aNcjIyIDZbMaCBQvw+uuv48UX\nX4QoikhISEBSUpJUUYjc1uA+fgjx90TO6UqkTOkHg6eH3JGISGKC6ERvvvHVofT4Klx69h7jHTmX\nsHHnOTw+KQazRkfZ7fvKic9j6XGM7aM7M3YuUEPk4sY90AseagWyj5XBZnOa1/FE1E0sdiIX56lV\nY/TgXrhibMJxrh9P5PJY7ERuIDkhDACQfYwr0RG5OhY7kRuIDOH68UTugsVO5Cba14/nrJ3IpbHY\nidxE+/rxeVw/nsiVsdiJ3MTN68fnnK6SOw4RSYTFTuRGkob1hgCeREfkyros9q1bt8JqtdojCxFJ\nLNBHh/h+gThfXoeLl+vkjkNEEuiy2Pfs2YPp06fj97//PfLz8+2RiYgkdOPSt6yjnLUTuaIui33N\nmjX45ptvEB8fj3Xr1uGxxx7DBx98gJoaLnRB5IwG9/VHsK8OB09WoqGRR+OIXM1dvcfu6emJsLAw\nhIaGor6+HmfOnMHSpUuxYcMGqfMRUQ9TCAImDQ+DtcWGvccr5I5DRD2sy93d/vznPyMjIwNhYWGY\nN28eXn75ZWg0GjQ0NGDKlClITU21R04i6kHjh4biiz3nkX20DNMejIBCEOSOREQ9pMtiVygU+PDD\nDxEREXHLx/V6Pd5//33JghGRdPQ6NUYNCsEPxytw8sJVxEUHyB2JiHpIl4fiz507d1upL126FAAw\ndOhQaVIRkeQm8yQ6IpfU4Yz93//933Hq1ClUVVUhOTm5/eOtra0IDQ21Szgikk7fUG/0DfVGXuEV\nXLnWiEBfndyRiKgHdFjsb7zxBq5du4bVq1fjlVdegSi27eOsUqkQGBhot4BEJJ3khDB88E0dduWW\nY/6kGLnjEFEP6PBQfHFxMcLDw/HMM8+gvLwcFRUVqKiowKVLl3Ds2DF7ZiQiiTw4MBheWhV255XD\n2mKTOw4R9YAOZ+wbN27E6tWrsW7dujvez0vdiJyfh1qJCfG9se1gCQ6frsKYuF5yRyKi+9Rhsa9e\nvRoAC5zI1U0aHobtB0uQdbSUxU7kAjos9s6uTxcEAf/85z8lCURE9hXsq8MDMQHIL6pB8eV6RPUy\nyB2JiO5Dh8X+H//xHx1+kcDFLIhcSnJCGPKLarDzaCmeeWiQ3HGI6D50ePKcl5cXRo0aBUEQoFAo\n2n8JgsBiJ3IxcdEBCPLVcv14IhfQ4Yw9PT2dJ88RuQmFIGDy8HB8ml2IH/IrMHNUpNyRiKibBPHG\nBepOoLq6Xu4ILi8oyMBxlpijjnFDoxUvvrMXvnoPrHl+jFOvH++oY+xKOMb2ERR07+e8dLmkbGVl\nJX75y19i1KhRGDt2LJYvX46rV692KyAROS69To1Rg0NQfa0JBee5LTORs+qy2H/3u99h5MiR2Llz\nJ7Zv3464uDisXLnSHtmIyM6mJIQD4PrxRM6sy2KvqanBkiVLoNfrYTAY8NRTT+Hy5cv2yEZEdhbV\ny4CYMG8cL6pBVa1Z7jhE1A1dFvugQYOwY8eO9tt79+5FbGyspKGISD7JCeEQAWQf46ydyBl1eFb8\n8OHDAQCiKOKLL76At7c3lEolamtrERDAvZuJXFVibDA27TyHH/Ir8OiEaGjUSrkjEdE96LDYudEL\nkXtSqxSYOKw3MvYV4+DJSkyM7y13JCK6B10eijebzfjjH/+Ixx57DD/5yU/whz/8AWYz33sjcmWT\nhoVBIQjYeaQUTnRFLBHhLor9tddeQ1NTE/7whz/gjTfegNVqxapVq+yRjYhk4u+tRUJsEC5VNeBc\nqVHuOER0Dzo8FH9DQUEBvv766/bbq1atwqxZsyQNRUTym5IQhsOnq5B5pBQDInzljkNEd6nLGTsA\nGI3GW/6sUnX5eoCInNyACF+EB+lx9Ew1rtY1yR2HiO5Slw391FNP4fHHH0dycjJEUURWVhaee+45\ne2QjIhkJgoCpieH48NvT2JVbhscmxsgdiYjuQpfFPnnyZMTFxSEnJweiKOLtt9/mdexEbmLU4BB8\nll2I73PL8fDYPlCreOkbkaPrstgXL16Mbdu2scyJ3JBGrcSE+N7YdrAEh05VYdwDoXJHIqIu3NXK\nc19++SXOnz+P8vLy9l9E5B6Sh4dBEMBL34icRJcz9ry8POTl5d328aysLEkCEZFjCfTVYVi/QBw7\ndwXnK+oQ09tH7khE1IkOi72yshKvvfYaPD09kZCQgBdffBE+PvwPTeSOpowIx7FzV7DzcClifsKf\nA0SOrMND8StXrkR0dDRWrFgBi8WCtWvX2jMXETmQQVF+CAv0Qs7pKtTWN8sdh4g60WGxV1VV4de/\n/jWSkpLw2muv3fFwPBG5B0EQMCUxHK02kbu+ETm4DotdrVbf8mcPDw+7BCIixzRmSC94aVXYdawM\n1pZWueMQUQc6LHae/UpEN9OolUgaFoaGRisOnKiUOw4RdaDDk+cKCwuRnJzcfruqqqr9tiAI2Llz\np/TpiMihJCeEYdvBEuw4XIrxQ0MhCILckYjoRzos9m3bttkzBxE5AX9vLRIHBuHQqSqcLrmGQVF+\nckcioh/psNjDw8PtmYOInMTUxAgcOlWFzMOXWOxEDuiudncjIrohprc3+oZ6I/fcFVTVmuWOQ0Q/\nwmInonsiCAKmJYZDBLDzCC99I3I0LHYiumeJA4Pho/fAnvxyNDa3yB2HiG7CYieie6ZSKpCcEI4m\nSyv25FfIHYeIbsJiJ6JumTw8DB4qBXbkXEKrzSZ3HCK6TrJit9lsSEtLQ0pKClJTU1FSUnLHz3vl\nlVfw1ltvSRWDiCSi16kx7oFQ1NQ14ciZarnjENF1khV7ZmYmrFYr0tPTsXz58jtuIpOeno5z585x\nkQsiJzXtwQgIALYfusTVKokchGTFfvToUUyYMAEAEB8fj4KCgtvuz8/Px8KFC/kDgchJ9fL3RHy/\nQFyoqENhmVHuOEQECYu9oaEBer2+/bZSqYTt+vtwVVVVeOedd5CWlsZSJ3JyM0ZGAAC+O3RJ5iRE\nBHSy8tz90uv1MJlM7bdtNhsUirbXEdu3b0dtbS2effZZXLlyBU1NTYiJicGjjz7a6WMGBRmkiks3\n4ThLz5XGODBQj367z+PouWq0CAqEBnrJHQmAa42xo+IYOybJij0hIQHZ2dmYNWsWcnNzERsb235f\namoqUlNTAQBffPEFzp8/32WpA0B1db1Ucem6oCADx1lirjjGyQlhKCw1YtP201gyfYDccVxyjB0N\nx9g+uvPiSbJD8dOmTYOHhwdSUlKwdu1arFy5EhkZGfj0009v+1yePEfk3BJjg+Fn0GDP8XKYmqxy\nxyFya4LoRG9y89Wh9PgqXHquOsbbDpbg0+xCzEuKxuwxfWTN4qpj7Eg4xvbhUDN2InIvE+N7Q+uh\nROaRUlhbuGANkVxY7ETUIzy1KkwaFgZjgwX7T1yWOw6R22KxE1GPmfZgBJQKAd8eLIHN5jTv8hG5\nFBY7EfUYP4MGY4b0QuVVM46duyJ3HCK3xGInoh41c1QkAODbg8VcgIpIBix2IupRvQO9MLx/IM6X\n1+HspWtyxyFyOyx2Iupxs0ZHAQC2Hrjzro5EJB0WOxH1uH5hPugf7oPj52twqapB7jhEboXFTkSS\nuDFr33awWOYkRO6FxU5EkhgaE4CwQC8cPFmFK8ZGueMQuQ0WOxFJQiEImDU6EjZRxLcH+V47kb2w\n2IlIMqMGhyDQR4s9eRW41tAsdxwit8BiJyLJKBUKPDQmCi2tNmw/xFk7kT2w2IlIUuPiQuFn0CD7\nWBnqzRa54xC5PBY7EUlKrVJg5qhIWKw27Dh8Se44RC6PxU5EkpsY3xvenmrsPFIKc5NV7jhELo3F\nTkSS06iVmD4yEo3Nrdh5pFTuOEQujcVORHYxeXgYvLQq7DhciiZLi9xxiFwWi52I7EKnUWFqYgQa\nGq3Ydaxc7jhELovFTkR2MzUxHFoPJbYdLEazpVXuOEQuicVORHbjpVVjamIE6sxWZB8rkzsOkUti\nsRORXc0YGQGdRomtB4r5XjuRBFjsRGRXXlo1pl1/r51nyBP1PBY7Ednd9Acj4KlRYdvBEjQ2c9ZO\n1JNY7ERkd55aNWaMjICpqQWZnLUT9SgWOxHJYmpiBLy0Knx3qATmJs7aiXoKi52IZKHTqDBzVGTb\nrJ1ryBP1GBY7EckmOSEcep0a23MuwcQ15Il6BIudiGSj06gwa3QkGptb8O0B7tdO1BNY7EQkq+SE\ncPjqPZB5+BJq65vljkPk9FjsRCQrjVqJR8b3haXFhq/3XpA7DpHTY7ETkezGDw1FiL8ndudV4PJV\ns9xxiJwai52IZKdUKDBvYjRsoogvdp+XOw6RU2OxE5FDGBEbhL6hBuScrsLFy3VyxyFyWix2InII\ngiBgflIMAODzXUUypyFyXix2InIYg/r4Y0hff5y8WIsTF6/KHYfIKbHYicih3Ji1f5ZdCJsoypyG\nyPmw2InIoUT1MmD0kBCUVDZg3/HLcschcjosdiJyOPOTYuChUmDz7iI0WbhBDNG9YLETkcPx99Zi\nxshIGBssXGqW6B6x2InIIc0aHQkfvQe2HyrB1bomueMQOQ0WOxE5JK2HCvMmxsDSYsPm73n5G9Hd\nYrETkcMa+0AvRIUYsP9EJc6Xc9EaorvBYicih6UQBKRM6QcASM86B5GXvxF1icVORA4tNtIPCQOC\nUFhqxMGTlXLHIXJ4LHYicngLk/tBrVJgU3YhGpt5+RtRZ1jsROTwgnx1mD06CsYGC/73B+7ZTtQZ\nFjsROYVZoyMR7KtD5uFSlFY1yB2HyGGx2InIKahVSiyeNgA2UcRH353hiXREHWCxE5HTGBoTgIQB\nQThbasT+E1xHnuhOWOxE5FRSpvSDh0qBT7MKYW6yyh2HyOGw2InIqQT66PDwuD6oM1uxZfd5ueMQ\nORwWOxE5nekPRiI0wBPZR8tQWGaUOw6RQ2GxE5HTUasUWDpzIEQA67eegrXFJnckIochWbHbbDak\npaUhJSUFqampKCm5devFjIwMLFiwAIsWLcKqVat4hisR3ZMBEb5ITghDRY0ZGfsuyh2HyGFIVuyZ\nmZmwWq1IT0/H8uXLsXbt2vb7mpqa8Ne//hUbNmzAxo0b0dDQgOzsbKmiEJGLmpcUA39vDbYeKMYl\nXttOBEDCYj969CgmTJgAAIiPj0dBQUH7fRqNBps2bYJGowEAtLS0QKvVShWFiFyUTqPCkzNi0WoT\nsX7rKbSwW78aAAAVGklEQVTaeEieSCXVAzc0NECv17ffViqVsNlsUCgUEAQB/v7+AIANGzagsbER\nY8eO7fIxg4IMUsWlm3Ccpccx7jlTggzILbqKXUdLsf9UNeZOatsNjmMsPY6xY5Ks2PV6PUwmU/vt\nG6V+8+0//elPKC4uxrp16+7qMaur63s8J90qKMjAcZYYx7jnzR3fB4dPVeKjb0+hf28D4gaEcIwl\nxuexfXTnxZNkh+ITEhKwe/duAEBubi5iY2NvuT8tLQ0WiwXvvPNO+yF5IqLuMHh6YMm0AbC02PD3\njJNobeUheXJfgijR6eiiKOLVV1/FmTNnAABr1qzBiRMnYDabERcXh3nz5iExMbH985cuXYqpU6d2\n+ph8dSg9vgqXHsdYOv/9vwU4dKoKT8wciORhveWO49L4PLaP7szYJSt2KfBJJD3+Z5Uex1g6DY1W\nrPrHIdSZLPhd6gj0DfWWO5LL4vPYPhzqUDwRkb3pdWo889AgtNpE/D3jJJqtrXJHIrI7FjsRuZQh\nff3x8IRoVNSY8fmuIrnjENkdi52IXM7S2YMRGuCJnUdKUXChRu44RHbFYicil6NRK/Hcw0OgVAj4\nIOMUjCaL3JGI7IbFTkQuKaqXAfOSYmA0WfDeVydgsznNecJE94XFTkQua8bICAzrF4hTxbX4mhvF\nkJtgsRORyxIEAc/MHoQAby2++uECTl68KnckIsmx2InIpel1avzs0SFQKAS899UJXGtoljsSkaRY\n7ETk8mJ6+2DB5H6oM1vx3lcnuAscuTQWOxG5hamJ4RgxIAinS65h867zcschkgyLnYjcgiAIePqh\nQQgN8MS2QyXYe7xC7khEkmCxE5Hb8NSq8It5Q+GpUeF/tp1GUZlR7khEPY7FTkRuJcTfEz97dAha\nbSLe3nIcV+ua5I5E1KNY7ETkduL6BmBhcn8YTRas23IcFm4WQy6ExU5EbmlaYjjGPdALxZfr8Y+t\np2Bznh2siTrFYicityQIAp6cMRD9wn1w6FQVPs0qlDsSUY9gsROR21KrFPjFvKEIDfDEdzmXsP1Q\nidyRiO4bi52I3Jpep8avFsTDV++BTVmFOHDystyRiO4Li52I3F6gjw6/XjAMOo0KH2Sc4pry5NRY\n7EREAMKD9XjhsQcgCMDbW47jQkWd3JGIuoXFTkR03cAoPzz38BA0W1vxfzflovhyvdyRiO4Zi52I\n6CaJA4Px09mDYW5qwZvpx1BSyXIn58JiJyL6kTFxvfDM7EHXyz0XpVUNckciumssdiKiOxj3QCiW\nzhqIhkYr/pR+DGXVLHdyDix2IqIOTIzvjSdnxqLebMUbnxzDxcs8oY4cH4udiKgTk4aFYenMWJga\n28r9VHGt3JGIOsViJyLqQtKwMPz80Ti0ttrw509zceRMldyRiDrEYiciuguJA4Pxn4/HQ6lU4N0v\nC7A7r1zuSER3xGInIrpLg/v44zeLhsNLq8aH357Glt3nuSscORwWOxHRPegb6o2VTyQgyFeLjH0X\n8f++KECzhfu5k+NgsRMR3aPQAC+8svRBDIz0xZGz1Vjz0RHUGJvkjkUEgMVORNQtep0av144DJOG\n9UZJVQNe+58cFJYa5Y5FxGInIuoulVKB1BmxWDJtABoaW/DGJ0ex7WAJ33cnWbHYiYjugyAImDIi\nHC+mDINep8an2YX42+f5qDdb5I5GborFTkTUAwZF+eHVZ0ZiSB8/5BfV4NX1OTh76ZrcscgNsdiJ\niHqIj5cHfrVwGOYlRcPYYMEbnxzF5u+LYG2xyR2N3AiLnYioBykEAbPH9MFvFg9HgLcW3+wvxu8/\nzMH5cq4zT/bBYiciksCACF/817KRSE4IQ/kVE17fcBifZRfC2sJr3klaLHYiIoloPVR4YnosfrNo\nOAJ9tPj2YAle+eAQ8ouuyB2NXBiLnYhIYgOj/PBfz4zCtMQIXLnWhL98lo+/fpaHylqz3NHIBank\nDkBE5A40HkosmtofE4aG4pPMs8grqsGJi1cxY2QkHhodBZ2GP46pZ3DGTkRkR+HBeqxYNBw/fzQO\n3l4e+GZ/MX773/vx7cFiNFv5/jvdP75EJCKyM0EQ8ODAYAyNCcB3OZew7WAJPssuwneHLmHO2D6Y\nGN8bahXnXdQ9gig6z9qH1dX1ckdweUFBBo6zxDjG0nO2MTY1WbH9UAl25JSi2doKP4MGUxPDkRQf\nBk+tY86/nG2MnVVQkOGev4bFTrfgf1bpcYyl56xjXGeyYOuBYnyfW45mayu0HkokDeuNaYkR8PfW\nyh3vFs46xs6GxU73jf9Zpccxlp6zj7GpyYpdx8qQebgURpMFSoWAYf0CkTSsNwb39YdCEOSO6PRj\n7Cy6U+yOeYyHiMiNeWnVmD2mD6Y/GIkDJy9jR04pjpytxpGz1Qjw1mJifCjGPRDqcLN4cgycsdMt\n+Cpcehxj6bnaGIuiiAsV9fg+twwHT1XCYrVBANA/whejBodgRGwQvD097JrJ1cbYUfFQPN03/meV\nHsdYeq48xo3NLTh4shIHTlzG2VIjgLb16Qf38cPwAUGIjwmwy0zelcfYkfBQPBGRi9NpVJg0PAyT\nhofhal0Tck5X4dCpShRcuIqCC1exAUBEsB5DYwLwQHQAont7Q6XkpXPuhDN2ugVfhUuPYyw9dxzj\nK9cakVdUg7yiKzhdXIuW1rYf7R5qBfqH+WBglB8GRvohqpehR4reHcdYDpyxExG5qUBfHaaMCMeU\nEeFosrTg1MVanLh4FadLruHExVqcuFgLAFApBUSGGBAd6o3oMG/07eWNID+dQ5xpTz2DxU5E5GK0\nHioMHxCE4QOCAABGkwVnSmpxpuQazpfXofhyfdv+8EfaPl+jViI8yAsRwXqEB+sRGuCFXv6e8NV7\nQGDhOx3Jit1ms+HVV1/F2bNnoVar8frrryMyMrL9/qysLLz77rtQqVSYN28eHn/8camiEBG5NR8v\nD4wcFIKRg0IAABZrK4or61FUVoeSqnpcqmrAxcv1KCqvu+XrNB5K9PLzRIi/DgE+WgT66BDoo0Wg\njxZ6b50cfxW6C5IVe2ZmJqxWK9LT05GXl4e1a9fi3XffBQBYrVasXbsWmzdvhlarxaJFi5CcnIyA\ngACp4hAR0XUeaiX6h/uif7hv+8esLTZU1JhQWt2AihozKq+acfmqGeU1JhRX3vm9dK2HEn4GDXz1\nGvjoPWDQecDbSw2DpwcMOjW8dGp4alXw0qrhpVXBQ62011/RrUlW7EePHsWECRMAAPHx8SgoKGi/\nr6ioCJGRkTAY2k4KGDFiBHJycjBz5kyp4hARUSfUKgUiQwyIDLn1ZC2bTcS1hmZcMTahxtiEK8ZG\nXDE2wWRpRVWNGdcamlFRc3f7yquUArQeKmg9lNB6qKDTKKFRt/3yUCuu/66EWqWAWqmAWqWA6vqf\nlUoBKqWi7ZdCgEIhQKkUoBSu/1mhgKBou/RPcf1jgtB2++bfIQht25oKgIDrHwPa33Jov33TjfY3\nIwTcersTPfEWhiAAQd34OsmKvaGhAXq9vv22UqmEzWaDQqFAQ0NDe6kDgJeXF+rreXYlEZGjUSgE\n+Htr266Nj/jXx28+K97a0gqjyYJ6s/X6r7Y/m5qsMDe1tP9ubm5Bk6UVjc0tqKlrQlNzC5zmsiyZ\nfP3WI/f8NZIVu16vh8lkar99o9QBwGAw3HKfyWSCj49Pl4/ZndP+6d5xnKXHMZYex1h6N49xbxlz\n0K0kW7UgISEBu3fvBgDk5uYiNja2/b7o6GgUFxfDaDTCYrEgJycHw4YNkyoKERGR25BsgRpRFPHq\nq6/izJkzAIA1a9bgxIkTMJvNWLBgAbKzs/HOO+/AZrNh/vz5WLx4sRQxiIiI3IpTrTxHREREneMC\nwkRERC6ExU5ERORCWOxEREQuhMVORETkQhyu2G02G9LS0pCSkoLU1FSUlJTccn9WVhbmz5+PlJQU\nfPbZZzKldG5djXFGRgYWLFiARYsWYdWqVeD5lfeuqzG+4ZVXXsFbb71l53Suo6txzs/Px5IlS7B4\n8WL86le/gsVikSmp8+pqjHfs2IF58+Zh/vz52Lhxo0wpXUNeXh5SU1Nv+/g9957oYLZv3y6+9NJL\noiiKYm5urvjzn/+8/T6LxSJOmzZNrKurEy0Wizhv3jzxypUrckV1Wp2NcWNjozh16lSxqalJFEVR\n/PWvfy3u3LlTlpzOrLMxvmHjxo3iwoULxbfeesve8VxGZ+Nss9nERx55RCwpKRFFURQ3bdokFhUV\nyZLTmXX1XJ48ebJoNBpv+flM9+69994T58yZIy5cuPCWj3en9xxuxn63a8yr1er2Nebp3nQ2xhqN\nBps2bYJGowEAtLS0QKvVypLTmXU2xjfuz8/Px8KFC3lE5D50Ns4XLlyAr68v1q9fj9TUVNTV1SE6\nOlquqE6rq+eyWq1GXV0dmpubIYoit3ntpqioKLz99tu3/TzoTu85XLF3tMb8jfu4xvz962yMBUGA\nv78/AGDDhg1obGzE2LFjZcnpzDob46qqKrzzzjtIS0tjqd+nzsa5trYWx44dwxNPPIH169dj//79\nOHDggFxRnVZnYwwATz/9NObNm4c5c+Zg8uTJt3wu3b3p06dDqbx997vu9J7DFbsUa8zTrTob4xu3\n33jjDezfvx/r1q2TI6LT62yMt2/fjtraWjz77LN4//33kZGRgS+//FKuqE6ts3H29fVFZGQkoqOj\noVKpMGHChNtmm9S1zsa4vLwcH3/8MbKyspCVlYWamhps27ZNrqguqTu953DFzjXmpdfZGANAWloa\nLBYL3nnnnfZD8nRvOhvj1NRUbNmyBRs2bMBzzz2HOXPm4NFHH5UrqlPrbJwjIiJgNpvbT/Y6cuQI\n+vfvL0tOZ9bZGDc3N0OhUMDDwwMKhQL+/v48itrDutN7ku3u1l3Tpk3D3r17kZKSAqBtjfmMjIz2\nNeZfeuklLFu2rH2N+eDgYJkTO5/OxjguLg6bN29GYmIinnzySQDA0qVLMXXqVDkjO52unsc343uS\n3dfVOL/++ut48cUXIYoiEhISkJSUJHNi59PVGM+dOxcpKSnQaDSIiorC3LlzZU7s3G78PLif3uNa\n8URERC7E4Q7FExERUfex2ImIiFwIi52IiMiFsNiJiIhcCIudiIjIhbDYiYiIXAiLnciBbNmyBStX\nrpQ1w9WrV5GcnNzl5w0fPhwAkJ6ejvT09A4/LysrCx9++GFPxSOiLjjcAjVE7swZF6u5sXBJR06c\nOOGUfy8iZ8ViJ3IgN68XdejQIfzlL39BU1MTjEYjVqxYgZkzZ6KsrAwrV65EbW0ttFotVq9ejdjY\nWHz44YdIT0+HUqnE5MmTsXz5cpw9exarV6+G2WzG1atX8fTTTyM1NRXr1q1Dbm4uLl++jCeeeALD\nhg3Dyy+/DFEUERcXd8dsZWVlWLFiBUwmEwYPHtyedd26dRAEAT/72c+wcuVKFBYWAgAWL16MhIQE\npKenQxAEhIWFYezYsfjd736HhoYGVFdXY/bs2XjxxRexZcsW7NmzB3V1dbh06RLGjRuHVatWQRRF\nvPnmm8jMzIRKpcLChQvx5JNPori4GL///e9x7do1aLVavPLKKxg0aJD0/0BEzqDHN5Ulom7bvHlz\n+97XL7zwgnj+/HlRFEVx37594pw5c0RRFMVnn31W/Pjjj0VRFMVdu3aJv/zlL8W8vDxx+vTpYn19\nvdjS0iI+9dRTYkFBgfj666+L+/fvF0VRFEtKSsThw4eLoiiKf/vb38TU1NT27ztnzhzxhx9+EEVR\nFD/44ANx8uTJt2V7/vnnxU2bNomiKIrbtm0TY2NjRVEUxXXr1onr1q0TDx06JD733HOiKIpibW1t\n+9/jxv03HvuLL74QRVEU6+rqxISEBPHq1avi5s2bxUmTJokmk0lsbGwUk5KSxDNnzohbt24VFy1a\nJFosFtFkMomPPPKIWF1dLS5cuFA8efKkKIqieO7cOXHGjBn3P/hELoIzdiIHcvMh6zfffBNZWVn4\n9ttvkZeXh8bGRgBATk4O/vznPwMAkpKSkJSUhA8++ADJycntW2auX78eADBo0CDs3r0b7733Hk6f\nPt3+GEDb3tpA23vqVVVVGDduHABg/vz5+Oijj27LdvDgQbz11lsAgBkzZrR/L/H6Htz9+/fHhQsX\nsGzZMiQlJWH58uW3PcYzzzyDAwcO4B//+AfOnj2LlpaW9kzDhw+Hp6cngLYNXIxGIw4fPoyHHnoI\narUaarUaX375JUwmEwoKCm45F6GxsRFGo5G7PRKBh+KJZHf48GFERkYiODgYNputfU/mRYsWYcyY\nMRg5ciTGjBmDF198EQCgVqtvOWRfWFh428cqKyuh0+nw8ssvw9fXF5MnT8ZDDz2ErVu3tn/OjZ37\nBEG45Wtv3sL3Zj/+vBs5b7wY8fX1RUZGBvbt24fvv/8ec+fOxTfffHPLY6xduxalpaV4+OGHMXXq\nVOzfv7/9MX+8k6AoilCpVLd8z9LSUvj4+ECj0dyy1W1FRQVLneg6nhVPJLPNmzcjMzMTAHDmzBlE\nRkbCaDSiuLgYv/jFLzBx4kT88MMPsNlsAIDExMT2gt67dy/S0tKQmJiI3bt3w2w2o6WlBcuXL8eJ\nEyewb98+vPDCC0hOTsahQ4cAoP1xbvDz80N4eDh27twJoG1XqTsZN24ctmzZAgDYs2cPjEYjgH+d\nF7Br1y6sWLECkyZNwssvvwxPT09UVFRAqVSipaUFALBv3z4sW7YMM2bMQHl5OSorK2/Lc7MHH3wQ\n3333XfvM/qc//SlqamoQFRWFr776qv0xU1NT73HUiVwXZ+xEMnv++efxm9/8Bh999BFCQ0Pxwgsv\nwGAw4PHHH8fs2bMREBCAadOmwWKxoKmpCWlpaXj55ZfxySefQKfTYfXq1YiJicGSJUuwcOFCiKKI\n6dOnY8yYMXjhhRewePFiBAYGIjExETExMSgtLb3tLPU//elPWLlyJd5++22MGDHijmexp6WlYcWK\nFfj8888xcOBABAYGAvjXjH38+PHYtm0bZs+eDY1GgxkzZmDAgAGoq6vDb3/7WwQFBbX/XQMCAtCv\nXz+MHj36jnluPO7UqVNx/PhxzJ07F6Io4qmnnkKfPn3w5ptvYtWqVfj73/8ODw8P/OUvf5HgX4bI\nOXHbViIiIhfCQ/FEREQuhMVORETkQljsRERELoTFTkRE5EJY7ERERC6ExU5ERORCWOxEREQu5P8D\nJ0jVGGnahs8AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "s = np.linspace(0, 1, 100)\n", "plt.plot(1-s, p(s, r, b))\n", "plt.xlabel('Jaccard distance')\n", "plt.ylabel('Probility')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2) $h_i(x)$ should be statistically independent, because we need estimate the possbility by using the product rule for independent events. \n", " eg: $s^r$ in $1 - (1 - s^r)^b$ for Jaccard similarity.\n", " \n", "3) efficient in two ways: \n", "+ identify candidate pairs in time much less than the time it takes to look at all pairs. \n", "+ They must be combinable to build functions that are better at avoiding false positives and negatives, and the combined functions must also take time that is much less than the number of pairs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "当我们使用 “band, then hash\" 这种方法来组合哈希函数达到概率 $1 - (1 - s^r)^b$ 变成 \"S\" 曲线(其实是倒”S\"型,如上图), \n", "所以我们需要找到的哈希方法,它应该有将距离接近的候选对更高概率匹配的特性,即 $P-d(x,y)$ 是倒 ”S\"。这就引出了接下的内容:local sensitive.\n", "\n", "**$(d_1, d_2, p_1, p_2)$-sensitive**:\n", "1. if $d(x, y) \\leq d_1$, then $P[f(x) = f(y)] \\geq p_1$.\n", "2. if $d(x, y) \\geq d_2$, then $P[f(x) = f(y)] \\leq p_2$.\n", "\n", "eg: For Jaccard distance, the family of minhash functinos is a $(d_1, d_2, 1-d_1, 1-d_2)$-sensitive family for any $d_1$ and $d_2$ where $0 \\leq d_1 < d_2 \\leq 1$." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAfYAAAFkCAYAAADSRRn0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl8VPW9//H3mclkIwkhIUQCJBqURbgBEQVRpICgPMTK\npgYQN9p6W/Vq3XFBrkqFVktRoCj2wg+qgIjgJaWiIbFarkgUCIayL2ETwhLIzkwy5/cHZWqKyWRC\nzkwyeT0fDx9yZpLDO58H8J5z5sz3GKZpmgIAAEHBFugAAACg4VDsAAAEEYodAIAgQrEDABBEKHYA\nAIIIxQ4AQBCxvNhzc3M1YcKECx7PysrSmDFjlJ6ermXLllkdAwCAZiHEyp3PmzdP//u//6sWLVpU\ne9zlcmnatGlavny5wsPDNXbsWA0aNEjx8fFWxgEAIOhZesSekpKiWbNm6d/XwNmzZ4+Sk5MVHR0t\nh8Ohq6++Wjk5OVZGAQCgWbC02IcOHSq73X7B4yUlJYqOjvZst2jRQsXFxVZGAQCgWbD0VHxNoqOj\nVVpa6tkuLS1Vy5Yta/2e5+b8Xa5Kt9XRGqU6rflrnv/fv77abUoyzz1imue+xpQp05TcpinTNOV2\nm3K7pSrPr92qrDJVWfWv/7vdvq86bDOkVjHhSoiNUOvYCCW0ilRiXKSSE6PVITFaLaNCZRiGz/sF\nANQuIMWempqq/Px8nTlzRhEREcrJydHEiRNr/Z7f/OoGHT/OUb3VEhKiL5iz223KWVklZ6VbLpdb\nzsoqVTirVHG2UuXOKlU4K1VaXqnicqdKylwqLnOpqMypwuKz2nXwtLbnF17w+7QID1FS6xa69JIY\ndWwXo8vbtVRcTLi/fsyA+rEZo2ExY+sxY/9ISIj2/kX/xi/Ffv7ILCMjQ2VlZbrzzjv17LPPauLE\niXK73RozZozatGnjjyioB5vNUHhoiMJDff9et9vUmVKnThZV6HhhuY6cLNWRE+f+2334jHYdOqPP\nvjn3ta2iw3RF+5ZK6xiv7qnxiomsx28IAM2c0ZTu7sarQ+v581X4WVeV8o8Wa8/hM9p9+Iz2HClS\nUalTkmRISk2KUVrHeF3bNVGJcZF+yeQPHOlYjxlbjxn7R6M9Ygd+TJjDrk4dYtWpQ6wkyTRNHTlR\nqi17Tip39wntPlykPUeKtOLLfbqifUvdkNZW13Rpo/BQ/tgCQE34FxKNhmEYapcQpXYJURrWN0Wl\nFS5t2X1Sf//ue23LL9SuQ2f0/me71OfKNrr52mS1jW/hfacA0MxQ7Gi0WoQ7dF33S3Rd90t04nS5\n1uUd1brvvtcXud/ry9zv1btLG916XYqSE30/VQUAwYpiR5PQOjZCt99wmW67/lJt3HFcGV/tV872\nAuVsL1Bax3iNujGVggcAUexoYmyGod5d2ujqzgn6bu8pZXy1X1v2nNR3e09q4FXtNKJ/qqIiHIGO\nCQABQ7GjSTIMQ2kd45XWMV55e0/q/cxdytp4WBu2FWjMTzrqhrS2srEADoBmiNu2osnrnhqvlyde\nqzsGdpSryq0Ff92uqQu/1bFTZYGOBgB+R7EjKITYbRrWJ0W/+Xlf9bkyUfu+L9KU+Tn6csuRC25C\nBADBjGJHUGkVHaYHf9pND/60m2w2Q/NXb9fcj7eqrMIV6GgA4Be8x46g1OfKRHVMitE7q/6hnO0F\n2nvkjP5zRHd1TKr9ZkMA0NRxxI6g1To2Qs+Mv0q333CZThWf1W/f36RvthcEOhYAWIpiR1Cz22y6\n/YbL9OiYHrLZDM1ZmafV6/N53x1A0KLY0SykdYzXpPG91Co6TB9+vkcL/rpdlVXuQMcCgAZHsaPZ\nSE6M1gv39FZKYrS+3PK9/rAsV2edVYGOBQANimJHs9IqOkzPju+lnpe31j/2F2rmh5Q7gOBCsaPZ\nCQu161cju+vqzgnafuA05Q4gqFDsaJZC7DY9+NNuuroT5Q4guFDsaLZC7DY9eDvlDiC4UOxo1s6X\ne69/lvuclXmqcnO1PICmi2JHsxdit+k/b++mbpfF6bu9J/X+Z7v4nDuAJotiB3Su3H81orvaJ7RQ\n9qbDWrPhYKAjAUC9UOzAP0WEheixO3ooNipUH2TvZvlZAE0SxQ78QFxMuB67o4fCQu2al/EP7Tl8\nJtCRAMAnFDvwb5ITo/XL27upssqtN5dv0amiikBHAoA6o9iBH5HWsbXGDr5CxWUuzf14K+vKA2gy\nKHagBoOvbq9ru7bR7sNn9OHnewIdBwDqhGIHamAYhu69pYsuiYvUpzkH9e0OLqYD0PhR7EAtIsJC\n9KuR3RUaYtP/rN6mY4VlgY4EALWi2AEv2idEacLNnVV+tkp/XJEnp4tlZwE0XhQ7UAfX/0db3dij\nrQ4UlGhp9u5AxwGAGlHsQB2Nu6mTklq3UPbGw9q671Sg4wDAj6LYgToKddj1s+FdZbcZ+p/V21RW\n4Qp0JAC4AMUO+ODSS2I0vN+lKiw+q8WZuwIdBwAuQLEDPrr1uhSlXBKtdXlHtWnn8UDHAYBqKHbA\nRyF2m342/EqF2G36f59sV1GZM9CRAMCDYgfqoV3rFhp1Y6qKylxatGYH928H0GhQ7EA9Db2mgzq1\nb6lvdxzXpl0nAh0HACRR7EC92WyG7h3WRSF2Q+99tlMVzspARwIAih24GG3jW2hYnxQVFp/Vyi/3\nBToOAFDswMW69boUtYmNUOY3h3TgWHGg4wBo5ih24CKFOuy6e2gnuU1TC9fskJsL6QAEEMUONIDu\nqfG6tmsb7T1SpC82Hwl0HADNGMUONJD0wVcoIsyuDz/fozOlfLYdQGBQ7EADiY0K06gbO6rsbKU+\n5A5wAAKEYgca0MCr2im5TZTW5R3V/qNFgY4DoBmi2IEGZLMZumvwFZKkJZm7WJEOgN9R7EAD65rS\nSldd0Vo7D53Rtzu4SQwA/6LYAQvcOfBy2W2GPsjeLVdlVaDjAGhGKHbAAolxkRp8dXudOFOhzG8O\nBToOgGaEYgcsctv1lyoqwqFV/7dfRXz8DYCfUOyARVqEO3T7DZepwlmllV/uDXQcAM0ExQ5YaEDP\nJLWNj9Tfco+ossod6DgAmoGQQAcAglmI3ab/Gp2m/GPFCrHzOhqA9Sh2wGKJcZFKjIsMdAwAzYRl\nhxBut1uTJ09Wenq6JkyYoAMHDlR7/rPPPtPo0aM1ZswYLV682KoYAAA0K5YdsWdmZsrlcmnJkiXK\nzc3VtGnTNGfOHM/zr732mlauXKmIiAjdeuutGj58uKKjo62KAwBAs2BZsW/cuFH9+/eXJPXo0UN5\neXnVnnc4HCoqKpLNZpNpmjIMw6ooAAA0G5YVe0lJiaKiojzbdrtdbrdbNtu5s//333+/Ro8erYiI\nCA0dOrTa19YkIYEjen9gztZjxtZjxtZjxo2TZcUeFRWl0tJSz/YPS/3IkSN67733lJWVpYiICD31\n1FP65JNPdMstt9S6z+PHi62Ki39KSIhmzhZjxtZjxtZjxv5RnxdPll0816tXL33xxReSpM2bN6tz\n586e586ePSubzabQ0FDZbDbFxcWpuJg/IAAAXCzLjtiHDBmidevWKT09XdK5i+UyMjJUVlamO++8\nUyNHjlR6errCwsKUkpKikSNHWhUFAIBmwzCb0A2jOe1jPU6vWY8ZW48ZW48Z+0ejOhUPAAD8j2IH\nACCIUOwAAAQRih0AgCBCsQMAEEQodgAAggjFDgBAEKHYAQAIIhQ7AABBhGIHACCIUOwAAAQRih0A\ngCBCsQMAEEQodgAAggjFDgBAEKHYAQAIIhQ7AABBhGIHACCI1KnYv/nmGy1evFhnz55VTk6O1ZkA\nAEA9eS32BQsWaObMmVqwYIFKS0v14osv6t133/VHNgAA4COvxb5ixQr96U9/UkREhOLi4rR8+XIt\nX77cH9kAAICPvBa73W5XaGioZzssLEwhISGWhgIAAPXjtaGvueYaTZs2TWVlZcrMzNTSpUvVp08f\nf2QDAAA+8nrE/swzzyglJUVdunTRypUrNWDAAD377LP+yAYAAHzk9Yh93rx5evDBBzV27FjPY7//\n/e/1+OOPWxoMAAD4rsZif/3113Xy5EllZWVp//79nscrKyuVm5tLsQMA0AjVWOxDhw7V7t279dVX\nX+naa6+VaZqSzl1M99BDD/ktIAAAqLsaiz0tLU1paWkaMmSIoqOjPY+73W4dPnzYL+EAAIBvvL7H\nvnLlSs2YMUPl5eWeo/aOHTvqL3/5i+XhAACAb7wW+/z58/Xxxx9rxowZevzxx7Vhwwbt3bvXH9kA\nAICPvH7cLS4uTh06dFCXLl20c+dOjRo1St98840/sgEAAB95LfbIyEitX79enTp1UnZ2tgoKCnTi\nxAl/ZAMAAD7yWuwvvPCCsrKydOONN+r06dMaNmyY7r77bn9kAwAAPjLM81fENQHHjxcHOkLQS0iI\nZs4WY8bWY8bWY8b+kZAQ7f2L/k2NF88NGjSoxm8yDENr1671+TcDAADWqrHYP/zwQ0nSjBkzdNll\nl2nMmDGy2WzKyMjQ7t27/RYQAADUXY3vscfFxSkuLk55eXl64IEHFBMTo6ioKKWnp+vbb7/1Z0YA\nAFBHXi+eMwxD69at82yvXbtWDofD0lAAAKB+vC5QM3XqVD399NMqKCiQaZpq3769fve73/kjGwAA\n8JHXYu/atatWrVqlwsJCGYah2NhYf+QCAAD1UGOxv/DCC3r11Vc1YcKEC54zDEMLFy60NBgAAPBd\njcWenp4uSXr44YcveM4wDOsSAQCAequx2MvKypSTkyPDMGQYhufObpQ6AACNV43F/qc//UmGYaig\noED79+9X3759FRISoq+//lqdO3fmVDwAAI1QjcX+9ttvS5ImTpyoN998U+3bt5ckFRQU6KmnnvJP\nOgAA4BOvn2M/cuSIp9QlKSEhQceOHbM0FAAAqB+vH3dLS0vTk08+qVtvvVVut1sff/yx+vTp449s\nAADAR16L/ZVXXtGf//xnLV26VJJ0/fXXa+zYsZYHAwAAvvNa7KGhobrjjjt0yy23eK6MLygoUFJS\nkuXhAACAb7wW+9y5c/XOO+9csOJcVlaWZaEAAED9eC32ZcuWKTMzU3Fxcf7IAwAALoLXq+KTkpIU\nExPjjywAAOAieT1iT0lJ0bhx49S3b1+FhoZ6Hv+xpWYBAEBgeS32xMREJSYmerZN06zTsrJut1tT\npkzRzp075XA4NHXqVCUnJ3ue37Jli6ZPny7TNJWYmKjp06dXe+EAAAB857XYH3nkkWrbbrdbhw4d\n8rrjzMxMuVwuLVmyRLm5uZo2bZrmzJkj6dyLg8mTJ+utt95Shw4d9MEHH+jQoUNKTU2t548BAACk\nOhT7okWLNGPGDJWXl3s+7taxY0f95S9/qfX7Nm7cqP79+0uSevTooby8PM9z+/btU2xsrObPn69d\nu3ZpwIABlDoAAA3A68Vz8+fP18cff6xhw4YpMzNTv/nNbzRw4ECvOy4pKVFUVJRn2263y+12S5IK\nCwu1adMm3X333Zo/f76++uorrV+//iJ+DAAAINXhiD0uLk4dOnRQly5dtHPnTo0aNcpzr/baREVF\nqbS01LPtdrtls517HREbG6vk5GTPUXr//v2Vl5envn371rrPhIRor78vLh5zth4zth4zth4zbpy8\nFntkZKTWr1+vTp06ae3aterevbtOnDjhdce9evVSdna2hg0bps2bN6tz586e5zp06KCysjIdOHBA\nycnJ+vbbbzVmzBiv+zx+vNjr1+DiJCREM2eLMWPrMWPrMWP/qM+LJ6+n4l944QVlZWXpxhtv1OnT\npzVs2DDdfffdXnc8ZMgQhYaGKj09XdOmTdOkSZOUkZGhDz74QKGhoZo6daqeeOIJjRkzRm3bttWA\nAQN8Dg8AAKozzPNXxNVi69at6tatm4qKirR161Zdd911/sh2AV4dWo9X4dZjxtZjxtZjxv5hyRH7\n66+/rtdff12SVF5erjlz5ujNN9/0PR0AALCc12LPzs7Wu+++K+ncYjXz58/Xp59+ankwAADgO6/F\nXlVVpfLycs+20+ms08pzAADA/7xeFZ+enq7Ro0dr0KBBMk1TX3zxhcaPH++PbAAAwEdei/2+++5T\nr169lJOTI4fDoddff11XXnmlP7IBAAAfeS12SUpLS1NaWprVWQAAwEXy+h47AABoOih2AACCiNdT\n8WfPntXf/vY3lZWVSZIqKyt1+PBhPfroo5aHAwAAvvFa7A8//LAqKiqUn5+va665Rjk5ORo8eLA/\nsgEAAB95PRW/b98+LVy4UEOGDNHEiRO1bNkyff/99/7IBgAAfOS12Fu3bi3DMJSamqodO3YoMTFR\nx48f90c2AADgI6+n4i+//HK98sorGjt2rJ588kkVFBTI6XT6IxsAAPCR1yP2KVOmaNiwYbr88sv1\nyCOP6Pjx43rjjTf8kQ0AAPioTrdtbSy4RaD1uBWj9Zix9Zix9Zixf1hy21YAANB0UOwAAAQRrxfP\nbdiwodptWg3DUFhYmFJSUhQTE2NpOAAA4BuvxT5nzhx99913uu666ySdK/qkpCSVlJTo0Ucf1W23\n3WZ5SAAAUDdei900Ta1atUpJSUmSpGPHjmnSpElatGiRJkyYQLEDANCIeH2P/dixY55Sl+RZoCY6\n2vcr9QAAgLW8HrH36tVLTzzxhG677TZVVVVp9erVuuqqq/T5558rMjLSHxkBAEAdef0cu8vl0pIl\nS/R///d/stvt6tevn+68806tW7dOHTt2VPv27f2Vlc9M+gGfTbUeM7YeM7YeM/aP+nyO3esRu8Ph\n0IgRIzR48GCdfw1QUFCgAQMG+J4QAABYymuxz507V++8845iY2OrPZ6VlWVZKAAAUD9ei33ZsmXK\nzMxUXFycP/IAAICL4PWq+KSkJBaiAQCgifB6xJ6SkqJx48apb9++Cg0N9Tz+8MMPWxoMAAD4zmux\nJyYmKjEx0bNtmma1JWYBAEDj4bXYH3nkEX/kAAAADaDGYh8xYoRWrlypLl26XPCcYRjatm2bpcEA\nAIDvaiz2lStXSpK2b9/utzAAAODieL0q3ul06o9//KOefvppFRUVadasWXI6nf7IBgAAfOS12P/7\nv/9bZWVl2rp1q+x2u/Lz8/X888/7IxsAAPCR12LfunWrnnjiCTkcDrVo0UK//e1v9Y9//MMf2QAA\ngI+8FrvNZqt26r2wsFA2m9dvAwAAAeD142733HOP7r//fp04cUKvvvqqMjMz9dBDD/kjGwAA8JHX\nYh8xYoS6deumDRs2qKqqSnPnzv3Rj8ABAIDA81rsknTkyBEdPHhQdrtdp06dsjoTAACoJ69vls+Y\nMUNvv/222rVrpzZt2mjmzJl6++23/ZENAAD4yOsRe1ZWlj766CM5HA5JUnp6ukaNGqUHH3zQ8nAA\nAMA3Xo/Yo6KiVFZW5tl2uVyKjo62NBQAAKifGo/YX3nlFUlSaGioRo4cqaFDh8pmsyk7O1uXXXaZ\n3wICAIC6q7HYu3XrJsMw1L1792q3ar388su5bSsAAI1UjcU+atQof+YAAAANgCXkAAAIIjUW+/79\n+/0YAwAANIQai/3Xv/61JOlXv/qV38IAAICLU+N77IZhKD09XTt27NCECRMueG7hwoWWhwMAAL6p\nsdgXLlyobdu26bnnntMjjzziuTL+h1fIAwCAxqXGYo+KitI111yjpUuXyjRN5ebmyu12q2fPnmrd\nurU/MwIAgDryelX81q1bNWLECH300UdasWKFbrvtNmVlZfkjGwAA8JHXteJnzJih999/Xx06dJAk\nHTx4UA899JAGDRpkeTgAAOAbr0fslZWVnlKXpA4dOsg0Ta87drvdmjx5stLT0zVhwgQdOHDgR7/u\nxRdf1BtvvOFDZAAAUBOvxd62bVstWLBAJSUlKikp0YIFC9SuXTuvO87MzJTL5dKSJUv05JNPatq0\naRd8zZIlS7Rr1y4uxgMAoIF4LfapU6dq06ZNuummmzR48GBt3LhRL7/8stcdb9y4Uf3795ck9ejR\nQ3l5eRc8v2XLFt111111OgMAAAC88/oee+vWrTVz5kyfd1xSUqKoqCjPtt1ul9vtls1mU0FBgWbP\nnq3Zs2dr9erVPu8bAAD8OK/FXl9RUVEqLS31bJ8vdUlas2aNCgsL9fOf/1wnTpxQRUWFOnbsqBEj\nRtS6z4QE7gPvD8zZeszYeszYesy4cbKs2Hv16qXs7GwNGzZMmzdvVufOnT3PTZgwwbOa3YoVK7R3\n716vpS5Jx48XWxUX/5SQEM2cLcaMrceMrceM/aM+L568vse+evVquVwun3c8ZMgQhYaGKj09XdOm\nTdOkSZOUkZGhDz744IKv5eI5AAAahmF6uXJt0qRJWr9+vX7yk59o5MiRSktL81e2C/Dq0Hq8Crce\nM7YeM7YeM/aP+hyxez0V/9prr6msrEyffvqp3nrrLZ08eVK33nqrRowYofj4+HoFBQAA1vB6Kl6S\nIiMj1a5dO7Vt21bFxcXasWOH7r33Xi1atMjqfAAAwAd1WlI2IyND7dq10+jRo/X8888rLCxMJSUl\nGjx48AW3dAUAAIHjtdhtNpsWLFhQbVlZ6dzH2ebNm2dZMAAA4Duvp+J37dp1Qanfe++9khTQC+kA\nAMCFajxif+ihh7Rt2zYVFBRUu5NbVVWV2rZt65dwAADANzUW+/Tp03X69Gm9+uqrevHFFz3ruYeE\nhKh169Z+CwgAAOquxmLPz89Xt27d9MADD+jIkSPVnjt48KCuueYay8MBAADf1Fjsixcv1quvvqq3\n3nrrR5/no24AADQ+Xleea0xY5ch6rCZlPWZsPWZsPWbsHw268lxtn083DEMLFy70+TcDAADWqrHY\nH3744Rq/iZu2AADQONX4OfYWLVqoT58+MgxDNpvN859hGBQ7AACNVI1H7EuWLOHiOQAAmhgunkM1\nXBBjPWZsPWZsPWbsH/W5eM7rkrLHjh3To48+qj59+qhfv3568sknderUqXoFBAAA1vJa7M8995yu\nvfZarV27VmvWrFH37t01adIkf2QDAAA+8lrsJ0+e1Pjx4xUVFaXo6Gjdd999Onr0qD+yAQAAH3kt\n9q5du+qzzz7zbK9bt06dO3e2NBQAAKifGq+Kv+qqqyRJpmlqxYoViomJkd1uV2FhoeLj4/0WEAAA\n1F2Nxb5p0yZ/5gAAAA2gxmI/r6ysTLNmzdL69etVWVmpvn376rHHHlNkZKQ/8gEAAB94fY/9lVde\nUUVFhX7zm99o+vTpcrlceumll/yRDQAA+MjrEXteXp5WrVrl2X7ppZc0bNgwS0MBAID68XrELkln\nzpyp9uuQEK+vBwAAQAB4bej77rtPd9xxhwYNGiTTNJWVlaVf/OIX/sgGAAB85LXYBw4cqO7duysn\nJ0emaWrWrFl8jh0AgEbKa7GPGzdOn3zyCWUOAEAT4LXYu3btqpUrVyotLU3h4eGex5OSkiwNBgAA\nfOe12HNzc5Wbm3vB41lZWZYEAgAA9ee12ClwAACajho/7nbs2DE9/PDDGj58uCZPnqyioiJ/5gIA\nAPVQY7FPmjRJqampeuqpp+R0OvXaa6/5MxcAAKiHGk/FFxQU6PHHH5ck9evXT7fffrvfQgEAgPqp\n8Yjd4XBU+3VoaKhfAgEAgPqrsdhN0/RnDgAA0ABqPBW/e/duDRo0yLNdUFDg2TYMQ2vXrrU+HQAA\n8EmNxf7JJ5/4MwcAAGgANRZ7+/bt/ZkDAAA0gDrdthUAADQNFDsAAEGEYgcAIIhQ7AAABBGKHQCA\nIEKxAwAQRCh2AACCCMUOAEAQodgBAAgiFDsAAEGEYgcAIIhQ7AAABBGKHQCAIEKxAwAQRCh2AACC\nSI33Y79YbrdbU6ZM0c6dO+VwODR16lQlJyd7ns/IyNDChQtlt9vVqVMnTZkyRYZhWBUHAIBmwbIj\n9szMTLlcLi1ZskRPPvmkpk2b5nmuoqJCM2fO1KJFi7R48WKVlJQoOzvbqigAADQblhX7xo0b1b9/\nf0lSjx49lJeX53kuLCxMS5cuVVhYmCSpsrJS4eHhVkUBAKDZsOxUfElJiaKiojzbdrtdbrdbNptN\nhmEoLi5OkrRo0SKVl5erX79+XveZkBBtVVz8AHO2HjO2HjO2HjNunCwr9qioKJWWlnq2z5f6D7d/\n97vfKT8/X2+99Vad9nn8eHGD50R1CQnRzNlizNh6zNh6zNg/6vPiybJT8b169dIXX3whSdq8ebM6\nd+5c7fnJkyfL6XRq9uzZnlPyAADg4lh2xD5kyBCtW7dO6enpkqTXXntNGRkZKisrU/fu3bV8+XL1\n7t1b99xzjyTp3nvv1U033WRVHAAAmgXDNE0z0CHqitM+1uP0mvWYsfWYsfWYsX80qlPxAADA/yh2\nAACCCMUOIGh9X3pMhRWnAx0D8CuKHUDQmrtlgaZ/86bOnOW9YDQfFDuAoNW/XV8VO0u04B+L5Tbd\ngY4D+AXFDiBoDe5wo/6j9ZXaWbhbf92XGeg4gF9Q7ACClmEYuqfrnYoLb6W/7l+r7ad2BToSYDmK\nHUBQi3REamL38bIZNi3YulhnzhYFOhJgKYodQNC7NCZZIy+/VcWuEs3f+r6q3FWBjgRYhmIH0Cz8\npP316pnQXbtO79XHe/8a6DiAZSh2AM2CYRi6u+sdSoxso7UHvtDX338b6EiAJSh2AM1GREiE/jPt\nXkWEROj97R9q35n8QEcCGhzFDqBZaROZoIndxqvKdOud7xayMh2CDsUOoNnpGt9Jo64YriJnsd75\n7v/JWeUKdCSgwVDsAJqlge1vUN9LeutA8WH9edsHrEyHoEGxA2iWDMNQepdRSm15qb4tyNWK3X8J\ndCSgQVDsAJothy1E/5l2nxIj2yjr4Jdae+CLQEcCLhrFDqBZa+GI1EM9JqplaIw+2p2hb45uCnQk\n4KJQ7ACavfiIVnqo50RFhIRr4bYPWFMeTRrFDgCS2kW11S/+414ZkuZ9t1D5RQcDHQmoF4odAP6p\nU6uOurfbWJ2tcmrW5nd1sPhwoCMBPqPYAeAHerVJ0z1X3qXyygq9tWmeDhUfCXQkwCcUOwD8m2sv\n6aW7u95NDnIUAAAOdElEQVShsspyvbn5HR0u+T7QkYA6o9gB4Ef0bdtb47qMVqmrTG9uekdHSo4G\nOhJQJxQ7ANSgX9K1Gtt5lEpcpZq56W0dKDoU6EiAVxQ7ANTihnZ9Na7zuSP3P2yaq52FuwMdCagV\nxQ4AXlzfro8e6D5eVe4qzd78J20u+C7QkYAaUewAUAe92qTplz0ekN1m17t5f9a6I18HOhLwoyh2\nAKijLnFX6NGrHlQLR6Te375cq/au4a5waHQodgDwQUpMB/261y/VOjxOn+xfqz/l/Vlnq5yBjgV4\nUOwA4KNLWrTRU9c8oitiU7X5eJ5+/+0cnaooDHQsQBLFDgD1EuVooUd6/lw3JPXRoZIj+m3OW9p7\nZn+gYwEUOwDUl91mV3rnUbqj0+0qrSzTjI1zlXngb7zvjoCi2AHgIhiGoZ+0v16P9PyZohwttGL3\nXzR3ywKVOEsDHQ3NFMUOAA2gU6vLNenax9Sl1RXaenK7Xsv5g3af3hfoWGiGKHYAaCAxodF6qOdE\n/TT1FhU5i/WHjXP18Z6/yuWuDHQ0NCMUOwA0IJth082XDtKjVz2ouPBYfZqfrWk5M7W/6ECgo6GZ\noNgBwAKXx16m5659XDe266ejpcf0+jeztXL3armqXIGOhiBHsQOARcJDwnRX5xF69KoHFR/eSp8d\n+FxTN/xeeSe2BToaghjFDgAW69Sqo57r87gGdrhBJysK9cct8/XH3PkqKDsR6GgIQiGBDgAAzUGY\nPVRjrviprmt7jZbt/Fh5J7dp+6mdGpR8o4amDFRESHigIyJIcMQOAH7ULqqtHr3qQU3sfreiQ6P1\naX62Xvpqmj7L/1xO1pxHA+CIHQD8zDAM9WqTpu7xXZR18EtlHvibVu5ZrayDX+rmSwfp+qQ+ctj4\n5xn1Y5imaQY6RF0dP14c6AhBLyEhmjlbjBlbr6nNuMxVprUHvlDWob/LWeVUbFhL/aT99bqhXR9F\nhEQEOt6PamozbqoSEqJ9/h6KHdXwl9V6zNh6TXXGxc4SfZqfrb8f+VrOKqfC7WG6PqmPBna4Qa3C\nYwMdr5qmOuOmhmLHReMvq/WYsfWa+ozLXGX6++GvlX3o7ypyFstm2JTW+kpdn9RHXeKukM0I/OVR\nTX3GTUV9ip03cQCgkYl0RGropQM1MLm/vjm6SdmH/q7Nx/O0+Xie4sJbqV/ba9W37dWN7igejQNH\n7KiGV+HWY8bWC7YZm6ap/OKDWnf4a31zbLOcbpcMGeoYe6l6J/ZUz4T/UHRolF8zBduMGytOxeOi\n8ZfVeszYesE84/LKCn1zbLNyjm7SnjPn7h5nM2zq3Opy9Ujopu7xXf1yJB/MM25MOBUPAEEuIiRc\n/dv1Vf92fVVYcVobC7bo22O52nZqp7ad2ilphdpFtVX3+K66Mr6zLo3poBA+OtescMSOangVbj1m\nbL3mOOOT5aeUd3K78k5s087C3ao0qyRJoTaHUlteqk6tOqpTq47qEN2uQYq+Oc44EDhiB4BmKj4i\nTgPa99OA9v1UUXlWOwp3a/upXdp5eo+2F+7S9sJdkqQQw6720e10aUwHXRaTrOSY9modEd8orrRH\nw6DYASDIhIeEqUdCN/VI6CZJKnIWa1fhXu06vVf7iw7oQPEh7S86oM+1TpIUag9VuxaXqF1UW7WL\naqtLWrRRm8gEtQyNkWEYgfxRUA+WFbvb7daUKVO0c+dOORwOTZ06VcnJyZ7ns7KyNGfOHIWEhGj0\n6NG64447rIoCAM1aTGi0rk7soasTe0iSnFUuHSw+rH1F+TpUfESHS75XfvEh7Ss6UO37wuyhahOZ\noDYRrRUX3krxEa0UFx6n+PBWio51BOJHQR1YVuyZmZlyuVxasmSJcnNzNW3aNM2ZM0eS5HK5NG3a\nNC1fvlzh4eEaO3asBg0apPj4eKviAAD+KdTuUMfYS9Ux9lLPYy53pY6WFuhIyfc6VnZcBWXHdazs\nuI6WHtPB4sM/up9we5hahrVUy7AYtQyNVlRoC0U7ohQV2kJRjii1cEQqMiRCkY4IRYZEKtTOiwF/\nsKzYN27cqP79+0uSevTooby8PM9ze/bsUXJysqKjz10UcPXVVysnJ0e33HKLVXEAALVw2ELUITpJ\nHaKTqj3uNt06c7ZIJysKdaqiUCfLC3Wq4pTKzFIdLynU6bNndKysoE6/R4hhV1hImMLt4QoPCVO4\nPUyh9lCF2UPlsIUqzO5QqD1UIbYQOWwhctgcCrGFnPvPsMtusyvEFiK7YZfdsMlu2GUzbLIZNtlt\nNhmyyWYYnscMGbIZhgwZMgxDhmwyDMmQIcn4wa/lecvh/Pb556s/9uPbP6oB3sGwyZDUiC6eKykp\nUVTUvxZMsNvtcrvdstlsKikp8ZS6JLVo0ULFxVxdCQCNjc2wqVV47D8/G3+Z5/EfXhXvqnKpyFmi\nEleJSlylKnGWqthVojJXucoqy1XmKlNZZbnKKytUUXVWFZUVOlVRqIrKszLVZD6YFRAf3PVHn7/H\nsmKPiopSaWmpZ/t8qUtSdHR0tedKS0vVsmVLr/usz2X/8B1zth4zth4ztt4PZ5ykuAAmwQ9Z9vmG\nXr166YsvvpAkbd68WZ07d/Y8l5qaqvz8fJ05c0ZOp1M5OTnq2bOnVVEAAGg2LFugxjRNTZkyRTt2\n7JAkvfbaa9q6davKysp05513Kjs7W7Nnz5bb7daYMWM0btw4K2IAANCsNKmV5wAAQO1YaggAgCBC\nsQMAEEQodgAAggjFDgBAEGl0xe52uzV58mSlp6drwoQJOnCg+trFWVlZGjNmjNLT07Vs2bIApWza\nvM04IyNDd955p8aOHauXXnpJXF/pO28zPu/FF1/UG2+84ed0wcPbnLds2aLx48dr3Lhx+vWvfy2n\n0xmgpE2Xtxl/9tlnGj16tMaMGaPFixcHKGVwyM3N1YQJEy543OfeMxuZNWvWmM8++6xpmqa5efNm\n85e//KXnOafTaQ4ZMsQsKioynU6nOXr0aPPEiROBitpk1Tbj8vJy86abbjIrKipM0zTNxx9/3Fy7\ndm1AcjZltc34vMWLF5t33XWX+cYbb/g7XtCobc5ut9u8/fbbzQMHDpimaZpLly419+zZE5CcTZm3\nP8sDBw40z5w5U+3fZ/junXfeMYcPH27edddd1R6vT+81uiP2uq4x73A4PGvMwze1zTgsLExLly5V\nWFiYJKmyslLh4eEBydmU1Tbj889v2bJFd911F2dELkJtc963b59iY2M1f/58TZgwQUVFRUpNTQ1U\n1CbL259lh8OhoqIinT17VqZpcpvXekpJSdGsWbMu+PegPr3X6Iq9pjXmzz/HGvMXr7YZG4ahuLhz\nS0MuWrRI5eXl6tevX0ByNmW1zbigoECzZ8/W5MmTKfWLVNucCwsLtWnTJt19992aP3++vvrqK61f\nvz5QUZus2mYsSffff79Gjx6t4cOHa+DAgdW+FnU3dOhQ2e32Cx6vT+81umK3Yo15VFfbjM9vT58+\nXV999ZXeeuutQERs8mqb8Zo1a1RYWKif//znmjdvnjIyMrRy5cpARW3SaptzbGyskpOTlZqaqpCQ\nEPXv3/+Co014V9uMjxw5ovfee09ZWVnKysrSyZMn9cknnwQqalCqT+81umJnjXnr1TZjSZo8ebKc\nTqdmz57tOSUP39Q24wkTJuijjz7SokWL9Itf/ELDhw/XiBEjAhW1Sattzh06dFBZWZnnYq9vv/1W\nV1xxRUByNmW1zfjs2bOy2WwKDQ2VzWZTXFwcZ1EbWH16z7K7u9XXkCFDtG7dOqWnp0s6t8Z8RkaG\nZ435Z599VhMnTvSsMd+mTZsAJ256aptx9+7dtXz5cvXu3Vv33HOPJOnee+/VTTfdFMjITY63P8c/\nxHuS9edtzlOnTtUTTzwh0zTVq1cvDRgwIMCJmx5vMx45cqTS09MVFhamlJQUjRw5MsCJm7bz/x5c\nTO+xVjwAAEGk0Z2KBwAA9UexAwAQRCh2AACCCMUOAEAQodgBAAgiFDsAAEGEYgeaqZdfflkrVqzw\nbD/77LMqKChokH1nZmbqvffea5B9AfANxQ40U4ZheBbDyM7OVmJiYoMt+HTTTTfp008/1alTpxpk\nfwDqjmIHmpHp06fr5ptv1vjx47Vnzx7P4++++65nWdv/+q//0syZMyVJc+fO1WOPPVbj/pYtW6Yn\nnnjCsz1r1izNmzdP0rmbWnDUDvgfxQ40E2vWrFFeXp5Wr16tOXPmeNZQP336tPLz83XZZZdJkqZM\nmaKPPvpIa9as0YcffqiXX365xn3eeuutWr9+vcrLy2WaplatWuV5gdC7d29lZWVZ/4MBqKbRrRUP\nwBobNmzQzTffLLvdrpYtW2rw4MEyTVMHDhyodgo+Li5OzzzzjB599FG9/fbbiomJqXGfkZGRuvHG\nG7VmzRq1b99eycnJSkhIkCQlJSVp//79Vv9YAP4NR+xAM2EYRrX7aIeEnHtdb7fbL7gP9N69e9W6\ndes63eZ09OjRWrVqlTIyMjRq1Khq+//h7YAB+Ad/64Bmol+/flq9erWcTqdKSkqUnZ0twzDUvn17\nHT161PN127Zt08qVK7V8+XJ99NFH2r59e6377d27t44dO6YNGzZUuwvgoUOHlJKSYtnPA+DHUexA\nMzFo0CD169dPt912m372s58pNTVVktSyZUslJydrz549crlcmjRpkiZNmqTExEQ9/fTTmjRpkior\nK/Xyyy/X+J75kCFD1LdvXzkcDs9jX3/9tQYPHuyXnw3Av3DbVgDKyspSTk6OnnnmmRq/Jjs7Wzab\n7YJ7mjudTj3wwAN6/vnn1bVrV8/j48aN06xZsxQXF2dZbgAX4ogdgAYNGqTjx4/XukCNy+XSdddd\nV+2xgoIC3XDDDerZs2e1Ul+zZo1uueUWSh0IAI7YAQAIIhyxAwAQRCh2AACCCMUOAEAQodgBAAgi\nFDsAAEHk/wMRXd/z/HcbkwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#minhash for Jaccard distance\n", "d = 1 - s\n", "d_1, d_2 = 0.4, 0.6\n", "\n", "plt.plot(d[dd_2], p(s, r, b)[d>d_2])\n", "plt.xlabel('d(x, y)')\n", "plt.ylabel('Probility of being a candidate')" ] }, { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "**Amplifing a Locality-Sensitive Family** \n", "Given a $(d_1, d_2, p_1, p_2)$-sensitive family $F = {f_1, ..., f_r}$, we can construct a new family $F'$:\n", "1. **AND**-construction: \n", " $f(x) = f(y)$ when **all** $f_i(x) = f_i(y)$ \n", " $$(d_1, d_2, p_1^r, p_2^r)$$\n", " \n", "2. **OR**-construction: \n", " $f(x) = f(y)$ when **any** $f_i(x) = f_i(y)$ \n", " $$(d_1, d_2, 1-(1-p_1)^r, 1-(1-p_2)^r)$$\n" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "ename": "NameError", "evalue": "name 'OR_cons' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mpos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'origin'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mAND_pos\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'AND-cons'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mOR_cons\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlabel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'OR-cons'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 13\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlegend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloc\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'upper right'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0mplt\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mxlabel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'd(x, y)'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNameError\u001b[0m: name 'OR_cons' is not defined" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAecAAAFVCAYAAADVDycqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl0nPWB7vnvW6uW0q6Srd2SLMm7jGwMNhhjbLOH4GBA\nLCZN0+k73bdvz6RDzyRzbhxyujM4Nzdnpk9C0umkw00TwtYkBAzBYGw2Y4N3W94l2ZIlWfu+VklV\n84ctxQYjb6p6a3k+5+gIVUn1PvxcVU+92+81/H6/HxEREQkZFrMDiIiIyPlUziIiIiFG5SwiIhJi\nVM4iIiIhRuUsIiISYlTOIiIiIeaSynnfvn2sXbv2C7dv3ryZNWvWUFFRwSuvvDLp4URERKKR7WK/\n8Mtf/pLXX3+d+Pj48273er2sX7+eV199lZiYGB566CFuueUW0tLSAhZWREQkGlx0zTk/P5+f/vSn\nfH6ukurqavLy8khISMBut7NgwQJ27NgRsKAiIiLR4qLlfOutt2K1Wr9we19fHwkJCeM/x8fH09vb\nO7npREREotBFN2t/mYSEBPr7+8d/7u/vJykpacK/eey338Pv913pIkOWnyuYAdU485dffAz/Obf4\nwTj7HR9+Y+y7Dz8+MEbPPs4l5vQ48Q3F4R+Owz8Uh28gEV9fEow6Lvj7FgNSE2Nwp8ThTo4lPTmW\nqWlx5E5JIHdKAkku5+X/f4uIyEVdcTkXFhZSW1tLd3c3sbGx7NixgyeeeGLCv/mPR79Pa6vWrieL\n3+9n1D+KZ9TL8OgwQ6PDDI0M43QZ1Le20ufpo9fbT4+nl86hLloH2ul0dOGn87zHSbKlkGKdSjKZ\nuLw5eIZs9A546en30Nk7xNHaTg6f7PjC8l2xdrLS4ynITKAoK4mi7CRSEqKjsN3uBD2XA0xjHHga\n4+BwuxMu/kufc8nlbBhnVtE2bNjAwMAADzzwAN/+9rd54okn8Pl8rFmzhoyMjMsOIFfOMAxshg2b\nxUacPXb8drc7gSzrhV9wXt8IHYMdNA20UtdzipNjX8OHgcMYGEzPKKDMPYcy92xSY1Lw+fx09Q3T\n3jNES+cgje39nG4boLGtn+Onujh2qgs4BUBqopOS3GTmFaYxpzANV6w9CCMhIhJZjGBflUqf0gLv\ncj8N+/w+mgdaOdR+lL2tlZzorh3fzD4nbQYr825menLB+Ae0cw17Rjlxuofqxm6qG8587x3wAmAY\nUJSdRFlRGotmTsGdHPuFvw9XWuMIPI1x4GmMg+NK1pxVzhHoal9w3cM97G87yKend3OipxaAaYl5\nrMxbRpl7Nhbjy48j9Pv9nGrpY391O/ur26lu7GbsGTYjL5kb52WyoDQDp/2LBxmGE72pBZ7GOPA0\nxsGhchZgcl9w1V0n2VT3AfvbDgKQGT+Fh2esoTAp/5L+vm/Qy57jrWw90HR28zfEOKwsnj2V267L\nIyNM16b1phZ4GuPA0xgHh8pZgMC84Jr6W3i39n22N+3EwOCmnMXcU3g7MbaYS36Mls4BPj7QxCeV\np+noGcZiGFw3K4M7r88n2+2a1LyBpje1wNMYB57GODhUzgIE9gVX1XWC3x15leaBFpKdSVSUrmZu\n+qzLeoxRn48dR1p4c1stDa1nTscrL3Gz+qZCstPjL/LXoUFvaoGnMQ48jXFwqJwFCPwLzusb4Z2T\nm9lYu4VR/yjLc27ka8V3T7gv+kJ8fj/7qtrY8EktJ073YLUYrFiQw1dvLCDWecVn+QWF3tQCT2Mc\neBrj4FA5CxC8F1xjXxO/Pvg8p/ubmZs+i8dnP4zTeuEJTSbi9/vZW9XGi+8dp7VriMR4Bw8sL2Lx\n7KkXPEI8FOhNLfA0xoGnMQ6OKylnXTJSrliWayrfWvC3zEgp5kDbIf6/3T+ne/jyX+iGYXBNsZt/\n/qvrWL20gKHhEX614TA/fH43bV2DAUguIhLaVM5yVWJtsfxt2V+yJPNa6nob+NHOn9DY13RFj2W3\nWfnKDQX88zeuo7zEzbH6br737Gd8eqh5klOLiIQ2lbNcNavFysMz1nBP4e10Dnfx/+7+OU39V16o\n6Umx/NfVc3jirpn4fPCL1w/y7xsOMTg8MompRURCl8pZJoVhGNw27RYemXE/AyODPLPv13QP91zV\n490wN5OnHr+WaVMT2FrZxPef3UFds/aPiUjkUznLpFqSdS13F9xGx1AnP9/3a4ZGhq7q8aakxvF/\nr13AHdfn0dI1yNO/3c3eqrZJSisiEppUzjLpbp92CzdkXcepvkZ+VflbRn2jV/V4NquF+2+ezn9d\nPQe/389PXt3Ppp2nJimtiEjoUTnLpDMMgwdL7mVO2kwOdxzj+SP/yWScsbegNIP/65FyEuIc/G7T\ncZ5/9xg+X1DPBBQRCQqVswSE1WLlL+c8Qn5iLp827WJT3QeT8rgFmYn898cWkO2O571d9fz09wfw\njlzdmrmISKhROUvAOK0O/mbe4yQ6EnijZiP1vY2T8rjpSbF855EFzJ6Wwt6qNn76+0oVtIhEFJWz\nBFSCw8WjM+9n1D/Kbw69iHfUOymPGxdj4+/XzGNuYRoHatpV0CISUVTOEnCz02awNHsxjf1NvHFi\n46Q9rt1m5e++Nme8oJ/5gwpaRCKDylmCYvX0u8iITWdz3Ucc66yetMcdK+g5hansr1ZBi0hkUDlL\nUDitDh6bVYFhGPzHoZcYHJm8ObPtNiv/7Wtzxwv6l28cwhfc67mIiEwqlbMETUFSHrfn30LncBev\nHHt9Uh97rKBLcpLYebSV/3x/8tbORUSCTeUsQXX7tBXkJWTzadMuqrpOTOpj221W/u6+eUxJjePt\nT+vYsqdhUh9fRCRYVM4SVFaLlftL7gXg91Ub8Pl9k/r4rlg737x/Hq5YO7995yj7q9sn9fFFRIJB\n5SxBV5iUz4KMMmp7TrGred+kP35GShx/v2YeNquFn/+xUhfLEJGwo3IWU3y16A5sFht/rP4Tnkk6\n9/lc07OT+Mbdsxj2jPIv/7mfnn7PpC9DRCRQVM5iirTYVJbn3EjncBdbTn0UkGUsnJHBfcsK6ewd\n5hevH9Q83CISNlTOYprbpi3HZY9nY+1mejyB2fR8x/X5lBWlcbi2k9e3Tu4BaCIigaJyFtPE2mK5\nq2AVw6MeNtS8E5BlWAyDJ+6eRXpSDG9sPUlljQ4QE5HQp3IWU92QdR1T4zL4pPEzGvpOB2QZrlg7\nf3PvHKxWg3974xAdPUMBWY6IyGRROYuprBYr906/Ez9+3j75XsCWU5CZSMWKYvoGvfz8j5WMjE7u\nKVwiIpNJ5Symm5M2k1xXFntaDtA6ELjNzsuvyWbRzAyqG3r4w4c1AVuOiMjVUjmL6QzDYGX+zfjx\ns+nUBwFdztdvn4E7OYa3P63jeH1XwJYlInI1VM4SEq5xzyUtJpXtp3cG7MhtgFinjSfumgXAv284\nzLBHV7ASkdCjcpaQYLVYWZF3EyO+Ed4/tTWgyyrJTea2RXm0dA3yyvtVAV2WiMiVUDlLyFicuRCX\nPZ4PG7YxNBLYI6pX31RAVno8m3c3cPBkR0CXJSJyuVTOEjIcVgc359zA4MggHzd+GtBl2W1W/uru\nmVgtBr9+8zADQyMBXZ6IyOVQOUtIuSlnCQ6rgy2nPmbEF9jCnDY1kbuXTKOzd5gXNh0L6LJERC6H\nyllCSrw9jhuyFtE13M2O5r0BX95di/PJn5rA1somDp7Q5m0RCQ0qZwk5K3JvwmJY2FT7Pn5/YC9W\nYbNaePyOGRgGPPfOUbwjOnpbRMyncpaQkxKTTHnGPJoGWqjqCvxkIXlTEli1MJeWzkHe3FYb8OWJ\niFyMyllC0o1Z1wGwtfGzoCzvqzcWkJLg5K3ttZxu7w/KMkVEvozKWULS9ORCMuLS2dN6gH7vQMCX\nF+u08fDKYkZG/fz2nWMB35wuIjIRlbOEJMMwWJK5iBHfCJ817Q7KMstL3Mw7e+3n7Yeag7JMEZEL\nUTlLyLo+cyFWw8onjZ8FZU3WMAweXVWCw2bhpfeO0z/kDfgyRUQuROUsISvB4WJe+iwa+5s42VMX\nlGWmJ8dyz40F9Ax4deUqETGNyllC2g1BPjAM4NZrc5mSGsf7exppaNPBYSISfCpnCWmlqdNJi0lh\nV/NeBgM83/YYm9XCg8un4/P7eXmzLowhIsGncpaQZjEsLM5chMfnZWcQZgwbUzY9jZn5KRyoaedA\nTXvQlisiAipnCQOLsxZiYLA1wBfDOJdhGFSsKMYw4KXNVYz6fEFbtoiIyllCXrIziTnpMzjV20Bd\nb33Qlpub4WLpvCwa2/r5YG9j0JYrIqJylrAwdmDY9tO7grrc1TcVEuOw8tpHJxjQqVUiEiQqZwkL\ns1JLibfFsadlPz5/8DYxJ8U7uGtxPn2DXt745GTQlisi0U3lLGHBarEyP2MuPZ7eoFwM41y3XptL\nelIMm3bW09o1GNRli0h0UjlL2Fg4pQyAnc37grpcu83K124qZNTn548fnwjqskUkOqmcJWxMTy4k\n0ZHA3pYDjPqCe93lRbOmkOOOZ1tlEw2tfUFdtohEnwnL2efzsW7dOioqKli7di11dedPofjuu+9y\n3333sWbNGl544YWABhWxGBbKM+bRPzLAkc7jQV62wdduKsIP/OEjrT2LSGBNWM6bNm3C6/Xy4osv\n8uSTT7J+/frz7n/66ad59tlneeGFF3j22Wfp7e0NaFiRBVPmA7AryJu24czEJEXZiew+1sqxus6g\nL19EoseE5bx7926WLl0KQFlZGZWVlefdb7fb6enpYXh4GL/fj2EYgUsqAhQk5pEak8K+1kq8o8E9\ntckwDNYsKwLgP946FNRli0h0sU10Z19fHy6Xa/xnq9WKz+fDYjnT6Y8//jj33XcfsbGx3Hrrref9\n7pdxuxOuMrJcikge5xunXcvrR96hfqSORVPnB3XZbncC7+5qYPfRFho7hygrcQd1+dEmkp/HoUJj\nHJomLGeXy0V//5+vynNuMTc2NvL888+zefNmYmNj+cd//Efefvttbr/99gkX2NqqTd+B5nYnRPQ4\nz0qYyeu8w+bj2yhwFgV9+XcvzmP30Rb+/fVK/vtjC7TFKEAi/XkcCjTGwXElH4Am3KxdXl7Ohx9+\nCMDevXspLS0dv294eBiLxYLD4cBisZCamqp9zhIUOa4sMuLSOdB2mKGR4aAvf9rURG6Yl8WJ0z3s\nOd4W9OWLSOSbsJxXrVqFw+GgoqKC9evX853vfIcNGzbw8ssvU1BQwOrVq6moqODhhx+mr6+P1atX\nByu3RDHDMFiQMR+vz0tlmzn7fh+5fQaGAX/8+AQ+v9+UDCISuQy/P7jvLNqEEnjRsKnqdH8z//zp\nj5mbPov/bd5fBH35bncCP/j37Ww/1Mx/+9pcrtG+50kXDc9js2mMg2PSN2uLhKrM+ClkuzI51H6U\nwZEhUzLcvWQaBvD61pME+TOuiEQ4lbOErTL3HEb9oxzuOGbK8rPS47l2Zga1zb3sq243JYOIRCaV\ns4SteemzANjfetC0DF9ZMg2AN7ae0NqziEwalbOErRxXFinOZCrbjwR9ru0x2W4XC0vdnDjdy4Ga\nDlMyiEjkUTlL2DIMg7npsxgcGaS627z5rr9yQwEAr2vtWUQmicpZwtr4pm2TTqkCyM1wUV7ipqax\nh0MnNee2iFw9lbOEteKUQmKsTva3HjJ1rfWeG6YB8EetPYvIJFA5S1izWWzMSiulfaiD0/3NpuXI\nm5LA/OnpVNV3c+xUl2k5RCQyqJwl7M1Lnw3A/jbzjtoGuGtxPgBvbq81NYeIhD+Vs4S92WmlWAyL\nqfudAYqykyjNTaaypoO6Zs26JCJXTuUsYS/OHsf05EJqe07RNdxtapY7z649v6W1ZxG5CipniQhj\nR21Xth02NcecglRyM1zsONJCS+eAqVlEJHypnCUizA2BU6rgzLnXd16fj98Pb392ytQsIhK+VM4S\nEdJjU8mKn8rRzipTrvF8roUz3LiTY/h4/2m6+8zNIiLhSeUsEWOeezYjvhGOmHQhjDFWi4Xbr8tn\nZNTHuzvrTc0iIuFJ5SwRY276TAAq24+YnARunDuVxHgHW/bUMzA0YnYcEQkzKmeJGHkJObjs8Rxq\nP2r6LF12m5VVC3MYHB5lyx6tPYvI5VE5S8SwGBZmpBbT7ekxdbawMcuvycHpsPLernpGRn1mxxGR\nMKJylogyK7UUgEMdR01OAnExNpaVZdHV5+HTQ+Z/WBCR8KFylogyI7UEgEPt5pczwMqFOVgMg42f\nnTJ9U7uIhA+Vs0SUJGcCua4sqrtOMDzqMTsO6UmxLJzhpr61j0O1upykiFwalbNEnJlppYz4Rzne\nWW12FABuW5QHwMbP6kxOIiLhQuUsEWfW2KbtENjvDFCQmUhJThKVNR00tPaZHUdEwoDKWSJOQVI+\nMVZnyOx3hnPWnndoSk8RuTiVs0Qcm8VGacp0WgfbaRloMzsOAGXF6UxJiWX7wSZN6SkiF6Vylog0\nM+3MKVWHTZ7Kc4zFMLj12lxGRv28t7vB7DgiEuJUzhKRxvY7Hw6R/c4AS+Zm4oq1s2V3PcPeUbPj\niEgIUzlLREqLTWVKnJujndV4faExt7XTbuXma7LoHxph28Ems+OISAhTOUvEmpVaimfUQ03XSbOj\njFt+TQ5Wi8F7O+s1KYmIfCmVs0SsmWmhdUoVQEqCk2tnZNDQ1q9JSUTkS6mcJWIVJxdis9hC5qCw\nMSsX5gLwrk6rEpEvoXKWiOWwOpieVEBD32m6h3vMjjOuMCuRouxE9le309wxYHYcEQlBKmeJaDNS\niwE42lllcpLzrTq79rxpl671LCJfpHKWiFaaMh0IvXIuL3GTkuDk4wOnGRgKjaPJRSR0qJwlouUk\nZBFni+VoR1VIHR1ts1q4pTybYc8oH+9vNDuOiIQYlbNENIthoSRlOp3DXbQOtpsd5zzL5mfjsFnY\ntKseny90PjiIiPlUzhLxQnXTtivWzuI5U2nrHmJfVWjMAS4ioUHlLBGvNDU0yxlg5YIcQAeGicj5\nVM4S8TJi00l2JnGsswqf32d2nPNku13MzE/hcG2nrvUsIuNUzhLxDMOgNGU6/d4BGvpOmx3nC1ac\nXXvW1apEZIzKWaJCqO53Bpg/PZ20RCefVJ5mYMhrdhwRCQEqZ4kK4/udO0KvnC0Wg+XlOXi8Pj4+\noKtViYjKWaJEsjOJqXEZVHXVMBIil5A8101lWdhtFjbvrscXQudji4g5VM4SNUpTp+PxeTnZE3oX\nnHDF2rlu1hRaOgeprOkwO46ImEzlLFFjfL9zx3GTk1zYivKzB4bptCqRqKdylqhRnFyEgRGSB4UB\n5E9NYHpOEgdqdLUqkWincpaoEWePJS8hhxM9dQyNDJsd54JWjp9WpbVnkWimcpaoUpo6HZ/fR1VX\njdlRLqi8xE2Sy8HWA6cZ8oTegWsiEhwqZ4kqY/udj3VWm5zkwmxWC8vnZzM4PMq2g81mxxERk6ic\nJaoUJuVjM6wc7wrNcga4aX4WVovB5t31IXWZSxEJHpWzRBWH1UF+Yh6nehsZ8A6aHeeCkl1OFpS6\naWjt59ipLrPjiIgJVM4SdUpSCvHjp7r7hNlRvtQtZ0+r2qz5tkWikspZok5JShEQuvudAYpzkshx\nx7P7WCtdfaF5ZLmIBM6E5ezz+Vi3bh0VFRWsXbuWurq68+7fv38/jzzyCA8//DDf/OY38Xg8AQ0r\nMhmmJY7tdw7NI7bhzJW0binPYdTn54O9jWbHEZEgm7CcN23ahNfr5cUXX+TJJ59k/fr14/f5/X7W\nrVvH+vXr+d3vfsfixYupr9e5mRL6HFY705LyqO9tZMAbupN9XD97CrFOK+/vbWBkNLSuQy0igTVh\nOe/evZulS5cCUFZWRmVl5fh9J06cIDk5mWeffZa1a9fS09NDYWFhYNOKTJKS5CL8+KnqCt39zjEO\nGzfMyaS7z8Oe421mxxGRIJqwnPv6+nC5XOM/W61WfL4zn+A7OzvZs2cPjz76KM8++yzbtm1j+/bt\ngU0rMkmKz+53DuVN2wDLy7MB2Kz5tkWiim2iO10uF/39/eM/+3w+LJYzfZ6cnExeXt742vLSpUup\nrKzk+uuvn3CBbnfC1WaWS6BxnlhS6mzs+2zU9J644rEKxhi73QmUFaez73gbAyN+8jMTA77MUKLn\nceBpjEPThOVcXl7Oli1buOOOO9i7dy+lpaXj9+Xm5jIwMEBdXR15eXns2rWLNWvWXHSBra29V59a\nJuR2J2icL8G0xDyquk5wsrGZeHvcZf1tMMf4xjmZ7DvexqvvHWPtbaUX/4MIoedx4GmMg+NKPgBN\nuFl71apVOBwOKioqWL9+Pd/5znfYsGEDL7/8Mg6Hgx/84Ad861vfYs2aNWRmZrJs2bIrDi8SbMUp\nob/fGWB+cRqpiU4+OdjE4LDm2xaJBhOuORuGwfe///3zbisoKBj/7+uvv55XXnklMMlEAqwkuZC3\ngONd1ZS5Z5sd50tZLRaWzc/mDx/WsO1g0/gEJSISuTQJiUStaYl52Cy2kJ6MZMxNZWPzbTdovm2R\nKKBylqhlt9opTMynsa+J/hA+3xkgKd7BwhkZNLZpvm2RaKBylqhWfHae7VC9vvO5ll9z9rQqzbct\nEvFUzhLVipNDf57tMZpvWyR6qJwlqk1LysNusYX8ZCRw/nzbH2q+bZGIpnKWqGa32ChImkZD32n6\nvP0X/wOTab5tkeigcpaoV5x85vTA6hA/3xnOzLe9ZE4mXX0e9lVpvm2RSKVylqhXnHxmCtpw2LQN\nOjBMJBqonCXqjZ3vXNUZHuWclR7PjLxkDtd2cro99DfFi8jlUzlL1LNb7UxLzKW+7zQD3kGz41yS\nsVnCtmjtWSQiqZxFOLNp24+f6u7Q3+8MML84nWSXg62VpxnyaL5tkUijchYBpofZfmeb9cx824PD\no2w/1Gx2HBGZZCpnEaAgKR+LYaGqMzzWnOGc+bZ3ab5tkUijchYBnFYH+Qm5nOprYGhkyOw4lyQl\nwck1JW7qW/uoaug2O46ITCKVs8hZxSmF+Pw+qrtrzY5yyW7RaVUiEUnlLHLW2H7ncLgIxpjSvGSy\n0uPZeaSF7n6P2XFEZJKonEXOKjq73/l4mJzvDGfm215+TfaZ+bb3ab5tkUihchY5K8YWQ64rm9re\nUwyPhs9a6JI5U3E6rHywt4FRn+bbFokEKmeRc0xPKcDn93EijPY7xzptLJk9lY6eYfZVtZsdR0Qm\ngcpZ5BzhNs/2mOXlYweG1ZucREQmg8pZ5BxFSQUYGGF1UBhAjttFSW4yh05qvm2RSKByFjlHnD2W\nHFcmJ3tO4R31mh3nstxydu1Z822LhD+Vs8jnTE8pZMQ3wsmeOrOjXJbyEjdJmm9bJCKonEU+J1z3\nO9usFm4+O9/2toOab1sknKmcRT7nzxfBCJ95tscsm392vu3d9ZpvWySMqZxFPifeHkdW/FROdNfi\n9YXX5uFkl5PyEjcNrf0cO9VldhwRuUIqZ5ELKE4pxOvzUttzyuwol23FghxA822LhDOVs8gFFCcX\nAeE1z/aY4pwkctzx7D7WSmfvsNlxROQKqJxFLmB6cgFAWM2zPcYwDG4pz2HU5+eDvVp7FglHKmeR\nC0hwuJgaP4WanlpGfaNmx7ls18+eQqzTxgd7GxkZ1XzbIuFG5SzyJYqTC/GMeqjrDb8pMWMcNm6Y\nO5Xufg+7jraaHUdELpPKWeRLFI9t2g7D/c4AK8rPHBj2nubbFgk7KmeRLzH97EFh4VrOU1LjmFuY\nRlV9N7VNvWbHEZHLoHIW+RJJzgQy4tKp6ToZlvudAVYsODPf9nu7tPYsEk5UziITKE4uYmh0mPq+\nRrOjXJE5hWlkpMSy/VAzvQMes+OIyCVSOYtMIFzn2R5jOXta1ciojw/3hecHDJFopHIWmUBxytly\nDsPzncfcODcTp93Klj0NjPp0WpVIOFA5i0wg2ZlEemwa1d0n8PnDs9jiYmwsmTOVjp5h9h5vMzuO\niFwClbPIRRQnFzI4MkRD32mzo1yxW87Ot60Dw0TCg8pZ5CLCfb8zQHZ6PDPzUzhS10V9a5/ZcUTk\nIlTOIhcxdn3nqjDe7wywUmvPImFD5SxyEWmxKaTFpFDVFb77nQHKpqeTnhTDtsom+ga9ZscRkQmo\nnEUuwfTkQvpHBjjd32x2lCtmsRisWJCDZ0SnVYmEOpWzyCUoSTkzleexzmqTk1ydpfOycDqsvLer\nXlerEglhKmeRS1A8Ns92mJdzXIyNG+dk0tk7zO5julqVSKhSOYtcgjP7nVM53lUT1vudAVYuPHNg\n2KadOjBMJFSpnEUuUUlKEQMjg2F9vjOcuVrVvKI0qhq6OXG6x+w4InIBKmeRSzR+vnOYb9oGWLUw\nF4B3d54yOYmIXIjKWeQSjR8U1hX+5TxrWgpZ6fHsONxCZ++w2XFE5HNUziKXKCUmmfTYtDPnO4f5\nBSQMw2DlwhxGfX627GkwO46IfI7KWeQylCQXMTgyxMmu8N8cvHj2VOJjbLy/pwGPd9TsOCJyDpWz\nyGUY27Rd2XLM5CRXz2m3smx+Nn2DXrYfCt/JVUQikcpZ5DKMXd/5YASUM8CKBTlYLQYbP6vD5/eb\nHUdEzpqwnH0+H+vWraOiooK1a9dSV1d3wd/77ne/y49//OOABBQJJcnOJDLi0jnSWsWoL/w3Back\nOFk0cwqn2weorGk3O46InDVhOW/atAmv18uLL77Ik08+yfr167/wOy+++CLHjx/HMIyAhRQJJcVn\n9zuf6ouMA6luW3TmtKqNn4X/fnSRSDFhOe/evZulS5cCUFZWRmVl5Rfu379/Pw8++CB+bRKTKBEp\n82yPyZuSwMz8FA7XdlLX3Gt2HBHhIuXc19eHy+Ua/9lqtY6fQtLS0sIzzzzDunXrVMwSVf48z3Z4\nX9/5XGNrz+/s0NqzSCiwTXSny+Wiv79//Gefz4fFcqbPN27cSGdnJ9/4xjdoa2tjaGiIoqIi7r33\n3gkX6HYnTEJsuRiNc+C4SSA7YSrVPSdJSYvDZrGaHemqLU9z8Z8f1PDZ4Wb++mvzSEuKNTsSoOdx\nMGiMQ9M6vBdzAAAdIUlEQVSE5VxeXs6WLVu444472Lt3L6WlpeP3rV27lrVr1wLwhz/8gZqamosW\nM0BrqzabBZrbnaBxDrDZGSW8U/0hu2oOU5iUb3acSbGiPJvfvH2Ul985ypqbi8yOo+dxEGiMg+NK\nPgBNuFl71apVOBwOKioqWL9+Pd/5znfYsGEDL7/88hd+VweESTSZPaUEiJz9znBmUpKEODvv72lg\nyDNidhyRqDbhmrNhGHz/+98/77aCgoIv/N7q1asnN5VIiJvtHivnKm6fdovJaSaHw25l+TXZvL71\nJFsPNLFiQY7ZkUSiliYhEbkCiTEJZLsyqe4+iWfUa3acSXNLeQ42q4WNn9UxGubzh4uEM5WzyBWa\nkVLMiG+Emu6TZkeZNInxDm6cl0lb9xA7j7SaHUckaqmcRa5QaWoxAEc6jpucZHLdtigXw4A/ba/V\naZIiJlE5i1yh6ckFWA0rRzurzI4yqaakxLGgNIO6lj4Onew0O45IVFI5i1whp9VBQVIep3ob6PcO\nmB1nUt15fR4Ab22vNTmJSHRSOYtchRkpxfjxR9QpVQDTpiaOT+l5sqnH7DgiUUflLHIVxvc7d0bW\nfmeAO68/M7nKW9svfDU6EQkclbPIVchPyCHG6uRYR2TtdwaYNS2FvCkudh1tobkzsjbbi4Q6lbPI\nVbBarBSnFNIy2Eb7YGQdPGUYBndcl4/fr8tJigSbylnkKpWmnNm0HWlHbQMsnOEmPSmGj/efprvf\nY3Yckaihcha5SjNSx8o58vY7Wy0W7rguj5FRH+98pn3PIsGicha5SlPjMkhyJHC0oyoiJ+24cV4m\nSS4Hm/c00DcYOVOVioQylbPIVTIMg9LUYnq9fTT2N5kdZ9LZbVZuX5THsGeUTTu171kkGFTOIpOg\nNGU6AEcjbCrPMTfPz8YVa2fTznoGh3U5SZFAUzmLTIKxcj4SgQeFATgdVm69NpeB4RE27643O45I\nxFM5i0yClJhkpsRlcLyrhhFfZK5Z3lKeQ6zTxsbPTjHsGTU7jkhEUzmLTJIZqcV4Rj3UdEfmfNRx\nMTZWLMihb9DLB/sazY4jEtFUziKTZFZqCQCH2o+anCRwVi3MwWm38vantXhHfGbHEYlYKmeRSVKS\nUoTNYuNQR+SWc0Kcg+XXZNPV5+HjA6fNjiMSsVTOIpPEYXVQnFxIQ99puoa7zY4TMLctysVus/DW\ntpOMjGrtWSQQVM4ik2hWWikAh9qPmZwkcJJcTm6en017zzAf7dfas0ggqJxFJtGs1LPlHMGbtgHu\nvD4Ph83Chk9O4h3Rkdsik03lLDKJpsS5SYtJ4UjHcUZ9kVtaSS4ny8uz6ewd5sN9WnsWmWwqZ5FJ\nZBgGM9NKGRwZ5GRPZE91ecd1+TjsFt7cdhKPN3I/iIiYQeUsMslmj23abj9icpLASox3sGJBDl19\nHj7Yq/OeRSaTyllkkpWkFGE1rBG/3xng9kV5OB1W3tpey7DWnkUmjcpZZJLF2GIoSi6grreBHk+v\n2XECKiHOwcoFOXT3e3h/T4PZcUQihspZJADGZgs7HMGnVI25bVEeMWNrz5pzW2RSqJxFAmD8fOco\n2LTtirVz67W59A54eUfXexaZFCpnkQDIip9KsjOJwx3H8Pkjfxat2xbl4Yq18/antfQNes2OIxL2\nVM4iAWAYBrNSS+j3DlDbE/nXP4512rh7cT6Dw6O8ue2k2XFEwp7KWSRAZqXNACL/lKoxy8uzSUt0\n8t6uBtq7h8yOIxLWVM4iATIjdTpWw8qB9sNmRwkKu83KV28sZGTUxx+3njA7jkhYUzmLBEisLZaS\nlCJO9TbQMdRpdpygWDJnKtnp8Ww9cJqGtn6z44iELZWzSADNS58FwP62QyYnCQ6LxeBrywrx++H3\nH1SbHUckbKmcRQJo7tlyPtAaHeUMMH96OtNzkthzvI2qhsi9rrVIIKmcRQIoJSaZvIRsjnVVM+Ad\nNDtOUBiGwZplRQC8sqUKv99vciKR8KNyFgmweelz8Pl9HIySo7YBSnKTuaY4neP13ew62mp2HJGw\no3IWCbB57rH9zgdNThJcDyyfjtVi8PKWKrwjmtZT5HKonEUCLCt+KmkxqRxqP4rXN2J2nKCZkhrH\nigU5tHUPsWln5E/EIjKZVM4iAWYYBvPcsxgaHeZ4Z3QdwfyVG6bhirWzYdtJevo9ZscRCRsqZ5Eg\nKEufDcC+KNu0HR9j56s3FjA4PMprH2tiEpFLpXIWCYLCpGnE2+I40HooKi6Eca5l87PITIvjg70N\n1Lf2mR1HJCyonEWCwGqxMid9Jt2eHk71NpgdJ6hsVgsPLJ+O3w8vbdapVSKXQuUsEiTjs4W1Rtem\nbYB5RWnMnpbCwRMd7K1qMzuOSMhTOYsEycy0UuwWW9Ttd4YzB8VVrCzBajF4YdNxhr06tUpkIipn\nkSBxWh2UphRzur+Z5oHom5gjOz2eVdfm0tY9xFvbas2OIxLSVM4iQVSeMQ+AXc17TU5ijntumEZK\ngpM/fVpLc+eA2XFEQpbKWSSI5rlnY7fY2Nm8LyoPjIpx2KhYUczIqJ/n3z0WlWMgcilUziJBFGuL\nYXbaTJoHWmjoO212HFMsLHUze1oKlTUd7D6mg8NELkTlLBJkC6fMB2BnlG7aNgyDh1edPTjsvWMM\ne3RwmMjnqZxFgmx22gxirE52tUTnpm2AzLR4br8uj46eYV7/RDOHiXyeylkkyBxWO/Pcs+kY6uRE\nT53ZcUxz9+JppCXGsPHTU9Q195odRySkqJxFTLAgowyI3qO2AZwOK1+/oxSf38+zbx1h1Bdd05qK\nTGTCcvb5fKxbt46KigrWrl1LXd35n/I3bNjAAw88wEMPPcT3vve9qN1EJ3K5ZqaWEG+LY3fL/qib\na/tccwrSuGHOVGqbe3nns1NmxxEJGROW86ZNm/B6vbz44os8+eSTrF+/fvy+oaEh/uVf/oXnnnuO\nF154gb6+PrZs2RLwwCKRwGqxMj9jLj2eXo531pgdx1QPrigmMc7Oax+foLlD5z6LwEXKeffu3Sxd\nuhSAsrIyKisrx+9zOp289NJLOJ1OAEZGRoiJiQlgVJHIEu1HbY9xxdp5eFUJ3hEf/+tPR/BpC5wI\ntonu7Ovrw+Vyjf9stVrx+XxYLBYMwyA1NRWA5557jsHBQZYsWXLRBbrdCVcZWS6FxjnwrnaM09Lm\n8R+Hk9jXXsnfpa7FZp3w5RjR7kx3saeqnU8PNrG7uoM7Fk8D9DwOBo1xaJrw3cDlctHf3z/+81gx\nn/vzj370I2pra/nJT35ySQtsbdVRmYHmdidonANsssZ4vnsuW059zIfHdjH37FWrotUDNxexv6qV\nX79eSWFGPKVFbj2PA0zvFcFxJR+AJtysXV5ezocffgjA3r17KS0tPe/+devW4fF4eOaZZ8Y3b4vI\npVuQcWbT9mdNu01OYr6UBCcP3lLMkGeUf3/zMD6fNm9L9JpwzXnVqlVs3bqViooKAJ5++mk2bNjA\nwMAAc+bM4dVXX2XhwoU89thjAHz9619n5cqVgU8tEiGmJeYyNS6D/a0H6fP043LEmx3JVEvnZbLn\nWCv7qtt54+MalszMMDuSiCkMf5DPf9ImlMDTpqrAm8wx3lz3Ia9WbeBr0+9mRd5Nk/KY4ay738N3\nf/Upw95R1n19Idlu18X/SK6I3iuCY9I3a4tI4C2augCbYWVr42eaKwBIinfwF3fMwDvi45dvHGJk\nNHrPA5fopXIWMZnLEU+Zew7NAy1Ud580O05IKC9xs2pRHnUtffzxY829LdFH5SwSAm7Iug6ATxo/\nMzlJ6Pirr84hPSmGt7bXcuxUl9lxRIJK5SwSAopTCkmPTWN3y34GvINmxwkJcTF2vvGVM6eX/fKN\nQ/QNek1OJBI8KmeREGAxLNyQuQivz8vO5j1mxwkZxTnJfGXJNNp7hvj1m4e1T16ihspZJERcl7kQ\ni2HRgWGfc88NBczMT2FvVRsbdXEMiRIqZ5EQkeRMYG7aTOr7GqnrrTc7TsiwWAz++p7ZJMU7ePWD\naqoaus2OJBJwKmeRELIkaxEAW3Vg2HmS4h38l3tm4/P7+dc/Vmr/s0Q8lbNICJmVVkqKM5mdzXsY\nGhkyO05ImZGfwr1LC+noGeZXGw7p6lUS0VTOIiHEYli4Mfs6hkc9Oq3qAu5anM+cglT2V7fzxtaT\nZscRCRiVs0iIuTH7ehwWO5tPfcyob9TsOCHFYhh84yuzSE+K4Y8fn2DnkRazI4kEhMpZJMS47PEs\nzlpE53AXu1r2mR0n5CTEOfj7++bhtFv51ZuHqGvW3NASeVTOIiFoRe5SLIaFTXUf6LSqC8jJcPGN\nr8zC4/Xxk1f309PvMTuSyKRSOYuEoLTYVK5xz6Wh7zSHO46ZHScklZe4Wb20gPaeYZ75wwFdIEMi\nispZJEStzF8GwKa6D0xOErruXjKNhTMyOF7fzX9sPKqtDBIxVM4iISovIYfSlOkc7azSpCRfwjAM\nnrhzJvlTEvh4/2ldwUoihspZJIStzDu79lyrtecv43RY+T/un0d6Ugyvbz3J+3sazI4kctVUziIh\nbGZqCdmuTPa0HqBtsMPsOCEryeXkWw/OxxVr57l3jrLnWKvZkUSuispZJIQZhsHKvGX4/D7te76I\nKalxfPOBMuw2C//6+kGq6jUHt4QvlbNIiFuQUYY7No2tjZ/SMqA1wokUZCbyt/fOZXTUz7/85z7q\nW/vMjiRyRVTOIiHOarFyT9Ed+Pw+Xq/ZaHackDevKI3H75xB/9AIP3phDw1t/WZHErlsKmeRMHCN\ney75ibnsadnPie46s+OEvBvmZvLY7aX0Dnj50Qt7ON2ugpbwonIWCQOGYbC66E4AXqt+U+fzXoKb\n52fz6K0l9PR7+B8v7KGpY8DsSCKXTOUsEiaKU4qYkzaTqq4TVLYfNjtOWLilPIeHVhbT3efhf/xu\nN82dKmgJDypnkTDy1aI7MDB4rfpPumLVJVq1MJeKW6bT1edh/fO7dZCYhAWVs0gYyXJN5frMhTT1\nN/Np0y6z44SNWxflUbHizBr0+t/u1mlWEvJUziJh5q6CVdgtNjbUvMPQyLDZccLGrdfm8sRdMxny\njPI/X9zD/up2syOJfCmVs0iYSYlJZkXeMro9Pbxe87bZccLKDXMz+bv75uIHfvLqfrYfbDI7ksgF\nqZxFwtDt+bcwJS6DD+q3UtWliz1cjvnT0/nWg/Nx2K382xuHeHPbSR39LiFH5SwShuxWO4/OvB8D\ng98efhnPqMfsSGGlJDeZbz9STmqik1c/qOGXGw7hHdEBdhI6VM4iYaowKZ/luTfSOtjOhpp3zI4T\ndnIzXHz3sYUUZSWy/WAzP/zdHrr6tA9fQoPKWSSMfaXwNtyxaWw+9REnumvNjhN2klxO/s+Hr2HJ\nnKnUNPbwT7/ZycmmHrNjiaicRcKZw+rgkRn348fPc4dfwTvqNTtS2LHbrDxx10zuv7mIrt5h/p/n\ndvHernrthxZTqZxFwlxxSiHLcpbQPNCio7evkGEY3HF9Pv/7/WXEOGw8/+4xfvaHSgaG9GFHzKFy\nFokA9xTeQUZcOptPfcSOpj1mxwlb84rS+P5fLqIkN5ldx1p56tkd1DRqM7cEn8pZJALE2Jz8l7l/\nQYw1huePvEJdT73ZkcJWSoKTf3xoPl9ZMo327iGe/u0uXv/4BCOjPrOjSRRROYtEiKnxGTw++yFG\nfKP84sBv6PH0mh0pbFktFlbfVMi3KuaTGO/gtY9P8M+/2Ulds8ZUgkPlLBJB5qTP5J7C2+ka7uaX\nB55jxDdidqSwNmtaKv/0xHUsnZdJXUsf//Sbnbz2UY3WoiXgVM4iEWZV/s0syCijpvskLx97TUcd\nX6W4GBuP3zmTf3igjCSXg9e3nuSpZ3dw6GSH2dEkgqmcRSKMYRg8OvN+cl1ZbG38jNeq31JBT4I5\nhWn80xPXsfyabE639fM/X9zLM384QFv3oNnRJAJZn3rqqaeCucCBAU0zGGjx8U6Nc4CF+hhbLVbm\nps+isv0IB9oOMTgyxMzUEgzDMDvaJQvFMbbbLJRNT2f+9HQaWvs5eKKD9/c24vP5KZiaiM0aXus7\noTjGkSg+3nnZf6NyjkB6wQVeOIxxjM3JNRlzOdR+lAPth+nz9jMrrTRsCjqUxzjZ5eTGeZlMSYnj\nWH0X+6ra+WhfI1aLhbwpLqyW8CjpUB7jSKJyFkAvuGAIlzF2Wp1ckzGPwx3HqGw/TI+nh9lpM8Ki\noEN9jA3DIDfDxbKyLGxWg6P13eytamPrgSZiHFZy3C4sltAe51Af40ihchZAL7hgCKcxdlodXJMx\nj6OdVVS2H6Gpv5mZaaXYLTazo00oXMbYbrMwIz+FZWVZAByr62L3sTY+qWzCALLd8SG7uTtcxjjc\nXUk5G/4gHynS2qrzBAPN7U7QOAdYOI7xgHeQf93/v6juPkFGXDp/NWct2a5Ms2N9qXAcY4CuvmHe\n/KSWj/Y34hnxEee0sbw8mxULckh2Xf6bdCCF6xiHG7c74bL/RuUcgfSCC7xwHeNR3yh/rPkT79V9\niN1i58HS1SzOXGh2rAsK1zEe0zvgYcvuBt7bXU/vgBerxWBBqZtlZVmU5qdgCYFdC+E+xuFC5SyA\nXnDBEO5jvK/1IM8dfonBkSGuz1zIfdO/Qpw91uxY5wn3MR7j8Y7yycEm3t1xitPtAwBkJMdy0/ws\nbpgzlSQT16YjZYxDncpZAL3ggiESxrhtsJ1fHXiOU32NuOzx3FN4O4uzrsVihMb+0UgY43P5/X6q\nGrr5YG8jO4604B3xYRgwIy+F62ZNobzEjSvWHtRMkTbGoUrlLIBecMEQKWPs9Y2w5dRH/Onke3hG\nPeQmZHN/8VcpSp5mdrSIGeML6R/ysv1gM9sPNVHdcOaqV1aLweyCVK4pTmdeUTopCYFfo47kMQ4l\nKmcB9IILhkgb467hbl6r+hM7mncDMC99Nrfk3sj05ELTTruKtDH+Mm1dg+w40sKnh5upa+4bvz1/\nSgJl09OYU5jGtKkJATniO1rG2GwqZwH0gguGSB3jmu6TvHp8Ayd76gDIdWWxPHcpC6aUYQvyqVeR\nOsYTaekcYF9VO/uq2zha18Wo78zbs9NhpTgniZl5KczITyE3wzUpZR2NY2wGlbMAesEFQySPsd/v\np6a7li2nPmJvayV+/CTYXZRlzOEa91yKkwuxWqwBzxHJY3wpBodHOHiig0O1nRyp7aSpY2D8PrvN\nQv6UBAqzEinMSmRaZiLpSTGXfQR4tI9xsKicBdALLhiiZYzbBzv5oH4rnzbtos/bD0C8LY657lnM\nTptBUdI0kpyJAVl2tIzxpersHeboqU6O1XVR3dhDfWsf5757j81KlpvhIifDRWZqHFPT4kiKd3zp\nrgmNcXBMejn7fD6eeuopjh07ht1u5wc/+AF5eXnj92/evJmf/exn2Gw27rvvPu6///6LLlBPhMDT\nCy7wom2MR32jVHefYE9LJftaD9Dt+fP/e3psGkVJ0yhIyifbNZXM+CnE2q7+tKxoG+PLNewZ5WRT\nDzWne6hr7uNUSx9N7QP4PveWHuu0MiUljoyUWNzJsaQlxZCeFENaYgwlBen09uiqWoE26eX8zjvv\nsGXLFp5++mn27dvHL37xC372s58B4PV6ueuuu3j11VeJiYnhoYce4he/+AVpaWkTLlAvtsDTm1rg\nRfMY+/w+antOcbyzhuruE1R31zI4cv4bfLIzicz4Kbhj00iJSSbVmUxKTArJziQSHPE4rI6LLiea\nx/hKeUdGaWwboL61j6aOAZraB2jqGKC5c4CR0Qu/1cc6bSS7HKQkOEmKd5AQ5yAhzj7+PT7GTnyM\njbiz3x32wO/SiDRXUs4THuGxe/duli5dCkBZWRmVlZXj91VXV5OXl0dCwpmFLliwgB07dnD77bdf\ndggRCR8Ww0JBUj4FSfnAcnx+H039LZzsqeN0f/P41+GOYxz+ksdwWOy4HC5c9njibLHE2JzEWGPO\nfndit9pJ6UjAMziK3WLHbrFhtdiwGVasFis2w4rFsGIxDCyGZfy/Dc78bIz/twEYGDB+25ixTb1n\n7z17G+f9zpe5lN+5lF8JhLQ0C2lpiZTx590Nfp+f7n4P7T1DdPQM0d4zTEfPEAOeUdo6B+jq6+N0\nd9clPb7NYuB02Ih1WHE6rDjtZ747bFYcNgsOx5nvdqsFm82C3WbBZjGwWS1YrWe+26wWrIaBxWJg\ntZ75dxr7bljAcvbfzmIxwODM7ed8xzCwAJz99xrbav/nf1P+/P3zm/SN875NaDLOVLBZjMkv576+\nPlwu1/jPVqsVn8+HxWKhr69vvJgB4uPj6e3Vp1yRaGMxLGS5ppLlmnre7YMjg3QMddEx1EnnUBcd\nQ110DXfT5+0/8+Xp53R/E17fiEnJo1zM2a+zHX45OyJ8QP/Zrwl/yXP2K8q9XPDzy/6bCcvZ5XLR\n3//n4R8rZoCEhITz7uvv7ycpKemiC7ySTxBy+TTOgacxvpgE8sgwO4RIWJrwRLny8nI+/PBDAPbu\n3Utpaen4fYWFhdTW1tLd3Y3H42HHjh3Mnz8/sGlFRESiwIQHhPn9fp566imOHj0KwNNPP83BgwcZ\nGBjggQceYMuWLTzzzDP4fD7WrFnDww8/HLTgIiIikSro5zmLiIjIxELj8jMiIiIyTuUsIiISYlTO\nIiIiIUblLCIiEmICUs4+n49169ZRUVHB2rVrqaurO+/+zZs3s2bNGioqKnjllVcCESHiXWyMN2zY\nwAMPPMBDDz3E9773PXTc3+W72BiP+e53v8uPf/zjIKeLHBcb5/379/PII4/w8MMP881vfhOPR7Na\nXK6LjfG7777Lfffdx5o1a3jhhRdMShkZ9u3bx9q1a79w+2X3nj8ANm7c6P/2t7/t9/v9/r179/r/\n5m/+Zvw+j8fjX7Vqlb+np8fv8Xj89913n7+trS0QMSLaRGM8ODjoX7lypX9oaMjv9/v9//AP/+B/\n7733TMkZziYa4zEvvPCC/8EHH/T/+Mc/Dna8iDHROPt8Pv9Xv/pVf11dnd/v9/tfeuklf3V1tSk5\nw9nFnsvLly/3d3d3n/f+LJfv3/7t3/x33323/8EHHzzv9ivpvYCsOV/qnNx2u318Tm65PBONsdPp\n5KWXXsLpdAIwMjJCTEyMKTnD2URjPHb//v37efDBB7Vl4ipMNM4nTpwgOTmZZ599lrVr19LT00Nh\nYaFZUcPWxZ7Ldrudnp4ehoeH8fv9kzKndDTKz8/npz/96RfeD66k9wJSzl82J/fYfZqT++pNNMaG\nYZCamgrAc889x+DgIEuWLDElZzibaIxbWlp45plnWLdunYr5Kk00zp2dnezZs4dHH32UZ599lm3b\ntrF9+3azooaticYY4PHHH+e+++7j7rvvZvny5ef9rly6W2+9Fav1i1ftupLeC0g5B2JObjnfRGM8\n9vMPf/hDtm3bxk9+8hMzIoa9icZ448aNdHZ28o1vfINf/vKXbNiwgddee82sqGFtonFOTk4mLy+P\nwsJCbDYbS5cu/cJan1zcRGPc2NjI888/z+bNm9m8eTPt7e28/fbbZkWNSFfSewEpZ83JHXgTjTHA\nunXr8Hg8PPPMM+Obt+XyTDTGa9eu5fe//z3PPfccf/3Xf83dd9/Nvffea1bUsDbROOfm5jIwMDB+\nANOuXbsoLi42JWc4m2iMh4eHsVgsOBwOLBYLqamp2po5ya6k9ya8KtWVWrVqFVu3bqWiogI4Myf3\nhg0bxufk/va3v80TTzwxPid3RoauXHO5JhrjOXPm8Oqrr7Jw4UIee+wxAL7+9a+zcuVKMyOHnYs9\nj8+lfXRX7mLj/IMf/IBvfetb+P1+ysvLWbZsmcmJw8/Fxnj16tVUVFTgdDrJz89n9erVJicOb2Pv\nB1fTe5pbW0REJMRoEhIREZEQo3IWEREJMSpnERGREKNyFhERCTEqZxERkRCjchYREQkxKmcREZEQ\n8/8DDYc6fwsbIJQAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#AND-construciton and OR-constuction\n", "\n", "dis = d\n", "pos = p(s, r, b)\n", "\n", "r = 4\n", "AND_pos = (pos**r)\n", "OR_pos = 1 - (1 - pos)**r\n", "\n", "plt.plot(dis, pos, label='origin')\n", "plt.plot(dis, AND_pos, label='AND-cons')\n", "plt.plot(dis, OR_cons, label='OR-cons')\n", "plt.legend(loc='upper right')\n", "plt.xlabel('d(x, y)')\n", "plt.ylabel('P[f(x) = f(y)]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As shown in the figure above, \n", "+ AND-construction: low probability close to 0, namely, false negative$\\uparrow$, false postive$\\downarrow$. \n", "\n", "+ OR-construction: high probabiity close to 1, namely, false negative$\\downarrow$, false postive$\\uparrow$.\n", "\n", "用 AND- 和 OR- 可以组合成需要的 $(d_1, d_2, p_1, p_2)$。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**LSH families for Distance measures** \n", "1. For **Jaccard distance**, \n", " the family of **minhash functinos** is a $(d_1, d_2, 1-d_1, 1-d_2)$-sensitive family for any $d_1$ and $d_2$ where $0 \\leq d_1 < d_2 \\leq 1$.\n", "\n", "2. For **Hamming Distance**, \n", " given d-dim vector $x$ adn $y$, their hamming distance is $h(x, y)$. \n", " define: $f_i(x)$ to be the $i_{th}$ bit of $x$, \n", " then for a randomly chosen $i$, $P[f_i(x) = f_i(y)] = 1 - h(x,y)/d$. \n", " \n", " $F={f_i(x)}$ is $(d_1, d_2, 1 - d_1/d, 1 - d_2/d)$-sensitive for any $d_1 < d_2$ \n", " + $h(x, y) \\in [0, d]$\n", " + $max \\, |F| = d$, \n", " 函数个数有限,若 $d$ 特别小,会限制了使用 AND- 和 OR- 组合的可能性。\n", " \n", "4. For **Euclidean Distance**, \n", " $f$ is associated with a randomly chosen line which is divided into segments of length $a$, and the segments of the line are the buckets into which function $f$ hashed points. as shown in Fig (3.13). \n", " $(a/2, 2a, 1/2, 1/3)$\n", " \n", "3. For **Cosine Distance**, \n", " $x$ 和 $y$ 的夹角是 $h(x,y$. \n", " 任选一超平面 $S_i$,取其法向量 $v_i$, \n", " 则当 $S_i$ 位于夹角内时,$v_i \\cdot x$ 和 $v_i \\cdot y$ 符号不同;夹角外时,符号相同。具体见书中图 3.12。\n", " \n", " 定义: \n", " $f(x) = f(y)$ 有且仅有 $v_f \\cdot x$ 和 $v_f \\cdot y$ 符号相同。\n", " \n", " $(d_1, d_2, (180-d_1)/180, (180-d_2)/180)$ \n", " \n", " + 感觉这个函数计算量,未必比直接计算余弦距离小 \n", " **sketches**: \n", " 有随机向量集合 $V = \\{v_1, ..., v_n\\}$,其中 $\\forall \\, v_i \\in \\{+1, -1\\}^d$ \n", " 则 sketches of $x$ is \n", " \n", " \\begin{align} \n", " sx & = [sx_1, ..., sx_n] \\\\\n", " & = [I(v_1 \\cdot x), ..., I(v_n \\cdot x)]\n", " \\end{align} \n", " \n", " 其中 \n", " \n", " \\begin{equation}\n", " I(x) = \\begin{cases}\n", " +1, & \\quad x > 0 \\\\\n", " -1, & \\quad x < 0 \\\\\n", " +1, \\, \\text{or} \\, -1 & \\quad x = 0\n", " \\end{cases}\n", " \\end{equation}\n", " \n", " 比较两个向量 $x$ 和 $y$ 的相似性,可转化为 $$P[f(x) = f(y)] = \\frac{len(\\text{intersection}(sx, sy))}{len(sx)}$$" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x:[3 4 5 6], y:[4 3 2 1]\n", "angle between x and y: 38.048\n", "\n", "v:[[ 1 -1 1]\n", " [-1 1 1]\n", " [ 1 -1 -1]\n", " [ 1 1 -1]]\n", "sketch of x: [ 1 1 -1]\n", "sketch of y: [ 1 -1 1]\n", "p = same elemets / total elements: 1 / 3\n", "angle(sketch): 120.0\n", "\n", "v:[[-1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1]\n", " [-1 -1 -1 -1 1 1 1 1 -1 -1 -1 -1 1 1 1 1]\n", " [-1 -1 1 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 1]\n", " [-1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1]]\n", "sketch of x: [-1 -1 -1 1 -1 1 -1 1 -1 -1 -1 1 -1 1 1 1]\n", "sketch of y: [-1 -1 -1 -1 -1 -1 -1 1 -1 -1 1 1 1 1 1 1]\n", "p = same elemets / total elements: 12 / 16\n", "angle(sketch): 45.0\n" ] } ], "source": [ "#example 3.21\n", "x = np.arange(3, 7)\n", "y = np.arange(4, 0, -1)\n", "print('x:{}, y:{}'.format(x,y))\n", "\n", "from scipy.spatial.distance import cosine \n", "\n", "print('angle between x and y: {:.3f}\\n'.format(np.rad2deg(np.arccos(1-cosine(x,y)))))\n", "\n", "def sketch_calc(x, v): \n", " sketch = np.dot(x, v)\n", " sketch[sketch>0] = 1\n", " sketch[sketch<=0] = -1\n", " return sketch\n", "\n", "v = [[1, -1, 1, 1], [-1, 1, -1, 1], [1, 1, -1, -1]]\n", "v = np.array(v).T\n", "print('v:{}'.format(v))\n", "\n", "x_s = sketch_calc(x, v)\n", "print('sketch of x: {}'.format(x_s))\n", "y_s = sketch_calc(y, v)\n", "print('sketch of y: {}'.format(y_s))\n", "print('p = same elemets / total elements: {} / {}'. format(sum((x_s-y_s)==0), len(x_s)))\n", "\n", "def sketch_angle_calc(x_s, y_s):\n", " \"\"\"calculate angle when similar possibily of two vectors are determined.\n", " \n", " p = (180 - d) / 180\n", " \"\"\"\n", " p = sum((x_s-y_s)==0) / len(x_s)\n", " d = 180 - 180 * p \n", " return d\n", "\n", "print('angle(sketch): {}\\n'.format(sketch_angle_calc(x_s, y_s)))\n", "\n", "v = list(itertools.product([-1,1],repeat=len(x)))\n", "v = np.array(v).T\n", "print('v:{}'.format(v))\n", "\n", "x_s = sketch_calc(x, v)\n", "print('sketch of x: {}'.format(x_s))\n", "y_s = sketch_calc(y, v)\n", "print('sketch of y: {}'.format(y_s))\n", "print('p = same elemets / total elements: {} / {}'. format(sum((x_s-y_s)==0), len(x_s)))\n", "print('angle(sketch): {}'.format(sketch_angle_calc(x_s, y_s)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.8 Applications of Locality-Sensitive Hashing\n", "##### 1. Entity Resolution\n", "to match data records that refer to the same real-world entity. \n", "找到同一对象在不同数据库里的数据\n", "\n", "eg: A 和 B 公司对账(记录方式不尽相同) \n", "1. 相似度比较: \n", " 使用\"姓名“,”电话“,”地址信息“,三个分别用编辑距离计算,评分(d=0-1, score=100-0),累积总值就是相似度评分 $s$。 \n", " + certain publicly available tables were used to reduce the penalty in appropriate situations, like \"Bill\" and \"William\" are treated as 1 letter.\n", "\n", "2. 全部比较太耗时,所以用 LSH 先找出潜在配对。 \n", " 对三组分别哈希,用 AND- 构建。 \n", " 实际中的方法是直接排序:名字,电话,地址均按升序排列,相似的就聚焦在一起。\n", " \n", "3. 评价匹配正确率 $f$。 \n", " 使用其它未使用的记录来做为评据。 \n", " 如这里使用两个数据库配对项目的创建时间差 $t$ 做为度量: \n", " 1. $\\bar{t} = E[\\text{random_pick}(t)]$: 随机配对的时间差期望\n", " 2. $t_p = E[t_{xy}: argmax_{x,y} s(x,y)]$: 最大 $s$ 数据集的时间差期望(假设全是正确匹配,$f=1$)。\n", " 3. 假设现在用 $s_i$ 做阈值,得到的匹配对算出均值 $t_i$,其正确匹配率是 $f_i$, \n", " 则有 $t_i = t_p * f_i + \\bar{t} * (1 - f_i)$ \n", " 解得:\n", " $$f_i = \\frac{\\bar{t} - t_i}{\\bar{t} - t_p}$$ \n", " \n", " 即,我们可以用结果中的其他数据项,组成评价匹配正确率的函数。\n", " \n", "\n", "##### 2. Matching Fingerprints\n", "特征描述: minutiae 构建指纹的特征向量。 \n", "\n", "两类问题:\n", " 1. 一对多:找出某个指纹的匹配\n", " 2. 多对多:找出潜在相同的匹配\n", " \n", "假设随机抽取一个指纹的特征向量的一位,其中有 minutiae 的概率是 $P[x] 20\\%$。 \n", " 同一来源的两个指纹在特征向量的一位,一个有 minutiae 时,另一个也有的概率是 $P[Y|X] = 80\\%$。 \n", "定义**哈希函数** $f$ 为: \n", " 抽取特征向量的三位,若两个指纹的每一位均含有 minutiae,则做为候选对。 \n", "\n", "分析:随机对候选的概率是 $0.2^6 = 0.000064$, 同一来源候选的概率是 $0.2^3 \\times 0.8^3 = 0.004096$,\n", "\n", "我们生成 1024 个 $f_i$ 哈希函数,使用 OR- 构建,再用 AND- 构建,结果见下图。" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dissimilar:0.0001, similar:0.0041\n", "dissimilar:0.0634, similar:0.9850\n", "dissimilar:0.0040, similar:0.9703\n", "248.496189026\n" ] } ], "source": [ "#Example 3.22\n", "p = np.array([0.2**6, (0.2**3)*(0.8**3)])\n", "print('dissimilar:{:.4f}, similar:{:.4f}'.format(p[0], p[1]))\n", "\n", "#1024 OR-cons\n", "p = 1 - (1 - p)**1024\n", "print('dissimilar:{:.4f}, similar:{:.4f}'.format(p[0], p[1]))\n", "\n", "#then 2 AND-cons \n", "p = p**2\n", "print('dissimilar:{:.4f}, similar:{:.4f}'.format(p[0], p[1]))\n", "\n", "print 1/p[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### 3. Matching Newspaper Articles\n", "difference of text in prose and in ads/headlines: \n", "Prose has a much greater frequency of stop words ====> define a _shingle_ to be a stop word followd by the next two words." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3.9 Methods for High Degrees of Similarity\n", "\n", "**LSH-based methods** appear most **effective** when the degree of similarity we accept is **relatively low**.\n", "\n", "##### 1. Finding Identical Items\n", "methods: \n", "1. hash the first few char and compare. \n", " + cons: some doc has the same header, such as html. \n", "2. hash the whole document. \n", " + cons: examine every char of every doc. \n", "3. pick some fixed random pos, then hash and compare. \n", " + prob: exclude the docs whose length are different.\n", "\n", "##### 2. Finding highly similar items\n", "all pairs in the sets have a high Jaccard similarity, say at least 0.9.\n", "\n", "**representing** sets as strings: \n", "represent a set by **sorting** the elements of the universal set in some **fixed order** and constructing a string of \"characters\". \n", "+ No char appears > 1 time \n", "+ if 2 char in 2 different strings, they are in the same order in both string. \n", "+ For documents, the element is shingle.\n", "\n", "eg: 26 lower-case letters, normal alphabetical order: {d, a, b, b} => 'abd' \n", "\n", "1. Length-Based Filtering: sort the strings by length \n", " each string $s$ is compared with those strings $t$ that follow $s$ in the list ($L_s \\leq L_t: L_x = len(x)$). \n", " $J_u$ is the upper bound on Jaccard distance between two strings. \n", "\n", " SIM(s, t) is at most $L_s/L_t$, and lower bound = $1 - L_s/L_t < J_u$ = upper bound. \n", " hence, $$L_t \\leq \\frac{L_s}{1 - J_u}$$.\n", "\n", "2. Prefix Indexing: create an index for the firs $p$-symbols of $s$. \n", " $p$: make sure that at least 1 shared symbol in two strings when $SIM(s, t) \\geq J$ \n", " \n", " 从相反情况讨论,假设 $SIM(s, t) \\geq J$ 但前 $p$ 字符无一相同,则此时 $max \\, SIM(s, t) = (L_s - p) / L_s$。那么只要 $J > (L_s - p) / L_s$ 则能保证前 $p$ 字符至少有一个相同。有: \n", " $$p = \\lfloor (1 - J)L_s \\rfloor + 1$$ \n", " \n", " 1. Using Position Information: 使用位置信息来判定是否需要比较 \n", " 索引是 $(x, j)$ \n", " 若 $s[1:i-1] \\cap t[1:i-1] = \\emptyset$ 并且 $s[i:-1] = t[j:-1]$ \n", " 则其 $$SIM(s, t) = \\frac{L_s-(i-1)}{L_s+(j-1)} \\geq J$$ \n", " 可解得 $$j \\leq (L_s(1 - J) - i + 1 + J)/J$$\n", " \n", " 2. Using Position and Length in Indexs: \n", " 索引是 (x, j, suffix length). \n", " 若有 (s_i, i, p) 和 (t_j, j, q),前面部分无一相同,后面部分两者是子集关系。 \n", " 则分情况讨论: \n", " + 若 $p \\geq q$,则 $$\\frac{L_s - (i-1) - (p-q)}{L_s + j - 1} = \\frac{q + 1}{L_s + j - 1} \\geq J$$ \n", " + 若 $p < q$, 则 $$\\frac{L_s - (i-1)}{L_s + (j-1) + (q-p)} = \\frac{L_s - i + 1}{i + j - 1 + q} \\geq J$$\n", " \n", " \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }