{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Driven Modeling\n", "
\n", "### PhD seminar series at Chair for Computer Aided Architectural Design (CAAD), ETH Zurich\n", "\n", "\n", "[Vahid Moosavi](href=https://vahidmoosavi.com/>)\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Ninth Session \n", "
\n", " 29 November 2016\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Markov Chains\n", "## Introduced by Andrei Markov in 1906\n", "![](Images/AAMarkov.jpg)\n", "## His original work on the sequence of charachters in language\n", "## One of the earilest data driven models of the langugae\n", "## Nevertheless, he didn't succeed as his model is data and computation intensive.\n", "## Later it was used extensively to study dynamic (stochastic) systems.\n", "## Recently, it was used as data driven representation approach.\n", "\n", "\n", "# Therefore, we discuss Markov Chains from the following aspects:\n", "\n", "* **From the point of view of dynamical systems**\n", "* **From the point of view of object representation**\n", "* **Properties and applications**\n", "* **Extensions to machine learning applications**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Topics to be discussed \n", "\n", "* **Text generation with Markov Chains**\n", "* **Markov Chains from the point of view of relational representation**\n", "* **Neuro-probablistic Models of the language**\n", "* ** Natural Language Modeling problems**\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import warnings\n", "warnings.filterwarnings(\"ignore\")\n", "import pandas as pd\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "pd.__version__\n", "import sys\n", "from scipy import stats\n", "import time\n", "import pysparse\n", "from scipy.linalg import norm\n", "import sompylib.sompy as SOM\n", "\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generative examples of Markov" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Example of sequence of characters in English texts" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "25000" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('./Data/IMDB_data/pos.txt','r') as infile:\n", " reviews = infile.readlines()\n", "len(reviews)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'Bromwell High is a cartoon comedy. It ran at the same time as some other programs about school life, such as \"Teachers\". My 35 years in the teaching profession lead me to believe that Bromwell High\\'s satire is much closer to reality than is \"Teachers\". The scramble to survive financially, the insightful students who can see right through their pathetic teachers\\' pomp, the pettiness of the whole situation, all remind me of the schools I knew and their students. When I saw the episode in which a student repeatedly tried to burn down the school, I immediately recalled ......... at .......... High. A classic line: INSPECTOR: I\\'m here to sack one of your teachers. STUDENT: Welcome to Bromwell High. I expect that many adults of my age think that Bromwell High is far fetched. What a pity that it isn\\'t!\\n'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reviews[0]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def cleanText(corpus):\n", " import string\n", " validchars = string.ascii_letters + string.digits + ' '\n", " punctuation = \"\"\".,:;?!@(){}[]$1234567890\"\"\"\n", " corpus = [z.lower().replace('\\n','') for z in corpus]\n", " corpus = [z.replace('
', ' ') for z in corpus]\n", " \n", " for c in punctuation:\n", " corpus =[z.replace(c, '') for z in corpus]\n", " \n", "\n", " corpus = [''.join(ch for ch in z if ch in validchars) for z in corpus]\n", " \n", " #treat punctuation as individual words\n", " for c in punctuation:\n", " corpus = [z.replace(c, ' %s '%c) for z in corpus]\n", "# corpus = [z.split() for z in corpus]\n", " corpus = [z.replace(' ', '_') for z in corpus]\n", " return corpus\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "texts = cleanText(reviews)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "'bromwell_high_is_a_cartoon_comedy_it_ran_at_the_same_time_as_some_other_programs_about_school_life_such_as_teachers_my__years_in_the_teaching_profession_lead_me_to_believe_that_bromwell_highs_satire_is_much_closer_to_reality_than_is_teachers_the_scramble_to_survive_financially_the_insightful_students_who_can_see_right_through_their_pathetic_teachers_pomp_the_pettiness_of_the_whole_situation_all_remind_me_of_the_schools_i_knew_and_their_students_when_i_saw_the_episode_in_which_a_student_repeatedly_tried_to_burn_down_the_school_i_immediately_recalled__at__high_a_classic_line_inspector_im_here_to_sack_one_of_your_teachers_student_welcome_to_bromwell_high_i_expect_that_many_adults_of_my_age_think_that_bromwell_high_is_far_fetched_what_a_pity_that_it_isnt'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "texts[0]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{'_': 0,\n", " 'a': 1,\n", " 'b': 2,\n", " 'c': 3,\n", " 'd': 4,\n", " 'e': 5,\n", " 'f': 6,\n", " 'g': 7,\n", " 'h': 8,\n", " 'i': 9,\n", " 'j': 10,\n", " 'k': 11,\n", " 'l': 12,\n", " 'm': 13,\n", " 'n': 14,\n", " 'o': 15,\n", " 'p': 16,\n", " 'q': 17,\n", " 'r': 18,\n", " 's': 19,\n", " 't': 20,\n", " 'u': 21,\n", " 'v': 22,\n", " 'w': 23,\n", " 'x': 24,\n", " 'y': 25,\n", " 'z': 26}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_chars = '_abcdefghijklmnopqrstuvwxyz'\n", "dictionary = {}\n", "for i in range(len(all_chars)):\n", " dictionary[all_chars[i]] = i\n", "dictionary" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "25000\n" ] } ], "source": [ "# building data with the format of sequence\n", "data = []\n", "for text in texts[:]:\n", " d = []\n", " for c in text:\n", " d.append(dictionary[c])\n", " data.append(d)\n", "print len(data)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def buildTM_from_sequential_data(data,states,irreducible=True):\n", " # each row is a sequence of observation\n", " n = len(states)\n", " M = np.zeros((n,n))\n", " for d in data:\n", " for k in range(1,len(d)):\n", " i = d[k-1]\n", " \n", " j = d[k]\n", " M[i,j]= M[i,j] + 1\n", " \n", " eps = .001\n", " for i in range(M.shape[0]):\n", " s= sum(M[i])\n", " \n", " if s==0:\n", " if irreducible==True:\n", " M[i]=eps\n", " M[i,i]=1.\n", " s= sum(M[i])\n", " M[i]=np.divide(M[i],s)\n", " else:\n", " M[i,i]=1.\n", " else:\n", " M[i]=np.divide(M[i],s) \n", " return M\n", "\n", "\n", "# Power iteration Method\n", "def simulate_markov(TM,verbose='on'):\n", " e1 = time.time()\n", " states_n = TM.shape[0]\n", " pi = np.ones(states_n); pi1 = np.zeros(states_n);\n", " pi = np.random.rand(states_n)\n", " \n", " pi = pi/pi.sum()\n", " n = norm(pi - pi1); i = 0;\n", " diff = []\n", " while n > 1e-6 and i <1*1e4 :\n", " pi1 = TM.T.dot(pi).copy()\n", " n = norm(pi - pi1); i += 1\n", " diff.append(n)\n", " pi = pi1.copy()\n", " if verbose=='on':\n", " print \"Iterating {} times in {}\".format(i, time.time() - e1)\n", " \n", " mixing_ = i\n", " return pi1,mixing_" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": true }, "outputs": [], "source": [ "states = np.unique(dictionary.values())\n", "M_char = buildTM_from_sequential_data(data,states,irreducible=True)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "chars = np.asarray([c for c in all_chars])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## steady state probabilites : equal to the frequencies of characters" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iterating 13 times in 0.000757932662964\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXmYFNXV/z9HZFEggggYQVAhgCQIxgXcAolR0VfFXSQu\nGI24kPfnFjFugLibaEQTo9Fo3ALEKKJEo1EnBiIIyrDJqrIIirKqiCzD+f1xu1+asZfq7qruWz3n\n8zzzzFTPqdOnq6tP3/rWueeKqmIYhmFUFjuUOwDDMAwjfCy5G4ZhVCCW3A3DMCoQS+6GYRgViCV3\nwzCMCsSSu2EYRgUSKLmLSF8RmSsi80VkSJr/DxCR6YmfCSKyX9B9DcMwjPCRXHXuIrIDMB84ElgO\nTAH6q+rcFJtewBxVXScifYFhqtoryL6GYRhG+AQZuR8MLFDVxaq6GRgF9Es1UNVJqrousTkJaBN0\nX8MwDCN8giT3NsDSlO2P2Za803Eh8HKB+xqGYRghsGOYzkTkx8D5wOFh+jUMwzDyI0hyXwa0S9lu\nm3hsOxI3UR8G+qrqmnz2TexvTW4MwzDyRFUl3eNBZJkpQEcRaS8iDYD+wLhUAxFpB/wdOEdVP8hn\n31pBBvoZOnRoxdr6EkfcbH2JwwdbX+LwwTZq3+X+yUbOkbuq1ojIYOBV3JfBo6o6R0QGuX/rw8CN\nwK7AH0REgM2qenCmfXM9Zy4WLVpUsba+xBE3W1/i8MHWlzh8sI3at88E0txV9RWgc63HHkr5+xfA\nL4LuaxiGYURLvWHDhpU7BgCGDx8+LGgszZo1Y6+99qpIW1/iiJutL3H4YOtLHD7YRu273AwfPpxh\nw4YNT/e/nJOYSoWIqC+xGIZhxAERQYu4oeodVVVVFWvrSxxxs/UlDh9sfYnDB9uofftMLJO7YRiG\nkR2TZQzDMGJKxckyhmEYRnZimdx90PLiqBFWsq0vcfhg60scPthG7dtnYpncDcMwjOyY5m4YhhFT\nTHM3DMOoY8Qyufug5cVRI6xkW1/i8MHWlzh8sI3at8/EMrkbhmEY2THN3TAMI6aY5m4YhlHHiGVy\n90HLi6NGWMm2vsThg60vcfhgG7Vvn4llcjcMwzCyY5q7YRhGTDHN3TAMo44Ry+Tug5YXR42wkm19\nicMHW1/i8ME2at8+E8vkbhiGYWTHNHfDMIyYYpq7YRhGHSOWyd0HLS+OGmEl2/oShw+2vsThg23U\nvn0mlsndMAzDyI5XmvumTUr9+uWOxDAMIx7ERnNfubLcERiGYVQGXiX3zz4LZueDlhdHjbCSbX2J\nwwdbX+LwwTZq3z4Ty+RuGIZhZMcrzf2pp5Sf/azckRiGYcSD2GjuNnI3DMMIh1gmdx+0vDhqhJVs\n60scPtj6EocPtlH79hmvkvvnn5c7AsMwjMrAK839hBOUcePKHYlhGEY8MM3dMAyjjhHL5O6DlhdH\njbCSbX2JwwdbX+LwwTZq3z4Ty+RuGIZhZMcrzb1RI2XlSmjcuNzRGIZh+E9sNPeWLa1ixjAMIwy8\nSu6tWgVL7j5oeXHUCCvZ1pc4fLD1JQ4fbKP27TPeJXfT3Q3DMIrHK839vPOU3r3h/PPLHY1hGIb/\nxEZzt5G7YRhGOMQyufug5cVRI6xkW1/i8MHWlzh8sI3at8/EMrkbhmEY2fFKc//HP5Tf/Q7++c9y\nR2MYhuE/sdLcrc7dMAyjeLxL7qa5+xFH3Gx9icMHW1/i8ME2at8+Eyi5i0hfEZkrIvNFZEia/3cW\nkf+KyDcicmWt/y0SkekiMk1E3sn2PC1buuTuiVJkGIYRW3Jq7iKyAzAfOBJYDkwB+qvq3BSb3YD2\nwEnAGlW9J+V/HwIHqOqaHM+jqsouu8DixdCsWaEvyTAMo25QrOZ+MLBAVRer6mZgFNAv1UBVV6rq\nu8CWdM8f8HkAq5gxDMMIgyBJtw2wNGX748RjQVHgNRGZIiK/yGUcJLn7oOXFUSOsZFtf4vDB1pc4\nfLCN2rfP7FiC5zhMVT8RkZa4JD9HVSekMxw4cCCrVu3F/ffD1KnN6NGjB3369AG2HfR8t5MEsa+u\nrg7sv7q6uqB4wt6O2+uLKt5Kf335xlvpr8+Hz1O54quqqmLRokXkIojm3gsYpqp9E9vXAqqqd6ax\nHQp8maq5B/1/UnO/6CI44AAYNChn7IZhGHWaYjX3KUBHEWkvIg2A/kC2Zaz/74lEZGcRaZL4uzFw\nNDAr25OZ5m4YhlE8OZO7qtYAg4FXgdnAKFWdIyKDROQiABFpLSJLgSuA60VkSSKptwYmiMg0YBLw\noqq+mu35THP3I4642foShw+2vsThg23Uvn0mkOauqq8AnWs99lDK3yuAPdPs+hXQI5+AWrWCiRPz\n2cMwDMOojVe9ZVSVN96AESPgzTfLHZFhGIbfxKa3DJjmbhiGEQbeJfcgi2T7oOXFUSOsZFtf4vDB\n1pc4fLCN2rfPeJfcW7SANWugpqbckRiGYcQX7zR3cKP32bOdRGMYhmGkJ1aaO5jubhiGUSyxTO4+\naHlx1Agr2daXOHyw9SUOH2yj9u0zsUzuhmEYRna81Nx/+Uvo1Mn9NgzDMNITO809uSKTYRiGURhe\nJnfT3MsfR9xsfYnDB1tf4vDBNmrfPhPL5G4YhmFkx0vNfcIEGDLEGogZhmFkI3aau43cDcMwiiOW\nyd0HLS+OGmEl2/oShw+2vsThg23Uvn3Gy+S+yy6wYQNs3FjuSAzDMOKJl5o7QJs2MHkytG1bxqAM\nwzA8JnaaO5jubhiGUQyxTO4+aHlx1Agr2daXOHyw9SUOH2yj9u0zsUzuhmEYRna81dyvugq++124\n+uoyBmUYhuExprkbhmHUMbxN7tnWUvVBy4ujRljJtr7E4YOtL3H4YBu1b5/xNrnbyN0wDKNwvNXc\n33kHLrsMpkwpY1CGYRgeY5q7YRhGHcPb5J5csCPdhYUPWl4cNcJKtvUlDh9sfYnDB9uoffuMt8m9\ncWOoVw+++qrckRiGYcQPbzV3gH32gX/9y/02DMMwtieWmjvYWqqGYRiF4nVyz3RT1QctL44aYSXb\n+hKHD7a+xOGDbdS+fSaWyd0wDMPIjtea+69/DU2bwnXXlSkowzAMj4mt5m4jd8MwjMKIZXL3QcuL\no0ZYyba+xOGDrS9x+GAbtW+f8T65Z2oeZhiGYWTGa829uhrOOw+mTy9TUIZhGB5jmrthGEYdw+vk\nvttusHIlbN26/eM+aHlx1Agr2daXOHyw9SUOH2yj9u0zXif3Bg1cKeSaNeWOxDAMI154rbkDdOkC\nzz8P++5bhqAMwzA8JraaO1jFjGEYRiHEIrnXvqnqg5YXR42wkm19icMHW1/i8ME2at8+431yt86Q\nhmEY+eO95j50KIjAsGGlj8kwDMNnYq+528jdMAwjP2KZ3H3Q8uKoEVayrS9x+GDrSxw+2Ebt22cC\nJXcR6Ssic0VkvogMSfP/ziLyXxH5RkSuzGffXNjI3TAMI39yau4isgMwHzgSWA5MAfqr6twUm92A\n9sBJwBpVvSfovik+0mruc+bAKae434ZhGMY2itXcDwYWqOpiVd0MjAL6pRqo6kpVfRfYku++ubCR\nu2EYRv4ESe5tgKUp2x8nHgtCMfsC0Lw5fPEFbN687TEftLw4aoSVbOtLHD7Y+hKHD7ZR+/aZHcsd\nQCoDBw5kr732AqBZs2b06NGDPn360KIFjBtXRYsW0KdPH2Dbm5BrO0kQ++rq6sD+q6urAz1/1Ntx\ne31RxVvpry/feCv99fnweSpXfFVVVSxatIhcBNHcewHDVLVvYvtaQFX1zjS2Q4EvUzT3fPZNq7kD\n7LcfPPkkdO+e8/V4x+bNMH48nHRSuSMxDKPSKFZznwJ0FJH2ItIA6A+My/Z8Reybljjr7tOnw8CB\n5Y7CMIy6Rs7krqo1wGDgVWA2MEpV54jIIBG5CEBEWovIUuAK4HoRWSIiTTLtm2+QtZN77UuobJTb\ndsECWLeuirVrA7sue8xxtPUlDh9sfYnDB9uofftMIM1dVV8BOtd67KGUv1cAewbdN1/i3BlywQL3\n+6OPYP/9yxuLYRh1B+97ywDcdht89ZX7HTfOPdfdL3juOTj55HJHYxhGJRHr3jIQ786QCxa4EXuA\nm9uGYRihEYvkHnfNvXPnqrySe7ljjqOtL3H4YOtLHD7YRu3bZ2KZ3OPCmjWwcaNbItBG7oZhlJJY\naO4ffABHHQUffljioIpkyhS46CL4859dOeT06eWOyDCMSiL2mntcq2UWLIDvfQ/23tuN3D35HjUM\now4Qi+TepAls2QJff+22fdDygtguXOiSe3V1FSIErnWPy+vzydaXOHyw9SUOH2yj9u0zsUjuIvEc\nvS9YAB07ur/32svVuhuGYZSCWGjuAAceCA8+CAcdVMKgiqRXL/jNb+Dww11vmXPPdb3pDcMwwiD2\nmjvEs2ImKcuAG7lbxYxhGKUilsndBy0vl22yDLJVK2ebT3KPw+vzzdaXOHyw9SUOH2yj9u0zsUzu\ncSBZKSOJCyYbuRuGUUpio7n/5jfw6afudxx45hl44QUYPdptV1c7zX3GjPLGZRhG5WCaexlIrZSB\nbSN3T75LDcOocGKT3FObh/mg5eWyTcoySdtmzaBePafFlzKOumLrSxw+2PoShw+2Ufv2mdgk97iN\n3FMrZZJYrbthGKUiNpr70qVwyCHw8cclDKoIWrSA99+H1q23PXbyyXD22XDqqeWLyzCMyqEiNPek\nLOPJd1FWVq92C2O3arX941YxYxhGqYhNcm/UCHbaCdat80PLy2ZbuwwyaRs0ufv++ny09SUOH2x9\nicMH26h9+0xskjvEp7/MwoXbV8oksZG7YRilIjaaO8Bhh8Fdd7nfPjNsmOtiecst2z8+fbrT3GfO\nLEtYhmFUGBWhuUN81lJNLYNMxWrdDd9Yvhy2bi13FEYUxCq5J8shfdDystnWLoNM2u6yC9SvD6tW\nlSaOumTrSxw+2Aa137DBdVkdMSKaOHywjdq3z8QyuftO7dmpqZjubvjCww/DihVuMGJUHrHS3EeO\ndInz/vtLFFQBrFoF++zjVl2SNErYKafAgAFw2mmlj80wknz9NXTo4Nb4fe89ePHFckdkFELFaO5x\nqJZJSjLpEjvYyN3wgwcfhEMPhfPOc03tjMojdsndd809nSSTahskufv8+ny19SUOH2xz2a9fD3ff\n7aq69toLVq2qYvXq8OPwwTZq3z4Ty+TuM5kqZZLYyN0oN7//PfTuDd26wQ47OHlm+vRyR2WETaw0\n9xUr3Anpc4L/2c/gmGNc7/Z0zJwJZ50Fs2aVNi7DAPjyS3dl+eab0LWre+yyy9yA5PLLyxubkT8V\no7m3aOFa5tbUlDuSzGSrlAFo395q3Y3ycf/98NOfbkvsAN2728i9EolVct9xR2jWDMaNqwq8Tyl1\nP9X0skyq7Xe+Aw0bwsqV0cVRF219icMH20z269bBvffCTTdt//iWLVWBb6r68Pp8OS98J1bJHZzu\nvnZtuaNIz+rVLsHvtlt2O9PdjXJw331w7LHQufP2j++zD8yb5zqZGpVDrDR3gKOPdtrgcceVIKg8\nmTQJBg+GqVOz2516KvTvD6efXpq4DGPtWicXvv12+hv+XbrA3/7m7mkZ8aFiNHeA/fbzVx/MVSmT\nxEbuRqm591448cTM56fp7pVH7JJ79+7w2mtVge1LqfulW1ovnW2u5O6DVhk3W1/i8MG2tv3q1a78\n8YYbMtsGTe4+vD5fzgvfiV1y79HD314YuSplkvg+cl+7Fr74otxRGGHx29+6JR732SezTffuNlO1\n0oid5r55s+uuuHIl7LxzCQLLg4MOcv1vDjkku92sWXDmmTB7dmniypfjjoNOneB3vyt3JEaxrFzp\nbqC++64bVGRi2TLYf383lyRT6wzDPypKc69f352svk0CylQGmQ6fa93fegtee83fLx4jP37zGzjj\njOyJHWCPPVxf908/LUlYRgmIXXIHaN3av7rcVavciKdFi9y2TZu69WAzNUErl1apCr/+tes5Mm1a\neWIo1NaXOHywTdp/9hn86U9w3XW5bUWC3VT14fX5cl74TiyTe8eO/t3ZT+rtQS9pfdTdx493E12G\nDHHNpdatK3dE5eW99+Coo+LbDveuu1x76T33DGZvFTMVhqp68eNCCcYbb6gedlhg85LwxBOqZ50V\n3P7UU1VHj44unnypqVHt1k31hRfc9oEHqr79dnljKhcff6x63nmqrVurDhig2q9fuSPKn08+UW3e\nXHXZsuD7PP64av/+0cVkhE8ib6bNqbEcuXfvDjNm+LX2Y1C9PYlvI/e//hWaNIETTnDbXbvC+++X\nN6ZSs369k6T22w+++12YPx9uu81NTvPx/kg27rzTNa/bY4/g+9jIvbKIZXKfMaOKZs3go49y25ZK\n98tWBpnOb7bkXmqtctMmuPFGuP32bbJSw4ZVgZN73LXVrVvh8cfdjfp581xlye23uz5A7drBpk1V\nLFkSbQxh2i5fDo88UsW11+bnu2tX95nasCGcOHywjdq3z8QyuYN/o4xME5gy4dPI/ZFHXOlj797b\nHmvfHubMKV9MpaKqCg480K0n+uyz7gomtbJEBPbdFyZPLleE+XP77a6cdffd89uvQQN3HlilVGUQ\nuzr3JDfe6D54N98cYVABUYXmzeGDD9JXy6Rj9mzXW6bc0sf69e5L6aWX4Ic/3Pb4woXuZmKQq6M4\nsmABXHONm7hz553uvch0M/y221w11G9/W9oYC+GBB+COO9zN4Fat8t//3HPdl/wFF4QfmxE+FVXn\nnsSnkfvKlS4x7Lpr8H18qXUfORKOOGL7xA6w995uQsv69eWJK0qSE80OOcRdnZxxRvYqp549ne7u\nM6rufsHIkTBhQmGJHWymaiURy+TuWy+MXItip/PbpAk0bpx+ValSaZVr1sA998CIEd+2/c9/qvje\n92Du3GhjCMs2qH1NDdx6K9xzTxXXXAONGuX2+803bl5FkJa45TgWW7e6bqTjxsF//uNkpUJ95/pc\n+fBem+YejEDJXUT6ishcEZkvIkMy2IwUkQUiUi0i+6c8vkhEpovINBF5J6zAO3Rwl8pr1oTlsXDy\nrZRJUm7d/c47Xc+RTp3S/79r18rT3SdNclp0u3bB92nc2F3JzJgRXVyFsmmTW9px9my3dF7r1sX5\nS1ailfuK0iienJq7iOwAzAeOBJYDU4D+qjo3xeZYYLCq/o+I9ATuU9Veif99CBygqlnTcL6aO8Ch\nh7qbR6k3AsvBjTe6hYaHD89vv9NPh9NOc31mSs3y5a539/Tp0LZtepsRI1zlxG23lTa2KLn6apes\n832vLrgADjgALr00mrgKYf16tzZAo0YwalSwq5AgtGkDEyfmbllglJ9iNfeDgQWqulhVNwOjgH61\nbPoBTwCo6mRgFxFJjiEk4PPkTY8efuiD+VbKJCnnyH3ECPj5zzMndnBVIuW+4RsmqjB2LJx0Uv77\n9uzpV8XMqlVw5JGujv3ZZ8NL7ODX/SyjcIIk3TbA0pTtjxOPZbNZlmKjwGsiMkVEflFooKkkdTFf\nemHkkmUy+c2U3KOOeeFCt+pOtjroqqqqwLJMXLTV2bNhyxY3KMg3jl69gt1ULcWxWLYMfvQj9/Po\no25t4TDjyHZT1Yf32jT3YKQ5LULnMFX9RERa4pL8HFWdkM5w4MCB7JW4FmzWrBk9evSgT58+wLaD\nnrpdUwPTp2f+fyqZ/p+6XV1dnfX/qdvVibO/d+8+LFgAn35aRVVVdv+1t7/4AhYtCm4f1uu76Sbo\n16+KmTOzv77vfx+WLOnDxo3w9tuFxVeq9yOI/RNPwEkn9UFk2/sX1P/nn7uJTGvW9KF58/Kdb3vs\n0Yejj4a+fas47jgQyW5fyPvVvTv8/vdV9O5d+tdX7PlVyvOzXPFVVVWxKMglf6a+BLqt50sv4JWU\n7WuBIbVs/gicmbI9F2idxtdQ4MoMz5N3X4WvvlLdaSfVTZvy3jU0VqxwPTwKYfZs1S5dwo0nF9Om\nqe6+u+qXXwaz79JFdcaMaGMqFT/8oeqbbxa+f+/eqq+8ElY0+TN1qnvvHn002ud5/33VffaJ9jmM\ncKDI3jJTgI4i0l5EGgD9gXG1bMYB5wKISC9graquEJGdRaRJ4vHGwNFAaJ3YGzd2He/mzQvLY/4U\nqrdDeWrdr7/etYBt0iSYfaVUzCxZ4n4OP7xwH+XU3d98E449Fh580N0riZJOnVxfd1uNK97kTO6q\nWgMMBl4FZgOjVHWOiAwSkYsSNv8APhKRhcBDQLKmoDUwQUSmAZOAF1X11WKDTr1EKXddbpAyyEx+\nGzd2vd1XrCg+jiCMHOn6xVx0UXC/QRqIRX2Mw7AfOxaOP36bPl1IHEEmM0Xx+tasgRNOqGLMmOA3\ng4uJo149+P73YebMcP2WwzZq3z4TSHNX1VeAzrUee6jW9uA0+30E9CgmwFwkK2Z+9rMonyUzQddN\nzUTypmq+fUDyRdUt3DB8ODRsGHy/rl3h+eeji6tUjB0Ll19enI9evdwXo2ppl6J75x13jiXk15KQ\nvKl62GGle04jXGLbWybJ+PFw333watHXA4XRv79rk1vol8sZZ8Appzg/UTJ+vFuEY/p0NzILSnU1\nnH22f8sa5sOqVW5x6E8/dStgFcOeezqJpJgv9Hy5+Wb4+mvXM6ZUPPCAm8z08MOle04jfyqyt0yS\n5AijXN9Rhc5OTVKqWve77oKhQ/NL7ODa4H7wgSshjCsvveRqwotN7FAe3X3SJPe8pcRq3eNPLJN7\nqi7Wpk32hX2j1P2Si2LnGsVl85suuYcd86JFTjdv1ix/vzvt5I7xBx8UF0PUttnsx451bRbCiCNX\ncg/79am656upCe43jDj228/NC6ipCddvqW2j9u0zsUzuqQRd2DcKPv8c6tfPrxtkbfbeO/qR+1NP\nOfmnfv3C9o/zTNWvv4bXX4f/+Z9w/AWdzBQWCxe6G++77Va65wTYZRfXWXLhwtI+rxEesdfcAa68\n0jVMGpK2pVl0TJwIV11V3Id9zhw3qgzSfbEQVKFLF3jiicIv7YcMcR/2664LN7ZSMHYs3H+/S/Bh\n8PXX0LKl0/HDnPKfiaeeghdecDOKS83JJ7t7QeXofWQEo6I1dyhfj5liK2XA1bovXhzderDvvOMS\n/MEHF+4jzuupppNkimHnnV0deKnOt0mT3NVCOTDdPd7EMrnX1sWynYRR6n5BJzBl87vzzm6tztRa\n9zBjfvJJt7qOSOF+c8kyvmqrW7a4m6n9are5KzKObLp72K9v8mT3fOXQmdN9rnx4r01zD0Ysk3tt\n9t0398K+UVBspUySqCpmNm2C0aNdKWMx7LuvmwUc1dVFVCQXrthzz3D9lmplpg0b3E3N2qtklQob\nucecTH0JSv1DAb1lUuneXfWdd4pykTf77686eXLxfs44Q/WZZ4r3U5uxY1WPOCIcX3vuqfrhh+H4\nKhX/+7+qt9wSvt/331fde+/w/dZm4kTXD6dc1NSofuc7qp9/Xr4YjOxQZG+ZWFDqUYZqcX1lUolq\n5J6UZMKg1Lr71q2uj0qQpe3SUUzv9lx07gyrV7tqqShJSjLlYocdXEmkjd7jSSyTezpdrEeP9Cdh\nVHrb889X0aABNG9evN/ayT2MmNesgddecys9heE3m+4exTGeNAkuvbSKYcMCu97O97Rp0KCB+1Iq\nJo50tjvsAAcdlF53D/NYpN5MLZfOXHvQ5IOObpp7MGKZ3NNR6lXbly0Lbwp6FLXuY8ZA377QrFk4\n/krdHXLMGDjxRHjsMTfdP1+SVTJR9YApxUzVco/cwXT3WJNJryn1D0Vq7itXOn2wpqYoN4F57DHV\ns88Ox9ecOaqdOoXjK8mhh6q++GJ4/iZMUO3ZMzx/2aipUd1jD9fv/uWXVdu2de9vPvzgB06zjopx\n41R/+tPo/H/yiWqzZqU7nzMxebK7n2X4CXVBc2/RwpUUlmpN0rAqZSD8WvcPPnD3A445Jhx/sE2W\nKcWct7ffdrN+u3Z1Vx+nnw4XXhj8uRcudHp4lPXhPXvClCnRVRAlR+07lPkT+oMfuEqpTZvKG4eR\nP7FM7j7U5U6cWBVYlsnld6ednHyS7I9TbMxPPulmFtZuN1CM3113dTX5y5YFiyGo33SMGePaJSRt\nb7/dffk99FD2/ZL2L7zgatuzJcZiY27Vyh2T2gvFhHUsaksy5dKZd97Z3RNKSnI+6OimuQcjlsk9\nE6XUB5ctC2/kDuFVzKi6KevnnFO8r9qUQnffutVNtT/99G2PNWwIf/0r3Hijq/vOxfPPR1MlU5so\ndfdyzkytjenuMSWTXlPqH4rU3FVVx4xR7devaDc52bpVtUkT1TVrwvN55pmqTz9dvJ+JE926p1u3\nFu+rNpdeqvq734XvN5W33lLdb7/0/3vkEdVu3VQ3bMi8/6efqu6yi+o330QTXyr33qs6aFD4frds\nUW3aNP/7DFFx222qV15Z7iiMdFAXNHco3QhjxQrXNCqsShQIb+T+xBPb2g2ETSlq3ceM2X7UnsrP\nf+6aoF1zTeb9x41zOn0+q00VSq9e0Yzc33/fNcJr0SJ834VgI/d4EsvknkkX69DB3Uhbuza3bT5+\nU1F1CaRly3D9pib3QmPeuNFJGplWhSr2WGSSZcI6xjU18Oyz25J7bVsRp7u/8ILrGZPOd9CJS2HE\n3KMHzJ/vOkWG6Xfy5G9LMuXUmZPJXdUPHd0092DEMrlnol496NbNLQ8WBRMnwhFHwMiRMGhQuL7D\nqHV/6SX3QWzXLpSQvkXXrk7zjqpiZsIEN2Lt3DmzTfPm8PTTrnrmk0+2/9/XX7t+MsceG018tWnU\nyC0k/e674fr1ob49lT32cL9rH2/DczLpNaX+IQTNXdVpoCNHhuLq/5g1S/XEE1XbtXP17Vu2hOtf\nVXXuXNWOHYvz0a+f6p//HE486di6VbV5c9UVK6Lxf+mlqrfeGsx26FBXZ55aBz56tGrfvpGElpHB\ng1Xvvjtcnz/4geqUKeH6LJYjj1QdP77cURi1oa5o7hCuPrhkCZx/Pvz4x9C7tyt7Gzgw/3VIg9Cu\nHSxdWngFCpQSAAASTElEQVTd9MqVUFUFp54aaljbIRJdxUxNDfz975n19trccIPrmvjb3257LKpe\nMtkIu0Pkl1/Chx+6ni4+Ybp7/Ihlcs+mi9VeuKMQvW3VKrj6ath/f3dJumCBW+0pdeWdsHW8nXZy\nksMnnxTme/RoOO44N5GrmDhy2aa7qRqG37fecsc6tbw0m98dd3TyzN13w9SpbpLNuHFVaXu35xNH\nvra1b6oW63fKFHcON2gQPIagvouxTSZ3H3T0ch+LuBDL5J6Nbt1c8tmyJf99N2yA225zmu/69TBr\nFtx6q1tirhQUUzHz5JPR1LbXJqr1VJMTl/KhfXt44AE46yx48UW3vfvu4ceWjQ4d3HmzfHk4/nzT\n25PYyD2GZNJrSv1DSJq7qtOuZ8/Ob5/x410/k9NPV503L7RQ8qJ/f9Wnnsp/v7lzVXffXXXz5vBj\nqs0rr6j+5Cfh+ty8WbVlS9WFCwvb/+c/V915Z9U77ww3rqAce6zqc8+F4+vEE1VHjQrHV5hs3Kja\nqJHq+vXljsRIhbqkuUP+a6q+9prT0keNciPITp0iCy0rhY7cn3rKjV533DHsiL5NFJr7v//t7jl0\n6FDY/iNHuiqmoHp92ISlu6umL4P0gQYN3BXtrFnljsQISiyTe9C63CC2EybAgAHuZl5NTXbbfGIo\nxDaZ3PPx/cYbVYElmTBibtvW3fRbsyY8v5kkmaB+GzeGV16BxYuLi6NQ21TdvRi/S5a43+lKWX3Q\nmXv0gEceCd9vlLZR+/aZWCb3XGRauKM2U6fCKae4G3NHHBF9XLkopNZ95kxo2tS95lIg4nT3sEbv\nW7bAc8+Vb9QdBgcf7Grda2qK85PsJxNVD/piufZaN0nun/8sdyRGIDLpNaX+IUTNfckS1dats9vM\nnOlsxo4N7WmLZt481Q4d8tvnwgtLrzWfd57r8xIGr76qetBB4fgqJ506qU6fXpyPK65wfVx8ZsIE\n1d12U506tdyRGKp1UHNv29aVxq1Ykf7/Cxa4Xuf33kvg0rlSkKx1//3vXWlgqvSRjg0bnJyUqd1A\nVIRZMfO3v+VfJeMjYXSInDTJz0qZVA47DB5+GE44wdXjG/4Sy+SeSxcT2SbN1LZdvBh++lO4+WZ3\nEzIfv1HbNmrkGn+9/HIVQ4a4ZN+2rZtOf801rtyxutr1kAFX/rf33lW0aVPamGvXuhfqd/Nm1543\nkyTji7YaxDZ5U7VQv5s2ufP1oIMKj6FQ+3xtTz4Zrr/eNWhbubI8MeSDae4VRro1VT/5BI480k1I\nuuCC8sSVizPPdBOo3n4b1q1zvVIuucTV2o8fD2ef7bpRdu0Kv/oVHH106WMMqzvkm2+6Cpn27Yv3\nVW6K7RA5Ywbss4+7fxIHLrvMLb5+/PHbN07zia1b4Z573BVuXUQ0qi5QeSIiGmYsjz/uShyfftpt\nr1zpWggMGOBGHXFm40bXCmH+fHd5XIr2tqnU1Lgk9Nln0KRJ4X4uvNB9UVx5ZXixlYvNm92X7ief\nZJ8lnIkHHnAj9z/9KfzYokLVlRCvWeNuihdairt8uZN4Dj881PD44x9h8GA3WLrjjnB9+4KIoKpp\nb8FX7Mg9tWJm3TqnsZ94Ilx3XXnjCoOGDV3vkdNOK31iB9dbp1MnmDu3cB+bN7teMKedFl5c5aR+\nfXfOTZ1a2P6+zkzNhgg88oiTlC69NP9uoV99BUOHulnlJ57oKr/C4uOP3cpdr74Kjz5aN+vzY5nc\ng+hi++7rFop+4YUqjjsODj3UtRbIVmZW6RphmLap0kwhfl9/3X1BZGtPHLfj1rMnPPNMYX5z3Uz1\n9VjUr+9uir/7LowYEcxvTY27QunUyX1G33sPLriginPO2XY/qZh4Vd2XzeDB8JOfwNlnV3HJJcGa\n8pnmHgMaNnRNqH75Szez7r77/K0fjiPFzlQtpJeM7xx7LLz8srtSzIdVq1xlV9eu0cQVNU2buvtB\njz/uRsmZUHWTzXr0cLOqx41zv9u3d03v2rWDYcOKj2fMGCfz/PrXbvuEE9yV4mOPFe87VmSqkSz1\nDyHWuSe56CLXryWK/ut1nWefdX1QCmHjRtVdd1VdujTcmHzgkktUTzstvzVsx49X/fGPo4upVMyb\n5+aOvPTSt/9XXa161FFuPsALL6Q/Pp9+6nokTZhQeAwrVzofb7+9/ePTpqm2aqX62WeF+/YR6lqd\ne5I//AH++tdo+q/XdYqpmPnXv5xs1rZtuDH5wD33uFHjAw8E38fXfjL50qmTWwLx/PPhnXfcY8uX\nu7VvjznGzSmZNcvp6+muolu3hgcfdGsAf/llYTFceaW7Iqx9PHv0cPNBsq2/W2nEMrkH1cXq1Su/\nDpuvrS9x5LLt2NFNuPrmm/z9ZlsEO58YirGPynbSJPf6RozYluBy+Q1yMzUux6JnT/jzn10i79+/\nim7dXNKeN8+VT9avn93vSSfBj34EV12Vfwz//KdrQnfrrenthw9393r+/e/8fceRWCZ3o/zUr+/q\nsufPz28/t6hG5VTJpKNDB1eGd+aZuWcZb90az0qZbBx/vCs9XLcOpk2D22/Pb02E++5zZczpFkHP\nxFdfwcUXuwXUM5XnNm3qfF98sTsPK52KrXM3oue009xP//7B93npJbjzTjc5q9K5/HL46CNX8pnp\nZv68eU6yKHZx9ErjrbfceTV9OrRsmdv+8sth9Wo3wzsbqk4WOuSQyiiLrpN17kb0FKK7V2KVTCbu\nugs+/dTp8JmIQz+ZcvCjHzmN/OKLc9fPT5rklpm8997cfkXc/ZB77nFlmJVMLJO7D9pqXdfcYVs5\nZFC/X3wBzz1XFXgR77gftwYN3JfZXXfBxInpbYPeTI37sSjEdsQIJ/s9+WRm202b3Ezne++FFi2C\n+W7f3t1YHTz4218cprkbBsG7Q37xhZtA1rGjkyD22CP62HyhfXtX+33WWembbFWa3h4mjRq5Ovir\nr962kEltbr/drYNw5pn5+b7iCjeL9dlni4/TV0xzNwpmwwZo3tyVraWrgli3Du6/393EOvpouOEG\n94VQFxkyxDUHGz8edkgMqb7+GnbbzU1i2mmn8sbnM3fc4doI/Otf244dwOzZ0KePu2lbSFntxInu\nS2H27Pxu+PqEae5GJOy0E+y557e1y7VrXdlZx47usnrCBNfAra4mdoBbbnFfgqkNrN57D77/fUvs\nufjVr5z8ct992x6rqXFyzM03Fz5f4rDD3KziG28MJ07fCJTcRaSviMwVkfkiMiSDzUgRWSAi1SLS\nI59988VXjTAMW1/iCGq7774wZoyzXb0abrrJJfVFi1zb4ieecO0foowhat9h2Nav7xZgv//+bXXW\nTz9dFViSqaRjka9tvXrwl784aW/2bGf7hz+4LpSDBhXn+4473H2RZMO3OqW5i8gOwAPAMcD3gbNE\npEstm2OBDqr6PWAQ8Meg+xZCde1G7RVk60scQW27doXXX6/m+utdL5/ly93knccec0m+FDFE7Tss\n27ZtXf+VAQNcL5n//rc68MzUSjsW+dp26OCS+znnwGuvVTN8uGs+tkOA4Wk23y1auBveF1/srgby\nPc4+E2TkfjCwQFUXq+pmYBRQe3G6fsATAKo6GdhFRFoH3Ddv1q5dW7G2vsQR1LZbN5gwYS0rV7rO\ngI884iY3lTKGqH2HaXvMMW46/oAB8NFHawOP3CvxWORre+GF0KYN3HPPWq64AroEHCbm8n3OOW7i\n0x/+kP9x9pkgyb0NsDRl++PEY0FsguxrxJgzz3RTxR96CPbaq9zRxINhw1wJ3ubN6a9ujPSIuNF6\nt27h9ogRcTOKb7658J42PhLVDdVIm+suymM6X9xsfYkjqO2OO8Jnn5U3hqh9h21br55raNez56LA\nbagr9Vjka7v77tC166KMPWoK9d2li7sfsnRpsDjiQM5SSBHpBQxT1b6J7WtxbSbvTLH5I/Cmqo5O\nbM8FegN759o3xYfVQRqGYeRJplLIIKseTgE6ikh74BOgP3BWLZtxwGXA6MSXwVpVXSEiKwPsmzVA\nwzAMI39yJndVrRGRwcCrOBnnUVWdIyKD3L/1YVX9h4gcJyILgfXA+dn2jezVGIZhGIBHM1QNwzCM\n8KjYGaoi0l5EQlxPPePzDBWRK0P0978i8r6IPJnbOrDPvI+FiEwI077AGCqodsE/RGQXEbmk3HEY\n0VCxyT1BHC9LLgF+qqrnhOw3r2OhqodHYJ/v+xHH9y9yREJb6r05cGlIvgzPiFVyF5HnRWSKiMwU\nkQsD7FJfRJ5KjITHiEijLL7PFZHpIjJNRP6SI47rRWSeiLwFdM5h+zMRmSwi74nIg9k+mCLyILAP\n8LKI/L8cfm9MtHV4S0SeCXD1sKOIPCwis0TkFRFpmMN/XqPmfOxFZJ/E8Tggn+dI46e9iMwRkccS\n78dTInKkiExIbB+YYZ/3gxwLEbkyca7NCPB+JGMJer793zmU6/1L+J4rIn9JXP1k7KYiIjuLyEuJ\n83iGiGRb0PB2IPlefKuCLU0MM1O2rxKRm9LY3S4il6ZsZ7yyFZGrE/fkEJF7ReT1xN8/FpGn0tgf\nmPiMNhCRxon3r2sG38NT3zMRuUVEfpnl9Q1KHLP3ROTDZCyxJtPK2T7+AM0SvxsBM4HmWWzbA1uB\nXontR4ErM9h2BeYm/SWfJ4PtD4HpQEOgKbAgi98uuEqieont3wNn53iNH2Z7XQmbA4H3gPpAE2B+\nphhSjsVmoFtiezQwIMdzfJHne5PVPhHDDKBTIvYfhORzE9A1sT0VeCTx94nA81n2yXosUt7nRkBj\nYBbQPaTzLfA5lOJ7C3BQgGN2CvBQynbTXO9JwPd3O1vgKuCmNHY9gKqU7dlAmww+ewKjE3+/BUwC\n6gE3Ab/IsM/NwN24tiZDcsT7buJvARbm+lwlbHcE/g0cl8/57+NPrEbuwOUiUo07CdoC38thv0RV\nJyX+fgrIJB38BPibqq4BUNVsc5CPwCWNjar6JS55Z+JI3Ad5iohMSzxPlsn5gDsRc112Hwa8oKqb\nVfUr4MUc9gAfqmpy5PUusFeAfcKmFTAWl0xnheTzI1VNdpWfDSRHXDNxH/BM++Q6Fofj3udvVHU9\n8Bzuvc9G0PMtn3MoyWJVnRLAbiZwVGIEfXjCf8lQ1WqgpYjsLiL7AatVdVkG83eBA0SkKbAReBs4\nCHd8Mi3EOAI4CjgAuCtLHIuBlSLSHTgaeC/5+c7BSOANVf1HAFuvCVLn7gUi0huXHHuq6kYReRM3\nqspGbc221BquAH9R1etL/Lzp2Jjydw25j10UrAOW4D68c0Pymfq6tqZsbyXz+V2qYxHm+bY+0BOq\nLhCRHwLHAbeIyL9U9ZYQnn8LblSdJNsx+xtwOrA77sooU6xbRGQRMBCYiLuy+zGuCWGm82M33NXq\njokYNmSJ4xFcWfbuwJ+z2AEgIgOBPVW1Iu5DxGnkvguwJpHYuwBB+um1F5Fka6YBQKaKjjeA00Vk\nVwARaZ7F51vASSLSMDHiOCGL7evAaSLSMulXRNoFiDsXE4ETEjE0AY4PsE++N+GimFS2ETgZOFdE\n0k5mKyCGbDaZ/hfE739w73MjEWmMizvXst7tAp5v+ZxD+cSMiHwX2KCqz+Dkix9mMf8SJwsFYQVu\nRN48cY8i2zk3Bjdh8VRcos/Gf4CrccdkAnAxMC2L/R+BG4CnyTJyTzAW6IuTMf+ZzTBx/+cq4Owc\nPmNDbEbuwCvAxSIyG5iHu4TLxVzgMhF5DHfJ/mA6I1V9X0RuBf4tIltwJ9fPM9hOE5HRuFHGCuCd\nTE+ubrLXDcCr4tofb8LN5M2waJjbLdeLUtWpIjIOp9uuSMSyLtduufyWwl5VN4jI8bhj8qWqvlSk\nT83wd7b9gxzjaSLyOG6GtgIPq+r0HLvNI9j5FvgcyifmBN2Au0VkK+58y1jqqKqrRWSiiMwAXlbV\njOstJEbZN+OOx8dAxsmIic9TU+BjVV2RI97/ANcBbyfOjQ24RP8tROQcYJOqjkp8niaKSB9VrcoQ\nx+bEFf4aTQjqWbgMVz30priah6mqelGOfbzGJjHFFBFprKrrRWQn3IfhFwm9sxyxtMB9GPYux/P7\ngLgWGy+parcC9h0KfKmq94QfWd0l8QXwLnCaqn6Qy77SKOvIPSGDvM62UYkk/j4y4M2PuszDiTKw\nhsDjZUzs3wWqcBJAXcdGSp4gIvsCLwF/r4uJHWzkbhiGUZHE6YaqYRiGERBL7oZhGBWIJXfDMIwK\nxJK7YRhGBWLJ3TAMowKx5G4YhlGB/H9iNwPLmhLUfwAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "pi,mixing_ = simulate_markov(M_char,verbose='on')\n", "plt.plot(pi);\n", "plt.xticks(range(27),chars);\n", "plt.grid()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Markov chain as a generative model?!!\n", "## What is the next probable characters for a give charachter" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "e\n", "_\n", "t\n", "h\n", "e\n", "_\n", "t\n", "h\n", "e\n", "_\n", "t\n", "h\n", "e\n", "_\n", "t\n", "h\n", "e\n", "_\n", "t\n", "h\n", "e\n" ] } ], "source": [ "# To see if we can generate something\n", "n_state = M_char.shape[0]\n", "ind_initial = np.random.randint(0,n_state,size=1)\n", "print chars[ind_initial[0]]\n", "ind = ind_initial[0]\n", "for i in range(20):\n", "\n", " \n", " # If we take the most likely next chars, it quickly falls in a loop?!!\n", " ind = np.argmax(M_char[ind])\n", " \n", " \n", " # If we take the next char based on a random choice based on the probabilites \n", "# ind = np.random.choice(range(M_char.shape[0]),size=1,p=M_char[ind])[0]\n", " \n", " print chars[ind]" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "s\n", "_\n", "a\n", "r\n", "o\n", "n\n", "o\n", "n\n", "s\n", "_\n", "w\n", "a\n", "t\n", "o\n", "_\n", "e\n", "r\n", "e\n", "_\n", "w\n", "a\n" ] } ], "source": [ "# To see if we can generate something\n", "n_state = M_char.shape[0]\n", "ind_initial = np.random.randint(0,n_state,size=1)\n", "print chars[ind_initial[0]]\n", "ind = ind_initial[0]\n", "for i in range(20):\n", "\n", " \n", "# If we take the most likely next chars, it quickly falls in a loop?!!\n", "# ind = np.argmax(M_char[ind])\n", " \n", " \n", " # If we take the next char based on a random choice based on the probabilites \n", " ind = np.random.choice(range(M_char.shape[0]),size=1,p=M_char[ind])[0]\n", " \n", " print chars[ind]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## at the char level, it is highly unlikely to expect something interesting" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# World level Markov chain\n", "## with more depth: higher order Markov chains" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0\n", "\n", "Ms\n", "1\n", "\n", "\n", "2\n", "\n", "I could be a bad thing? Never Again: It has done out what was still want to changed, or surprises like treasha! Congrats to leave from the grasp some extent\n", "3\n", "\n", "My own \"Western\" (i like a vivid matte how timeless \"The Amanda Bynes seems to ensure the Cameron said that most TV I love story of film-making team is my head has hope the film to modify the end we love it looks nice to create a quest for Catwoman (the delicious Paris with a very extensively, it had directed by everyone\n" ] } ], "source": [ "# codes from https://github.com/codebox/markov-text\n", "import sys\n", "\n", "sys.path.insert(0, './markovtext')\n", "\n", "from db import Db\n", "from gen import Generator\n", "from parse import Parser\n", "from sql import Sql\n", "from rnd import Rnd\n", "import sys\n", "import sqlite3\n", "import codecs\n", "\n", "\n", "\n", "\n", "\n", "SENTENCE_SEPARATOR = '.'\n", "WORD_SEPARATOR = ' '\n", "\n", "args = ['','gen','IMDB2','2']\n", "\n", "if (len(args) < 3):\n", "\traise ValueError(usage)\n", "mode = 'gen'\n", "name = './markovtext/IMDB_N2'\n", "count = 4\n", "\n", "\n", "if mode == 'parse':\n", " \n", " depth = 2\n", " file_name = './Data/IMDB_data/pos.txt'\n", "\n", " db = Db(sqlite3.connect(name + '.db'), Sql())\n", " db.setup(depth)\n", "\n", " txt = codecs.open(file_name, 'r', 'utf-8').read()\n", " Parser(name, db, SENTENCE_SEPARATOR, WORD_SEPARATOR).parse(txt)\n", "\n", "elif mode == 'gen': \n", " db = Db(sqlite3.connect(name + '.db'), Sql())\n", " generator = Generator(name, db, Rnd())\n", " for i in range(0, count):\n", " print \"{}\\n\".format(i)\n", " print generator.generate(WORD_SEPARATOR)\n", " \n", "\n", "else:\n", "\traise ValueError(usage)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# As it can be seen it is far from reasonable\n", "### In fact, the problem of \"generative models\" is still an open question, unlike \"discriminative models\"\n", "# So, what can we do with this relational representation?\n", "# \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Representation Learning\n", "\n", "## Representation of objects based on a priori given features \n", "### Set theoretical defnition of universals --- > Abstract Universals\n", "![](Images/FeatureBasedRepresentation.jpg)\n", "# Representation of objects based on its context: Relational Representation\n", "### Category theoretical defnition of universals --- > Concrete Universals\n", "![](Images/RelationalRepresentation.jpg)\n", "\n", "### This is aslso called Distributional Semantic Models with references to (de Saussure, 1966; Harris, 1951 and Wittgenstein, 1963; Firth, 1957) \n", "sources from https://www.inf.uni-hamburg.de/en/inst/ab/lt/publications/cogalex-invited-biemann.pdf\n", "\n", "\n", "## Now when we look at the original idea of Markov in dealing with stochastic systems, it is pure relational \n", "\n", "## This is the case in many applications:\n", "* pixels in an image and their neighboring cells\n", "* A house and its neigborhood\n", "* A person and his friends\n", "* an ingridient in a food recepie\n", "* ..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# co-occurance matrix\n", "### Now if we take each row as the representation of each char, we can assume that we have a proper representation. \n", "### In fact, we have a matrix form for our data, where each object (e.g. char or a word) is being represented based on its normalized co-occurance matrix \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Char markov chain" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the selected char: i\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEACAYAAABI5zaHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJztnXmYFOXVt+8zgKgwCiqyuIA7CipuaFRw8prEPWKMC65k\nMWpEYzSJ8VUjqEk0iUaNihJN3EX9XOIeXKYVUAwqKAMoqBlUUARloEGWgTnfH0/3O83QS3V3VXdV\nzbmva67u6j51+lRP9emnf895TomqYhiGYcSLmmoHYBiGYfiPJXfDMIwYYsndMAwjhlhyNwzDiCGW\n3A3DMGKIJXfDMIwY4im5i8jhIvK+iMwWkUuyPH+KiLyb+psoIntkPNeYenyqiPzHz+ANwzCM7Eih\nOncRqQFmA4cC84EpwMmq+n6GzQHALFVdIiKHA6NU9YDUcx8D+6jq4oCOwTAMw2iDl5H7YGCOqs5V\n1WZgHHBspoGqTlbVJanNycBWGU+Lx9cxDMMwfMJL0t0K+DRj+zPWTd5t+SnwfMa2Ai+KyBQROav4\nEA3DMIxi6einMxH5NvAj4OCMhw9S1c9FpAcuyc9S1Yl+vq5hGIaxLl6S+zxg24ztrVOPrUNqEnUs\ncHimvq6qn6duF4rIEziZZ73kLiLW5MYwDKNIVFWyPe5FlpkC7CgifUVkA+Bk4KlMAxHZFngMOF1V\nP8p4fGMR6Zq63wX4HtCQJ0hPf1deeWVsbcMSR9RswxJHGGzDEkcYbIP2Xe2/fBQcuavqWhEZCYzH\nfRncpaqzRORs97SOBa4ANgNuExEBmlV1MNATeCI1Ku8IPKCq4wu9ZiEaGxtjaxuWOKJmG5Y4wmAb\nljjCYBu07zDjSXNX1ReAXdo8dkfG/bOA9SZLVfW/wKAyYzQMwzCKpMOoUaOqHQMAo0ePHuU1lm7d\nutGvX79Y2oYljqjZhiWOMNiGJY4w2Abtu9qMHj2aUaNGjc72XMFFTJVCRDQssRiGYUQBEUHLmFAN\nHYlEIra2YYkjarZhiSMMtmGJIwy2QfsOM5FM7oZRDskkzJjhbg0jrpgsY7QrkknYay+YOxcGDIAJ\nE6C2ttpRGUZpxE6WMYxSaWiAjz+GNWtg5kw3gjeMOBLJ5B4GLS+KGmGcbb3aDxwIG20ENTUJdtvN\njd79jCMMtmGJIwy2QfsOM772ljGMsFNbC1ttBbvuCvffb5KMEV9MczfaHX36wOmnw3XXVTsSwygP\n09wNI4OmJvdnGHEmksk9DFpeFDXCONt6tV+9GlasgNmzg4kjDLZhiSMMtkH7DjORTO6GUSpLUtcL\nsxp3I+6Y5m60K+bMgZ13hv32g//Y5dqNiGOau2GkaGpyFTKmuRtxJ5LJPQxaXhQ1wjjberVfsgT6\n9YMFC4KJIwy2YYkjDLZB+w4zkUzuhlEqTU3Qty8sWwamAhpxxjR3o11x553wxhvw0EOwcCF06VLt\niAyjdExzN4wUS5ZAt27Qvbvp7ka8iWRyD4OWF0WNMM62Xu2bmmDTTaFTpwSLF/sfRxhswxJHGGyD\n9h1mIpncDaNU0iP3rl1t5G7EG9PcjXbFGWfAd74DjzwC55wDRx9d7YgMo3RMczeMFGlZpls3PMsy\nhhFFIpncw6DlRVEjjLOtV/u0LLN8ecKzLBOG47NzqDTboH2HmUgmd8MolaYm09yN9oFp7ka7om9f\nePVVeOwxmD8frr++2hEZRumY5m4YKdKyjGnuRtyJZHIPg5YXRY0wzrZe7FtaXKvf2lr47DPT3NuD\nbdC+w0wkk7thlEIy6bT2Dh2sM6QRf0xzN9oNc+fC0KHu9p134Cc/galTqx2VYZSOae6GQWuNOzjN\n3UbuRpyJZHIPg5YXRY0wzrZe7NOTqQAzZpjm3h5sg/YdZiKZ3A2jFNI17gAbb+w0+JaW6sZkGEFh\nmrvRbrj3XnjxRbjvPre96aZOf08nfMOIGqa5GwbryjJgursRbyKZ3MOg5UVRI4yzrRf7zAnVRCLh\n+YIdYTg+O4dKsw3ad5iJZHI3jFLI1NzBVqka8caT5i4ihwM34r4M7lLV69o8fwpwSWozCfxcVd/z\nsm+GD9PcjUD56U/hgAPcLcCwYXDmmXDccdWNyzBKpSzNXURqgFuAw4ABwHAR6d/G7GNgqKruCVwD\njC1iX8OoCJmyDNh1VI1440WWGQzMUdW5qtoMjAOOzTRQ1cmquiS1ORnYyuu+pRAGLS+KGmGcbb3Y\nZ06oJhIJzxOqYTg+O4dKsw3ad5jxkty3Aj7N2P6M1uSdjZ8Cz5e4r2EEhmnuRnuio5/OROTbwI+A\ng0vZf8SIEfTr1w+Abt26MWjQIOrq6oDWb9S6ujrq6urW2W77fDnbaQrZpx/z4r/YeON+fEHE68V+\n/nzYdNPW41u4MEFNTTSOL/2YV//F2Eft+ML0ear0dvp+Y2MjhSg4oSoiBwCjVPXw1PZvAc0yqboH\n8BhwuKp+VMy+qedsQtUIlC23hIYGdwtuUdNLL7lbw4gi5S5imgLsKCJ9RWQD4GTgqTYvsC0usZ+e\nTuxe9y2FtqOCONmGJY6o2RayV12/zt2rLBOG47NzqDTboH2HmYKyjKquFZGRwHhayxlnicjZ7mkd\nC1wBbAbcJiICNKvq4Fz7BnY0hpGDFStcH/fOnVsfsxWqRpyx3jJGu+Dzz2Hvvd1tmunT4ZRT3K1h\nRBHrLWO0e9rWuINVyxjxJpLJPQxaXhQ1wjjbFrJv2zQsrblbnXu8bYP2HWYimdwNo1ja1riDu57q\nypXQ3FydmAwjSExzN9oF48bBE0/Aww+v+/gWW8D777tbw4gaprkb7Z62skwa092NuBLJ5B4GLS+K\nGmGcbQvZt5Vl0rZedPcwHJ+dQ6XZBu07zEQyuRtGsWSrlgHrDGnEF9PcjXbBz38OAwe620xOOMH9\nnXhideIyjHIwzd1o9+QaudsqVSOuRDK5h0HLi6JGGGfbQva5NHcvskwYjs/OodJsg/YdZiKZ3A2j\nWPJVy9jI3Ygjprkb7YIBA1yN+8CB6z5+222ut8yYMdWJyzDKwTR3o91jI3ejvRHJ5B4GLS+KGmGc\nbQvZm+bePm2D9h1mIpncDaMYmptdD5kuXdZ/zlaoGnHFNHcj9nz1Fey8s7tty6xZcNxxrr+MYUQN\n09yNdk22jpBpbIWqEVcimdzDoOVFUSOMs20++2wLmNK21lsm3rZB+w4zkUzuhlEMuSplADbc0N2u\nWFG5eAyjEpjmbsSexx+H++5z/dyz0asXTJ0KvXtXNi7DKBfT3I12Tb6RO5jubsSTSCb3MGh5UdQI\n42ybzz7bhGqmbaFyyDAcn51DpdkG7TvMRDK5G0Yx5OoImcZWqRpxxDR3I/ZceCH06+dus3HKKXD0\n0e7WMKKEae5GuyZfnTvYyN2IJ5FM7mHQ8qKoEcbZNp99vjp3MM09zrZB+w4zkUzuhlEMhaplbORu\nxBHT3I3Ys9decNddsPfe2Z//+9/hP/9xt4YRJUxzN9o1XjR36wxpxI1IJvcwaHlR1AjjbJvPPpss\n01ZzzyfLhOH47BwqzTZo32EmksndMLyiCkuXwiab5LaxFapGHDHN3Yg1yST06eNuc/Hhh3DYYfDR\nR5WLyzD8wDR3o91SSG8Hq5Yx4kkkk3sYtLwoaoRxts1ln6v1QFvNfckSJ+GUG0cYbMMSRxhsg/Yd\nZiKZ3A3DK4Vq3AE6doSNNoJlyyoTk2FUAk+au4gcDtyI+zK4S1Wva/P8LsA/gb2B/1XVGzKeawSW\nAC1As6oOzvEaprkbvvPMM3D77e42H9tsA5MmwbbbViYuw/CDsjR3EakBbgEOAwYAw0Wkfxuzr4Dz\ngT9ncdEC1KnqXrkSu2EERaGOkGnipLsnk/DGG/knkY3440WWGQzMUdW5qtoMjAOOzTRQ1UWq+jaw\nJsv+4vF1PBMGLS+KGmGcbXPZ55Jl2trmK4cMw/F5tU0mYZ994OCDEwwZ4i3BVzvmIG2D9h1mvCTd\nrYBPM7Y/Sz3mFQVeFJEpInJWMcEZRrkUM3KPwyrVhgaYMwdaWmDmTJgxo9oRGdWioOYuIscDh6nq\nz1LbpwGDVfWCLLZXAsk2mntvVf1cRHoALwIjVXViln1Nczd85ze/gS22cLf5OOMMOPRQOPPMysQV\nFF9+CT17uvt77AETJ0JtbXVjMoIjn+be0cP+84DMaaatU495QlU/T90uFJEncDLPeskdYMSIEfTr\n1w+Abt26MWjQIOrq6oDWn0u2bdvFbDc11bHjjoXtly9PMGUKnHlmuOIvdnubberYZhuoqUlwxBFQ\nWxuu+Gy7vO30/cbGRgqiqnn/gA7Ah0BfYANgGrBrDtsrgYsztjcGuqbudwEmAd/Lsa96pb6+Pra2\nYYkjara57E84QXXcuMK2v/ud6qhR5cdRbdvXXlM98EDVa6+t14EDVdeurU4cYbEN2ne1SeXNrLm7\noOauqmuBkcB4YAYwTlVnicjZIpKWanqKyKfAL4HLROQTEekK9AQmishUYDLwtKqOL/yVYxj+4KXO\nHeKjuc+bB1ttBYMHw4YbwhNPVDsio1pYbxkj1uy/P9x8s7vNxz//Ca++CnffXZGwAuOGG+CTT+DG\nG+Hpp+Hyy2HqVKix5YqxxHrLGO0Wr9UycekMmR65g7vod4cO8NRT1Y3JqA6RTO6Zkwtxsw1LHFGz\nzWXvtc49nywThuPzajt/vuuCmUgkEIHf/Q6uuip335yg4giLbdC+w0wkk7theKW9rVDNHLkDHHus\nq3l/9tnqxWRUB9PcjdiycqVL7CtXgmRVJVuZOxeGDnW3UWbHHeG552DnnVsfe+wxuPZad53YQu+D\nES1MczfaJWlJxktCi8PIXdWN3Pv0Wffx445zX3AvvFCduIzqEMnkHgYtL4oaYZxts9nnk2Ta2tbW\nupa/a9eWF0c1bZuaYIMNoGvXde1rauCKK2D06Ozae1SOrxTboH2HmUgmd8Pwgtcad3AJcJNN3D5R\npa3ensnxx7tryb74YmVjMqqHae5GbBk/Hv7yF3frhe23h5decrdRZPx4+NOf3DFk46GH4JZbXL8Z\n097jgWnuRrvEa6VMmqivUs03cgc48URYtAheeaVyMRnVI5LJPQxaXhQ1wjjbZrPPJ8tk851rUjUM\nx+fFdv781uSezb5DB7di9aqrgo0jTLZB+w4zkUzuhuGFpibvmjtEf5VqtkqZtgwf7r4EXn21MjEZ\n1cM0dyO2XHaZu/D15Zd7s//JT+DAA91tFBk2zPWjP+64/HZ33w333mvyTBwwzd1olxRTLQPx0NwL\njdwBTj0VGhthwoTAQzKqSCSTexi0vChqhHG2zWZfTJ075JZlwnB8XmwzJ1Tz2Xfq5H7VpLX3qBxf\nKbZB+w4zkUzuhuGFUkbuUdXc16yBhQtbL7FXiNNPd9daff31YOMyqodp7kZsGTIE/vAHd+uF+++H\n55+HBx4INq4gmDcP9t0XPv/c+z5jx8Kjj7oR/MCBdq3VKGKau9EuKaXOPaoj93Sr32I4/nhIJFzD\ntCFDIJkMJDSjSkQyuYdBy4uiRhhn22z2xda5R1lzb7uAyYvv2bNdL501axLMnAkzZpQfR9hsg/Yd\nZiKZ3A3DC8XWuUe5WqaUkfvAgbD55q4VwW67wYABwcRmVAfT3I1Ysnat65DY3Oz9+qHz5sF++7lE\nGTUuu8xdEPuKK4rb7+qrYeZMp7+b5h49THM32h1Ll7ouj8VcGDrKK1RLGbkDbLedu7XEHj8imdzD\noOVFUSOMs21b+0KTqdl8b7SRKylctar0OKKkuYP7Qpg1y784wmYbtO8wE8nkbhiFKFZvB6c9R7Vi\nptSRe+/e8NVX/sdjVB/T3I1YkkjAqFHuthh23hmefhp22SWAoAKke3f48EM3QVoMS5bA1ltbGWRU\nMc3daHcUW+OeJoq6+zffwIoVsNlmxe+7ySZu8tmSe/yIZHIPg5YXRY0wzrZt7Qu1HsjlO1s5ZBiO\nL59tWpLJvLqSV98i0L17wvPK1rC/F5X2HWYimdwNoxClaO4QTc29VL09zWabFde2wIgGprkbsWT0\naCc3tL3qUCHOOQcGDXK3UeGhh+DJJ+Hhh0vb/8QTXQ/44cP9jcsIHtPcjXZHsR0h07THkXufPjZy\njyORTO5h0PKiqBHG2batfSFZJk6ae7YLYxfj+5tvEp5X5Yb9vai07zATyeRuGIUotVqmPY7ct9jC\nRu5xxDR3I5Yceij87/+622J4+GF4/PHS9etqMGQIXHMNHHJIafu/9JLre2/XVI0eprkb7Y5yRu5R\n6wzph+YexWZpRn4imdzDoOVFUSOMs21b+3Lq3NvKMmE4vly2qtmTezG+P/rI6txL9R1mIpncDaMQ\npda5R22F6uLF0LkzdOlSuo+uXWH1ali+3L+4jOpjmrsRO1RdL/dvvoFOnYrb98sv3UUsvvwymNj8\nZvp0OPlkb1dRysf228P48bDjjv7EZVSGsjV3ETlcRN4XkdkickmW53cRkddFZKWIXFTMvobhN998\n45J7sYkdWmWZqIwz5s0rT29P07u36e5xo2ByF5Ea4BbgMGAAMFxE+rcx+wo4H/hzCfsWTRi0vChq\nhHG2zbT3Isnk8p3+Uvjmm9LiqLTt/Pnr17iX4tvrQqYwvxfV8B1mvIzcBwNzVHWuqjYD44BjMw1U\ndZGqvg2sKXZfw/CbUitl0kSp1j3bAqZSsJF7/CiouYvI8cBhqvqz1PZpwGBVvSCL7ZVAUlVvKGFf\n09wNX3j9dfjVr9xtKQwY4OrcBw70N64gOPdcF+d555Xn59pr4euv4U9/8icuozJYnbvRrrCRe/H0\n7m2rVONGRw8284BtM7a3Tj3mhaL2HTFiBP369QOgW7duDBo0iLq6OqBVC6urq1tHF8v2fOZ2233y\n2U+bNo0LL7wwr7/09o033pgzvnLijfvxBRVvpn1TE6xalSCRKO34uneHCRMSrFkTnuPLFe/8+XX0\n6VP+/2/hwnTFTbiOL8yfp2psp+83NjZSEFXN+wd0AD4E+gIbANOAXXPYXglcXOK+6pX6+vrY2oYl\njqjZZtrfdpvqOeeU7vvUU1Xvu6+0OCpt26uX6mefle97+nTVXXctPY6w2gbtu9qk8mbW3O2pzl1E\nDgduwsk4d6nqtSJydsrxWBHpCbwF1AItwDJgN1Vdlm3fHK+hXmIxjEL88Y9uheq1Wc+0wowc6a6h\nev75/sblN2vWwEYbuUvsdfTyGzwPX3/tat2jIkcZjnyau6dTQlVfAHZp89gdGfcXANt43dcwgqTU\nXu5porJK9YsvoEeP8hM7uGNeudKVgG68cfn+jOoTyQnVTP0pbrZhiSNqtpn25dS5w/oTqmE4vmy2\n+RqGFetbxNukaljfi2r5DjORTO6GkQ8/qmWi0BnSr0qZNFYxEy+st4wRO444Ai64wN2WwmOPwQMP\nuL7uYebWW6GhAcaM8cff8cfDSSe5a6oa0cDq3I12RakdIdNERXP3e+Ru11KNF5FM7mHQ8qKoEcbZ\nNtPeiyxTSHPPlGXCcHxBa+5gmnspvsNMJJO7YeSj3GqZqKxQDWLkbv1l4oNp7kbs2HhjWLiw9AtY\nLF7sar7DPqk6YACMGwe77+6Pv3//G/7yF3jxRX/8GcFjmrvRbli9Gpqby6vV3mQTSCahpcW/uILA\nRu5GPiKZ3MOg5UVRI4yzbdo+LclI1rGMN98dOrhR/9KlxcdRKdvly2HVKjf565dv09xNczeM0FJu\njXuasFfMpCdTC32JFcPmm7svjRUr/PNpVA/T3I1Y8dZbcM457rYcBg2Cu+92t2Hk1Vfh8sthwgR/\n/fbtC4kEbLedv36NYDDN3Wg3lFvjnibsq1T91tvT2BWZ4kMkk3sYtLwoaoRxtk3be5VlCvnOLIcM\nw/G1tc1X416O70ILmcL4XlTTd5iJZHI3jFyUW+OeJuyae5Ajd1ulGg9MczdixfXXu8R3ww3l+fnl\nL2Hbbd1tGDnpJBg2DIYP99fv738Py5a5nvhG+DHN3Wg3mOZeHjZyjw+RTO5h0PKiqBHG2TZt71WW\nKeQ7U5YJw/G1tZ03LzjNPd+Eahjfi2r6DjORTO6GkQu/6tzD3F9GtfCEaqnYyD0+mOZuxIpjj4Uf\n/9jdlsNTT8Hf/w5PP+1PXH6yaBHstFMwstHChdC/P3z1lf++Df8xzd1oN/ipuYd15D5/fjB6O7hV\nqsmka21gRJtIJvcwaHlR1AjjbJu296vOPcyau5fJ1FJ919RAr165pZmwvRfV9h1mIpncjfiTTMKM\nGe62GPyqcw/7yD0IvT2N6e7xwDR3I3Qkk3DQQTB9uutVPmkS1NZ627dbN2hsLD/BJ5MuyS1bVp6f\nILjqKtfa+JprgvE/bBiccQb84AfB+Df8wzR3I1I0NMCsWe7+zJluBO+FlhaXlL1+EeSja1enOzc3\nl+/Lb4IeuVtf93gQyeQeBi0vihphVGwHDoQttwRIsPHG7opDXnjuuQRdu7p+7OXGIeK0+yVLwve+\nBam5Q35ZJmzvRbV9h5lIJncj3tTWwre+BSee6JLsypXe9lu2zJ8a9zRhXaVqI3fDC6a5G6GkXz8Y\nPx5Gj4YDD4Tzziu8z3vvwWmnuVs/2HdfGDMG9tvPH39+0asXvPNOcAn+uefg5pvhhReC8W/4h2nu\nRqT4/HOnne+0E5x6KjzwgLf9/KpxTxPGzpDNzW6BkZOtgsFG7vEgksk9DFpeFDXCqNi++Sbsvz+8\n+mqC734XPvrI/RViwoSEZ1nGSxzpcsgwvW9ffOESe8eOwcVhmns8iGRyN+LN5MlwwAHufqdOTnv3\nMnpfvtzfkXsYNfeg9XaAHj3cRPLq1cG+jhEsprkHTDLpSvsGDvSnRK89cMgh7vqg3/2u237zTVd3\n/f77+S8I/be/wezZ7tYPfvMb2GILdxsWHn8c7r0Xnnwy2NfZemt4/XXX094IL6a5V4lk0k0GHnww\nDBlS/GrL9siaNfD22zB4cOtjgwe7GvZCF732qyNkmjCuUq3EyB1slWociGRyD4OW58W2ocEtwmlp\nSTBjhvfFOFE5viBsp0+Hvn1dkk7binibWG1oSHiWZbxq7osXh+t983qRjnLjyDWpGqb3Igy+w0wk\nk3tUGDgQNtzQ3e/Rw/tinPZMpt6eyamnwrhxbmSfC7/r3MNYLWMjd8MrprmnCEIbnz3bSTKXXgr3\n3+/kBiM/Z57p3rOzzlr/uf33d3Xvhx+efd8f/tBdW/SEE/yJ5fnnXb3388/7488PvvMdNwfwve8F\n+zpB968x/ME09wIkk7Dbbv5r4/fe6xbVnH8+fPop/Pe//viNM2+84VanZuO00/JLM351hEzTXqtl\nwEbuccBTcheRw0XkfRGZLSKX5LC5WUTmiMg0Edkr4/FGEXlXRKaKyH/8CDoIbXzePKeNe21UVchv\nSwvcd58biU6cmOAHP4BHH/Uv5jjafvUVLFgAu+6a3fakk9yVkZYvz77/J5/Ev87dNHfT3L1SMLmL\nSA1wC3AYMAAYLiL929gcAeygqjsBZwNjMp5uAepUdS9VHUwI6d+/tcTOL208kXCa7Z57uu2TToKH\nHy7fb5x580231D9X468tt3TVR//6V/bn/a5zD5vmvmyZk0r8PMZc2Mg9Bqhq3j/gAOD5jO3fApe0\nsbkdOCljexbQM3X/v8DmHl5Hq8Wbb6rutpvq7ber7rCD6tq15fs880zVG25o3V6zRrVnT9U5c8r3\nHVeuuEL1ssvy2zzwgOoRR2R/rkcP1QUL/ItnxQrVzp3981cuH3zgzs9K8Pnn7v00wk0qb2bNqV5k\nma2ATzO2P0s9ls9mXoaNAi+KyBQRyTJNVn3q691E1c9+5iZTX3yxPH/LlrnR5SmntD7WoQMcf7x3\naaY9kqtSJpNjj3WLa778ct3HVf2vc09XOq1Y4Z/PcqiU3g7uF+zixeHsZ294o0CHCl84SFU/F5Ee\nuCQ/S1UnZjMcMWIE/fr1A6Bbt24MGjSIuro6oFULq6urW0cXy/Z85nbbfbL7gwMPTHDTTdMYOfJC\nbrkFOnfObQ9w44035ozv8cehf/8Es2ZBz56tr73TTnDPPXVcemnueIM4vvT2tGnTuPDCC/P683J8\nQcT7yisJXn8dHnywcLzHHAPXXOPmMdL7jx+foKVlGp07+3t83bvX8dxzCTbfvLzj8+P/sXDhILba\nKv/r+/X/S/fpWbCgjq23ju75FuTnqRrb6fuNjY0UJNeQXteVZV7Q4mSZ90nJMm3srgQuyvE6nn+K\n1NfX+2a7erVqba3qokXOdvly1S22UP3oo9L9/s//qD7yyPq2a9ao9urlfl6XE3McbRsaVHfc0Zvt\n88+rDh687mPz56t2715+HG3p31/17rv991uK7XXXqV50UeXi2GcfJ1n67beStkH7rjbkkWW8JPcO\nwIdAX2ADYBqwaxubI4FntfXLYHLq/sZA19T9LsAk4Hs5XqdS78c6vPGG6p57rvvYr3+t+qtfleZv\n7lzVzTZzem02Ro5Uveaa0nzHmTvvVD3tNG+2zc1u/mL27NbHZs5U3WUX/+M64ADVSZP891sKv/iF\n6vXXV+71jj5a9cknK/d6RvHkS+4FNXdVXQuMBMYDM4BxqjpLRM4WkZ+lbJ4D/isiHwJ3AD9P7d4T\nmCgiU4HJwNOqOr7w74nKkUhA6pfP/3HuuXD33fDNN8X7u+8+18Uwrde25cQT4ZFHivcbd7zo7Wk6\ndoSTT1635t3vGvc0YeovM29e5TR3sL7uUcdTnbuqvqCqu6jqTqp6beqxO1R1bIbNSFXdUVX3VNV3\nUo/9V1UHqSuD3D29b7lk6k/l2tbXtyb3tO1227mFNA89VJxfVbjnHlfbnsv2oINg0SLX4bDUmONo\nmy255/N76qlu1W96UXNTE6xdW34cbenaFR55JOF5YVuQ79v8+d5q3P2KI1s5ZJjPoWr4DjOhWqFa\n6a6Jzc1uReTQoes/N3Kkax1bTEeEyZNdvfz+++e2qalxy+StaqaVJUvc6t099vC+z777ugqk//yn\n1UfXrv7GlUy6L/977w1HV08buRtFkUuvqfQfoHvuqbp0qS9SlCdef1110KDsz61dq7rzzqoTJnj3\nd/bZqr+g0QZ6AAAXW0lEQVT/fWG7iRNVBw707jfuvPii6pAhxe83erSbw1B1axR+9jN/43r9ddWa\nGjcz1amTm5+pFi0tqhtsoPrNN5V7zaeeUj3yyMq9nlE8lFnnXjG8Lv33i0xJpi01Ne6izLfc4s3X\nypVuNH766YVtv/UtV0M8c6bnUGNNMXp7Jqee6uYvmpv9r3EH10Ruxx3d/V13rW5Xz0WLoEsX2Gij\nyr2mjdyjTaiSu9cPkF8aWiIB3/52btszz4R//9tbj42nnoK99oJttilsW1PjOhfmkmbCoFVW0jZX\nci/kd4cdYPvt3aKzJUvg66/Li6MttbUwZQr07Jng+uu9dQsN6n37178SnvV2v+IwzT3ahCq5n3de\n5S5Ft3q109uHDMlts+mmMHw4jB2b2yZNtonUfFjVjEO19JE7uE6R99/vRu5+a+4Am2wCRx7pGpZV\nk0WLvE+m+sWWW8LXX+fvoW+El1D1c+/TR3n//cok+EmTXCved97JbzdjhmtNMHcubLBBdpsvvnC/\nOj77zP109kJLC/Tr53qFt+eLeMyZ0/r+lsLChW7l79Chrjnbqaf6Gx+4GIcOdf/fXE3NgubOO13b\nhX/8o7Kv27u3u7xhpb9YDG9Epp/7d74Dv/99ZV6rrSSTiwEDXOJ+/PHcNg88AMOGeU/s0CrNtPdO\nkeWM2sH1QDn4YHjhheC6Je60k0tur74ajH8veG316zd9+lh3yKgSquR+7bVuhDJnTn47PzS0bIuX\nctmOHLn+xGraNl3bfsYZxceQlmba/ngKg1ZZKdt8yd2r39NOc5OqL78cXD36ySfnX/dQql+vvPVW\noqgySL/i6N173TmnMJ5D1fQdZkKV3Hv3dpcQu+iiYF9n1SqXVPLp7Zl8//tONpg6df3npk2DpUvh\nkEOKj2PwYFdlM3168fvGhXJH7uB+gdXUwE03BVePfuKJ7tfb6tX++/ZCNTR3sL7uUSZUmruqsmqV\nK0H7299yXyuzXCZOhAsvdFqiV/7wB/j4Y/fLIpNf/tJN5F19dWmx/OY3Tstvj9eqXL7cTdp99VXu\ndg1eSC9EW7MGOnWC114r/wsjG0OGwG9/C0cd5b/vQuy9t5vY33ffyr7ulVe629GjK/u6hjcio7kD\ndO4MN97okm9Qo6RskkwhfvpTeOwxVz2QprkZHnwwvyRTiFzSTHvg7bdh993LS+zgBgMDBrjEvttu\nwU1Qe5Vm/CaZhMZG/+v4vWAj9+gSuuQObmS0/fZu9J6NcjW0XIuX8vndcks45pjWaoVEIsELL7hF\nLjvtVHwMafbZx404333Xm30xvsNuW0iS8eq3thYmTIAbb0wwYUJw9eg//CE880z+hnJ+v2/JpOtH\ntHhxgh/+0Lvk5FccbRcyhe0cqrbvMBPK5A7w17+6CdYvvvDX76pVrh+JV709k/PPh9tug7Vr3Xax\nte3ZEGm/Ne9+6O1pamvdqD3IMtqePd08yXPPBfcabWloaF3JPGtWZVdwg43co0zoNPdMfv1rp8f6\nWds7YYKbsJ0ypbT9998frrjCXah5++3dz+VyS/Deeccl+DlzWi/UHXdU3ajwjTdcvX9U+Mc/4Nln\nnURXCZJJ6NXLSZQDBuD5l4lfzJvnLlpubQjCSaQ090yuuMLVL5eaiLNRX++tvj0X6bLIcePchK8f\ntdV77eVus1XjxJVPPnEJvm/fakdSHMcdBy+95CqkKkFjo1slW19f+cQO7tfKwoWtv1aN6BDq5L7J\nJq5K5YIL3IrONOVoaPkmU734PeEENxF4xRUJTjihtBja0laaCYNWGbTt5MmugVq+Xyph0VYzbbt3\nd+fPk09WJoYxY+Dss2HNmkRRid2vODp2hM03b70geZjOoTD4DjOhTu7gKlFaWlz/kHJZudL9Cjj4\n4NJ9NDe7muqvv4arrvKvprq9Vc34qbdXmpNPdr/cgiaZdK9z1lnBv1Y+2i5kMiJCrl7Alf4jzzVU\nJ09W7dOn/F7vicT6F1YultdfV+3Y0f8e3y0tqjvtpDplivd9li518VSyB75fHHCA+39EkWXLVDfd\nVHXhwmBfZ8wY1eOOC/Y1vHDEEapPP13tKIxsEJV+7rnYf3/47nfL7ztTSn17W4KqqU5LM/ff7yYZ\nM38RtLS4n8VTp7pSvDvugEsucZcDHDo0HFcJKoZVq+C99yq/IMcvunRx8y1BTqqqOknm3HODew2v\nlNLXPZlc/zw2KkskkjvAH//Y2nemVA2tUHL34jfImuqjjoJbb4WDDkqwzTZOtujXz12gYbfdXNnl\nrbc6aWnhwnQ71oTni5yEQQNNJBJMnQo771y40VpYtNVstrmkGb9ieP11WLECDj20eL9+xgHrlkN6\nrc3/1rfg4IMTngceprn7T8dqB+CV3r3daPWCC+Doo93in2ImmPzQ29MEVVOt6kbpqrBsGYwYAd/7\nnhs5tV3FmUy6ev0ZM1xJZpTaBqcnU6PMEUfAj3/sRrRBXNd0zBg45xw3v1Nt+vRxPZS8kq7NV229\nulpU51ciTS69ptJ/5NHc0yxapNq5s2qHDlr09Vbr61X339+7fTVYutQdV6dO3o5v6VLVn/9c9fvf\nr0x8fnHSSar33FPtKMpnxAjVv/7Vf79ffuk0/a++8t93KTzxhOoxx3i3f+QR9zkV8WeuzMgNUdfc\n08ye7ept1651o4FiVuvlu15qWEhLPq+95q2mubYW/vIXtwjqzTcrE6MfRLlSJpOgqmb+8Q9XT7/Z\nZv77LoVierqvXAmXXup68Dz4oNteuTLY+IzsRCq5pycza2oSgLdGSmkNzcvFOcKgSdfWwsqV3mua\n33wzwZVXOsmqUBllGI7vscdcz/VC/XiK9VusvR+2hx7qOoV+/LF/flta3IR524nUamvu6QnVQrY3\n3OA+p8cdB716JTj11NbOkuXEUKpt0L7DTKSSe3pk+7e/uRPmlFNc29hCrFjhFh4ddFDwMVaDESNg\nwQK3mjfszJzpqp/i0GahY0fXTMzPq2n9+99uodR++/nns1x69fK2SvXTT11yv+GG1seuvNJdCL6h\nIdgYjfUJdW+ZfKjCT37irnr/6KP5J55eeQUuu8yVZsWVJ56AUaNcuWQYJuFyccklrv/9FVdUOxJ/\nmDDBtaTI7OpZDsccA8ce61pMh4ktt3Tlq7165bY56STo33/93u833+wuMD5+fDy+1MNEZHvL5EPE\nVRR8+SX87nf5bb1IMlFn2DDYeGOnc4aVZNJ9wPfYo9qR+MdBB7nmdunOjeUwd64rgRw+vHxfflNI\nd6+vd/M+l1yy/nPnnutG9c8+G1x8xvpEMrmndbHOnd2lzx58MHdSSyQSnhcvhUGTLtW3CFx3nRsR\nr1rln1+/bJNJV4Y6bVqCyy/3v/a5WHu/bGtq3Ig1PbFajt+xY931YLPV/1f7vUjr7tlsm5tdifL1\n17sBRlu/nTo5qebii3NfgCdsn6c4EMnknkmPHvDUU+7KTZMnr//8ypWumuTAAysfW6UZOtRNON9+\ne7UjWZ+Ghlbd9YMPKt+XPEiGD3fJvRyFc/VquOsuV9seRvL1db/tNifX/OAHufc/4gi3ovq224KJ\nz8hCrhrJSv/hoc49H88842pq585d9/GXXlI98MCyXEeK995T3XJL1SVLqh3JuvzhD6obbOD68hS7\nRiHstLSo7rCD6ltvle7joYdUv/1t/2Lym8suUx09ev3HFyxQ3WIL1ZkzC/toaHC2QffkaU8Qlzr3\nfBx1lLsIx/e/71Z3pvGjn0yU2H131/fkL3+pdiSt3HSTax0xdaqbgKxGX/IgESm/5j0sfWRykWvk\nfumlrnPrrrsW9jFggJOwRo3yPTwjG7myfqX/KGLkXl9fn/XxlhbVH/9Yddgw1bVr3WMDB9br+PHl\n+a2krR++GxtVN9tM9fPP/fVbiu1NN6lut52LKcgYgvTtxbahQXXrrVVffrl4vw0Nqr16qa5eXV4M\npdp7sX38cbcSOtN28mTV3r1z/0rM5nfhQjd6b2gINt5K+a42tIeRO7RW0Hz9NVx+uauB//DD9qG3\nZ9K3r2sydvXV1Y3jllvctXDr66N3xaViGTDA/Rp55pniOyGOGeN6tnfqFExsftB25N7S4kpAr73W\nXVTHK1ts4cqSL77Y/xiNdYlsnXs+Fi1yC2UOOsg1PJo0KV4ygBcWLXI1x5Mnw447Vv71b73VSUP1\n9dG6RmqpJJOu0+UXX7jbt97yds4tWwbbbuvq5LfZJvg4S+WTT9zn6dNP3fadd7o2CRMnFr+uYvVq\nJx/eeKObaDVKJ1+de9XlmPQfZU6otuXNN1Vratxf3CbwvHLNNa5JV6W59VbVvn1VP/648q9dLTIv\n4gKqBx2kOmlS4f3uuEP12GODj69cVq1yDe3WrlX9+mvVnj1V3367dH9PP63av39+KarSRPHiN5Qr\ny4jI4SLyvojMFpEsyxRARG4WkTkiMk1EBhWzb7F4qUVdu9aNKFpaotXv3E/fF17ompC9/ba/fvPZ\njhnj6u1fecWVvvnl1297v23TfY86dEiw++5w/PGuZn3oUHj++exlkvX1Cc8TqdV+LzbYwPVy+te/\nXC+jYcNg771L93vUUe6XSrpst9rnRTLpmtkV04M+7BRM7iJSA9wCHAYMAIaLSP82NkcAO6jqTsDZ\nwO1e9y2FaR6aS7c2GZvm+YpJXvwGbeun7y5d3Ord3/7WX7+5bO+4w2mw9fWux7xffoOw99s23fdo\n5MhpTJoEv/yl62J6zjlu1eZee7lqmsz+LE8+OY1k0l1lzM94i7X3atu7N/y//zeNceO8XRUtn18R\nt7Dp6qvdHJnXGJJJeOKJaUUl31y+W1rcNR5Gj3Yy7syZ0NIyzfNgMOx4GbkPBuao6lxVbQbGAce2\nsTkWuBdAVd8ENhWRnh73LZqmpqaCNukP249+1OS59M6L36Bt/fb9k5+4Ze0vvujdbzIJ06c3ef4A\nNTU1MXas+8C/8kruxO4l3lJtg/Tt1ba2Frp1a/q/c61jR9fc7t133Xtzyy2wyy5uJerChfDss02M\nGOFNsw7De9GjBzz6aBOXXgqbb16+34EDXeO10aO9xZBMuktK/u1vTUWNrjN9L17sGr2deab7sjrz\nTOfnT39y8wA1NU2+Xj6zmni5EtNWwKcZ25/hknYhm6087hsYtbWw9dbtbzI1k06dXGL59a/dz+hk\nsvX9WLPGdczM/Fu0CH70I/joIzdZ9uCDrsJho41a/zpmnDXJJLz0EjQ2ujUFO+xQjaMMNyJOhjjq\nKDfguOYaOO889/6PGwe/+EX4z9Fk0q1TaG6Gu+92jc38iHn0aDfxX1fnXqOmBj77bN2/Tz91tx98\n4KrfwH1h7rWXax3dp0/2v5494ZtvnO2oUfDyy+7+0KFw5JHusUzp8JBD3C+uv/41/P8PLwR1mb1A\ne781NjbG1jYI34cd5kYo777byL33ugS9cqX7WZqZtDfayD3mPkCNzJ7t+nLDul8ANTXOdsMNoakJ\nmpsb6d/ffZgqfWyV8O2n7ZAhLqm88gqk32Mvl6Gr9nvR0JAeKTcya5Z/MW+4oft7/PFGnnzSDUa2\n2cYNytK3e+zhvhi7d4ezz4ZZsxrZdVcnAy5d6nrezJ/vEvfzz7duL1zozueWlkYmTHC/mI44wp27\n2aithTVrGmOR2MFDKaSIHACMUtXDU9u/xc3QXpdhcztQr6oPp7bfBw4Btiu0b4aPcNRkGoZhRAjN\nUQrpZeQ+BdhRRPoCnwMnA22bkj4FnAc8nPoyaFLVBSKyyMO+eQM0DMMwiqdgclfVtSIyEhiPm4C9\nS1VnicjZ7mkdq6rPiciRIvIhsBz4Ub59AzsawzAMAwjRClXDMAzDP2LVWyYTEekrItMr8DpXishF\nPvq7QERmish9Pvos+r0QkYl+2pcYQwyWkoQXEdlURELci9Ioh9gm9xRR/FlyLvAdVT3dZ79FvReq\nenAA9sX+P6L4/wscEd+uRNod+LlPvoyQEankLiJPiMgUEZkuIl4uIdxJRO5PjYQfEZEN8/g+Q0Te\nFZGpInJPgTguE5EPROQ1YJcCtqeKyJsi8o6IjMn3wRSRMcD2wPMi8osCfq9ItXV4TUQe9PDroaOI\njBWRBhF5QUQ6F/Bf1Ki5GHsR2T71fuxTzGtk8dNXRGaJyD9T/4/7ReRQEZmY2t43xz4zvbwXInJR\n6lx7z8P/Ix2L1/Pt/86hQv+/lO/3ReSe1K+frfPYbiwiz6TO4/dE5IQ8Yf8RSP8v1qtgyxLD9Izt\ni0VkvasXi8gfReTnGds5f9mKyK9Sc3KIyF9F5OXU/W+LyP1Z7PdNfUY3EJEuqf/fbjl8j878n4nI\nNSJyfp7jOzv1nr0jIh+nY4k0uZrOhPEP6Ja63RCYDnTPY9sXaAEOSG3fBVyUw3Y34P20v/Tr5LDd\nG3gX6AzUAnPy+O2PqyTqkNq+FTitwDF+nO+4Ujb7Au8AnYCuwOxcMWS8F83A7qnth4FTCrzG0iL/\nN3ntUzG8B+ycin2gTz5XA7ultt8C7kzd/z7wRJ598r4XGf/nDYEuQAOwp0/nm+dzKMP3GmA/D+/Z\nD4A7MrZrC/1PPP5/17EFLgZ+l8VuEJDI2J4BbJXD5/7Aw6n7rwGTgQ7A74CzcuxzFfBnXFuTSwrE\n+3bqvgAfFvpcpWw7Aq8CRxZz/ofxL1Ijd+BCEZmGOwm2BnYqYP+JqqavrHo/kEs6+B/gUVVdDKCq\n+dZCD8EljVWqmsQl71wcivsgTxGRqanXybM4H3AnYqGf3QcB/1LVZlVdBjxdwB7gY1VNj7zeBvp5\n2MdvtgSexCXTBp98/ldVZ6buzwDSI67puA94rn0KvRcH4/7PK1V1OfA47n+fD6/nWzHnUJq5qjrF\ng9104LupEfTBKf8VQ1WnAT1EpJeI7AF8rarzcpi/DewjIrXAKuANYD/c+zMhxz5XA98F9gH+lCeO\nucAiEdkT+B7wTvrzXYCbgVdU9TkPtqEmqBWqviMih+CS4/6qukpE6nGjqny01WwrreEKcI+qXlbh\n183Gqoz7ayn83gXBEuAT3If3fZ98Zh5XS8Z2C7nP70q9F36eb8s9vaDqHBHZGzgSuEZEXlLVa3x4\n/TW4UXWafO/Zo8AJQC/cL6Ncsa4RkUZgBDAJ98vu27gmhLnOjy1wv1Y7pmJYkSeOO3Fl2b2Af+Sx\nA0BERgDbqGos5iGiNHLfFFicSuz9gQKLnwHoKyL7p+6fAuSq6HgFOEFENgMQke55fL4GDBORzqkR\nxzF5bF8GfigiPdJ+RWRbD3EXYhJwTCqGrsDRHvYpdhIuiEVlq4DjgDNEJOtithJiyGeT6zkvfifg\n/s8bikgXXNy5RpNptvV4vhVzDhUTMyLSG1ihqg/i5It8jXmTOFnICwtwI/LuqTmKfOfcI7gFi8fj\nEn0+JgC/wr0nE4FzgKl57G8HLgceIM/IPcWTwOE4GfPf+QxT8z8XA6cV8BkZIjNyB14AzhGRGcAH\nuJ9whXgfOE9E/on7yT4mm5GqzhSR3wOvisga3Mn14xy2U0XkYdwoYwHwn1wvrm6x1+XAeHHtj1fj\nVvJ+kifmgqM9VX1LRJ7C6bYLUrEsKbRbIb+VsFfVFSJyNO49SarqM2X61Bz38+3v5T2eKiJ341Zo\nKzBWVd8tsNsHeDvfPJ9DxcScYnfgzyLSgjvfcpY6qurXIjJJRN4DnlfVnNdbSI2yr8K9H58BORcj\npj5PtcBnqrqgQLwTgP8F3kidGytwiX49ROR0YLWqjkt9niaJSJ2qJnLE0Zz6hb9YU4J6Hs7DVQ/V\ni6t5eEtVf1Zgn1Bji5giioh0UdXlIrIR7sNwVkrvrEYsm+M+DFkuz9E+ENdi4xlV3b2Efa8Ekqp6\ng/+RtV9SXwBvAz9U1Y+qHU+lqerIPSWDvEzrqERS9w/1OPnRnhmbKgPrDNxdxcTeG0jgJID2jo2U\nQoKI7Ao8AzzWHhM72MjdMAwjlkRpQtUwDMPwiCV3wzCMGGLJ3TAMI4ZYcjcMw4ghltwNwzBiiCV3\nwzCMGPL/AeSipEDsJlLzAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# For each char\n", "ind_initial = np.random.randint(0,n_state,size=1)[0]\n", "\n", "print 'the selected char: {}'.format(chars[ind_initial])\n", "plt.plot(range(M_char.shape[0]),M_char[ind_initial],'.-');\n", "\n", "plt.xticks(range(M_char.shape[0]),chars);\n", "plt.grid();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Let's train a SOM with this matrix and see how it works" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Total time elapsed: 2.790000 secodns\n", "final quantization error: 0.003545\n" ] } ], "source": [ "import sompylib.sompy as SOM\n", "\n", "msz11 =20\n", "msz10 = 20\n", "\n", "X = M_char\n", "\n", "som_char = SOM.SOM('', X, mapsize = [msz10, msz11],norm_method = 'var',initmethod='pca')\n", "# som1 = SOM1.SOM('', X, mapsize = [msz10, msz11],norm_method = 'var',initmethod='pca')\n", "som_char.init_map()\n", "som_char.train(n_job = 1, shared_memory = 'no',verbose='final')\n", "codebook_char = som_char.codebook[:]\n", "codebook_char_n = SOM.denormalize_by(som_char.data_raw, codebook_char, n_method = 'var')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "array([[ 19, 13, 393],\n", " [ 17, 4, 344],\n", " [ 5, 0, 100],\n", " [ 0, 19, 19],\n", " [ 9, 7, 187],\n", " [ 19, 19, 399],\n", " [ 5, 9, 109],\n", " [ 6, 16, 136],\n", " [ 0, 9, 9],\n", " [ 19, 8, 388],\n", " [ 4, 13, 93],\n", " [ 8, 4, 164],\n", " [ 14, 0, 280],\n", " [ 4, 4, 84],\n", " [ 14, 14, 294],\n", " [ 19, 0, 380],\n", " [ 9, 0, 180],\n", " [ 10, 14, 214],\n", " [ 12, 4, 244],\n", " [ 9, 19, 199],\n", " [ 4, 19, 99],\n", " [ 15, 8, 308],\n", " [ 0, 5, 5],\n", " [ 0, 14, 14],\n", " [ 14, 19, 299],\n", " [ 11, 10, 230],\n", " [ 0, 0, 0]])" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# we projects all the vectors in SOM and visualize it \n", "xy = som_char.ind_to_xy(som_char.project_data(X))\n", "xy" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "([], )" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAV0AAAD3CAYAAAC+eIeLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAHZRJREFUeJzt3XtUVVXiB/DvPoCCKE+Nh8rLSU2p0ERxUAR0knLlqMHS\nQh11fA0Tg5YTS9NZOk6D5TilZooF4VSiTuVrqePPENQi8QFlaSkiXERFEwVKRbiwf3+QjAwX5F7w\nnHvh+1mLVZ192X0L/HLY5+xzhZQSRESkDkXrAERE7QlLl4hIRSxdIiIVsXSJiFTE0iUiUhFLl4hI\nRdZNDQoheD8ZEZEJpJTC0PEmS/eXT2z9NEREbZgQBvsWAJcXqAV0Oh0URcGMGTO0jkJkMVi61CJC\niCZ/qhNRfaKp5QMhhOTyAjVGr9fjwoULcHR0hJubm9ZxiMyGEKLRNV2WLhFRK2uqdJu9vLBt2zaE\nhITAyckJnTp1whNPPIEVK1agsrKy9ZKSReGaLrV1WVlZiIyMhIeHBzp27AgvLy/MnTsXV65cMXnO\nZpXuokWLMGnSJJw9exbR0dGIjY2tOx4REQG9Xm9yACIic5ScnIxhw4Zh//79CA8Px/z58xEYGIik\npCQMGjQIRUVFpk0spWz0A4D86quvpBBC+vj4yGvXrsl7qqur5XPPPScVRZEJCQmS2p+CggIphJDT\np0/XOgpRqzp37pzs0KGD7N27t7xy5Uq9sYMHD0orKys5YcKERj+/tloN9+oDz3STkpIghMDixYvR\nrVu3uuOKomDVqlUQQuD99983rfGJiMzQu+++C71ej7fffhvu7u71xsLCwjB27Fjs3r0bt27dMnru\nB26OyMnJqfsX/a9HH30UPXr0QH5+Pn766Sd06dLF6ABERObm6NGjAICMjAwcO3aswfi1a9dQXV2N\nc+fOYcCAAUbN/cDSLSsrAwB4eHgYHPfw8MDFixdRWlrK0iWiNqGkpAQA8I9//KPR1wgh8PPPPxs9\n9wNL19HREQBQXFwMX1/fBuP3ruLdex0RkaW712fl5eWwt7dv1bkfuKZ779Q5IyOjwVheXh6Kiorg\n6+sLBweHVg1GRKSVoKAgAMDhw4dbfe4Hlu6MGTMgpcTf/vY3XL9+ve54TU0NXnnlFUgpMXPmzFYP\nRkSklZdeegnW1taYP38+cnNzG4xXVVXhiy++MGnuBy4vDB06FK+++ipWrlwJf39/REZGwt7eHvv2\n7cPp06cxfPhwLFiwwKR/ORGROerTpw+Sk5Px+9//Hv3790dERAR69+6NqqoqFBYW4siRI3jkkUdw\n5swZo+d+YOkCwIoVKzBw4EC88847+PDDD1FVVYVevXrh9ddfx8svvwxr62ZNQ20QH3hDbVV0dDQC\nAgKwatUqpKen48CBA7C3t4enpyeioqIwceJEk+blsxeIiFpZqzx7gYiIWo7rAkQaKysrw1tvvdWs\nZZrp06fDy8tLhVT0sHB5gUhjOp0Ovr6+zSrd9PR0hISEqJCKWoLP0yUiUhHXdImIzARLl4hIRSxd\nIiIVsXSJiFTE0iUiUhFLl4hIRSxdI6WkpCAyMhK9evVCp06d4OjoiGHDhuHjjz/WOhoRWQDep2uk\nTp06wd/fH/7+/vDw8EBJSQn27t2LoqIiLFmyBMuWLdM6IhFpjJsjWlF+fn6Dd9DQ6/WIiIjAkSNH\nUFBQ0OhbGxFR+8DNEa3I0FsWWVtb449//CP0ej3S0tI0SEVEloIPvDHSxYsXsWLFChw8eBCFhYW4\nc+dO3ZgQApcuXdIwHRGZO5auEfLz8xEYGIiysjIMHz4co0ePhqOjI6ysrFBQUIBNmzbh7t27Wsck\nIjPG0jXCqlWrcPPmTaSkpGDKlCn1xrZs2YKUlBRtghGRxeCarhHy8vIAABMmTGgwlpGRwbetIaIH\nYukawcfHB0DDt6Pfv38/kpKS1A9ERBaHpWuEmJgY2NjYIDIyElOmTEF8fDzGjBmDMWPGICoqCry9\njogehGu6Rnj88ceRkZGBxYsXY+/evdDr9XjyySexfft2ODg4YOvWrVxiIKImcXMEEVEr4+YIIiIz\nwdIlIlIRS5eISEUsXSIiFbF0iYhUxNIlIlIRS5eISEUsXSIiFbF0iYhUxNIlIlIRS5dMdujQISiK\ngr/+9a8Gx318fODn56dyKiLzxtKlh4YP/yFqiKVLRKQili4RkYpYukREKmLpEhGpiKVLJlOU2m8f\nvV5vcLy0tFTNOEQWgaVLJnN2dgYAXLx4scHY+fPnUVZWpnYkIrPXrNLV6XRQFAUzZsx42HnIgvTt\n2xcODg7YuXMnrl+/Xne8oqICf/rTnzRMRmS+eKZLJrO2tkZcXBzKysoQEBCA2NhY/OEPf4C/vz9u\n3boFT09PrSMSmR2WLrXIsmXLkJCQADs7O7z33nvYt28foqKisH//ftjY2HCDBNH/aNa7Aet0Ovj6\n+mLatGlITk5WMR4RkeXhuwETtQP3X3vR6XSYNGkSunXrBjs7OwQGBmLPnj1aR7Qo48ePh6IoeOed\ndxqMLVmyBIqiYNasWUbPy9IlamMKCgowePBgFBYWYurUqZg0aRJOnz6NcePG4dChQ1rHsxjJycnw\n8vLCq6++im+++abueFpaGhISEuDv74+1a9caPS9Ll6iNOXToEGJjY5GZmYlVq1bhgw8+wI4dO1Bd\nXY2VK1dqHc9iODs7IzU1FXq9HhMnTsTt27dx9epVTJ48Gba2tti2bRtsbW2NnpelS9TGeHt747XX\nXqt37Omnn4aXlxeOHTumUSrLNHToUCxfvhy5ubmYPXs2pk6dimvXrmHt2rXo27evSXNat3JGItJY\nQECAwbtGevbsiaNHj2qQyLLFx8cjPT0dmzdvhhACL774IqZPn27yfDzTJWpjnJycDB63trZGTU2N\nymnahgkTJtT9fVxcXIvmYukSETUhNzcXCxYsgIuLCxRFwcyZM1FZWWnyfCxdIqJGVFZW1l1E27p1\nKxYuXIhTp05h3rx5Js/J0iUiasQrr7yCb775BvHx8Rg5ciSWLl2K4OBgJCYm4tNPPzVpTpYuEZEB\n27dvx7p16xAUFITly5cDqH2caWpqKpydnTFz5kzk5+cbPW+zS1cIwX30RGbuQX9O+We4eS5evIiZ\nM2fC2dkZW7ZsqXt2NAD06NEDycnJKC8vxwsvvNDo86Qb06xnLxARUfPx2QtERGaCpUtEpCKWLhGR\nili6REQqYukSEamIpUtEpCKWLhGRili6REQqYukSEamIpUtEpCKWrpHuf8fV3NxcTJw4EW5ubrCy\nssLhw4e1jkdEZo5v12Oi8+fPY8iQIejTpw8mT56MO3fuwMHBQetYRGTmWLom+vLLL7Fo0aK6R74R\nETUHnzJmJJ1OB19fX7i7u0On08HGxkbrSERkZviUsYfgySefZOESkdFYuiZyd3fXOgIRWSCWron4\nBH4iMgVLl4hIRSxdIiIVsXSJiFTE0jUB3xmZiEzF0jWSt7c3qqurkZSUpHWUdu/s2bNQFAUjR45s\n9DWPP/44OnbsiKtXr6qYjKhxLF2yWH369EFYWBgyMjJw/vz5BuOZmZk4ffo0xo0bBzc3Nw0SEjXE\n0iWLFhMTAyklNm7c2GBs48aNEEJgzpw5GiQjMozbgMmiVVdXw8vLC1VVVbh06VLdLsGysjJ4enqi\ne/fuOHfunMYpqb3hNmBqs6ysrDBr1iyUlJTg008/rTv+r3/9C3fu3OFZLpkdnumSxbt8+TK8vb0x\nbNgwpKenA6i9gHb+/HkUFRXB1dVV44TU3jR1pstHO5LF8/T0xNixY7Fjxw6cO3cO169fx+nTp/HC\nCy+wcMnssHSpTYiJicH27duxYcMG3Lx5kxfQyGxxeYHajL59++LHH39ERUUFvL29cebMGa0jUTvF\nC2nULsydOxc3b95ERUUFz3LJbPFMl9qM0tJSdO3aFR07dkRRURGcnZ21jkTtFM90qV34+uuvUVNT\ng6ioqHZduO+88w78/f1hZ2eHHj16IDY2FuXl5fDx8YGfn5/W8do9XkijNuPNN9+EEAIvvfSS1lE0\nExcXh7Vr18LT0xNz5syBjY0Ndu7ciaysLFRVVaFjx45aR2z3uLxAFu27777D7t27cfLkSXz22Wd1\nt461R1999RWCg4Px6KOP4tixY3B0dAQAVFZWIjQ0FEePHoWPjw8uXLigcdK2j8sL1GadPHkSixcv\nRlpaGiZOnIjk5GStI2kmOTkZQgi89tprdYULAB06dEBCQoKGyeh+PNMlaiMGDRqEnJwc5OXlwcfH\np95YdXU1bG1t0bNnT57pqoBnukTtQFlZGQAYfIyllZUVunbtqnYkMoClS9RG3FtSMPTA9urqaly/\nfl3tSGQAS5cM4u1FlmfgwIEAgEOHDjUYO3LkCKqrq9WORAawdMkgvgec5Zk2bRqklHj99ddx8+bN\nuuMVFRVYuHChhsnofrxPl6iN+PWvf43Y2Ni6zRGRkZF19+m6uLjAw8ND64gEnukStSmrV6/G2rVr\n4eTkhI0bN2LLli145pln8Pnnn6NDhw5axyM0s3R1Oh0URcGMGTNw9uxZjBs3Dq6urujcuTOGDx+O\nAwcOPOyc9JA0tmWULFdMTAxOnz6NO3fuoKioCGvWrEGXLl20jmVxdu3ahZEjR8LT0xO2trbo3r07\nQkNDsX79+hbN26z7dHU6HXx9fRESEoJTp07hiSeeQHBwMK5cuYKtW7fi7t27SE1NRVRUVIvCkLru\n3zJ6/6+iTk5OuHTpEjp27Mh7OtsQX19fCCH4NW2GjRs3Yu7cufDw8MBzzz2Hrl274tq1azh16hSk\nlMjKymry85u6TxdSykY/aoelLCgokEIIqSiKjI+Pl/c7efKktLGxkS4uLvKnn36SZBkyMzOlEEL2\n7t1blpaW1h2/e/euHDp0qBRCSF9fXw0TUmvz8fHh17SZnnrqKWlrayuvX7/eYKykpOSBn/9Ldxrs\nVaPWdB0dHbFkyZJ6xwYOHIjo6GiUlpZi+/btxkxHGuKW0faJd6U0n7W1NaysrBocd3FxadG8RpXu\nwIEDYW9v3+B4aGgopJTIyclpURhSz72vVUhISIOxYcOGGfxmI8uWn5+PvLw8rWNYhOjoaNy+fRv9\n+vXDyy+/jJ07d7ba5hKjStfQ9kIAcHd3B/DfbYhk/rhllKhx8+fPx6ZNm+Dj44O1a9diwoQJcHNz\nQ3h4OE6ePNmiuY0qXUPbCwGguLgYAOr9mkrmjVtGiZo2efJkZGZmoqSkBHv27MHMmTNx+PBhRERE\noKSkxOR5jSrd7Oxs3Lp1q8Hx9PR0CCEwYMAAk4OQurhllKh5HBwcEBERgcTEREybNg03btzA4cOH\nTZ7PqNItKyvDsmXL6h07ceIENm/eDCcnJ4wfP97kIJbi/nuWLRm3jBI1LiMjw+Dxe78ZdurUyeS5\njdoGHBISgqSkJGRlZSE4OBiXL1/Gtm3bIKVEYmIiOnfubHIQUhe3jBI1bvz48ejcuTOCgoLg4+MD\nKSWOHDmC48ePIzAwEKNGjTJ5bqPOdH19fZGZmQkXFxckJibik08+waBBg7Bv3z5ERkaaHIK08aAt\no7y9iNqrN954A4MHD0ZOTg7Wr1+PlJQU6PV6rFy5EgcPHmzR3T1G7UibNm1au347FAD8f0FED8R3\njlCBlBJxcXFQFAWRkZG4e/eu1pGIyAzx0Y6t4O7du3jxxRexY8cOxMbGYvXq1VpHIiIz1ezSFUJw\njc+AGzduYOzYsTh69CjeeOMNLFiwQOtIRGTGmlW63t7evG/TgMLCQowePRr5+fn46KOPMGnSJK0j\nEZGZ4/KCiX744QcMHToUt2/fxn/+8x+EhoZqHYmILAAvpJkoNzcXxcXF8PPz4048Imo2lq6Jnnvu\nOfz9739HTk4OwsPDcePGDa0jEZEFYOm2QHx8PN566y3k5OQgNDQU165d0zoSEZk5lm4LxcXFYcOG\nDTh9+jRGjBhR98Q1IiJDWLqtYPbs2fjggw+Qm5uL4cOH4+LFi1pHIiIzxdI1gaF7lqdOnYqPP/4Y\nhYWFGDFiBAoKCrQJR0RmrVnPXiAioubjsxeIiMwES5eISEUsXSIiFbF0iYhUxNIlIlIRS5eISEUs\nXSIiFbF0iYhUxNIlIlIRS5eISEUsXSJqdatXr0b//v1hZ2eHHj16IDY2FuXl5fDx8YGfn5/W8TTF\nt+sholYVExODDRs2oHv37pgzZw46dOiAXbt24dixY9Dr9ejQoYPWETXFB94QUav54osvEBISgr59\n+yIrKwtdunQBAOj1eowcORJHjhyBj48PLly4oHHSh4sPvCEiVaSkpEAIgddee62ucAHA2toaCQkJ\nGiYzHyxdImo1X3/9NQAgODi4wVhQUBCsrbmiydIlolZTVlYGAHBzc2swpigKXF1d1Y5kdli6RNRq\nHBwcAABXr15tMFZTU4OSkhK1I5mdZpWuTqeDoiiYMWPGw85D7dCaNWvQv39/dOrUCYqiYM2aNVpH\nIhMNGDAAQO0Ftf/11VdfQa/Xqx3J7PBMlzS1ZcsWzJs3D3Z2dpg/fz6WLl2KoKAgrWORiaZOnQop\nJV5//XWUl5fXHa+srMSiRYs0TGY+uKpNmtqzZw+EENizZ4/BdUCyLCEhIZg9ezbee+899O/fH88/\n/zxsbGywe/duODk5wdPTE4rSvs/12vd/PWnu8uXLAAxfeCHLtGHDBvzzn/9Ely5dkJiYiNTUVDz9\n9NM4cOAAysvL69Z92yujS/fs2bMYN24cXF1d0blzZwwfPhwHDhx4GNmoDVu2bBkURUF6ejqklFAU\nBYqiwMrKSuto1Ari4uJw5swZ3LlzB0VFRVizZg2Ki4vx888/47HHHtM6nqaMWl64cOEChg4diiee\neAJz587FlStXsHXrVjzzzDNITU1FVFTUw8pJbUxYWBiEEPjggw9QWFiIpUuXQkoJIQxu4iELcvXq\nVTzyyCP1vpa3b9/GvHnzIITAhAkTNEynvWZtA9bpdPD19YUQAn/+85+xYsWKutdkZ2cjKCgIXbp0\ngU6nQ+fOndXITW1EWFgYDh8+jOrqaq2jUCtZuHAhUlNTERoaCg8PDxQXFyMtLQ2XLl3Cs88+i927\nd2sd8aFrtW3Ajo6OWLJkSb1jAwcORHR0NEpLS7F9+/YWxCSituA3v/kN+vfvjwMHDuDtt9/GJ598\ngm7dumHlypXYsWOH1vE0Z9TywsCBA2Fvb9/geGhoKDZt2oScnBxMmTKl1cIRkeUJDw9HeHi41jHM\nllFnuo1dYXZ3dwfw3y2ARERkmFGla2hrHwAUFxcDqF1+ICKixhlVutnZ2bh161aD4+np6RBC1G0B\nJCLjcKt9+2FU6ZaVlWHZsmX1jp04cQKbN2+Gk5MTxo8f36rhiNoTIQRvmTMjD+sHoVEX0kJCQpCU\nlISsrCwEBwfj8uXL2LZtG6SUSExM5O1iRCbq3r07vv/+ey7RtQPNPtMVQsDPzw+ZmZlwcXFBYmIi\nPvnkEwwaNAj79u1DZGTkw8xJbRjP7mrfWaF3797cDm1GHtpblUkpG/2oHab7FRQUSCGEnD59uszL\ny5PPP/+8dHV1lV26dJFPP/20/O6776SUUv74449y1qxZ0sPDQ9ra2srAwECZnp6ubXgyW/d/X5H2\nli5dKoUQUlEUKYSo97Fp06YHfv4v3WmwV/mUMRPl5+djyJAh6NevH6ZPn46CggJ89tlnCAsLQ2Zm\nJiIiIuDo6IhJkybhxo0bSE1NxbPPPotz586hR48eWscnoiaEhYWhrKwMb7/9NgICAjBu3Li6sYCA\ngJZN3lgbS57pGnTvjERRFJmQkFBvbPny5VIIIV1cXGRMTEy9sQ8//FAKIeTLL7+sZlyyEDzTNT8t\n+ZqgiTNdPtrRRD4+PoiPj6937He/+x2A2gc2v/nmm/XGXnzxRVhbW9e9cR8RtU8sXRMFBAQ0uADk\n6ekJAOjdu3eD7dKKosDNzQ1FRUWqZSQi88PSNZGhW3vuPQu2sdt+rK2tUVVV9VBzEZF5Y+kSEamI\npUtEZMC931xb+1nPLF0iIgOcnZ0hhEBhYWGrzsv7dImIDLC3t8eQIUNw5MgRTJ48Gb1794aVlRV+\n+9vfwt/f3+R5WbomaOrBJA96aAm3vBJZjo8++gjz58/H/v37sWXLFkgp0bNnT5aumry9vZtc42lq\nLD8//2FEojagoqICANCxY0eNk9D9/Pz8sHPnzladk2u6RGbg7NmzAMAt4u0Az3SJNPTtt9/io48+\nwubNm2FlZcVnUrcDPNMl0lB2djbWrVsHd3d37Nq1C/369dM6Ej1kQjbxzEghhGxqnIiIGhJCQEpp\n8Ko5z3SJiFTE0iUiUhFLl4hIRSxdIiIVsXSJiFTE0iUiUhFLl4hIRSxdIiIVsXSJiFTE0iUiUhFL\nl4hIRSxdIiIVsXSJiFTE0iUiUhFLl4hIRSxdIiIVsXSJiFTE0iUiUhFLl4hIRSxdIiIVsXSJiFTE\n0iUiUhFLl4hIRSxdIiIVsXSJiFTE0iVN6XQ6KIqCGTNmaB2FSBUsXSIiFbF0iYhUxNIlIlJRs0rX\n19cXiqI0+sH1uObJysqCoih4/vnnG33NY489Bjs7O5SWlqqYzDzodDpMmjQJ3bp1g52dHQIDA7Fn\nzx6tY1mEW7duoUOHDhg+fHi94xUVFbC1tYWiKPj444/rja1fvx6KoiAlJUXFpJbl+PHjmDhxInr0\n6AFbW1t4enpi9OjR+Pe//23ynNbNedH8+fMNlsCuXbuQk5MDe3t7kwO0J0OGDEGfPn2wd+9e3Lx5\nE87OzvXGjx8/jrNnzyIqKgpOTk4apdRGQUEBBg8ejF69emHq1Km4ceMGtm7dinHjxuHzzz/HiBEj\ntI5o1uzt7TFkyBAcO3YMt27dqvsz+eWXX6KyshJCCKSlpSE6Orruc9LS0iCEwMiRI7WKbdbee+89\nxMTEwNraGmPHjsWjjz6Ka9eu4cSJE1i/fj2ioqJMm1hK2ehH7bBhBw4ckDY2NrJPnz6ypKSk0ddR\nfQkJCVJRFLlu3boGYzExMVJRFLlnzx4NkmmjoKBACiGkoihy+fLl9cb2798vhRByzJgxGqWzLH/5\ny1+koihy7969dccWLlwobWxs5KhRo6SXl1fd8ZqaGunq6ip/9atfaRHV7J05c0ba2NhIV1dX+f33\n3zcYv3TpUpOf/0t3Gu7VxgZkE6X77bffSgcHB/nII4/IvLw8E/+z2qeioiJpZWUlBw8eXO94ZWWl\ndHV1le7u7rK6ulqjdOq7V7q+vr6ypqamwbi3t7fs1q2bBsksz6FDh6QQQr7yyit1xwYPHiyDgoLk\nu+++KxVFkbm5uVJKKbOzs6UQQs6ZM0eruGbtpZdekoqiyNWrV5v0+U2VrtEX0oqLizFmzBhUVVVh\nx44d8PPzM+0Uu53q3r07Ro4ciRMnTuCHH36oO75r1y7cuHEDkydPhqK0v+ubAQEBEEI0ON6zZ0/c\nvHlTg0SWZ+jQobCzs0NaWhoAoLy8HNnZ2Rg1ahTCwsIgpawbu7e0EB4ermVks5WVlQUAiIiIaPW5\nRW0pNzIoROODRETUKCllw7MIPKB0671QCAXATgDPAlgkpXyj9eK1L0IIWwDFAH4C4AWgK4BLAL6V\nUj6lZTa1CSG8AeQDSJFSNrgNRgiRDiBESmmlejgLJISIB/B3AC8ACAYwE4CzlLJSCPEhgAgA3QGU\nALggpXxSs7BmTAhxDMBTAB6TUp5rzbmN+T12NYAxAJJYuC0jpawAsA2AJ4BRAKJReyfJJi1zUZuQ\nBkCg9vsqHECmlLLyvjEXAH8AYP/LP5NhR3/56zOtPXGzSlcIMQ/AHwH8H2q/YNRyKaj9w/E7AFMA\nVAHYrGUgahOyAZQB+C2AfqhfrAdR+z23EID85Z/JsPUAqgEsEUI89r+DQojupk78wPt0hRBuAFYB\nqAFwBsBiAxc8vpZS7jQ1RHskpcwUQpwHEAXABsAuKeV1jWORhZNS1gghMlBbuhL3la6UslAIkQeg\nFwA9gEOahLQAUsrvhRAxqC3fHCHETgC5AFwBBKL2B5tJNzg3Z3OE7S9/FQDiGnnNJtSu95JxNgH4\nK2p/oKVoG0VT8pePpsap+dIAjEVtMZwwMOYH4ISU8ie1g1kSKeX7QohvASwAMAK1P8iuAzgF4H1T\n5232hTQiImq59ndDKBGRhli6REQqYukSEamIpUtEpCKWLhGRili6REQqYukSEamIpUtEpCKWLhGR\nili6REQq+n+2kr0zSuNqEgAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "ax = plt.subplot(1,1,1)\n", "for i in range(len(X)):\n", " plt.annotate(chars[i], (xy[i,1],xy[i,0]),size=20, va=\"center\")\n", " plt.xlim((0,som_char.mapsize[0]))\n", " plt.ylim((0,som_char.mapsize[0]))\n", "plt.xticks([])\n", "plt.yticks([])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## In the char model, we can't make sure if it makes sense, maybe at the world level, it works\n", "### Nevertheless, we only represent each chars based on its relation with the next possible chars\n", "### And this means to loose lots of valuable data\n", "![](Images/RelationalMarkov.jpg)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## At the word level, defnitely we get better results, but..\n", "\n", "# We will have \"state space explosion\" with few amount of texts\n", "# Therefore, we need to perform some types of dimensionality reduction. \n", "### For example, to use PCA or similar methods (LSA,LDA,SVD,...) to reduce the dimensionality.\n", "# Problems: Scalability and speed, memory,...!\n", "\n", "# \n", "# \n", "# \n" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ " Neural embeddings!\n", "
\n", "# A game changer idea: do not build the co-occurence matrix explicitly!\n", "### Instead learn a classifier that predicts a word given a context around the word or the other way around.\n", "### (Bengio et al. 2003) and Mikolov et al., 2013 known as Word2vec\n", "\n", "### In word2Vec there are two main models\n", "###
CBOW
\n", "![](Images/CBOW.png)\n", "\n", "\n", "\n", "###
Skipgram
\n", "![](Images/Skip-gram.png)\n", "\n", "\n", "* ** We have v unique words**\n", "* **each word is a one-hot v dimensional vector**\n", "* **we have Two matrice: W1(vxn) ,W2(nxv)**\n", "* **n is the dimension we choose (50-1000)**\n", "* **read arrows as dot product**\n", "* ** there is an objective function that implies: given a word the network should predict the contexts and vice versa**\n", "* ** if one can write a parametric objective function, there are usually severalmethods to find an optimum value for it**\n", "* ** In majorty of machine learning methods they use Stochastic Gradient Descent (SGD) along chain rule (we discuss them in detail in one session**\n", "* **After training the W1 has amazing features**\n", "\n", "** Details about training: http://www-personal.umich.edu/~ronxin/pdf/w2vexp.pdf**" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": true }, "outputs": [], "source": [ "## Just a hint to Gradient Descent" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1.50053826867\n" ] }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEACAYAAABGYoqtAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGUpJREFUeJzt3X2QVNd95vHvAwiNZAMr2QFWA5LRC/JolETCW2Pv5kXt\naAUiWyXYxGJx5AgiEmeFIrPxxmVGLtWMVlsbS+VEOEmBywkrwJJDIaUc4YRFQKHOlr1SIJEUxIvF\nlFcgGKxx1rJwWbtOGPHbP/rM6DKaCzPdF3p6+vlUdXHv6XNvnxm1+ulzzj13FBGYmZkNZ0K9G2Bm\nZmOXQ8LMzHI5JMzMLJdDwszMcjkkzMwsl0PCzMxyjTgkJK2X1Cdp3zDP/WdJpyVdninrlNQj6ZCk\n+ZnyeZL2STosaU2mfLKkzemY5yVdWcsPZmZmtRtNT+JxYMHQQkmzgNuAo5myNmAJ0AYsBNZKUnp6\nHbAiIuYCcyUNnHMF8GZEXAesAR4d5c9iZmYFG3FIRMS3gB8O89RjwOeGlC0CNkdEf0QcAXqADkkz\ngSkRsTfV2wQszhyzMW0/Ddw60raZmdn5UdOchKQ7gGMR8cqQp1qBY5n93lTWChzPlB9PZWccExHv\nAG9lh6/MzOzCm1TtgZIuAR6gMtR0PujcVczM7HyqOiSAa4APAf+Q5htmAS9K6qDSc8hOPM9KZb3A\n7GHKyTx3QtJEYGpEvDncC0vyDafMzKoQEaP6Aj7a4SalBxGxPyJmRsTVETGHytDRzRHxfWAr8B/S\nFUtzgGuBPRHxBnBSUkcKlruBZ9K5twLL0vadwO6zNSQi/Cjo0dXVVfc2jJeHf5f+fY7lRzVGcwns\n14H/ReWKpNcl/cbQz23eDZCDwBbgILANWBnvtvA+YD1wGOiJiO2pfD3wQUk9wH8CVlf1E5mZWWFG\nPNwUEb92juevHrL/+8DvD1Pv74GfHqb8n6hcNmtmZmOEV1wbpVKp3k0YN/y7LJZ/n/Wnasep6klS\nNGK7zczqSRJxnieuzcysiTgkzMwsl0PCzMxyOSTMzCyXQ8LMzHI5JMzMLJdDwszMcjkkzMwsl0PC\nzMxyOSTMzCyXQ8LMzHI5JMzMLJdDwszMcjkkzMwsl0PCzMxyOSTMzCyXQ8LMzHKNOCQkrZfUJ2lf\npuxRSYckvSzpLyRNzTzXKaknPT8/Uz5P0j5JhyWtyZRPlrQ5HfO8pCuL+AHNzKx6o+lJPA4sGFK2\nA2iPiJuAHqATQNINwBKgDVgIrJU08Cfz1gErImIuMFfSwDlXAG9GxHXAGuDRKn4eMzMr0IhDIiK+\nBfxwSNmuiDiddl8AZqXtO4DNEdEfEUeoBEiHpJnAlIjYm+ptAhan7UXAxrT9NHDrKH8WMzMrWJFz\nEvcA29J2K3As81xvKmsFjmfKj6eyM46JiHeAtyRdXmD7zMxslAoJCUlfAE5FxJ8Xcb6B046kUnt7\nga9oZmZnmFTrCSQtB34Z+KVMcS8wO7M/K5XllWePOSFpIjA1It7Me93u7m4A7rwTyuUSpVKplh/D\nzGzcKZfLlMvlms6hiBh5ZelDwDcj4qfT/u3AHwC/GBE/yNS7AXgS+CiVYaSdwHUREZJeAD4D7AX+\nGvijiNguaSVwY0SslLQUWBwRS3PaEaNpt5mZgSQiYkSjNANG3JOQ9HWgBHxA0utAF/AAMBnYmS5e\neiEiVkbEQUlbgIPAKWBl5lP9PmAD0AJsi4jtqXw98DVJPcAPgGEDwszMLpxR9STGCvckzMxGr5qe\nhFdcm5lZLoeEmZnlckiYmVkuh4SZmeVySJiZWS6HhJmZ5XJImJlZLoeEmZnlGnch4Rv+mZkVxyuu\nzcyahFdcm5lZoRwSZmaWyyFhZma5HBJmZpbLIWFmZrkcEmZmlsshYWZmuRwSZmaWyyFhZma5RhwS\nktZL6pO0L1N2maQdkl6V9KykaZnnOiX1SDokaX6mfJ6kfZIOS1qTKZ8saXM65nlJVxbxA5qZWfVG\n05N4HFgwpGw1sCsirgd2A50Akm4AlgBtwEJgraSBpeDrgBURMReYK2ngnCuANyPiOmAN8GgVP88Z\nfB8nM7PajDgkIuJbwA+HFC8CNqbtjcDitH0HsDki+iPiCNADdEiaCUyJiL2p3qbMMdlzPQ3cOoqf\nY1gHDtR6BjOz5lbrnMT0iOgDiIg3gOmpvBU4lqnXm8pageOZ8uOp7IxjIuId4C1Jl9fYPjMzq8Gk\ngs9X5K1Zz3qnwu7u7sHtUqlEqVQq8KXNzBpfuVymXC7XdI5R3Spc0lXANyPiZ9L+IaAUEX1pKOm5\niGiTtBqIiHgk1dsOdAFHB+qk8qXALRFx70CdiPhbSROB70XE9Pe2wrcKNzOrxoW4Vbg48xv+VmB5\n2l4GPJMpX5quWJoDXAvsSUNSJyV1pInsu4ccsyxt30llItzMzOpoxD0JSV8HSsAHgD4qPYO/BJ4C\nZlPpJSyJiLdS/U4qVyydAlZFxI5U/hFgA9ACbIuIVan8YuBrwM3AD4CladJ7uLa4J2FmNkrV9CT8\nl+nMzJqE/zKdmZkVyiFhZma5HBJmZparqULCt+kwMxsdT1ybmTUJT1ybmVmhHBJmZpbLIWFmZrkc\nEmZmlsshYWZmuRwSZmaWyyFhZma5HBJmZparaUPCq6/NzM7NK67NzJqEV1ybmVmhHBJmZpbLIWFm\nZrkcEmZmlquQkJD0u5L2S9on6UlJkyVdJmmHpFclPStpWqZ+p6QeSYckzc+Uz0vnOCxpTRFtMzOz\n6tUcEpKuAO4H5kXEzwCTgE8Cq4FdEXE9sBvoTPVvAJYAbcBCYK2kgdn2dcCKiJgLzJW0oNb2mZlZ\n9YoabpoIvE/SJOASoBdYBGxMz28EFqftO4DNEdEfEUeAHqBD0kxgSkTsTfU2ZY4xM7M6qDkkIuIE\n8AfA61TC4WRE7AJmRERfqvMGMD0d0gocy5yiN5W1Ascz5cdTmZmZ1cmkWk8g6V9Q6TVcBZwEnpJ0\nFzB0tVuhq9+6u7sHt0ulEqVSqepztbfDgQO1t8nMbCwpl8uUy+WazlHzimtJnwAWRMRvpf1fBz4G\n/BJQioi+NJT0XES0SVoNREQ8kupvB7qAowN1UvlS4JaIuHeY1/SKazOzUarXiuvXgY9JakkT0LcC\nB4GtwPJUZxnwTNreCixNV0DNAa4F9qQhqZOSOtJ57s4cY2ZmdVDzcFNE7JH0NPAScCr9+1VgCrBF\n0j1UeglLUv2DkrZQCZJTwMpMt+A+YAPQAmyLiO21ts/MzKrnG/yZmTUJ3+DPzMwK5ZAwM7NcDgkz\nM8vlkDAzs1wOCTMzy+WQGIb//rWZWYUvgTUzaxK+BNbMzArlkDAzs1wOCTMzy+WQMDOzXA4JMzPL\n5ZAwM7NcDolz8JoJM2tmXidhZtYkvE7CzMwK5ZAwM7NcDgkzM8vlkDAzG6eKuPCmkJCQNE3SU5IO\nSTog6aOSLpO0Q9Krkp6VNC1Tv1NST6o/P1M+T9I+SYclrSmibWZmzerAgdrPUVRP4svAtohoA34W\n+A6wGtgVEdcDu4FOAEk3AEuANmAhsFbSwGz7OmBFRMwF5kpaUFD7zMysCjWHhKSpwC9ExOMAEdEf\nESeBRcDGVG0jsDht3wFsTvWOAD1Ah6SZwJSI2JvqbcocM2Z43YSZNZMiehJzgP8j6XFJL0r6qqRL\ngRkR0QcQEW8A01P9VuBY5vjeVNYKHM+UH09lY0oR3Tczs0YxqaBzzAPui4i/k/QYlaGmoavdCl39\n1t3dPbhdKpUolUpFnt7MrOGVy2XK5XJN56h5xbWkGcDzEXF12v95KiFxDVCKiL40lPRcRLRJWg1E\nRDyS6m8HuoCjA3VS+VLgloi4d5jX9IprM7NRqsuK6zSkdEzS3FR0K3AA2AosT2XLgGfS9lZgqaTJ\nkuYA1wJ70pDUSUkdaSL77swxZmZWB0UMNwF8BnhS0kXA/wZ+A5gIbJF0D5VewhKAiDgoaQtwEDgF\nrMx0C+4DNgAtVK6W2l5Q+8zMmkJ7e7Fzp77Bn5lZk/AN/szMrFAOiRp4zYSZjXcebjIzaxIebjIz\ns0I5JMzMLJdDwsyswZ3P+VHPSZiZNQnPSZiZWaEcEgXx5bBmNh55uMnMrEl4uMnMzArlkDAzs1wO\nCTOzBnMh50A9J2Fm1iQ8J2FmZoVySJwnviTWzMYDDzeZmTUJDzeZmVmhHBJmZg2gXkPYhYWEpAmS\nXpS0Ne1fJmmHpFclPStpWqZup6QeSYckzc+Uz5O0T9JhSWuKapuZWaM7cKA+r1tkT2IVcDCzvxrY\nFRHXA7uBTgBJNwBLgDZgIbBW0sAY2TpgRUTMBeZKWlBg+8zMbJQKCQlJs4BfBv4sU7wI2Ji2NwKL\n0/YdwOaI6I+II0AP0CFpJjAlIvamepsyxzQ0X+lkZo2qqJ7EY8DngOwlRzMiog8gIt4ApqfyVuBY\npl5vKmsFjmfKj6eyhlevbqKZWa0m1XoCSf8O6IuIlyWVzlK10GtWu7u7B7dLpRKl0tle2sys+ZTL\nZcrlck3nqHmdhKT/BnwK6AcuAaYA3wD+FVCKiL40lPRcRLRJWg1ERDySjt8OdAFHB+qk8qXALRFx\n7zCv6XUSZjbutbcXOxJRl3USEfFARFwZEVcDS4HdEfHrwDeB5anaMuCZtL0VWCppsqQ5wLXAnjQk\ndVJSR5rIvjtzjJlZ0xkLQ9U1DzedxReBLZLuodJLWAIQEQclbaFyJdQpYGWmW3AfsAFoAbZFxPbz\n2D4zMzsH35bjAiu6+2hmNlLVDDc5JMzMmoTv3WRmZoVySJiZjRFjceGth5vMzJqEh5sa0Fj85mBm\nNsA9CTOzJuGehJmZFcohYWZWR2N9yNnDTWZmTcLDTQ1urH+jMLPm456EmVmTcE/CzGyMa7QRA/ck\nzMyahHsS40yjfeMws/HHPQkzsybhnoSZmRXKIWFmdp418tCxQ6JBNPKbzKzZNfJfo/SchJlZk6jL\nnISkWZJ2Szog6RVJn0nll0naIelVSc9KmpY5plNSj6RDkuZnyudJ2ifpsKQ1tbbNzMxqU8RwUz/w\n2YhoB/41cJ+kDwOrgV0RcT2wG+gEkHQDsARoAxYCayUNJNs6YEVEzAXmSlpQQPvMzC648TJEXHNI\nRMQbEfFy2v4xcAiYBSwCNqZqG4HFafsOYHNE9EfEEaAH6JA0E5gSEXtTvU2ZYyxjvLz5zMazRp6H\nyCp04lrSh4CbgBeAGRHRB5UgAaanaq3AscxhvamsFTieKT+eymyI8fLmM7Oxb1JRJ5L0fuBpYFVE\n/FjS0JnlQmeau7u7B7dLpRKlUqnI05uZjUp7+9j7AlculymXyzWdo5CrmyRNAv4K+B8R8eVUdggo\nRURfGkp6LiLaJK0GIiIeSfW2A13A0YE6qXwpcEtE3DvM6/nqpoyx+OY0s7Gnniuu/ztwcCAgkq3A\n8rS9DHgmU75U0mRJc4BrgT1pSOqkpI40kX135hg7CweEmZ0vNfckJP0c8D+BV6gMKQXwALAH2ALM\nptJLWBIRb6VjOoEVwCkqw1M7UvlHgA1AC7AtIlblvKZ7EmZWd43Wi6+mJ+HFdGZmTcI3+DNfHmtm\nhXJIjDON1PU1a0TN9kXMITGONdub2exCaLYvYp6TMDNrEp6TMDMrWLP3yB0STaTZ3+xm1Wi24aWh\nPNxkZtYkPNxkI+ZehVk+///xLvckzMyahHsSVjV/c7Jm5vd/PvckzGzcO/raa2x48EFO9/YyobWV\n5Q8/zFVz5tS7WRec791khWi0m5aZnc3R117jj2+7jYe++13eB7wNdF1zDVt1kMM9k+vdvAvKw01W\nCAeEjScbHnxwMCAA3gc89N3vctdH76lnsxqGQ8LOymO11uhO9/YOBkQ7+4FKUJw+caJubWokDgk7\nq6G9CoeGNZL2dpjQ2srbaf8ANwKVIacJV1xRt3Y1Es9JmFlDqHbyOW9O4v6dO5tu8toT13ZBeYLb\nLpTRfNAP974cDJgTJ5hwxRW+umk0xzTih61DYmxyaFi1ztVLeOhTn+L3nnxycG4BKkHxpbvuYstL\nT/h9N0LVhMSk89WYakm6HVhDZb5kfUQ8Uucm2Qhl/0d1YNhQeUEwbC/hhRfO6CVkJ5+hMgF9gBs5\nfeKE32fn2ZgKCUkTgD8BbgVOAHslPRMR36lvy2y0HBjj39AP/X/76U+z66tfHXb//06dysmXXuKx\nY8feEwR5l6h+6cEH6XriCaAy+dzGfg6liecD3OjJ5wtkTA03SfoY0BURC9P+aiCG9iY83NTYsqHh\nABmdFombgX8JvAG8mHluMnARcDEwMT0mAUr/TkjPk8omAC1APxATJtAycSItl14KF1/MlP5++iX+\n38SJg9v/3NLC9bNmcdGMGfzo7bf58be/zdU/+Qm/CXwQuG/SJD7f308blRAYuv8gsAq4KrVhYLjo\ndG8vD5XLwLs9BICfuvQ1/vHtDwGefC5Kw89JSPpVYEFEfDrtfwroiIjPDKnnkBiHhgaGA+RMLRJ3\nAl+BwQ/K/wg8lZ6fkMpb0r8XUQmOi9PzF1MJh0sy55wKvB94i0qQvD+VC/hd4E+BH+dsr6ISDl3A\n/Wn7S2mf1L7h9rdkgqDr4x9n3YubeO3k7GHnGwZ6EuDJ5yJ4xbU1tKGBMDQwsrL7edvjzTzeDQjS\nv19J5fOAm4DrgNnA5cAUKoEx0KsY+GSYnHn8BJgJfDizPROYAWwBHj7L9obUhofSdgf7OZ1eo539\nlQVrw+wPXauw96VTdF1zzeBahoFewvKHHz7j579qzhy6nniCh3bvpuuJJxwQF8hYC4le4MrM/qxU\n9h7d3d2Dj3Lqqtr4dbYAyduGkYdJNc9dyHO0t1c+vIdbObyf/YMf7jOH7L/CfqYD04G/Yz8foBIe\nU4Dn2M8UKh8CX8lsD+yfZiCIht8+zZkf/nu4cfADZXDOYMj+QGhkg+CqOXO4f+fOSs/h4x/nS3fd\n5WGkgpTL5TM+K6sx1oabJgKvUpm4/h6wB/hkRBwaUs/DTdZ0/o3ETnjPsMxtZzlmYO4B3p2PmDak\nzo3p3/2ZbagMP/0e8MWc7YGhpLdT+bFzzEk8cOWVTLj5Zqb+6EceLqqThp+TgMFLYL/Mu5fAfnGY\nOg4JazpjdU7i/ksuYdr8+Xzis5+tXM2U5gwGr27yHMKYMS5CYiQcEtaszuvVTZMm0XLJJZWrm955\nh36oXN2Utv+5pYXrZ8/mounT6ZfcI2hA42IxnZnl+4m/HNkFNtYmrs3MbAxxSJiZWS6HhJmZ5XJI\nmJlZLoeEmZnlckiYmVkuh4SZmeVySJiZWS6HhJmZ5XJImJlZLoeEmZnlckiYmVkuh4SZmeVySJiZ\nWS6HhJmZ5XJImJlZLoeEmZnlckiYmVmumkJC0qOSDkl6WdJfSJqaea5TUk96fn6mfJ6kfZIOS1qT\nKZ8saXM65nlJV9bSNjMzq12tPYkdQHtE3AT0AJ0Akm4AlgBtwEJgraSBP769DlgREXOBuZIWpPIV\nwJsRcR2wBni0xrbZCJXL5Xo3Ydzw77JY/n3WX00hERG7IuJ02n0BmJW27wA2R0R/RByhEiAdkmYC\nUyJib6q3CVicthcBG9P208CttbTNRs7/IxbHv8ti+fdZf0XOSdwDbEvbrcCxzHO9qawVOJ4pP57K\nzjgmIt4B3pJ0eYHtMzOzUZp0rgqSdgIzskVAAF+IiG+mOl8ATkXEnxfYNp27ipmZnVcRUdMDWA58\nG7g4U7Ya+HxmfzvwUWAmcChTvhRYl62TticC3z/La4Yffvjhhx+jf4z2M/6cPYmzkXQ78DngFyPi\nnzJPbQWelPQYlWGka4E9ERGSTkrqAPYCdwN/lDlmGfC3wJ3A7rzXjQj3MszMLgClb+bVHSz1AJOB\nH6SiFyJiZXquk8oVS6eAVRGxI5V/BNgAtADbImJVKr8Y+Bpwczrf0jTpbWZmdVJTSJiZ2fjWUCuu\nJX1C0n5J70iaN+S5YRfv2chI6pJ0XNKL6XF7vdvUaCTdLuk7aaHo5+vdnkYn6Yikf5D0kqQ99W5P\no5G0XlKfpH2Zsssk7ZD0qqRnJU0713kaKiSAV4B/D/xNtlBSG/mL92zk/jAi5qXH9no3ppFImgD8\nCbAAaAc+KenD9W1VwzsNlCLi5ojoqHdjGtDjVN6PWauBXRFxPZV5385znaShQiIiXo2IHt57eewi\nhlm8d6HbNw44WKvXAfRExNGIOAVspvK+tOqJBvuMGksi4lvAD4cUZxctb+Tdxcy5xst/gLzFezY6\nv5Puw/VnI+mG2hmGvgezC0WtOgHslLRX0m/VuzHjxPSI6AOIiDeA6ec6oKZLYM+HkSzes+qc7XcL\nrAX+S7pM+b8Cf0jl6jSzevm5iPiepJ+iEhaH0rdjK845r1wacyEREbdVcVgvMDuzPyuVWcYofrd/\nCjiQR6cXyN652O/BGkXE99K//yjpG1SG9BwStemTNCMi+tK99L5/rgMaebgpO36+FViabjc+h7R4\nrz7NakzpDTPgV4D99WpLg9oLXCvpKkmTqdxNYGud29SwJF0q6f1p+33AfPyerIZ472fl8rS9DHjm\nXCcYcz2Js5G0GPhj4IPAX0l6OSIWRsRBSVuAg1QW760MLwAZrUcl3UTlipIjwG/XtzmNJSLekfQ7\nVG6fPwFYHxGH6tysRjYD+IakoPI59eTAglwbGUlfB0rAByS9DnQBXwSeknQPcJTKVaFnP48/S83M\nLE8jDzeZmdl55pAwM7NcDgkzM8vlkDAzs1wOCTMzy+WQMDOzXA4JMzPL5ZAwM7Nc/x/3LBEZmGtj\ntQAAAABJRU5ErkJggg==\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "x_old = 0\n", "x_new = 9\n", "eps = .001\n", "precision = .00001\n", "\n", "# y = x^2\n", "def f(x):\n", " return np.power(x,4) -3*np.power(x,3) + 2\n", "\n", "def f_deriv(x):\n", " return 4*np.power(x,3) - 9*x\n", "\n", "\n", "\n", "counter = 0\n", "while abs(x_old-x_new)>precision:\n", " x_old = x_new\n", " x_new = x_old - eps*f_deriv(x_old)\n", " plt.plot(x_new,f(x_new),'or')\n", " counter = counter + 1\n", "print x_new \n", "\n", "for x in np.linspace(-10,10,100):\n", " plt.plot(x,f(x),'.b',markersize=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### in principle, in most of machine learning methods, we have a loss function which is differetiable to all the parameters.\n", "### Therefore, using the training data we gradually update the parameters toward a direction that minimizes the loss function\n", "\n", "### we will discuss it in more details later on neural networks\n", "### For now let's go back to Word2vec" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [], "source": [ "#### We use a beautiful library called, gensim\n", "import gensim\n", "from gensim import corpora, models, similarities\n", "import logging\n", "logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# from gensim.models import word2vec\n", "# # get the pretrained vector from https://code.google.com/archive/p/word2vec/\n", "# Google_w2v = word2vec.Word2Vec.load_word2vec_format('/Users/SVM/Downloads/GoogleNews-vectors-negative300.bin', binary=True)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# print Google_w2v.most_similar(['girl', 'father'], ['boy'], topn=1)\n", "# print Google_w2v.most_similar(positive=['woman', 'king'], negative=['man'], topn=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# What can be done with Word2vec?\n", "## In general the learned densed vectors can be used in any other tasks that require fixed lenght vectors. \n", "### Some of interesting applications:\n", "* **In Fashion industry:**\n", " * http://multithreaded.stitchfix.com/blog/2015/03/11/word-is-worth-a-thousand-vectors/\n", " * http://developers.lyst.com/2014/11/11/word-embeddings-for-fashion/\n", " * In combination with topic modeling http://www.slideshare.net/ChristopherMoody3/word2vec-lda-and-introducing-a-new-hybrid-algorithm-lda2vec-57135994\n", "* ** Any recommendation system**\n", " * **music selection considering user's playlists**\n", " * **Shopping baskets**\n", "* **Graphical data: Deep Walk:**\n", " * https://sites.google.com/site/bryanperozzi/projects/deepwalk\n", "* **Sentiment Analysis**\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Movie Reviews Sentiment Analysis" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "from sklearn.cross_validation import train_test_split\n", "from gensim.models.word2vec import Word2Vec\n", "import numpy as np\n", "with open('/All_Files/Files/Data/gensim/sample_Data/IMDB_data/pos.txt', 'r') as infile:\n", " pos_tweets = infile.readlines()\n", "\n", "with open('/All_Files/Files/Data/gensim/sample_Data/IMDB_data/neg.txt', 'r') as infile:\n", " neg_tweets = infile.readlines()\n", " \n", "with open('/All_Files/Files/Data/gensim/sample_Data/IMDB_data/unsup.txt','r') as infile:\n", " unsup_reviews = infile.readlines()\n", "\n", "#use 1 for positive sentiment, 0 for negative\n", "y = np.concatenate((np.ones(len(pos_tweets)), np.zeros(len(neg_tweets))))\n", "\n", "x_train, x_test, y_train, y_test = train_test_split(np.concatenate((pos_tweets, neg_tweets)), y, test_size=0.5)\n", "\n", "#Do some very minor text preprocessing\n", "\n", "\n", "\n", "def cleanText(corpus):\n", " import string\n", " validchars = string.ascii_letters + string.digits + ' '\n", " punctuation = \"\"\".,:;@(){}[]$1234567890\"\"\"\n", " corpus = [z.lower().replace('\\n','') for z in corpus]\n", " corpus = [z.replace('
', ' ') for z in corpus]\n", " \n", " for c in punctuation:\n", " corpus =[z.replace(c, '') for z in corpus]\n", " \n", "\n", " corpus = [''.join(ch for ch in z if ch in validchars) for z in corpus]\n", " \n", " #treat punctuation as individual words\n", " for c in punctuation:\n", " corpus = [z.replace(c, ' %s '%c) for z in corpus]\n", " corpus = [z.split() for z in corpus]\n", "# corpus = [z.replace(' ', '_') for z in corpus]\n", " return corpus\n", "\n", "\n", "x_train_c = cleanText(x_train)\n", "x_test_c = cleanText(x_test)\n", "unsup_ = cleanText(unsup_reviews)\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train Word2vec model here" ] }, { "cell_type": "code", "execution_count": 75, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "5535848" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n_dim = 150\n", "#Initialize model and build vocab\n", "imdb_w2v = Word2Vec(size=n_dim, min_count=10,\n", " sentences=None, alpha=0.025, window=5, max_vocab_size=None,\n", " sample=0, seed=1, workers=6, min_alpha=0.0001, sg=1, hs=1, negative=0, cbow_mean=0,\n", " iter=1, null_word=0)\n", "\n", "\n", "# imdb_w2v.build_vocab(np.concatenate((unsup_,x_train)))\n", "imdb_w2v.build_vocab(x_train_c)\n", "\n", "# Train the model over train_reviews (this may take several minutes)\n", "# imdb_w2v.train(np.concatenate((unsup_,x_train)))\n", "imdb_w2v.train(x_train_c)" ] }, { "cell_type": "code", "execution_count": 76, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('decent', 0.6008697748184204), ('great', 0.5777084827423096), ('nice', 0.563409686088562), ('bad', 0.5431483387947083), ('cool', 0.5381938219070435)]\n" ] } ], "source": [ "print imdb_w2v.most_similar(['good'], topn=5)\n", "# print imdb_w2v.most_similar(positive=['woman', 'king'], negative=['man'], topn=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# To make DocVec based on SOMind\n", "### Now the question is how to make prediction at the document level" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "u'0.18.0'" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "import pandas.io.data\n", "import numpy as np\n", "from matplotlib import pyplot as plt\n", "import sys\n", "from sklearn.preprocessing import scale\n", "\n", "pd.__version__" ] }, { "cell_type": "code", "execution_count": 78, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "No\n", "data size (20332, 150)\n" ] } ], "source": [ "# Here, we have the choice of using ay pretrained model too\n", "Googlevec = 'No'\n", "import gensim\n", "\n", "\n", "vocablen = len(imdb_w2v.vocab.keys())\n", "\n", "# vocablen = len(uniq_from_x_train)\n", "\n", "vector_size = imdb_w2v.vector_size\n", "VocabVec = np.zeros((vocablen,vector_size))\n", "\n", "vocab = imdb_w2v.vocab.keys()\n", "\n", "\n", "\n", "for i in range(vocablen):\n", " if Googlevec=='Yes':\n", " try:\n", " VocabVec[i] = Google_w2v[vocab[i]]\n", " except:\n", " continue\n", " else:\n", " try:\n", " VocabVec[i] = imdb_w2v[vocab[i]]\n", " except:\n", " continue\n", " \n", "print Googlevec \n", "\n", "print 'data size', VocabVec.shape" ] }, { "cell_type": "code", "execution_count": 79, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def buildDocHistogram(Vocab_ind, text, ind_size,normalize='Yes'):\n", " vec = np.zeros(ind_size).reshape((1, ind_size))\n", " count = 0.\n", " for word in text:\n", " try:\n", " vec[0,Vocab_ind[word]] += 1\n", " count += 1.\n", " except KeyError:\n", " continue\n", " if count != 0:\n", " if normalize=='Yes':\n", " vec /= count\n", " return vec" ] }, { "cell_type": "code", "execution_count": 80, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data size (20332, 150)\n" ] } ], "source": [ "#Build new dim for vocabs based on SOMinds\n", "print 'data size', VocabVec.shape\n", "ind_final_vocab = VocabVec.sum(axis=1)!=0\n", "\n", "final_VocabVec = VocabVec[ind_final_vocab]\n", "\n", "final_vocab = list(np.asarray(vocab)[ind_final_vocab])\n", "Vocab_Wordind = dict(zip(final_vocab,range(len(final_vocab)) ))" ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "all done\n", "neg done\n", "pos done\n" ] } ], "source": [ "ind_size = len(final_vocab)\n", "labels = y_train\n", "ind_pos = labels==1\n", "ind_neg = labels==0\n", "all_coocur_ = np.zeros((len(x_train_c),ind_size))\n", "for i in range(len(x_train_c)):\n", " all_coocur_[i]= buildDocHistogram(Vocab_Wordind, x_train_c[i], ind_size,normalize='No')\n", "all_coocur_ = all_coocur_.sum(axis=0)\n", "\n", "print 'all done'\n", "len_neg = len(list(np.asarray(x_train_c)[ind_neg]))\n", "neg_coocur_ = np.zeros((len_neg,ind_size))\n", "for i,text in enumerate(list(np.asarray(x_train_c)[ind_neg])):\n", " neg_coocur_[i,:]= buildDocHistogram(Vocab_Wordind, text, ind_size,normalize='No')\n", "neg_coocur_ = neg_coocur_.sum(axis=0)\n", "\n", "print 'neg done'\n", "len_pos = len(list(np.asarray(x_train_c)[ind_pos]))\n", "pos_coocur_ = np.zeros((len_pos,ind_size))\n", "for i,text in enumerate(list(np.asarray(x_train_c)[ind_pos])):\n", " pos_coocur_[i,:]= buildDocHistogram(Vocab_Wordind, text, ind_size,normalize='No')\n", "pos_coocur_ = pos_coocur_.sum(axis=0)\n", "\n", "print 'pos done'" ] }, { "cell_type": "code", "execution_count": 82, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wordspos_coocur_neg_coocur_pos_to_negdiffer
17071predicted87764.073465.01.19462114299.0
5983was171797.0163184.01.0527748613.0
17619paddy76676.068766.01.1150127910.0
11015down56285.049633.01.1340016652.0
11027annmargret50111.043690.01.1469416421.0
\n", "
" ], "text/plain": [ " words pos_coocur_ neg_coocur_ pos_to_neg differ\n", "17071 predicted 87764.0 73465.0 1.194621 14299.0\n", "5983 was 171797.0 163184.0 1.052774 8613.0\n", "17619 paddy 76676.0 68766.0 1.115012 7910.0\n", "11015 down 56285.0 49633.0 1.134001 6652.0\n", "11027 annmargret 50111.0 43690.0 1.146941 6421.0" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labels = y_train\n", "ind_pos = labels==1\n", "ind_neg = labels==0\n", "#Make the histogram of documents basedo n SOMinds\n", "ind_size = len(final_vocab)\n", "\n", "\n", "# all_coocur_ = np.concatenate([buildDocHistogram(Vocab_Wordind, z, ind_size,normalize='No') for z in x_train_c])\n", "# pos_coocur_ = np.concatenate([buildDocHistogram(Vocab_Wordind, z, ind_size,normalize='No') for z in list(np.asarray(x_train_c)[ind_pos])])\n", "# neg_coocur_ = np.concatenate([buildDocHistogram(Vocab_Wordind, z, ind_size,normalize='No') for z in list(np.asarray(x_train_c)[ind_neg])])\n", "\n", "\n", "# #Summing over all texts for each word\n", "# pos_coocur_ = pos_coocur_.sum(axis=0)\n", "# neg_coocur_ = neg_coocur_.sum(axis=0)\n", "# all_coocur_ = all_coocur_.sum(axis=0)\n", "\n", "#normalizing the values\n", "# pos_coocur_ = pos_coocur_/all_coocur_\n", "# neg_coocur_ = neg_coocur_/all_coocur_\n", "\n", "pos_to_neg = pos_coocur_/(neg_coocur_+1)\n", "\n", "sorted_features =pd.DataFrame(index=range(pos_coocur_.shape[0]))\n", "sorted_features['words'] = Vocab_Wordind.keys()\n", "sorted_features['pos_coocur_'] = pos_coocur_\n", "sorted_features['neg_coocur_'] = neg_coocur_\n", "sorted_features['pos_to_neg'] = pos_to_neg\n", "sorted_features['differ'] = np.abs(neg_coocur_-pos_coocur_)\n", "sorted_features = sorted_features.sort_values('differ',ascending=False)\n", "sorted_features.head()" ] }, { "cell_type": "code", "execution_count": 83, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(20332, 5)" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted_features.shape" ] }, { "cell_type": "code", "execution_count": 84, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "Total time elapsed: 153.612000 secodns\n", "final quantization error: 8.289663\n", "Training Done\n", "(3000, 150)\n", "(15000, 150)\n" ] } ], "source": [ "\n", "###############\n", "###############\n", "\n", "\n", "### It seems that having all the features is not that bad! Even the results are similar, eventhough it might slow down the \n", "### som trainig and som projection steps, it dosne't need conditional probabilities to be calculated\n", "sel_features = sorted_features.index[:15000].values\n", "Data= final_VocabVec[sel_features,:]\n", "\n", "\n", "# sel_features = sorted_features.index[:].values\n", "# Data= final_VocabVec\n", "\n", "# len(sel_vocab)\n", "\n", "\n", "#Train a SOM based on vocabs\n", "# reload(sys.modules['sompy'])\n", "ind_size = 3000\n", "sm1 = SOM.SOM('sm', Data, mapsize = [1,ind_size],norm_method = 'var',initmethod='pca')\n", "# ind_size = 50*50\n", "sm1.train(n_job = 1, shared_memory = 'no',verbose='final')\n", "print 'Training Done'\n", "\n", "# sm1.hit_map()\n", "print sm1.codebook.shape\n", "\n", "#Remained Data\n", "print sm1.data.shape\n", "\n", "\n", "#Build new dim for vocabs based on SOMinds\n", "Vocab_Somind = dict(zip(list(np.asarray(final_vocab)[sel_features]), list(sm1.project_data(Data))))\n", "# Vocab_Somind = dict(zip(list(np.asarray(final_vocab)[:]), list(sm1.project_data(Data))))\n", "\n", "\n", "# Vocab_Somind = dict(zip(final_vocab, list(sm1.project_data(Data))))\n" ] }, { "cell_type": "code", "execution_count": 85, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
wordsomind
1334ford0
977peter0
2261l0
1021wood0
1149holmes0
1978jr0
1205jim0
2137sir0
11396williams0
11322wayne0
\n", "
" ], "text/plain": [ " word somind\n", "1334 ford 0\n", "977 peter 0\n", "2261 l 0\n", "1021 wood 0\n", "1149 holmes 0\n", "1978 jr 0\n", "1205 jim 0\n", "2137 sir 0\n", "11396 williams 0\n", "11322 wayne 0" ] }, "execution_count": 85, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DF = pd.DataFrame()\n", "DF['word']=np.asarray(final_vocab)[sel_features]\n", "b = sm1.project_data(Data)\n", "DF['somind'] = b\n", "\n", "DF.sort_values('somind')[:10]" ] }, { "cell_type": "code", "execution_count": 86, "metadata": { "collapsed": true }, "outputs": [], "source": [ "\n", "from sklearn.preprocessing import scale\n", "#Make the histogram of documents basedo n SOMinds\n", "train_vecs = np.concatenate([buildDocHistogram(Vocab_Somind, z, ind_size) for z in x_train_c])\n", "train_vecs = scale(train_vecs)\n", "\n", "\n", "test_vecs = np.concatenate([buildDocHistogram(Vocab_Somind, z, ind_size) for z in x_test_c])\n", "test_vecs = scale(test_vecs)\n", "\n" ] }, { "cell_type": "code", "execution_count": 87, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# #now select the most informative features (here are sominds, but we can do this on original words too)\n", "\n", "# def calc_conditional_feature_importance(corpus_mat,labels):\n", "# #corpus_mat is the original matrix where each row is one record and columns are features, where are either words or sominds\n", "# #sentiments are labels\n", "# #it returns a matrix showing the relative importance of each feature regarding to each label\n", "# pos_coocur_ = np.zeros((corpus_mat.shape[1],1))\n", "# neg_coocur_ = np.zeros((corpus_mat.shape[1],1))\n", "# ind_pos = labels==1\n", "# ind_neg = labels==0\n", "# for i in range(corpus_mat.shape[1]):\n", "# pos_coocur_[i] = np.sum(corpus_mat[ind_pos,i])\n", "# neg_coocur_[i] = np.sum(corpus_mat[ind_neg,i])\n", "# sum_ = (pos_coocur_[i]+neg_coocur_[i])\n", "# if sum_ !=0:\n", "# pos_coocur_[i] = pos_coocur_[i]/sum_\n", "# neg_coocur_[i] = neg_coocur_[i]/sum_\n", " \n", "# # print i\n", "# DF =pd.DataFrame(index=range(corpus_mat.shape[1]))\n", "# DF['pos_coocur_'] = pos_coocur_\n", "# DF['neg_coocur_'] = neg_coocur_\n", "# DF['differ'] = np.abs(neg_coocur_-pos_coocur_)\n", "# DF = DF.sort_values('differ',ascending=False)\n", "# return DF" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Predictor \n", "it can be any method" ] }, { "cell_type": "code", "execution_count": 91, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " 0.0 0.85 0.84 0.85 12542\n", " 1.0 0.84 0.85 0.85 12458\n", "\n", "avg / total 0.85 0.85 0.85 25000\n", "\n" ] } ], "source": [ "#Use classification algorithm (i.e. Stochastic Logistic Regression) on training set, then assess model performance on test set\n", "from sklearn.linear_model import SGDClassifier\n", "import sklearn.linear_model as lm\n", "lm.RidgeClassifier\n", "from sklearn.decomposition import RandomizedPCA\n", "\n", "\n", "# howmany = range(10,sm1.nnodes,200)\n", "# # howmany = range(10,15000,500)\n", "# howmany = range(sm1.nnodes,sm1.nnodes+1)\n", "clf = lm.RidgeClassifier()\n", "# from sklearn.neighbors import KNeighborsClassifier\n", "# clf = KNeighborsClassifier(n_neighbors=5)\n", "# clf = lm.SGDClassifier(loss=\"hinge\", alpha=0.01, n_iter=200)\n", "\n", "# import sklearn.ensemble as ensemble\n", "# clf = ensemble.RandomForestRegressor(n_jobs=1) \n", "\n", "\n", "\n", "\n", "\n", "\n", "X_Train = train_vecs[:]\n", "X_Test = test_vecs[:]\n", "\n", "# pca = RandomizedPCA(n_components=int(.05*X_Train.shape[1]))\n", "# pca.fit(X_Train)\n", "# X_Train = pca.transform(X_Train)\n", "# X_Test = pca.transform(X_Test)\n", "\n", "\n", "\n", "\n", "clf.fit(X_Train, y_train)\n", "\n", "\n", "\n", "\n", "import sklearn.metrics as metrics\n", "print metrics.classification_report(y_test,clf.predict(X_Test))" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Some extensions\n", "* **Doc2vec**\n", " * https://arxiv.org/pdf/1405.4053v2.pdf\n", "* **Thought vector: In the same direction in other compression applications**" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# One of my previous test application for news analysis\n", "http://todo-vahidmoosavi.rhcloud.com/somnews\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" } }, "nbformat": 4, "nbformat_minor": 0 }