{ "metadata": { "name": "", "signature": "sha256:f97c586cbc172f6bad795343ef22cd69628e63f485aeb280a66b6509bfec349d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "heading", "level": 1, "metadata": {}, "source": [ "Unsupervised learning challenges (Frederik Durant)" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Data preparation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our data are a slightly enhanced version of Kiva.org's loan records downloaded as a [data dump](http://build.kiva.org/) from Feb 17, 2015. The loan data files in JSON got an extra processed_description field with language IDs, and were then loaded into a local MongoDB database called kiva.loans.\n", "\n", "The Kiva loan descriptions are real documents with multiple paragraphs, unlike tweets." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pymongo import MongoClient\n", "from datetime import datetime\n", "\n", "client = MongoClient()\n", "\n", "langcode = \"en\"\n", "loansCollection = client.kiva.loans\n", "#print \"Number of loan descriptions in '%s': %d\" % (langcode,loansCollection.find({\"processed_description.texts.%s\" % langcode :{'$exists': True}}).count())\n", "\n", "startYear = 2015\n", "start = datetime(startYear, 1, 1)\n", "c = loansCollection.find({\"$and\" : [{\"posted_date\" : { \"$gte\" : start }},\n", " {\"processed_description.texts.%s\" % langcode :{'$exists': True}}\n", " ]\n", " })\n", "print \"Number of loans in '%s' since %d: %d\" % (langcode,startYear,c.count())\n", "documents = []\n", "for loan in c:\n", " documents.append(loan[\"processed_description\"][\"texts\"][langcode])\n", "\n", "print documents[0:1]" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Number of loans in 'en' since 2015: 17938\n", "[u\"Jose is 49 years old and lives with his partner. He lives in his own home with his partner and their three children in the rural zone of the Oyotun district, Chiclayo province, Lambayeque region, on the north coast of Peru. The inhabitants of this district mainly make a living in agriculture, livestock and business.\\n\\nHe makes a living offering transport services in his mototaxi, with seven years of experience in the business. He has very good personal and work references in the area. The loan will be used to maintain his vehicle as well as to undertake repairs in the garage where he keeps his mototaxi. With this, Jose hopes to offer a better service to his clients, increasing his income level, and therefore his and his family's quality of life.\"]" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n" ] } ], "prompt_number": 23 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cluster sentences with K-means. If you have your own Fletcher test data, get sentences out and cluster them. If not, cluster the tweets you gathered during the last challenge set. For each cluster, print out the sentences, try to see how close the sentences are. Try different K values and try to find a K value that makes the most sense (the sentences look like they do form a meaningful cluster)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nltk.tokenize import word_tokenize\n", "from sklearn.feature_extraction.text import TfidfVectorizer\n", "from sklearn.cluster import KMeans\n", "from sklearn import metrics\n", "from pprint import pprint\n", "import sys\n", "\n", "# number of documents to cluster \n", "nrDocs = 2500\n", "\n", "# Build a TFIDF weighted document-term matrix \n", "vectorizer = TfidfVectorizer(stop_words=\"english\", ngram_range=(1,1), tokenizer=word_tokenize, use_idf=True)\n", "docVectors = vectorizer.fit_transform(documents[:nrDocs])\n", "\n", "# Inspired by http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html \n", "print(\"number of sample docs: %d, \\t number of unigram features: %d\" % (docVectors.shape[0], docVectors.shape[1]))\n", "\n", "minNrClusters = 5\n", "maxNrClusters = 25\n", "results = {}\n", "\n", "def groupDocsByCluster(clusterIndices,docs):\n", " ''' \n", " clusterIndices is a list cluster indices \n", " documents is a list of documents \n", " result is a dict with cluster indices as key, and a list of documents as value \n", " '''\n", " assert(len(clusterIndices) == len(docs)), \"number of cluster indices %d and number of documents %d are unequal\" % (len(clusterIndices),l\\\n", "en(docs))\n", " result = {}\n", " for cnt in range(len(clusterIndices)):\n", " i = clusterIndices[cnt]\n", " if result.has_key(i):\n", " result[i].append(docs[cnt])\n", " else:\n", " result[i] = [docs[cnt]]\n", " return result\n", "\n", "\n", "for nrClusters in range(minNrClusters,maxNrClusters+1):\n", " print >> sys.stderr, \"Fitting KMeans model with %s clusters\" % nrClusters\n", " results[nrClusters] = {}\n", " model = KMeans(n_clusters=nrClusters).fit(docVectors)\n", " results[nrClusters]['inertia'] = model.inertia_\n", " results[nrClusters]['predictions'] = model.predict(docVectors)\n", " results[nrClusters]['cluster_centroids'] = model.cluster_centers_\n", " \n", "# Just for inspection \n", "clusteredDocs = groupDocsByCluster(results[minNrClusters]['predictions'],documents[:nrDocs])\n", "for k in sorted(clusteredDocs.keys()):\n", " print \"<<<<<<<<<< SAMPLES FROM CLUSTER %d: >>>>>>>>>\" % k\n", " print \"\\n-----------------------------------------------------------------\\n\\n\".join(clusteredDocs[k][:3])\n", " print" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "Fitting KMeans model with 5 clusters\n", "Fitting KMeans model with 6 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 7 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 8 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 9 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 10 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 11 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 12 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 13 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 14 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 15 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 16 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 17 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 18 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 19 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 20 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 21 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 22 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 23 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 24 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting KMeans model with 25 clusters" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "number of sample docs: 2500, \t number of unigram features: 10377\n", "<<<<<<<<<< SAMPLES FROM CLUSTER 0: >>>>>>>>>" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Alma works hard to support her family. Alma is married and has a pig fattening business in the Philippines.\n", "\n", "She requested a PHP 10,000 loan through NWTF to buy feed and vitamins for her pigs.\n", "\n", "Alma has been in this business for 10 years now.\n", "\n", "In the future, Alma would like to save money to expand her business.\n", "-----------------------------------------------------------------\n", "\n", "Anecita is 54 years old and married with three children. She is a very hardworking entrepreneur. \n", "\n", "She has a buy and sell of clothing business in the Philippines. Anecita requested a PHP 11,000 loan through NWTF to buy additional items like t-shirts, pants, shorts, etc.\n", "\n", "Anecita has been in this business for three years. In the future, Anecita would like to save money to expand her business.\n", "-----------------------------------------------------------------\n", "\n", "Asoncion works hard to support her family. Asoncion is married and has a dried fish vending business in the Philippines.\n", "\n", "She requested a PHP 10,000 loan through NWTF to purchase additional boxes of dried fish to sell.\n", "\n", "Asoncion has been in this business for 18 years now.\n", "\n", "She would like to save enough to provide a secure future for her family.\n", "\n", "<<<<<<<<<< SAMPLES FROM CLUSTER 1: >>>>>>>>>\n", "Jose is 49 years old and lives with his partner. He lives in his own home with his partner and their three children in the rural zone of the Oyotun district, Chiclayo province, Lambayeque region, on the north coast of Peru. The inhabitants of this district mainly make a living in agriculture, livestock and business.\n", "\n", "He makes a living offering transport services in his mototaxi, with seven years of experience in the business. He has very good personal and work references in the area. The loan will be used to maintain his vehicle as well as to undertake repairs in the garage where he keeps his mototaxi. With this, Jose hopes to offer a better service to his clients, increasing his income level, and therefore his and his family's quality of life.\n", "-----------------------------------------------------------------\n", "\n", "Maria is 45 years old and married. Her family, her husband and son, live together in their own house in the Lagunas district of Chiclayo Province in the Lambayeque region on the north coast of Per\u00fa. The people of this district are reliant on agriculture, cattle raising, and commerce. \n", "\n", "Maria's business is selling produce. She has over three years of experience in this line of work and has very good personal and professional references. In addition, she raises livestock such as ducks, guinea pigs, and sheep. This loan will be used to buy greater quantities of produce so as to increase her business's stock. With this she hopes to increase sales, and thus her income, and improve her quality of life. \n", "-----------------------------------------------------------------\n", "\n", "Alcides is 52 years old and has a life partner. He lives with his partner and their daughter in the Callayuc province in the Cajamarca region in the northern mountains of Peru. In that district, the population works in agriculture and livestock.\n", "\n", "Alcides makes a living raising and selling farmyard animals like guinea pigs, chicken and ducks, among others. He has over five years of experience in this field and has very good personal and work references in the area. The purpose of the loan will be to buy a greater quantity of animals like guinea pigs, chickens and ducks, as well as to buy animal feed for them. With this, Alcides hopes to increase his business's production and improve the quality of the products he offers his customers, which will have a positive impact on his income level and, therefore, his quality of life.\n", "\n", "<<<<<<<<<< SAMPLES FROM CLUSTER 2: >>>>>>>>>\n", "Sylivier is 43 years old. He is married with five children aged between 1 and 14 years, 4 of whom go to school. He is a farmer and has been farming for 20 years. With the loan from Kiva, he would like to buy more fertilizers to grow maize, which he will then sell. The profits from the business will be used to pay for his workers.\n", "\n", "The agriculture sector accounts for 37% of Rwanda's gross domestic product, generates 65% of Rwanda's export revenue, and employs approximately 90% of Rwandans (as of 2009). Despite the importance of agriculture to Rwandans and their economy, financial institutions view lending to fund agricultural activities as a high-risk proposition because the profitability of these activities is affected by weather, natural disasters, and price fluctuations. For this reason, farmers in Rwanda remain underserved by financial institutions. Urwego Opportunity Bank is expanding into this market and is happy to provide Kiva lenders with the opportunity to support Rwandan farmers. \n", "-----------------------------------------------------------------\n", "\n", "Innocent is 44 years old. He is married with 7 children aged between 1 and 22 years, 4 of whom go to school. \n", "\n", "Innocent has been farming for 21 years. He will use this loan from Kiva's lenders to buy fertilizer to grow maize which he will then sell. The profits will be used to pay his employees.\n", "\n", "As of 2009, the agriculture sector accounts for 37% of Rwanda's gross domestic product, generates 65% of Rwanda's export revenue, and employs approximately 90% of Rwandans. However, despite the importance of agriculture to Rwandans and their economy, financial institutions view lending to fund agricultural activities as a high-risk proposition because the profitability of these activities is affected by weather, natural disasters, and price fluctuations. For this reason, farmers in Rwanda remain underserved by financial institutions. Urwego Opportunity Bank is expanding into this market and is happy to provide Kiva lenders with the opportunity to support Rwandan farmers. \n", "-----------------------------------------------------------------\n", "\n", "Joseph is 48 years old. He is married with four children aged between seven and 23 years; all of them go to school. Joseph has farmed for 15 years and will use his Kiva loan to buy fertilizer and grow and sell maize. The profits from the business will be used to pay for his employees.\n", "\n", "The agriculture sector accounts for 37% of Rwanda's gross domestic product, generates 65% of Rwanda's export revenue, and employs approximately 90% of Rwandans (as of 2009). Despite the importance of agriculture to Rwandans and their economy, financial institutions view lending to fund agricultural activities as a high-risk proposition because the profitability of these activities is affected by weather, natural disasters, and price fluctuations. For this reason, farmers in Rwanda remain underserved by financial institutions. Urwego Opportunity Bank is expanding into this market and is happy to provide Kiva lenders with the opportunity to support Rwandan farmers.\n", "\n", "<<<<<<<<<< SAMPLES FROM CLUSTER 3: >>>>>>>>>\n", "Sothy\u2019s group lives in a rural village in Kompong Thom province in Cambodia. Sothy works as a farm laborer and he also does extra work as a seller by selling bread to support his family. In their village there is no reliable access to safe, clean drinking water. Having a water filter at home will help each of woman and man to safeguard the health of their families, save money on medical expenses and save time by not having to collect fuel and boil water.\n", "-----------------------------------------------------------------\n", "\n", "Rosita is 35 years old and married. She lives with her husband and her children. She has two daughter. She usually sells food and her husband has been a farmer for over 11 years to increase the family's income. She was pleased to be able to help her husband to improve their life\n", "\n", "She has asked for a loan from KPP-UMKM Syariah for 6,000,000 rupiah. This loan will be used to buy and build a bathroom, septic tank, and closet at her home to improve access to water and sanitation for her family. She wants to have access to safe sanitation for her family. Her family income is still not enough to build toilets and water resources with cash, but she is able to repay the loans received.\n", "\n", "She and her family want to live a healthy life with healthy sanitation. She is very grateful for the opportunity and wants to thank all of lenders.\n", "-----------------------------------------------------------------\n", "\n", "Murni is 50 years old and married. She lives with her husband and her children. She has two sons. She has been selling food and cakes for over eleven years to increase the family income. She was pleased to be able to help her husband to improve and have a better life.\n", "\n", "She has asked for a loan from KPP-UMKM Syariah for 7,500,000 rupiah. This loan will be used to buy and build a bathroom, septic tank, and closet at their home to improve access to water and sanitation for their family. She wants to have access to safe sanitation for the families. Her family income is still not enough to build toilets and water resources with cash, but they are able to repay the loans received.\n", "\n", "She and her family want to live a healthy life with healthy sanitation. She is very grateful for the opportunity and wants to thank to all of lenders.\n", "\n", "<<<<<<<<<< SAMPLES FROM CLUSTER 4: >>>>>>>>>\n", "The communal bank \"San Agust\u00edn\" is the name of this group, which includes 8 hardworking women with initiative. They are from a community where there is a large lake where the residents go out daily to fish, wash clothes, or enjoy themselves. The members of this group are:\n", "\n", "Josefina \u2013 sells wooden furniture \n", "\n", "Francisca \u2013 sells shoe through a catalog\n", "\n", "Maria Del Carmen - sells chicken\n", "\n", "Marisela - sells bread\n", "\n", "Karina \u2013 sells tortillas \n", "\n", "Gloria - sells sweets\n", "\n", "Karina is a very responsible person. Three years ago she took over the care of her nephew, whom she loves as if she was his mother. Karina learned to make tortillas by hand when she was a little girl. Her mother taught her, and she has been selling this traditional Mexican food for 2 years. Mondays through Saturdays she gets up at 5am to wash the corn that she cooked the previous evening, and later she grinds it, turns on the light, and opens her business. Each day she sells about 500 tortillas. Karina must take good care of her hands, since they are her main work tool. In addition to this activity, Karina also rents her mill to other people who make their own tortillas.\n", "\n", "With her earnings, Karina covers the bi-monthly payment that covers the electricity that the mill uses, and she also buys firewood and contributes to the household finances. As with her previous loans, she will buy corn at wholesale and a new griddle, and she will also do maintenance work on her mill so that she can keep her business running. \"I'm thinking about starting up a new business, so that I can make use of my free time in the afternoons and increase my income.\"\n", "\n", "Note: the photograph was taken in front of Francisca's house. Karina is seen wearing a grey skirt and a blue blouse.\n", "-----------------------------------------------------------------\n", "\n", "The most important crop in the region where the group is located is sugar cane. Almost all of the fields are covered with this fruit, although one also tends to see lime, papaya, and orange trees. The members of Las Tres Mar\u00edas [The Three Mar\u00edas] and their occupations are:\n", "\n", "Francisca \u2013 Sewing\n", "\n", "Martina \u2013 Tacos\n", "\n", "Maria R. \u2013 Bread\n", "\n", "Mar\u00eda R. is an extremely committed person. She is married to Filiberto and has three children. The youngest of her children is 15 years old. She does not know how to read or write due to the fact that she did not study. Therefore, she unfortunately had to start working from a very early age to help her parents take care of her siblings. \n", "\n", "Mar\u00eda R. used to make a living by buying and reselling bread with the purpose of giving her children the quality of life that she did not have. \u201cI will never forget the day in which my son told me: \u2018Mom, why don\u2019t you make bread to sell instead of buying it?\u2019\u201d she says. When Filiberto arrived at the house after a day of work, she told him the little one\u2019s idea and that was how they started their own bakery 5 years ago. The invested their savings to buy materials to build a stone oven where the bread is cooked over a wood fire. They also bought flour, sugar, butter, and food coloring. \n", "\n", "At the beginning, Mar\u00eda R.\u2019s biggest fear was to not be successful and that her investment would be in vain. Today, her bakery employs 7 people. She, her husband, and her brother-in-law prepare the bread. \u201cI like to sell. For this reason, after I finish making the bread I carry 80 pieces in my basket and I go out to sell them to the neighboring communities,\u201d she says.\n", "\n", "Thanks to the two loans that she received before, she was able to buy baskets, sugar, butter, flour, and firewood, continuing her bakery in this manner. Due to the fact that the coldest months in the region are arriving, the inhabitants tend to buy more bread. For this reason, Mar\u00eda R. needs to buy more ingredients and materials to take advantage of the season. \n", "\n", "*The photograph was taken in Ma Filiberta\u2019s house. Mar\u00eda R. is wearing a blue t-shirt and is leaning against some pieces of firewood.\n", "-----------------------------------------------------------------\n", "\n", "Javier is 29 years old and studied through high school. Later he stopped studying because he could not pay university costs. He has a life partner and one child. His partner sells clothing and both of them work to support their home. Javier has been in sales for 5 years. He makes and sells bread everyday in the streets of his town and the surrounding communities. Before, he sold ice cream but the bread sales are going better. He is requesting a loan to buy flour, sugar, yeast, molds, and more for his business. Javier sees a great opportunity to grow in this business. His dream is to establish a big bakery and create jobs for people who need them.\n", "\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n" ] } ], "prompt_number": 39 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Draw the inertia curve over different k values. (Sklearn KMeans class has an inertia_ attribute)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline\n", "\n", "from matplotlib import pyplot as plt\n", "#print \"KMeans with %d clusters has inertia: %d\" % (nrClusters, model.inertia_) \n", "\n", "X = [n for n in sorted(results.keys())]\n", "Y = [results[n]['inertia'] for n in X]\n", "\n", "plt.figure()\n", "plt.scatter(X,Y)\n", "plt.title(\"Metric of goodness for KMeans document clustering\")\n", "plt.xlabel(\"Number of clusters\")\n", "plt.ylabel(\"Inertia\")\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAY0AAAEZCAYAAABrUHmEAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xu8VlW97/HP1/tKMVNLAS90wURSI0w9R4vVRdB2qcTR\ncpebyp250+AYlZfaSdZJy23eTnmyKEHLtlvT3N4WxGZlZEkZBIqmbiUDxTKvFAbK7/wxxiOTh7UW\ncy3WXGs9z/q+Xy9ePM+YtzEv6/nNOcYcYygiMDMzK2OL/s6AmZk1DgcNMzMrzUHDzMxKc9AwM7PS\nHDTMzKw0Bw0zMyvNQaOPSDpL0ncqWO/3JT0l6Ve9ve5u5mOdpNf1Zx5qJB0m6UFJz0s6ur/zM5BJ\nulLSl/s7HwPBQLmGq/qt6C2DOmhIWibp75J2qUtfmC+gvUqso1XSHzc1X0ScFxEf35z8drDttwHv\nBoZFxKG9ue4Gdy5waUQMiYibNndl9T+skkZLelzSp/P3zb6O+lHkfw1P0oh8vPvtd603gnAVvxW9\naVAHDdIfy8PACbUESfsDLfTiH5KkLXtrXXX2BpZFxAsVrb9R7QUs7cmCnZyrl39YJY0B/gs4NyK+\nUZhe+XVUIfV3BnpZw+5Phb8VvSciBu0/4BHg88CCQtq/AWcD64C9ctq2Of0PwErgcmA7YHtgNfAS\n8DzwHDAUmA5cB1wFPAuclNOuKmzncOBO4GngUWByJ3kcBtwE/AV4EPjnnH5S3vaLedvndLDsFsCF\nwJ9JP2qn5f3aoqt1F/b5YmBF/ncRsE1h+meBx4DlwMfyel+Xp10JfBO4OR+TX9Wm5en7AnPydu8H\njitMew9wb15uOTAtp++a1/d0Xu4OQB3s83/n8/G3vI6tN7Gf9efqYx2s8/vAl4GD87H8WN30zbqO\n8rSd8v79CXgK+E9geGF97aQnqPl5v9qAXfK07YCrgSfz8VkAvKaT62kM8Nu8jh8B1wBfLkz/eD5G\nfwF+AgwtTBtdOG8rgTML57u4jlbgj4Xvy4DPAItJ1+oMYDfgtnzM5wA7FeY/lPV/G4uAcSWPw6P5\neD+f/x3Syd/E2cBDefnf1I4zG17D7cBJheU+Avw8fxbp7+GJnP/F+dicDKwB/p63/5PC39n1+dw+\nDHyqi+tvg98KYETO1z+Rrps/A2cXlm8BZpKumaXA54rHvpLfzSpXPtD/kf7Y30X64doX2BL4I+lO\ntfjHfhFwI+kPewfSD9BX87Rx9Scpn/Q1wNGFP+pzChfC3vmC/UDe5s7AgZ3k8Q7g/wLbAAfmC+8d\nedrk2oXcybKnkH6Ah+W8/5T0g7pFiXWfS/rD3TX/+wXp7hrgSNKPxn7AK4AfsnHQeBI4KO/f1cA1\nedr2+RhPJv0Bvzn/Ieybpz8OHJY/vxIYkz+fR/qR3TL/O2wT5/WdJY/hRueqg/V9H5hN+rH8UEXX\n0c7AxHyt7ABcC9xQ2EY76cf8DXmeecB5edon8rq2I/2gjQGGdJDPbUg/PFNzHiflfa+d13fmc/Hm\nPO+lwM/ytCH53Jyep+0AvLVwfM4tbKeVDYPGI6Rr6dWka/EJUuA6kBRI5wJfzPMOz9fOkfn7u/P3\nXUoch70p3BR1cm18lvQjPzJ/PwDYOX8uXsPzKNwcsGHQmEAKNjvm728Edu/kWGwB3A18AdgKeC3p\nxmZ8yd+KETlf387H6gDgBeCNefr5Oa+vzMduMfBolb+bg714quYqUiQ/ghStV9QmSBLp7uvTEfFM\nRKwi/YB9sDZLJ+u8M3J5eqTio+J8/wjMiYh/j4iXIuKpiPhd/Qok7Qn8T+CMiFiT5/luzmtX2645\nHrg4Ih6LiGdyvlVy3R8iXfxPRsSTwJeAEwvr/V5ELI2Iv5Eu8qIAfhwRv4mIl4AfkH6IAN4LPBIR\nMyNiXUQsAn6c1wnpD2i0pB0j4tmIWFhIHwqMyMfsF5vYd0ruJ2x8rjZaDXAI8Axwexeb6/F1lK+B\nGyLihTztq6QbkpoAvh8RD+U8Xsv6Y7oG2IX0QxgRsTAinu8gf4cCW0XEJfkYXg/8ujD9Q8CMiFgU\nEWuAs4D/IWlv0nl7LCIuysdxVUQUl93UtXhZRPw5Ih4Dfg78MiJ+FxF/B24gBTqADwO3RsTt+bj8\nlPQD/Q8ljkOZYqmTgM9HxIN5/Ysj4qkSyxWtJQXRUZK2iIjfR8TKwvRiPt4K7BoRX4mIFyPiEdL1\n98HCPF39VtR8KSL+HhGLgd+RAi7AcaQbj2cjYgVwSSfL9xoHjXQRXkX6g5kMzGLDg/5q0t303ZKe\nlvQ06bF6102sd3kX0/YkPaZuyjDgqYj4ayHtUdIdRRlDSXe8HeWps3UPKyz7hy6m/bFuWr0nCp9X\nk+5MId0NHlI7lvl4/iOpuALS3e97gGWS2iXVKvgvIBUpzJb035LO6GiHO1DmGHZ1riBdI98k3THO\nkbRTJ/P0+DqS9ApJ386V6s8CPwNemYNNTfGHqXhMryIV0/xI0gpJX5O0VQd5HEYhkGXFc7zBOc/H\n7C+kY7UH5a7ZztRfD8XvL7Dh9XFc3fVxGLB7Yf7OjkMZe5Lu9HssIv6L9OT6TeCJfN6GdDL73sCw\nuv05C3hNYZ5NXX+w4T7/jfX7PIzO/8Yr4aABRMSjpD+Io0h3vUVPki7M/SLiVfnfThGxY23xjlbZ\nSXrNo8DrS2TtMWBnScU/ir0of2E8TvojqSl+7mzdKwrTR3Qy7fH8vTitrEdJRR6vKvwbEhGnAuSn\nk2NJP7I3ku4kyXe2n4mI1wNHA5+W9M4S2ytzDLs6VzUvkoLbo0BbRz8Sm3kdTQP2AQ6OiFeSnjJE\nibvGfAd7bkSMJj1VvZcNn6RqHmfjG469C583OOeStic9wSwn/TB19jrqX0kBsWb3TuYr6my/HiUV\nzdRfH18vsc4y5/GPpKKtTfkrqSi1ZoN9iojLIuIgUhHtPqRir47y8Cjpybq4PztGxHsL89cvU2Y/\narr6G6+Eg8Z6J5HKwVcXEyNiHfAd4GJJrwaQNFzS+DzLE8AuknYsLLapP/QfAu+WdJykrSTtIunA\n+pki4o+ksuDzJG0r6QBSpfPVJffpWmCqpGH57vgM8gVZYt3XAF+QtKukXYEvFqZdC3xE0ihJr2Dj\n4qmu9v8WYB9JH5a0df73Vkn75s8fkvTKXKz1PKkOBknvlfSGfOf9XE5/aVMHoBeOYW1/FBEvkooD\nngRuzfter6fX0Q6koPKspJ3Z+JjW8rFxovQOSfvnN2+eJxWfdHRs7gRelDQlH+v3k4pPaq4BPirp\nQEnbkorIfpWD4S3AUElT83EcIungvNwi4D2SXiVpd+B/d5TPkq4G3idpvKQtJW2XX2svBrvOrq8/\nk8r/u7oh+y7w5dq1JOmAfLzrLQLeL6lF0htI5zXVgksHSTpE0taku/4XWH+8n2DD4LoAeF7S5/K6\ntpT0JkkHdbEv3SleuhY4S9JO+RidRveCTrc5aGQR8XBE/LaYVPh8Bqlo5Fe56GAO6e6CiLif9Mf2\nsFIju6F0fvdQ+8F+lFQEM430+L+QVMHVkRNId3+Pke5ev5gfjzdYZye+Q6rAXUwqWrkFeCn/gG1q\n3V8hlSUvzv9+k9PI5c0Xk149fYBUkVnMR6d3T7msfTypTHcF6U7pPFLlKqQy7UfycT6ZVNwD6e5w\nDulH8U7gmxHxsy72vWhzjuEG80TEWuD9pB+KmyRtt8GMPbyOSMezhRSQ7iQVXXV1B1rM927Af5De\nvllKqiy+aqOdWJ/3j5Cuu+NJb/XUps8F/jWnPUaqtK3VuTxPqqt5H+mcPUCq8CZv63ekt6RuJ72V\nVeaYbrQvEbEcOIb0htOfSHfq09jwh7SzZf8G/B/gF7ko6GA29g3SD+1s0vH6DqnyuX69F5Hqip4g\nVW4XbzJ2BK4gvbG0jHTOLsjTZgD75e3/OP+tvZdU7/IwKbBdkdexQf472qcO8lXvXNKT4CN5n/4j\n57syiqgmKOUKyFmksrsAroiISyVdQDqIa0hlix+NiGclHcH6H481wGcjYl5e11jSGznbkSrJplaS\n6SYn6Sjg8ogY0d95MbPeJ+lfgOMj4h1VbaPKJ421wOm5nPVQ4FRJo0jRcHREHEi6Wzkrz/9n4L0R\ncQCpIrF4p3Q56Z3pkcBISUdWmO+mkR/t35OLwIaTijzqy9rNrEFJ2l2p25wtJL0R+DTpbbTKVBY0\nImJlpNcpya8Q3kfq7mJOoXjkLtJbGeTX/GpvCCwFWnK561DSO+cL8rRZwLFV5bvJiPQe+FOk9+Lv\nJdVNmFlz2Ab4f6R6vrmkl0e+VeUGO3otr9dJGkF6D/uuukkfI9UH1JsE3B0Ra/MdcvFNlxWUf+V0\nUMuVsR2V65pZE8j1o/v35TYrDxr5VcfrgKn5iaOW/nlgTUT8sG7+0aRWjkdUnTczM+ueSoNGfiXt\neuDqiLixkP4R0ttD76qbfw9SmfuJueUkpCeLPQqz7cHGDZSQVOlrZmZmzSoiSr/mW1mdRn6ffgaw\nNCIuLqQfSWoIc0wUumzI7QhuIXX38MtaekQ8DjyX34sWqSuLlwNQUVTY30p//zvnnHP6PQ/eP+/f\nYNu3wbB/3VXl21OHkd65f4fSuAIL8yufl5EaMs3JabVKm9NIjXLOKcxf66rjk6RGOQ8CD0Xul8bM\nzPpWZcVTETGfjoPSyE7m/wq58VgH0+6mjyt7zMxsY24R3iBaW1v7OwuV8v41rmbeN2j+/euuylqE\n9zVJ0Sz7YmbWVyQRA6Ei3MzMmo+DhpmZleagYWZmpTlomJlZaQ4aZmZWmoPGANfW1sb48ZMYP34S\nbW1t/Z0dMxvk/MrtANbW1sbEiZNZvfprALS0nMENN8xkwoQJ/ZwzM2sW3X3l1kFjABs/fhJz5hxN\nGpMKYCZHHHETs2df39ViZmaluZ2GmZlVpk8GYbKemTbtZObPn8zq1el7S8sZTJs2s38zZWaDmoun\nBri2tjYuvPAKIAUR12eYWW9ynYaZmZXmOg0zM6uMg4aZmZVW5XCve0qaJ+leSfdImpLTL5B0n6Tf\nSfqxpFcWljlL0oOS7pc0vpA+VtKSPO2SqvJsZmZdq/JJYy1wekSMBg4FTpU0CpgNjI6IA4EHgLMA\nJO0HfADYDzgS+FYeExzgcuCkiBgJjMzjjJuZWR+rLGhExMqIWJQ/rwLuA4ZFxJyIWJdnuwvYI38+\nBrgmItZGxDLgIeAQSUOBIRGxIM83Czi2qnybmVnn+qROQ9IIYAwpSBR9DLg1fx4GLC9MWw4M7yB9\nRU43M7M+VnnjPkk7ANcBU/MTRy3988CaiPhhb21r+vTpL39ubW312L5mZnXa29tpb2/v8fKVttOQ\ntDVwM3BbRFxcSP8I8HHgXRHxQk47EyAizs/fbwfOAf4AzIuIUTn9BGBcRJxSty230zAz66YB004j\nV2LPAJbWBYwjgc8Cx9QCRnYT8EFJ20h6LTASWBARK4HnJB2S13kicGNV+W5k7kbdzKpW2ZOGpMOB\nO4DFQG0jZwOXAtsAT+W0X0bEJ/MyZ5PqOV4kFWe15fSxwJVAC3BrREzpYHuD+knD3aibWU+4G5FB\nyt2om1lPDJjiKTMzaz7uGr1JuBt1M+sLLp5qIu5G3cy6y3UaZmZWmus0zMysMg4aZmZWmoOGmZmV\n5qBhZmalOWiYmVlpDhpmZlaag8Yg5c4Nzawn3E5jEHLnhmZW48Z9tknu3NDMaty4z8zMKuMOCwch\nd25oZj3l4qlByp0bmhkMoDoNSXsCs4DXkEbuuyIiLpV0HDAd2Bd4a0T8Ns+/HfB9YDTpCWhWYbzw\n2sh925FG7pvawfYcNMzMumkg1WmsBU6PiNHAocCpkkYBS4CJpKFgiz4IEBEHAGOBT0jaK0+7HDgp\nIkYCI/M442Zm1scqCxoRsTIiFuXPq4D7gGERcX9EPNDBIo8D20vaEtgeWAM8J2koMCQiFuT5ZgHH\nVpVvMzPrXJ+8PSVpBDAGuKuzeSKiDXiOFDyWARdExDPAcGB5YdYVOc3MzPpY5W9PSdoBuA6Ymp84\nOpvvw0ALMBTYGfi5pLnd2db06dNf/tza2kpra2sPcmxm1rza29tpb2/v8fKVvj0laWvgZuC2iLi4\nbto8YFqhIvxbwJ0RcXX+PgO4DZgPzIuIUTn9BGBcRJxStz5XhJuZddOAqQiXJGAGsLQ+YBRnK3y+\nH3hnXnZ7UuX5/RGxklS3cUhe54nAjVXl2zbmfqrMrKbKV24PJ70htZj0yi3A2cC2wGXArsCzwMKI\nOErStqQgcyApmH0vIi7M66q9cttCeuV2Sgfb85NGBdxPlVlzGzDtNPqag0Y13E+VWXMbMMVTZmbW\nfNz3lHXJ/VSZWZGLp2yT3E+VWfNynYaZmZXmOg0zM6uMg4aZmZXmoGFmZqU5aJiZWWkOGmZmVpqD\nhpmZleagYWZmpTlomJlZaQ4aZmZWmoOGmZmV5qBhZmalOWiYmVlpVQ73uqekeZLulXSPpCk5/bic\n9pKkt9Qtc4CkX+b5F0vaJqePlbRE0oOSLqkqz2Zm1rUqnzTWAqdHxGjSeN+nShoFLAEmkoaCfZmk\nrYCrgJMj4k3AOODFPPly4KSIGAmMlHRkhfk2M7NOVBY0ImJlRCzKn1cB9wHDIuL+iHigg0XGA4sj\nYkle5umIWCdpKDAkIhbk+WYBx1aVbzMz61yf1GlIGgGMAe7qYraRQEi6XdLdkj6b04cDywvzrchp\nZmbWxyof7lXSDsB1wNT8xNGZrYHDgYOA1cBcSXcDz5bd1vTp01/+3NraSmtraw9ybGbWvNrb22lv\nb+/x8pWO3Cdpa+Bm4LaIuLhu2jxgWkT8Nn//AHBURHwkf/8C8AJwNTAvIkbl9BOAcRFxSt36PHLf\nAOHhYc0ax4AZuU+SgBnA0vqAUZyt8LkN2F9SS64UHwfcGxErgeckHZLXeSJwY1X5ts3T1tbGxImT\nmTPnaObMOZqJEyfT1tbW39kys15S2ZOGpMNJb0gtBmobORvYFrgM2JVU9LQwIo7Ky3wIOCvPf0tE\nnJnTxwJXAi3ArRExpYPt+UljABg/fhJz5hwNTM4pMzniiJuYPfv6/syWmXWiu08aldVpRMR8On+S\n6fBJISJ+APygg/S7gf17L3dmZtYTlVeE2+AybdrJzJ8/mdWr0/eWljOYNm1m/2bKzHpNpRXhfcnF\nUwOHK8LNGkd3i6ccNKxfOcCY9S8HDWsYtTetVq/+GpCKsm64YaYDh1kfctCwhuE3rcz634Bpp2Fm\nZs3Hb09Zv/GbVmaNx8VT1q9cEW7Wv1ynYWZmpblOw8zMKuOgYWZmpTloWMNoa2tj/PhJjB8/yT3n\nmvUT12lYQ3BDQLNquCLcmpIbAppVwxXhZmZWmSpH7ttT0jxJ90q6R9KUnH5cTntJ0ls6WG4vSask\nTSukjZW0RNKDki6pKs82cE2bdjItLWcAM4GZuSHgyf2dLbNBp8qR+3YHdo+IRZJ2AO4GjiWNyrcO\n+DaFMcILy10HvAQsiIgLc9oC4LSIWCDpVuDSiLi9bjkXTzU5NwQ0630DaeS+lcDK/HmVpPuAYREx\nF1JG60k6FngY+GshbSgwJCIW5KRZpOBz+0YrsKY2YcIEBwqzflYqaEh6LzAa2I483ndEnFt2I5JG\nAGOAu7qYZwfgc8C7gc8WJg0Hlhe+r8hpZmbWxzZZpyHp28DxwKdy0vHA3mU3kIPBdcDUiFjVxazT\ngYsi4m9A6UclMzPrO2WeNP5nROwvaXFEfEnShZQsGpK0NXA9cHVE3LiJ2Q8GJkn6OrATsE7SauDH\nwB6F+fYgPW1sZPr06S9/bm1tpbW1tUw2zcwGjfb2dtrb23u8/CYrwiUtiIiDJf0KmAT8BbgnIt6w\nieVEetXlLxFxegfT5wGfiYi7O5h2DvB8RHwjf78LmAIsAG7BFeFmZr2iiorwmyW9CriA9AYUwHdK\nLHcY8GFgsaSFOe1sYFvgMmBX4BZJCyPiqE2s65PAlUALcGt9wDAzs77RrVduJW0HbBcRz1SXpZ7x\nk4YV+fVcs3J6rRsRSe+KiLmSJpHfmCqKiB/3PJu9z0HDatxPlVl5vVk89XZgLvA+OggapApqswHn\nwguvyAEj9VO1enVKc9Aw23ydBo2IOCd/PDciHi5Ok/S6SnNlZmYDUpm+p67rIO0/ejsjZr3F/VSZ\nVafTJw1Jo4D9gJ0kvZ/U4C6AHUktw80GpAkTJnDDDTMLFeGuzzDrLV1VhB8DTCTVadxUmPQ88KOI\nuLP67JXninAzs+7r1UGYJG0FfC4ivtobmauSg4aZWff16iBMEfEi6WnDrGl57HGz8sp0I3IRsDXw\n76QuywVE/TgY/c1PGtYTbtNhg12vjxEuqZ2OG/e9o9u5q5CDhvWExx63wa7X+56KiNbNypGZmTWN\nMuNp7C5phqTb8/f9JJ1UfdbMqtfdNh2u/7DBrkzx1O3A94HPR8QBeYyMhRHxpr7IYFkunrKeKtu5\noes/rBlVUafxm4g4KHdhPianLYqIN29mXnuVg4ZVzfUf1ox69ZXbbJWkXQobOBR4tieZMzOzxlZm\nEKZpwH8Cr5N0J/Bq4H9VmiuzAWjatJOZP38yq1en76n+Y2b/Zsqsj5UahCnXY7wxf/19RKwtscye\nwCzgNaRXdq+IiEslHQdMB/YFDq4N9yrpCOA8YBtgDfDZiJiXp40ljdy3HWnkvqkdbM/FU1Y5D+5k\nzabX6zTySg8DRpCeTAIgImZtYpndgd0jYpGkHUhDxR6bl18HfBuYVmskKOnNwMqIWClpNNAWEXvk\naQuA0yJigaRb8RjhZma9otfbaUi6GngdsAh4qTCpy6ARESuBlfnzKkn3AcMiYm4to3XzLyp8XQq0\n5CecXYEhEbGgsN1jAY8TbmbWx8rUaYwF9tuc23hJI4AxwF0lF5kE3B0RayUNB5YXpq0Ahvc0L2Zm\n1nNlgsY9wFDgsZ5sIBdNXQdMjYhVJeYfDZwPHNGT7ZmZWXXKBI1XA0tzvcLfc1pExNGbWjAXL10P\nXB0RN5aYfw/S2OMnRsQjOXkFsEdhtj1y2kamT5/+8ufW1lZaW1s3tUkzs0Glvb2d9vb2Hi9fpnFf\na0fpEdHlVpUqLWYCf4mI0zuYPg/4TOHtqZ2AnwHn1AcYSXcBU4AFwC24ItwahN+2soGukrenepiR\nw4E7gMWs7yX3bGBb4DJSBfezpC5JjpL0BeBM4MHCao6IiCcLr9y2kF65ndLB9hw0bEBxtyPWCHot\naEhaRQddomcRETv2IH+VcdCwgcbdjlgj6LVXbiNih97JkpmZNYsyFeFm1gPudsSaUWV1Gn3NxVM2\nELki3Aa6AVMR3tccNMzMuq+KrtHNzMwABw0zM+sGBw0zMyvNQcNsgGhra2P8+EmMHz+Jtra2/s6O\nWYdcEW42ALj1uPUXvz1l1oDcetz6i9+eMjOzyrhFuNkA4Nbj1ihcPGU2QHSn9bhbmltvcZ2GWZNz\npbn1JgcNsybnSnPrTa4INzOzylQWNCTtKWmepHsl3SNpSk4/Lqe9JOktdcucJelBSfdLGl9IHytp\nSZ52SVV5NmsE06adTEvLGaTRlGfmSvOT+ztbNkhUOdzr7sDuEbFI0g7A3cCxpNEA1wHfBqZFxG/z\n/PsBPwTeCgwHfgqMjIiQtAA4LSIWSLoVjxFug5wrwq239NrIfZsrIlYCK/PnVZLuA4ZFxFxIGa1z\nDHBNRKwFlkl6CDhE0h+AIRGxIM83ixR8bq9fgdlgMWHCBAcK6xd9UqchaQQwBriri9mGAcsL35eT\nnjjq01fkdDMz62OVN+7LRVPXAVMjYlWV25o+ffrLn1tbW2ltba1yc2ZmDae9vZ329vYeL1/pK7eS\ntgZuBm6LiIvrps1jwzqNMwEi4vz8/XbgHOAPwLyIGJXTTwDGRcQpdetznYaZWTcNmFdulSotZgBL\n6wNGcbbC55uAD0raRtJrgZHAglw38pykQ/I6TwRurCrfZmbWuSrfnjocuANYTHpjCuBsYFvgMmBX\n4FlgYUQclZc5G/gY8CKpOKstp48FrgRagFsjYkoH2/OThplZN7lFuJmZlTZgiqfMzKz5OGiYmVlp\nDhpmZlaag4aZvaytrY3x4ycxfvwk2tra+js7NgA5aJg1ubKBoDZOx5w5RzNnztFMnDjZgcM24ren\nzJpYdwZs8jgdg9OA6bDQzPrfhRdekQNGCgSrV6c0d3ZoPeWgYWZA6mJ9/vzJrF6dvqdxOmb2b6Zs\nwHHxlFkT6+544h6nY/Bxi3Az24ADgXXFQcPMzEpzNyJmZlYZBw0zMyvNQcPMzEpz0DAzs9IcNMzM\nrLQqh3vdU9I8SfdKukfSlJy+s6Q5kh6QNFvSTjl9O0nXSFosaWltzPA8baykJZIelHRJVXk2s/Lc\nueHgVOWTxlrg9IgYDRwKnCppFHAmMCci9gHm5u8AHwSIiAOAscAnJO2Vp10OnBQRI4GRko6sMN9m\ntgnu3HDwqixoRMTKiFiUP68C7gOGA0cDtb4JZgLH5s+PA9tL2hLYHlgDPCdpKDAkIhbk+WYVljGz\nfrBhn1apxXmtAaE1tz6p05A0AhgD3AXsFhFP5ElPALsBREQb8BwpeCwDLoiIZ0iBZnlhdStympk1\nKRd9DVyVd1goaQfgemBqRDwvrW94GBEhKfJ8HwZagKHAzsDPJc3tzramT5/+8ufW1lZaW1s3N/tm\n1oEqOzes7y9r/vzJXfaXZd3T3t5Oe3t7j5evtBsRSVsDNwO3RcTFOe1+oDUiVuaip3kRsa+kbwF3\nRsTVeb4ZwG3A/DzPqJx+AjAuIk6p25a7ETHrQ1X1aeVxPfrWgOlGROmRYgawtBYwsptYfzVMBm7M\nn+8H3pmX3Z5UeX5/RKwk1W0cktd5YmEZM+snEyZMYPbs65k9+/pSAcNFTs2hsicNSYcDdwCLgdpG\nzgIWANcCe5HqLo6PiGckbUsKMgeSgtn3IuLCvK6xwJWk4qtbI2JKB9vzk4bZANWdLtq72527bR73\ncmtmA053i5zcnXvf8XCvZtbwJkyY4EAxQDlomFnlPJRs83DxlJn1CRc5DUyu0zAzs9IGzCu3ZmbW\nfBw0zKxBFc44AAAK2ElEQVShuf1H33LxlJk1LLfp2Hyu0zCzQcNdjmw+12mYmVll3E7DzBqW23/0\nPRdPmVlDc/uPzeM6DTMzK811GmZmVhkHDTMzK81Bw8wGDTcE3HwOGmY2KNQaAs6ZczRz5hzNxImT\nuwwcDjAdq3K41z0lzZN0r6R7JE3J6TtLmiPpAUmzJe1UWOYASb/M8y+WtE1OHytpiaQHJV1SVZ7N\nrHldeOEVueX4ZCC1Iq+9dVWvuwFmMKnySWMtcHpEjCaN932qpFHAmcCciNgHmJu/I2kr4Crg5Ih4\nEzAOeDGv63LgpIgYCYyUdGSF+TazQa47AWawqSxoRMTKiFiUP68C7gOGA0cDtdY3M4Fj8+fxwOKI\nWJKXeToi1kkaCgyJiAV5vlmFZczMSpk27WRaWs4g/ezMzA0BT+7vbDWcPmkRLmkEMAa4C9gtIp7I\nk54Adsuf9wFC0u3Aq4EfRcQFpECzvLC6FTnNzKy0CRMmcMMNMwsNATvv2NAtzTtXedCQtANwPTA1\nIp6X1rchiYiQVGuRtxVwOHAQsBqYK+lu4Nmy25o+ffrLn1tbW2ltbd3c7JtZEyk79nh3AkyjaW9v\np729vcfLV9oiXNLWwM3AbRFxcU67H2iNiJW56GleROwr6QPAURHxkTzfF4AXgKvzPKNy+gnAuIg4\npW5bbhFuZtZNA6ZFuNIjxQxgaS1gZDexvh/jycCN+fNsYH9JLblSfBxwb0SsBJ6TdEhe54mFZczM\nrA9V9qQh6XDgDmAxUNvIWcAC4FpgL2AZcHxEPJOX+VCeJ4BbIqL2ZtVY4EqgBbg1IqZ0sD0/aZiZ\ndZM7LDQzs9IGTPGUmZk1HwcNMzMrzUHDzMxKc9AwM7PSHDTMzKw0Bw0zMyvNQcPMzEpz0DAzs9Ic\nNMzMrDQHDTMzK81Bw8zMSnPQMDOz0hw0zMysNAcNMzMrzUHDzMxKq3Lkvj0lzZN0r6R7JE3J6TtL\nmiPpAUmzJe1Ut9xeklZJmlZIGytpiaQHJV1SVZ7NzKxrVT5prAVOj4jRwKHAqZJGAWcCcyJiH2Bu\n/l70DeCWurTLgZMiYiQwUtKRFeZ7QNqcgeAbgfevcTXzvkHz7193VRY0ImJlRCzKn1cB9wHDgaOB\nmXm2mcCxtWUkHQs8DCwtpA0FhkTEgpw0q7jMYNHsF673r3E1875B8+9fd/VJnYakEcAY4C5gt4h4\nIk96Atgtz7MD8Dlget3iw4Hlhe8rcpqZmfWxyoNGDgbXA1Mj4vnitDyod21g7+nARRHxN6D0eLVm\nZtZ3lH63K1q5tDVwM3BbRFyc0+4HWiNiZS56mhcR+0q6A9gzL7oTsA74V+DHeZ5RefkTgHERcUrd\ntqrbETOzJhYRpW/Ut6oqE5IEzACW1gJGdhMwGfha/v9GgIh4e2HZc4DnI+Jb+ftzkg4BFgAnApfW\nb687O21mZj1TWdAADgM+DCyWtDCnnQWcD1wr6SRgGXB8iXV9ErgSaAFujYjbez23Zma2SZUWT5mZ\nWXNpihbhko6UdH9u/HdGf+ent0laJmmxpIWSFmx6iYFL0vckPSFpSSGtywafjaST/ZsuaXk+fwsb\nuZ1RTxvtNoou9q/hz6Gk7STdJWmRpKWSzsvp3Tp3Df+kIWlL4PfAu0mv4/4aOCEi7uvXjPUiSY8A\nYyPiqf7Oy+aS9DZgFTArIvbPaV8HnoyIr+eg/6qIqG/02RA62b9aHd03+jVzvUDS7sDuEbEovxl5\nN6nd1EdpgnPYxf4dTxOcQ0mviIi/SdoKmA98htR2rvS5a4YnjYOBhyJiWUSsBX4EHNPPeapCU1T0\nR8TPgafrkjtt8NloOtk/aJ7z1+1Gu42ki/2DJjiHuUkDwDbAlqRrtVvnrhmCxnDgj4Xvy2m+xn8B\n/FTSbyR9vL8zU4EOG3w2mU9J+p2kGY1adFOvTKPdRlbYv1/lpIY/h5K2kLSIdI7mRcS9dPPcNUPQ\naOzytXIOi4gxwFGkPrze1t8Zqkpdg89mcTnwWuDNwOPAhf2bnc3XjUa7DSnv33Wk/VtFk5zDiFgX\nEW8G9gDeLukdddM3ee6aIWisYH2jQPLn5Z3M25Ai4vH8/5+BG0hFcs3kiVyWXOtr7E/9nJ9eFRF/\nigz4Lg1+/nKj3euBqyLixpzcNOewsH9X1/av2c5hRDxL6hh2LN08d80QNH5D6vl2hKRtgA+QGhA2\nBUmvkDQkf94eGA8s6XqphlNr8AmFBp/NIv8h1kykgc9fiUa70MDnsLP9a4ZzKGnXWrGapBbgCGAh\n3Tx3Df/2FICko4CLSRU7MyLivH7OUq+R9FrS0wWkxpg/aOT9k3QNMA7YlVR++kXgJ8C1wF7kBp8R\n8Ux/5XFzdLB/5wCtpGKNAB4BPlEoQ24okg4H7gAWs74Y4yxSbw0Nfw472b+zgRNo8HMoaX9SRfcW\n+d9VEXGBpJ3pxrlriqBhZmZ9oxmKp8zMrI84aJiZWWkOGmZmVpqDhpmZleagYWZmpTlomJlZaQ4a\n1jAkrZP0b4Xvn8k9yPbGuq+UNKk31rWJ7RyXu6WeW2W+JO2tNDSyWa9y0LBGsgaYKGmX/L03Gxn1\neF25m+myTgL+OSLeVXL+nvbj9FrgH7uzQDf3wwYpBw1rJGuBK4DT6yfU35FLWpX/b5X0M0k3Svpv\nSedLOlHSAqWBrV5XWM27Jf1a0u8l/UNefktJF+T5fyfp5MJ6fy7pJ8C9HeTnhLz+JZLOz2lfJA2D\n/L08hkj9MmfkZRZJ+moH05fl1rtIOkjSvPx5nNYPDnR37mzvfOBtOW1q7t10U/txT+625pachyWS\nygzHbIOI7yys0XyLNO58/Y9u/d148fsBwL6ksQMeAb4TEQcrjcr2KVIQErB3RLxV0huAefn/ycAz\nef5tgfmSZuf1jgFGR8QfihuWNIz0o/0W4BlgtqRjIuLc3KvotIj4bd0yR5HGNTg4Il7opOvtzp44\npgGfjIhfSnoF8HfgDOAzEfG+vP6Ty+xHDrwrIqIWNHfsZJs2SPlJwxpK7oZ7FjClG4v9OiKeiIg1\nwENAW06/BxhRWzWp/x0i4iHgYVKgGQ/8k6SFpHEVdgbekJdZUB8wsreSxir4S0S8BPwAeHthekeD\n+bwL+F5EvJDz0J1+m34BXCTpU6RR117qYBtl92MxcER+Ijs8Ip7rRj5sEHDQsEZ0MaluYPtC2ovk\n61nSFqSRyWr+Xvi8rvB9HV0/bdfu7E+LiDH53+sj4qc5/a9dLFf80RYbPiV09sSwqZHhXt5HYLuX\nVxbxNdLxaAF+IemNnSy/yf2IiAdJTx5LgK9I+tdN5MkGGQcNazgR8TTpqeAk1v8ALyONDQCpmGfr\nbq5WwHFKXg+8Drif9FTyyVolsaR9chFQV34NjJO0i9IY9h8EfraJZeYAH81dViPpVR3Msww4KH8u\n1t+8PiLujYiv522/EXgOGFJYttR+KHUB/kJE/AD4N1IRm9nLXKdhjaR4h34hcFrh+3eAnygNZXk7\nsKqT5erXF4XPj5K6+N6R1PX1GknfJRVh/VaSSAPUTKSLt5oi4nFJZwLzSMHo5oj4zy53LKJN0puB\n30haQxog5wt1s30JmCHpOaC9sP2pua5kHanI7bY87aV8PL4PXFpyP/YHLpC0jvS22r90lW8bfNw1\nupmZlebiKTMzK81Bw8zMSnPQMDOz0hw0zMysNAcNMzMrzUHDzMxKc9AwM7PSHDTMzKy0/w+osM6y\nBaN7MwAAAABJRU5ErkJggg==\n", "text": [ "" ] } ], "prompt_number": 40 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 3" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's name the clusters 1**\n", "\n", "For each cluster, find the sentence closest to the centroid of the cluster. (You can use sklearn.metrics.pairwise_distances6 or scipy.spatial.distance3 [check pdist, cdist, and euclidean distance] to find distances to the centroid). KMeans has a cluster_centroids_ attribute. This sentence (closest to centroid) is now the name of the cluster. For each cluster, print the representative sentence, and print 'N people expressed a similar statement', or something like that relevant to your dataset. (This is very close to what amazon used to do in the reviews section up to a year ago.)\n", "\n", "Find the biggest 3 clusters, and print their representative sentences (This is close to what amazon is doing now in the reviews section, except they choose the sentence from the most helpful review instead of closest to center)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from sklearn.metrics.pairwise import pairwise_distances\n", "from collections import Counter\n", "\n", "def getIndicesOfClosest(distances, n=1):\n", " result = sorted(range(len(distances)), key=lambda i:distances[i])\n", " return result[n]\n", "\n", "for nrClusters in range(minNrClusters,minNrClusters+3):\n", " centroids = results[nrClusters]['cluster_centroids']\n", " groupedPredictions = Counter(results[nrClusters]['predictions'])\n", " sortedGroupedPredictions = [(l,k) for k,l in sorted([(j,i) for i,j in groupedPredictions.items()], reverse=True)]\n", " print \"<<<<<<<<<< For %d clusters >>>>>>>>>>\" % nrClusters\n", " for (clusterIndex,nrInstancesInCluster) in sortedGroupedPredictions[0:3]:\n", " c = centroids[clusterIndex]\n", " distances = pairwise_distances(docVectors, Y=[c], metric='cosine')\n", " indexOfClosest = getIndicesOfClosest(distances, 1)\n", " closestDocument = documents[indexOfClosest]\n", " print \"\\n<< central document in cluster %d (%d instances)>>\" % (clusterIndex,nrInstancesInCluster)\n", " print closestDocument[:100] + \" ...\"" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "<<<<<<<<<< For 5 clusters >>>>>>>>>>\n", "\n", "<< central document in cluster 4 (1609 instances)>>\n", "Amelida is 52 years old, married, and lives with her family in a rural area of the Oyot\u00fan district. ...\n", "\n", "<< central document in cluster 0 (664 instances)>>\n", "As a married parent of five children, Irene works hard to support her family.\n", "\n", "She has a general sto ...\n", "\n", "<< central document in cluster 1 (120 instances)>>\n", "Amelida is 52 years old, married, and lives with her family in a rural area of the Oyot\u00fan district. ...\n", "<<<<<<<<<< For 6 clusters >>>>>>>>>>\n", "\n", "<< central document in cluster 1 (1502 instances)>>" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Amelida is 52 years old, married, and lives with her family in a rural area of the Oyot\u00fan district. ...\n", "\n", "<< central document in cluster 0 (476 instances)>>\n", "Ma. Jory is 44 years old and married with five children.\n", "\n", "She works hard to provide for her family. ...\n", "\n", "<< central document in cluster 4 (190 instances)>>\n", "As a parent of two children, Paz works hard to support her family.She has a food vending business in ...\n", "<<<<<<<<<< For 7 clusters >>>>>>>>>>\n", "\n", "<< central document in cluster 2 (1381 instances)>>" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "Amelida is 52 years old, married, and lives with her family in a rural area of the Oyot\u00fan district. ...\n", "\n", "<< central document in cluster 4 (461 instances)>>\n", "Ma. Jory is 44 years old and married with five children.\n", "\n", "She works hard to provide for her family. ...\n", "\n", "<< central document in cluster 1 (205 instances)>>\n", "As a parent of two children, Paz works hard to support her family.She has a food vending business in ...\n" ] } ], "prompt_number": 41 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Let's name the clusters 2**\n", "\n", "Calculate the tf-idf of each word in each cluster (think of all sentences of a cluster together as a document). Represent each cluster with the top 1, or top 2 or... to 5 tf-idf words. For each cluster, print the name (keywords) of the cluster, and \"N statements\" in the cluster (N is the size of the cluster)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "\n", "for nrClusters in range(minNrClusters,minNrClusters+3):\n", " groupedPredictions = Counter(results[nrClusters]['predictions'])\n", " #print groupedPredictions\n", " \n", " print \"<<<<<<<<<< For %d clusters >>>>>>>>>>\" % nrClusters\n", "\n", " clusterSizeDocs = [\"\" for x in range(nrClusters)]\n", " for i,docIndex in enumerate(results[nrClusters]['predictions']):\n", " clusterSizeDocs[docIndex] += (documents[i])\n", "\n", " # Build a TFIDF weighted document-term matrix \n", " vectorizer = TfidfVectorizer(stop_words=\"english\", ngram_range=(1,1), tokenizer=word_tokenize, use_idf=True)\n", " clusterVectors = vectorizer.fit_transform(clusterSizeDocs)\n", "# print clusterVectors.shape\n", "\n", " for clusterIndex in range(clusterVectors.shape[0]):\n", " clusterVector = clusterVectors[clusterIndex,0:].todense()\n", "# print clusterVector\n", "# print clusterVector.shape\n", " sortedIndicesByTFIDF = sorted(range(clusterVector.shape[1]), key=lambda i:clusterVector.item(0,i), reverse=True)\n", "# print sortedIndicesByTFIDF\n", "# print \"Number of features: %d\" % len(vectorizer.get_feature_names())\n", " topTokens = [str(vectorizer.get_feature_names()[x]) for x in sortedIndicesByTFIDF[0:5]]\n", " print \"cluster %d (size %d) highest TFIDF terms:\" % (clusterIndex, dict(groupedPredictions)[clusterIndex]), topTokens" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "<<<<<<<<<< For 5 clusters >>>>>>>>>>\n", "cluster 0 (size 664) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', 'nwtf', ',', 'business', 'php']\n", "cluster 1 (size 120) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [',', '.', 'district', 'ducks', 'region']\n", "cluster 2 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'rwanda', '%', 'rwandans']\n", "cluster 3 (size 46) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', 'water', ',', 'community', 'chlorine']\n", "cluster 4 (size 1609) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'loan', 'business', 'years']\n", "<<<<<<<<<< For 6 clusters >>>>>>>>>>\n", "cluster 0 (size 476) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'nwtf', 'business', 'years']\n", "cluster 1 (size 1502) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'loan', 'business', 'years']\n", "cluster 2 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'rwanda', '%', 'rwandans']\n", "cluster 3 (size 179) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " [',', '.', 'district', 'som', 'loan']\n", "cluster 4 (size 190) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', 'business', 'nwtf', ',', 'hard']\n", "cluster 5 (size 92) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'anticipated', 'business', 'years']\n", "<<<<<<<<<< For 7 clusters >>>>>>>>>>\n", "cluster 0 (size 92) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'anticipated', 'business', 'years']\n", "cluster 1 (size 205) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', 'nwtf', 'business', ',', 'php']\n", "cluster 2 (size 1381) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'loan', 'business', 'years']\n", "cluster 3 (size 111) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'kgs', 'som', 'loan']\n", "cluster 4 (size 461) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'nwtf', 'business', 'php']\n", "cluster 5 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'rwanda', '%', 'rwandans']\n", "cluster 6 (size 189) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['.', ',', 'water', 'community', 'district']\n" ] } ], "prompt_number": 42 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's name the clusters 3 Same as the previous challenge, but this time, calculate tf-idf only for nouns (NN tag) and build keyword(s) with nouns. (This is close to what amazon switched to last year, before settling into the current design). (They would show five nouns, you would click on one and it would show sentences - linked to the reviews- that were related to that noun.)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nltk.tag import pos_tag\n", "from nltk.tokenize import sent_tokenize, word_tokenize\n", "import numpy as np\n", "\n", "vocabulary = {}\n", "for i,d in enumerate(documents[:nrDocs]):\n", " sentences = sent_tokenize(d)\n", " for si,s in enumerate(sentences):\n", " tokens = pos_tag(word_tokenize(s))\n", " nouns = [token for token in tokens if token[1] == 'NN']\n", " for noun in nouns:\n", " if vocabulary.has_key(noun):\n", " vocabulary[noun] += 1\n", " else:\n", " vocabulary[noun] = 1\n", " if i % (nrDocs/20) == 0:\n", " print >> sys.stderr, \"Tokenized sentences and words from document #%d out of %d\" % (i,nrDocs)\n", "\n", "voc = [entry[0] for entry in sorted(vocabulary.keys(), key=lambda x:vocabulary[x], reverse=True)]\n", "print \"Vocabulary (%d clusters)\" % nrClusters, voc[0:10]\n", "\n", "for nrClusters in range(minNrClusters,minNrClusters+10):\n", " groupedPredictions = Counter(results[nrClusters]['predictions'])\n", " #print groupedPredictions\n", " \n", " print \"<<<<<<<<<< For %d clusters >>>>>>>>>>\" % nrClusters\n", "\n", " clusterSizeDocs = [\"\" for x in range(nrClusters)]\n", " for i,docIndex in enumerate(results[nrClusters]['predictions']):\n", " clusterSizeDocs[docIndex] += (documents[i])\n", " \n", " # Build a TFIDF weighted document-term matrix \n", " vectorizer = TfidfVectorizer(stop_words=\"english\", ngram_range=(1,1), vocabulary=voc, use_idf=True)\n", " clusterVectors = vectorizer.fit_transform(clusterSizeDocs)\n", "# print clusterVectors.shape\n", "\n", " for clusterIndex in range(clusterVectors.shape[0]):\n", " clusterVector = clusterVectors[clusterIndex,0:].todense()\n", "# print clusterVector\n", "# print clusterVector.shape\n", " sortedIndicesByTFIDF = sorted(range(clusterVector.shape[1]), key=lambda i:clusterVector.item(0,i), reverse=True)\n", "# print sortedIndicesByTFIDF\n", "# print \"Number of features: %d\" % len(vectorizer.get_feature_names())\n", " topTokens = [str(vectorizer.get_feature_names()[x]) for x in sortedIndicesByTFIDF[0:5]]\n", " print \"cluster %d (size %d) highest TFIDF terms:\" % (clusterIndex, dict(groupedPredictions)[clusterIndex]), topTokens" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "Tokenized sentences and words from document #0 out of 2500\n", "Tokenized sentences and words from document #125 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #250 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #375 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #500 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #625 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #750 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #875 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1000 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1125 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1250 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1375 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1500 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1625 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1750 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #1875 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #2000 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #2125 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #2250 out of 2500" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Tokenized sentences and words from document #2375 out of 2500" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Vocabulary (7 clusters) [u'business', u'loan', u'family', u'income', u'future', u'school', u'money', u'store', u'husband', u'group']\n", "<<<<<<<<<< For 5 clusters >>>>>>>>>>\n", "cluster 0 (size 664) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'future', 'loan', 'save', 'hard']\n", "cluster 1 (size 120) highest TFIDF terms: ['district', 'quality', 'region', 'northern', 'business']\n", "cluster 2 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'kiva', 'export', 'gross']\n", "cluster 3 (size 46) highest TFIDF terms: ['water', 'community', 'chlorine', 'dispenser', 'carbon']\n", "cluster 4 (size 1609) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'buy']\n", "<<<<<<<<<< For 6 clusters >>>>>>>>>>" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "cluster 0 (size 476) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'loan', 'save', 'future', 'hard']\n", "cluster 1 (size 1502) highest TFIDF terms: ['loan', 'business', 'family', 'group', 'income']\n", "cluster 2 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'kiva', 'export', 'gross']\n", "cluster 3 (size 179) highest TFIDF terms: ['district', 'som', 'business', 'loan', 'income']\n", "cluster 4 (size 190) highest TFIDF terms: ['business', 'hard', 'sustaining', 'borrowing', 'family']\n", "cluster 5 (size 92) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'loan', 'use', 'future', 'educate']\n", "<<<<<<<<<< For 7 clusters >>>>>>>>>>\n", "cluster 0 (size 92) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'loan', 'use', 'future', 'old']\n", "cluster 1 (size 205) highest TFIDF terms: ['business', 'sustaining', 'hard', 'borrowing', 'family']\n", "cluster 2 (size 1381) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'buy']\n", "cluster 3 (size 111) highest TFIDF terms: ['som', 'loan', 'income', 'farm', 'bank']\n", "cluster 4 (size 461) highest TFIDF terms: ['business', 'loan', 'save', 'future', 'hard']\n", "cluster 5 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "cluster 6 (size 189) highest TFIDF terms: ['water', 'community', 'district', 'chlorine', 'province']\n", "<<<<<<<<<< For 8 clusters >>>>>>>>>>\n", "cluster 0 (size 379) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'future', 'loan', 'buy', 'save']\n", "cluster 1 (size 165) highest TFIDF terms: ['store', 'loan', 'business', 'sell', 'provide']\n", "cluster 2 (size 482) highest TFIDF terms: ['loan', 'business', 'family', 'quality', 'income']\n", "cluster 3 (size 46) highest TFIDF terms: ['water', 'community', 'chlorine', 'dispenser', 'carbon']\n", "cluster 4 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'maize']\n", "cluster 5 (size 79) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['husband', 'loan', 'kiva', 'partner', 'field']\n", "cluster 6 (size 121) highest TFIDF terms: ['business', 'hard', 'sustaining', 'family', 'parent']\n", "cluster 7 (size 1167) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'old']\n", "<<<<<<<<<< For 9 clusters >>>>>>>>>>\n", "cluster 0 (size 207) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'sustaining', 'borrowing', 'hard', 'dream']\n", "cluster 1 (size 92) highest TFIDF terms: ['som', 'loan', 'income', 'bank', 'livestock']\n", "cluster 2 (size 104) highest TFIDF terms: ['district', 'quality', 'region', 'northern', 'province']\n", "cluster 3 (size 1434) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'group']\n", "cluster 4 (size 166) highest TFIDF terms: ['store', 'loan', 'business', 'provide', 'sell']\n", "cluster 5 (size 37) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 6 (size 92) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'educate']\n", "cluster 7 (size 307) highest TFIDF terms: ['business', 'future', 'loan', 'save', 'buy']\n", "cluster 8 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "<<<<<<<<<< For 10 clusters >>>>>>>>>>\n", "cluster 0 (size 163) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['store', 'loan', 'business', 'provide', 'save']\n", "cluster 1 (size 203) highest TFIDF terms: ['business', 'save', 'future', 'entrepreneur', 'send']\n", "cluster 2 (size 104) highest TFIDF terms: ['district', 'quality', 'region', 'northern', 'province']\n", "cluster 3 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "cluster 4 (size 37) highest TFIDF terms: ['water', 'community', 'chlorine', 'dispenser', 'carbon']\n", "cluster 5 (size 1449) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'family', 'group', 'income']\n", "cluster 6 (size 92) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'educate']\n", "cluster 7 (size 120) highest TFIDF terms: ['business', 'hard', 'dream', 'parent', 'sustaining']\n", "cluster 8 (size 195) highest TFIDF terms: ['business', 'future', 'loan', 'family', 'buy']\n", "cluster 9 (size 76) highest TFIDF terms: ['som', 'loan', 'bank', 'income', 'business']\n", "<<<<<<<<<< For 11 clusters >>>>>>>>>>\n", "cluster 0 (size 76) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['som', 'loan', 'bank', 'income', 'business']\n", "cluster 1 (size 92) highest TFIDF terms: ['business', 'loan', 'use', 'future', 'old']\n", "cluster 2 (size 850) highest TFIDF terms: ['loan', 'family', 'business', 'buy', 'old']\n", "cluster 3 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'kiva']\n", "cluster 4 (size 107) highest TFIDF terms: ['loan', 'husband', 'kiva', 'family', 'business']\n", "cluster 5 (size 29) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'electricity', 'access', 'selling', 'water']\n", "cluster 6 (size 464) highest TFIDF terms: ['loan', 'business', 'group', 'buy', 'family']\n", "cluster 7 (size 439) highest TFIDF terms: ['business', 'hard', 'buy', 'save', 'future']\n", "cluster 8 (size 37) highest TFIDF terms: ['water', 'community', 'chlorine', 'dispenser', 'carbon']\n", "cluster 9 (size 120) highest TFIDF terms: ['district', 'quality', 'region', 'northern', 'province']\n", "cluster 10 (size 225) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['store', 'loan', 'business', 'save', 'old']\n", "<<<<<<<<<< For 12 clusters >>>>>>>>>>\n", "cluster 0 (size 329) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'future', 'save', 'loan', 'buy']\n", "cluster 1 (size 603) highest TFIDF terms: ['loan', 'business', 'family', 'buy', 'old']\n", "cluster 2 (size 196) highest TFIDF terms: ['loan', 'buy', 'biodigester', 'old', 'family']\n", "cluster 3 (size 29) highest TFIDF terms: ['business', 'electricity', 'access', 'boost', 'school']\n", "cluster 4 (size 104) highest TFIDF terms: ['district', 'northern', 'quality', 'region', 'province']\n", "cluster 5 (size 570) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'group', 'family', 'income']\n", "cluster 6 (size 163) highest TFIDF terms: ['store', 'loan', 'provide', 'hard', 'sell']\n", "cluster 7 (size 133) highest TFIDF terms: ['som', 'loan', 'income', 'business', 'livestock']\n", "cluster 8 (size 46) highest TFIDF terms: ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 9 (size 174) highest TFIDF terms: ['business', 'sustaining', 'hard', 'borrowing', 'family']\n", "cluster 10 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['agriculture', 'opportunity', 'export', 'gross', 'kiva']\n", "cluster 11 (size 92) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'old']\n", "<<<<<<<<<< For 13 clusters >>>>>>>>>>\n", "cluster 0 (size 722) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'family', 'old', 'buy']\n", "cluster 1 (size 408) highest TFIDF terms: ['business', 'loan', 'group', 'buy', 'family']\n", "cluster 2 (size 77) highest TFIDF terms: ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 3 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'proposition']\n", "cluster 4 (size 252) highest TFIDF terms: ['business', 'future', 'loan', 'save', 'buy']\n", "cluster 5 (size 76) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['som', 'loan', 'bank', 'income', 'business']\n", "cluster 6 (size 173) highest TFIDF terms: ['business', 'sustaining', 'borrowing', 'hard', 'family']\n", "cluster 7 (size 76) highest TFIDF terms: ['business', 'attainment', 'pig', 'old', 'save']\n", "cluster 8 (size 76) highest TFIDF terms: ['loan', 'family', 'income', 'province', 'living']\n", "cluster 9 (size 165) highest TFIDF terms: ['store', 'loan', 'business', 'provide', 'sell']\n", "cluster 10 (size 120) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['district', 'quality', 'region', 'northern', 'province']\n", "cluster 11 (size 73) highest TFIDF terms: ['husband', 'kiva', 'loan', 'rickshaw', 'partner']\n", "cluster 12 (size 221) highest TFIDF terms: ['loan', 'business', 'farming', 'use', 'poultry']\n", "<<<<<<<<<< For 14 clusters >>>>>>>>>>\n", "cluster 0 (size 17) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'poor', 'fund', 'loan', 'hardship']\n", "cluster 1 (size 483) highest TFIDF terms: ['loan', 'business', 'income', 'group', 'family']\n", "cluster 2 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "cluster 3 (size 122) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'educate']\n", "cluster 4 (size 112) highest TFIDF terms: ['water', 'community', 'chlorine', 'carbon', 'dispenser']\n", "cluster 5 (size 164) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['store', 'loan', 'save', 'provide', 'business']\n", "cluster 6 (size 76) highest TFIDF terms: ['som', 'loan', 'bank', 'income', 'business']\n", "cluster 7 (size 63) highest TFIDF terms: ['husband', 'kiva', 'partner', 'loan', 'field']\n", "cluster 8 (size 68) highest TFIDF terms: ['business', 'sustaining', 'borrowing', 'entrepreneur', 'income']\n", "cluster 9 (size 104) highest TFIDF terms: ['district', 'quality', 'northern', 'region', 'province']\n", "cluster 10 (size 782) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'family', 'old', 'buy']\n", "cluster 11 (size 12) highest TFIDF terms: ['solar', 'energy', 'cell', 'lighting', 'hard']\n", "cluster 12 (size 317) highest TFIDF terms: ['business', 'future', 'save', 'loan', 'buy']\n", "cluster 13 (size 119) highest TFIDF terms: ['business', 'hard', 'sustaining', 'parent', 'borrowing']\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n" ] } ], "prompt_number": 45 }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 6" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cluster the same data with [MiniBatchKMeans](http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html). MiniBatchKMeans is a fast way to apply K-means to large data without much loss -- The results are very similar. Instead of using EVERY single point to find the new place of the centroid, MiniBatch just randomly samples a small number (like 100) in the cluster to calculate the new center. Since this is usually very close to the actual center, the algorithm gets there much faster. Try it and compare the results.\n", "\n", "[Example on two-feature data](http://scikit-learn.org/stable/auto_examples/cluster/plot_mini_batch_kmeans.html)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nltk.tag import pos_tag\n", "from nltk.tokenize import sent_tokenize, word_tokenize\n", "import numpy as np\n", "from sklearn.cluster import MiniBatchKMeans\n", "\n", "results2 = {}\n", "\n", "for nrClusters in range(minNrClusters,maxNrClusters+1):\n", " print >> sys.stderr, \"Fitting MiniBatchKMeans model with %s clusters\" % nrClusters\n", " results2[nrClusters] = {}\n", " model = MiniBatchKMeans(n_clusters=nrClusters, batch_size=250).fit(docVectors)\n", " results2[nrClusters]['inertia'] = model.inertia_\n", " results2[nrClusters]['predictions'] = model.predict(docVectors)\n", " results2[nrClusters]['cluster_centroids'] = model.cluster_centers_\n", "\n", "for nrClusters in range(minNrClusters,minNrClusters+10):\n", " groupedPredictions2 = Counter(results2[nrClusters]['predictions'])\n", " print groupedPredictions2\n", " \n", " print \"<<<<<<<<<< For %d clusters >>>>>>>>>>\" % nrClusters\n", "\n", " clusterSizeDocs2 = [\"\" for x in range(nrClusters)]\n", " for i,docIndex in enumerate(results2[nrClusters]['predictions']):\n", " clusterSizeDocs2[docIndex] += (documents[i])\n", " \n", " # Build a TFIDF weighted document-term matrix \n", " vectorizer = TfidfVectorizer(stop_words=\"english\", ngram_range=(1,1), vocabulary=voc, use_idf=True)\n", " clusterVectors2 = vectorizer.fit_transform(clusterSizeDocs2)\n", "# print clusterVectors.shape\n", "\n", " for clusterIndex in range(clusterVectors2.shape[0]):\n", " clusterVector2 = clusterVectors2[clusterIndex,0:].todense()\n", "# print clusterVector2\n", "# print clusterVector2.shape\n", " sortedIndicesByTFIDF = sorted(range(clusterVector2.shape[1]), key=lambda i:clusterVector2.item(0,i), reverse=True)\n", "# print sortedIndicesByTFIDF\n", "# print \"Number of features: %d\" % len(vectorizer.get_feature_names())\n", " topTokens = [str(vectorizer.get_feature_names()[x]) for x in sortedIndicesByTFIDF[0:5]]\n", " if dict(groupedPredictions2).has_key(clusterIndex):\n", " print \"cluster %d (size %d) highest TFIDF terms:\" % (clusterIndex, dict(groupedPredictions2)[clusterIndex]), topTokens\n", " else:\n", " print \"cluster %d is empty\" % clusterIndex" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stderr", "text": [ "Fitting MiniBatchKMeans model with 5 clusters\n", "Fitting MiniBatchKMeans model with 6 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 7 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 8 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 9 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 10 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 11 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 12 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 13 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 14 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 15 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 16 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 17 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 18 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 19 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 20 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 21 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 22 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 23 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 24 clusters" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n", "Fitting MiniBatchKMeans model with 25 clusters" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "Counter({1: 1571, 2: 664, 3: 152, 0: 76, 4: 37})\n", "<<<<<<<<<< For 5 clusters >>>>>>>>>>\n", "cluster 0 (size 76) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['som', 'loan', 'business', 'income', 'bank']\n", "cluster 1 (size 1571) highest TFIDF terms: ['loan', 'business', 'family', 'buy', 'old']\n", "cluster 2 (size 664) highest TFIDF terms: ['business', 'loan', 'hard', 'save', 'sell']\n", "cluster 3 (size 152) highest TFIDF terms: ['business', 'maize', 'loan', 'use', 'agriculture']\n", "cluster 4 (size 37) highest TFIDF terms: ['water', 'community', 'chlorine', 'dispenser', 'carbon']\n", "Counter({2: 941, 3: 797, 4: 443, 0: 201, 1: 61, 5: 57})\n", "<<<<<<<<<< For 6 clusters >>>>>>>>>>\n", "cluster 0 (size 201) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'sustaining', 'hard', 'borrowing', 'family']\n", "cluster 1 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'maize']\n", "cluster 2 (size 941) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'old']\n", "cluster 3 (size 797) highest TFIDF terms: ['loan', 'business', 'family', 'group', 'income']\n", "cluster 4 (size 443) highest TFIDF terms: ['business', 'loan', 'save', 'future', 'old']\n", "cluster 5 (size 57) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['husband', 'loan', 'rickshaw', 'business', 'providing']\n", "Counter({1: 829, 2: 753, 3: 372, 6: 290, 5: 157, 4: 62, 0: 37})\n", "<<<<<<<<<< For 7 clusters >>>>>>>>>>\n", "cluster 0 (size 37) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 1 (size 829) highest TFIDF terms: ['loan', 'business', 'buy', 'old', 'family']\n", "cluster 2 (size 753) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'buy']\n", "cluster 3 (size 372) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'loan', 'future', 'old', 'save']\n", "cluster 4 (size 62) highest TFIDF terms: ['loan', 'university', 'family', 'taking', 'student']\n", "cluster 5 (size 157) highest TFIDF terms: ['loan', 'group', 'family', 'husband', 'kiva']\n", "cluster 6 (size 290) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'family', 'hard', 'sustaining', 'loan']\n", "Counter({6: 1023, 1: 655, 4: 536, 7: 92, 3: 62, 5: 61, 2: 47, 0: 24})\n", "<<<<<<<<<< For 8 clusters >>>>>>>>>>\n", "cluster 0 (size 24) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['rickshaw', 'husband', 'maintenance', 'loan', 'driver']\n", "cluster 1 (size 655) highest TFIDF terms: ['business', 'hard', 'loan', 'save', 'family']\n", "cluster 2 (size 47) highest TFIDF terms: ['group', 'loan', 'province', 'income', 'family']\n", "cluster 3 (size 62) highest TFIDF terms: ['loan', 'family', 'house', 'city', 'group']\n", "cluster 4 (size 536) highest TFIDF terms: ['business', 'loan', 'family', 'income', 'quality']\n", "cluster 5 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "cluster 6 (size 1023) highest TFIDF terms: ['loan', 'business', 'family', 'community', 'buy']\n", "cluster 7 (size 92) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'educate']\n", "Counter({2: 1439, 1: 414, 0: 171, 3: 110, 5: 100, 7: 79, 6: 76, 4: 61, 8: 50})\n", "<<<<<<<<<< For 9 clusters >>>>>>>>>>\n", "cluster 0 (size 171) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['store', 'loan', 'business', 'provide', 'hard']\n", "cluster 1 (size 414) highest TFIDF terms: ['business', 'hard', 'loan', 'family', 'buy']\n", "cluster 2 (size 1439) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'group']\n", "cluster 3 (size 110) highest TFIDF terms: ['water', 'community', 'chlorine', 'village', 'dispenser']\n", "cluster 4 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'reason']\n", "cluster 5 (size 100) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'use', 'loan', 'future', 'old']\n", "cluster 6 (size 76) highest TFIDF terms: ['som', 'loan', 'income', 'business', 'bank']\n", "cluster 7 (size 79) highest TFIDF terms: ['business', 'attainment', 'fattening', 'save', 'sustain']\n", "cluster 8 (size 50) highest TFIDF terms: ['mercy', 'born', 'somoni', 'business', 'loan']\n", "Counter({2: 1007, 3: 361, 5: 285, 0: 260, 1: 140, 7: 138, 9: 119, 4: 79, 6: 61, 8: 50})\n", "<<<<<<<<<< For 10 clusters >>>>>>>>>>\n", "cluster 0 (size 260) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'future', 'entrepreneur', 'loan', 'buy']\n", "cluster 1 (size 140) highest TFIDF terms: ['water', 'community', 'chlorine', 'carbon', 'dispenser']\n", "cluster 2 (size 1007) highest TFIDF terms: ['loan', 'business', 'family', 'old', 'buy']\n", "cluster 3 (size 361) highest TFIDF terms: ['loan', 'business', 'district', 'quality', 'family']\n", "cluster 4 (size 79) highest TFIDF terms: ['som', 'loan', 'income', 'business', 'bank']\n", "cluster 5 (size 285) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'store', 'loan', 'save', 'old']\n", "cluster 6 (size 61) highest TFIDF terms: ['opportunity', 'agriculture', 'export', 'gross', 'kiva']\n", "cluster 7 (size 138) highest TFIDF terms: ['loan', 'business', 'use', 'income', 'old']\n", "cluster 8 (size 50) highest TFIDF terms: ['shown', 'loan', 'house', 'requesting', 'photograph']\n", "cluster 9 (size 119) highest TFIDF terms: ['business', 'hard', 'sustaining', 'family', 'work']\n", "Counter({8: 764, 5: 537, 3: 461, 10: 205, 7: 177, 1: 92, 2: 82, 6: 76, 4: 61, 9: 34, 0: 11})\n", "<<<<<<<<<< For 11 clusters >>>>>>>>>>\n", "cluster 0 (size 11) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['guide', 'business', 'learner', 'loan', 'household']\n", "cluster 1 (size 92) highest TFIDF terms: ['business', 'loan', 'use', 'future', 'buy']\n", "cluster 2 (size 82) highest TFIDF terms: ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 3 (size 461) highest TFIDF terms: ['business', 'loan', 'save', 'future', 'hard']\n", "cluster 4 (size 61) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['agriculture', 'opportunity', 'export', 'gross', 'profitability']\n", "cluster 5 (size 537) highest TFIDF terms: ['loan', 'business', 'family', 'buy', 'old']\n", "cluster 6 (size 76) highest TFIDF terms: ['som', 'loan', 'income', 'business', 'bank']\n", "cluster 7 (size 177) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'husband', 'business', 'income', 'family']\n", "cluster 8 (size 764) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'buy']\n", "cluster 9 (size 34) highest TFIDF terms: ['loan', 'village', 'community', 'duck', 'packet']\n", "cluster 10 (size 205) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'sustaining', 'hard', 'borrowing', 'family']\n", "Counter({4: 786, 1: 492, 8: 295, 11: 233, 0: 189, 3: 124, 7: 124, 2: 119, 9: 54, 6: 52, 10: 23, 5: 9})\n", "<<<<<<<<<< For 12 clusters >>>>>>>>>>\n", "cluster 0 (size 189) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'future', 'loan', 'entrepreneur', 'save']\n", "cluster 1 (size 492) highest TFIDF terms: ['loan', 'business', 'family', 'income', 'work']\n", "cluster 2 (size 119) highest TFIDF terms: ['business', 'hard', 'parent', 'family', 'sustaining']\n", "cluster 3 (size 124) highest TFIDF terms: ['business', 'future', 'save', 'borrowing', 'buy']\n", "cluster 4 (size 786) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'family', 'water', 'community']\n", "cluster 5 (size 9) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'kiva']\n", "cluster 6 (size 52) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'kiva']\n", "cluster 7 (size 124) highest TFIDF terms: ['business', 'use', 'loan', 'future', 'educate']\n", "cluster 8 (size 295) highest TFIDF terms: ['group', 'loan', 'family', 'business', 'income']\n", "cluster 9 (size 54) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['group', 'loan', 'city', 'business', 'bank']\n", "cluster 10 (size 23) highest TFIDF terms: ['rickshaw', 'husband', 'loan', 'driver', 'profession']\n", "cluster 11 (size 233) highest TFIDF terms: ['store', 'business', 'loan', 'hard', 'save']\n", "Counter({0: 1814, 1: 598, 3: 61, 9: 18, 7: 5, 4: 1, 5: 1, 6: 1, 12: 1})\n", "<<<<<<<<<< For 13 clusters >>>>>>>>>>\n", "cluster 0 (size 1814) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'loan', 'family', 'income', 'old']\n", "cluster 1 (size 598) highest TFIDF terms: ['business', 'hard', 'future', 'save', 'loan']\n", "cluster 2 is empty\n", "cluster 3 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'kiva', 'export', 'gross']\n", "cluster 4 (size 1) highest TFIDF terms: ['district', 'home', 'life', 'province', 'experience']\n", "cluster 5 (size 1) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['project', 'start', 'job', 'head', 'manufacturing']\n", "cluster 6 (size 1) highest TFIDF terms: ['business', 'loan', 'family', 'income', 'future']\n", "cluster 7 (size 5) highest TFIDF terms: ['business', 'maize', 'harvest', 'sell', 'hope']\n", "cluster 8 is empty\n", "cluster 9 (size 18) highest TFIDF terms: ['biodigester', 'farm', 'gas', 'biogas', 'waste']\n", "cluster 10 is empty" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "cluster 11 is empty\n", "cluster 12 (size 1) highest TFIDF terms: ['maize', 'member', 'harvest', 'sell', 'percent']\n", "Counter({0: 1140, 3: 369, 9: 173, 11: 141, 13: 106, 2: 92, 4: 80, 8: 76, 10: 75, 1: 70, 7: 61, 5: 44, 12: 44, 6: 29})\n", "<<<<<<<<<< For 14 clusters >>>>>>>>>>\n", "cluster 0 (size 1140) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'business', 'family', 'buy', 'group']\n", "cluster 1 (size 70) highest TFIDF terms: ['kiva', 'husband', 'partner', 'loan', 'field']\n", "cluster 2 (size 92) highest TFIDF terms: ['business', 'loan', 'use', 'educate', 'future']\n", "cluster 3 (size 369) highest TFIDF terms: ['business', 'loan', 'save', 'future', 'buy']\n", "cluster 4 (size 80) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['loan', 'fellowship', 'university', 'student', 'god']\n", "cluster 5 (size 44) highest TFIDF terms: ['portion', 'group', 'loan', 'province', 'income']\n", "cluster 6 (size 29) highest TFIDF terms: ['business', 'access', 'electricity', 'school', 'water']\n", "cluster 7 (size 61) highest TFIDF terms: ['agriculture', 'opportunity', 'export', 'gross', 'kiva']\n", "cluster 8 (size 76) highest TFIDF terms: ['som', 'loan', 'bank', 'income', 'livestock']\n", "cluster 9 (size 173) highest TFIDF terms:" ] }, { "output_type": "stream", "stream": "stdout", "text": [ " ['business', 'sustaining', 'borrowing', 'hard', 'family']\n", "cluster 10 (size 75) highest TFIDF terms: ['loan', 'family', 'buy', 'old', 'growing']\n", "cluster 11 (size 141) highest TFIDF terms: ['store', 'loan', 'business', 'provide', 'sell']\n", "cluster 12 (size 44) highest TFIDF terms: ['water', 'chlorine', 'community', 'dispenser', 'carbon']\n", "cluster 13 (size 106) highest TFIDF terms: ['district', 'region', 'quality', 'northern', 'province']\n" ] }, { "output_type": "stream", "stream": "stderr", "text": [ "\n" ] } ], "prompt_number": 55 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see a less even distribution of documents over the various clusters; some clusters are even empty" ] }, { "cell_type": "heading", "level": 2, "metadata": {}, "source": [ "Challenge 7" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Switch the init parameter to \"random\" (instead of the default kmeans++) and plot the inertia curve for each of the n_init values for K-Means: 1, 2, 3, 10 (n_init is the number of different runs to try with different random initializations)" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from nltk.tag import pos_tag\n", "from nltk.tokenize import sent_tokenize, word_tokenize\n", "import numpy as np\n", "from sklearn.cluster import MiniBatchKMeans\n", "\n", "results3 = {}\n", "\n", "for nrClusters in range(minNrClusters,minNrClusters+5):\n", " results3[nrClusters] = {}\n", " for nInit in range(1,20):\n", "# print >> sys.stderr, \"Fitting random-init MiniBatchKMeans model with %s clusters and n_init = %d\" % (nrClusters,nInit)\n", " results3[nrClusters][nInit] = {}\n", " model = MiniBatchKMeans(init='random' , n_init=nInit, n_clusters=nrClusters, batch_size=250).fit(docVectors)\n", " if model.inertia_:\n", " results3[nrClusters][nInit]['inertia'] = model.inertia_\n", " else:\n", " raise(\"Something's wrong\")\n", "# results3[nrClusters][nInit]['predictions'] = model.predict(docVectors)\n", "# results3[nrClusters][nInit]['cluster_centroids'] = model.cluster_centers_\n", "\n", "%matplotlib inline\n", "\n", "from matplotlib import pyplot as plt\n", "\n", "X = [n for n in sorted(results3[9].keys())]\n", "Y = [results3[9][n]['inertia'] for n in X]\n", "\n", "plt.figure()\n", "plt.scatter(X,Y)\n", "plt.title(\"Metric of goodness for MiniBatchKMeans document clustering\")\n", "plt.xlabel(\"n_init\")\n", "plt.ylabel(\"Inertia\")\n", "plt.show()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZoAAAEaCAYAAAAotpG7AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XuYXFWd7vHvCwbJyEUQFRKCASciQdSYOUYHlFYmF3wU\niDmKzIBBUETUcJioCHig1RF1PGhAB0YUJeiIw4BkIibpBE2PokJGTAiQMIDIJYEAyjUaNZDf+WOt\nTjZFdXd10qu7qng/z1NP71r7tvauVftX67J3KyIwMzMrZbvhzoCZmbU3BxozMyvKgcbMzIpyoDEz\ns6IcaMzMrCgHGjMzK+o5F2gknSHpGwW2+21Jj0i6frC3PcB8bJK033DmoYekgyXdIelJSUcM4X7/\nQVLXYC872JrpsxpKko6X9LPhzkczkNQt6cQmyMebJN1WavtNEWgk3S3pz5JeVJO+PH8Z92lgGx2S\n7utvuYj4fER8YFvyW2ffbwL+DhgVEW8YzG23uM8AF0TEzhExf1s3JunSXB6OqEn/Sk6fCRAR/xYR\nUxvZZu2yeTvrc3B8WNL3JO3aYP4aKoMNbusZF2NJu0j6uaT/kDSi0XNh5eXr11u3cvXIr23Z/zYH\n7oj4WUS8clu20ZemCDSkE30XcExPgqSDgJFs44dQJWn7wdpWjZcBd0fEnwptv1XtA6zamhV7+awC\nuB14b2W55wHvBu5k8MrKqyNiZ2A/YDegc5C2u1Uk7Qb8GPgtcHREbGTozoX1LwANdya2Vi43RTVL\noAH4LpUvDTATuIzKByjp+ZL+n6R7JK2TdJGkHSW9AFgIjMq/RJ+QtJekTklXSvqOpMeB43Padyrb\nPETSLyQ9Kune3n4JSholab6k3+fmoPfn9BOBbwBvzPs+p86620k6L/9CvkvSR/Kvzu362nblmOdI\nWptfX5G0Q2X+xyXdL2mNpBNq9nuppH+RdE0+J9dXm2okvVLSkrzf2yS9qzLvbZJuzeutkTQ7p++R\nt/doXu+nkp71JZP0G9KF+od5GyP6Oc7az6q3X+Q/BA6R9ML8fhpwE/BgZVu1tYFNkj4o6fac76/1\ntmxVRDyZ9ze+svz7JK3Kx/QbSSfl9HplcE9J20s6U9KdOe1XkkZXdjO5Xr4q+3sxsBRYGRHHRsSm\nBs9F9XtzQs7zI5IWqdJCIOn8XO4fz3k7pDKvU9IVkubmvN8iaWJl/um5bDyRy0/dX/WSXpQ/98cl\n3QC8vGb+30r6b0mPSVom6Y2VebsrNUuvzfm/Oqc/63NTpSkyl/0LJS3In8fP8udxfj7XqyW9trLu\nKElXSXpI6Tv60UbOg9K1ZB9SOX9S0sd6OQdHSlqRz8GdkqbUWab22jRWz7xOHJ/L3BM5j38v6ZXA\nv7Ll+vNIXrbutTLP68if2yckPQBcoprauFItbbakm/Ln8n1Jz6/M/4S2XHfer/6agSNi2F+kX2qH\nAbcBrwS2B+7LH+AmYJ+83FeAecALgZ2A+cC5ed6hwH012+0E/gIckd/vCJwDfCe/fxnwBHB03ufu\nwGt6yeNPga8BOwCvAR4C3pLnzQR+1sfxnQzcCozKeb8WeBrYroFtfwb4BbBHfv0c+EyeNw1YR7oQ\n/hXwvXy+9svzLwV+B/xNPr7vApfneS/I53gm6QfHa4GHgVfm+Q8AB+fpXYEJefrzwEV5e9v3LNPH\n5/rWBs/hsz6rOtv7NvBZ4OvAyTntCuA9wM+A9+a046ufRz4n84FdgDF5v1P7WPbleXo3YDHQWZn/\nNmDfPP1m4A+Vc1OvDH4cWAmMy+9fDezeYL5uza9/2YZzcSRwB7B//pzPAn5e2c4/5OPcDvjH/Lnv\nUPlMNpDKmYBzgV/mefsD9wJ75vf7kMtdnbx+P79GAgcCa4Cf5nm7A4/mfGyX8/8IsFue/yPgclIZ\nfB7wpnqfW+V8Vsv+w8AE4PmkGuHdwLH5WD4L/CQvux1wI/CpvI99gd8AU/o7D/XKeZ3jfz3wGHBY\nfj8K2D9PLwVOyNObr035/dh8TNuRvq+Ps6UcvRQY39v1h76vlR3ARtJ3eQTputhBpezmY7oe2DOX\nj1XAByvXnQeAA/Jn+l3S9azu5x8RTRdozsof4jSgi3Qh20QqxALWVw8GeCNwV+Xk1Qs03XXSegLN\nGcBVDeRvDPAU8IJK2rnAt3sr9DXr/wT4QOX9YZUC1N+2fwNMq8ybAvw2T3+rp/Dk9+N45pft28DF\nlfmHA6vz9NHkL3tl/teBs/P0PcBJwC41y3w6F+CXN/i5vrXBc/isz6rO9nourgeTgu+upEC7I/0H\nmr+tvP934PQ+ln2cdPF7ivQFG9VHnq4GZvVRBm8D3tHLuv3l6wngz8Drt+FcLCRfyPL77UjBcUwv\neXoEOKjymSyuzBsP/DFP/zWp5nQYMKKP87M96QfEKyppn+s558BxwPU16/yCdPHci3QB27XOdp/x\nuVXOZ7Xsf70y7yPArZX3BwGP5ulJwD012zoD+FZ/56G2nPdyDr4OnNfLvGqg6aTvQPMo8E5gZF/n\ngsaulX8m/6CoV3bzMf195f0XgYvy9LeAz1Xmvbx67uu9mqnpLIDvkH7ZzKSm2Qx4MelX+4256vso\n6Uu0Rz/bXdPHvDGkvqH+jAIeiYg/VNLuBUb3snytvUi1h3p56m3boyrr3tPHvPtq5tV6sDK9gfTr\nBlJtblLPuczn8+9Jv5QAZpB+vd+tNDKmZ5DDl0h9AItzNf70egdcRyPnsK/PqkdExM9J5eFTwA+j\nsb6xdZXpP5K+uL2ZEBG7kS7a/wr8rKfZQNLhSk2Qv8/n7G3Ai/rY1hjSj4WtyddNpBrRwmozT0Uj\n5+JlwPmVz/j3OX10Pp6P5Wa1x/L8XXnmd6pafv4I7Chpu4i4E/g/pIvjg5Iul7RXnTy+mFRL6K2c\njuLZ5faenL43qcw8Xme7jXioMv2nmve134VRNd+FM4CXVJavex4azMfe9F0G+pW/N0eTWkfuV2q+\n3r+XxRu5Vj4cEX/pZ7fVsrmBLWWzr+tZXc0UaIiIe0kX/sOBH9TM/h3pYMdHxG759cKI2KVn9Xqb\n7CW9x73UtBf34n5gd0k7VdL2obELI6Rq5pjK++p0b9teW5k/tpd5D+T31XmNuhf4r8q53C3S6LAP\nA0TEryLiKFKhnUdqliEi1kfExyLi5cARwD/21jZfo5Fz2NdnVeu7pKaeywawzoBExFPAJaSmlANz\nsLkK+GfgJTkYLWDLD6J6+b+P9Ot/a/NwAfAFYImkA3tZrK9zcS9wUs3n/IKIuF5ptOTHgXfl79Ju\npNpcQx3bEXF5RLyJdKEO0q/eWg+Taoa9ldO1ef2ql+X0+0hlpt6ovz+QLqYASNqzkTz34j5SK0H1\nHO0SEW/P8/srl/3Nb7QMrKdyTKRmqy07iVgcEVNy+m2kvuF6++/vWtlInvvS1/WsrqYKNNmJpGro\nhmpipE7QbwBzlDpIkTS60qn2IPAiSdWT2d8X5nvA30l6l6TnKXVavqZ2oYi4j1Sd/3zuZHs1cALp\nC96IK4BTc4fjC4HTyR90A9u+HPiUUif8HsDZlXlXkAY4HCDpr0htvFV9Hf+PgFdIOlapo36EpP+l\nNEBghNL9JbtGxNPAk6QmDCS9XdJfSxKpaefpnnl9GYRz2HM8Pcd0AfB3EbE1wzqr2+ltfs/It/eR\nfsHeRepb2oH0Rd4k6XBSU2aPemXwm8Bne86ZpFdL2n0g+YqILwHnA9dKekWdZfs6F/8KnClpfD6m\nXbVl0MfOpCDwO0k7SDqb1F/UL0mvkPTWHHz/TKoxPKsc5PLzA6BT0sicj5lsudAtJJXDY/J38GhS\nP+01EbEuz79Q0gtzuXxzXu8mUvB/Te7k7qzNYiPHkS0Dnswd3COVBnC8StLfNLitB+n7B+slwPvy\n+douX7fq1UZWAG+WNCYH1zM2H4z0EqUBBS8g9a/8gS3n+0Fgb0kjoKFr5dbqOQ9X5ON5Zb7u/N/+\nVmy6QBMRd0XEr6tJlenTSc021yuNTFoCvCKvdxvponyX0uiUvahfo9mclmtQbwNmk5oUlpM6a+s5\nhlSzuJ/0xTk7In5Su81efIPUqbyS1On4I+Dp2DKCqK9t/xPwq7zuyjz9Tzn/i4A5pD6g20kdntV8\n9Hb8RBpRNYXU+bqW9Cvl86QLKaRO09/m83wSqUkT0i+zJaTg8wtSR/V/9XHsVdtyDp+xTEQ8GhFL\n+1uu8r63+fX2e5OkJ0n9FccB0yPisXzOZpG+aI/k4/nPzRt9dhncE/hyXn4xqbbwDVKT3IDyFRH/\nRApa1yqN7mnoXETEPFJN4/v5s7wZ6LlvaFF+3U7qKN/AM5uxei0/pA72z5NqLA+QmmXOoL6PkJqp\n1pHa979Vyd/vgbeTvoO/Az4GvD0iHsmLHEe6sN5GuqDOyuvdThoocy3wP6R+qb7Kfl/fhadzHl5L\n+kHxMHAxW4JuX+eBfB4+lZup/rH24CPiv0k/WL5CGhTQTZ3Wh4i4ltRPtxL4b9Kowp79bAecRvqu\n/h54E/ChPO/HpEEj6yT1NA/2eq2sk/++0qrzes7XItKPm6WksvPLvMyfe1tZuTNn0EkaQ6rKvyRn\n8OKIuEBSJ/B+0ocJcGZELMzrnEH6lfs0qYN1cU6fSBpFsiOwICJOLZLpIZJ/CV8UEWOHOy9mZttC\n0gGkHzA7xDOH329WskazETgtIg4E3gB8OGcogC9HxIT86gky40mdXeNJo84uzM0zkIbTnhgR44Bx\nkqYVzPegU7rX5225aWA0qYmrtg/KzKwlSJqem8B3I9WY5/cWZKBgoImIdRGxIk+vB1azZYRRvTbP\nI0n3eGyMiLtJ1b5JuQls54hYlpe7DDiqVL4LEakN+RHg16Rq7tnDmSEzs21wEqkp805SpeJDfS1c\n/NEDkO5wJd04dT1p3P9HJb2X1N8wOyIeIw1nrD6Qcg0pMG3kmSOT1tL4sOKmkAc2vH6482FmNhgi\n4vCBLF98MEAeznolcGqu2VxEGi76WlIn4nml82BmZsOnaI0mD7e7CvhuHv1CRDxUmf9N0sgKSDWV\n6njsvUk1mbV5upq+lhqSyoxqMDNrcxFR9KGgxWo0uSP/EmBVRMyppFfvHp5OGq0A6Vk878nj+fcl\nPU5lWR5L/4SkSXmbx5FuIHyW3h5/4NfAX+ecc86w56GdXj6fPpfN+hoKJWs0B5PuxVgpaXlOOxM4\nRulxGkF6ns4HASJilaQrSM+Wego4JbachVNIw5tHkoY3LyqYbzMzG0TFAk1EXEf9GtPCPtY5l/Sg\nxdr0G0kPwTMzsxbTdE8GsObQ0dEx3FloKz6fg8fnsvUUezLAUJMU7XIsZmZDRRLRqoMBzMzMwIHG\nzMwKc6AxM7OiHGjMzKwoBxozMyvKgcasF11dXUyZMoMpU2bQ1dU13Nkxa1ke3mxWR1dXF9Onz2TD\nhi8CMHLk6Vx99VymTp3az5pmrWUohjc70JjVMWXKDJYsOYL07+0B5jJ58nwWL75qOLNlNuh8H42Z\nmbW8IfnHZ2atZvbsk7juupls2JDejxx5OrNnzx3eTJm1KDedmfWiq6uL8867GEiBx/0z1o7cRzMA\nDjRmZgPnPhozM2t5DjRmZlaUA42ZmRXlQGNmZkU50JiZWVHFAo2kMZKWSrpV0i2SZtXMny1pk6Td\n8/uxkjZIWp5fF1aWnSjpZkl3SDq/VJ7NzGzwlbxhcyNwWkSskLQTcKOkJRGxWtIYYDJwT806d0bE\nhDrbugg4MSKWSVogaVpELCqYdzMzGyTFajQRsS4iVuTp9cBqYFSe/WXgE41sR9JewM4RsSwnXQYc\nNcjZNTOzQoakj0bSWGACcIOkI4E1EbGyzqL75mazbkmH5LTRwJrKMmtzmpmZtYDizzrLzWZXAqcC\nm4AzSc1mmxfJf+8HxkTEo5JeB8yTdGDp/JmZWVlFA42kEcBVwHcjYp6kg4CxwE2SAPYm9d28PiIe\nAv4CEBG/lvQbYBypBrN3ZbN757Rn6ezs3Dzd0dFBR0fHIB+RmVlr6+7upru7e0j3WexZZ0qRZC7w\n+4g4rZdlfgtMjIhHJO0BPBoRT0vaD/gp8KqIeEzSDcAsYBnwI+CC2sEAftaZmdnAtfqzzg4GjgXe\nUhmyfHjNMtXI8GZSTWc58B/AByPisTzvFOCbwB2kkWkecWZm1iL89GYzs+ewVq/RmJmZOdCYmVlZ\nDjRmZlaUA42ZmRXlQGNmZkU50JiZWVEONGZmVpQDjZmZFeVAY2ZmRTnQmJlZUQ40ZmZWlAONmZkV\n5UBjZmZFOdCYmVlRDjRmZlaUA42ZmRXlQGNmZkU50JiZWVEONGZmVlSxQCNpjKSlkm6VdIukWTXz\nZ0vaJGn3StoZku6QdJukKZX0iZJuzvPOL5VnMzMbfCVrNBuB0yLiQOANwIclHQApCAGTgXt6FpY0\nHjgaGA9MAy6UpDz7IuDEiBgHjJM0rWC+zcxsEBULNBGxLiJW5On1wGpgVJ79ZeATNascCVweERsj\n4m7gTmCSpL2AnSNiWV7uMuCoUvk2M7PBNSR9NJLGAhOAGyQdCayJiJU1i40C1lTerwFG10lfm9PN\nzKwFPK/0DiTtBFwJnApsAs4kNZttXmSw9tXZ2bl5uqOjg46OjsHatJlZW+ju7qa7u3tI96mIKLdx\naQRwDbAwIuZIOgi4FvhjXmRvUg1lEvA+gIj4Ql53EXAOqR9naUT09O8cAxwaESfX7CtKHouZWTuS\nREQM2g/+ekqOOhNwCbAqIuYARMTNEfHSiNg3IvYlNYm9LiIeBOYD75G0g6R9gXHAsohYBzwhaVLe\n5nHAvFL5NjOzwVWy6exg4FhgpaTlOe3MiFhYWWZzFSQiVkm6AlgFPAWcUqminAJcCowEFkTEooL5\nNjOzQVS06WwouenMzGzgWrrpzMzMDBxozMysMAcaMzMryoHGzMyKcqAxM7OiHGjMzKwoBxozMyvK\ngcbMzIpyoDEzs6IcaMzMrCgHGjMzK8qBxszMinKgMTOzohxozMysKAcaMzMryoHGzMyKcqAxM7Oi\nHGjMzKwoBxozMyuqWKCRNEbSUkm3SrpF0qyc/llJN0laIenHksbk9LGSNkhanl8XVrY1UdLNku6Q\ndH6pPJuZ2eBTRJTZsLQnsGdErJC0E3AjcBSwJiKezMt8FHhNRLxf0ljghxFxUJ1tLQM+EhHLJC0A\nLoiIRTXLRKljMTNrV5KICJXcR7EaTUSsi4gVeXo9sBoY1RNksp2A3/W1HUl7ATtHxLKcdBkpYJmZ\nWQt43lDsJNdWJgA35PefA44D/gi8obLovpKWA48Dn4qI64DRwJrKMmtzmpmZtYDigSY3m10JnJpr\nNkTEWcBZkj4JfAV4H3A/MCYiHpX0OmCepAMHsq/Ozs7N0x0dHXR0dAzKMZiZtYvu7m66u7uHdJ/F\n+mgAJI0ArgEWRsScOvP3ARZExKvqzFsKzAYeAH4SEQfk9GOAQyPi5Jrl3UdjZjZALd1HI0nAJcCq\napCRNK6y2JHA8py+h6Tt8/R+wDjgroh4AHhC0qS8zeOAeaXybWZmg6tk09nBwLHAytzvAnAmcKKk\n/YGngd8AH8rz3gx8RtJGYBPwwYh4LM87BbgUGEmqAT1jxJmZmTWvok1nQ8lNZ2ZmA9fSTWdmZmbg\nQGNmZoU50JiZWVEONGZmVpQDjZmZFeVAY2ZmRTnQmJlZUQ40ZmZWlAONmZkV1dAjaCS9HTgQ2BEI\ngIj4TMF8mZlZm+i3RiPp68C7gY/mpHcDLyuZKTMzax/9PutM0s0RcZCklRHx6vz/ZRZFxCFDk8XG\n+FlnZmYD1yzPOtuQ//5R0mjgKWDPclkyM7N20kgfzTWSdgO+BNyY075RLktmZtZOBvRvAiTtCOxY\n+T8xTcNNZ2ZmAzcUTWe91mgkHRYRP5Y0gzzSrCZjPyiZMTMzaw99NZ29Gfgx8A5qAk3mQGNmZv1q\nZNTZfhFxV39pw81NZ9DV1cV5510MwOzZJzF16tRhzpGZNbtmGXV2ZZ20/xjsjNi26erqYvr0mSxZ\ncgRLlhzB9Okz6erqGu5smTWVrq4upkyZwZQpM/z9GEJ99dEcAIwHXijpnYBITWi7kJ4Q0CdJY4DL\ngJfk9S6OiAskfRY4Iqf9Hjg+Iu7L65wBnAA8DcyKiMU5fSJwad7vgog4dauOto2dd97FbNjwRWAm\nABs2pDTXasySnh9j6XsC1103k6uvnuvvyBDoq4/mFaT+mV3z3x5PAh9oYNsbgdMiYkW+yfNGSUuA\nf46I/wsg6aPAOcD7JY0HjiYFt9HAtZLG5fawi4ATI2KZpAWSpkXEooEdqpk9l/nH2PDpNdBExH9K\n+hHwiYg4d6Abjoh1wLo8vV7SamBURKyuLLYT8Ls8fSRweURsBO6WdCcwSdI9wM4RsSwvdxlwFOBA\nUzF79klcd91MNuTba0eOPJ3Zs+cOb6bMzOjnhs2IeErSdGDAgaZK0lhgAnBDfv854DjSUwdenxcb\nBVxfWW0NqWazMU/3WJvTrWLq1KlcffXcymAANwmYVfnH2PBp5MkA10n6GvDvwB/IfTUR8etGdpCb\nza4ETo2I9aSVzwLOkvRJYA7wvq3JfK3Ozs7N0x0dHXR0dAzGZlvG1KlTHVzMeuEfY0l3dzfd3d1D\nus9Ghjd3U+c+moh4S78bl0YA1wALI2JOnfn7kDr3X5WDDhHxhTxvEan/5h5gaUQckNOPAQ6NiJNr\ntvWcH95sZjZQw/pkgB4R0bE1G5Yk4BJgVTXI5A7+O/LbI4HleXo+8D1JXyY1jY0DlkVESHpC0iRg\nGanJ7YKtyZOZmQ29fgONpD2BzwGjI2JaHh32xoi4pJ9VDwaOBVZK6gkmZwInStqfNIT5N8CHACJi\nlaQrgFWkJ0SfUqminEIa3jySVAPyQAAzsxbRSNPZIuDbwFn5/9GMAJZHxKuGIoONctOZmdnANcuT\nAfaIiH8n1UDIw4+fKpkpMzNrH40EmvWSXtTzRtIbgMfLZcnMzNpJI8ObZwM/BPaT9AvgxcD/Lpor\nMzNrGw3947PcL7N/fvs/ufmsqbiPxsxs4JqljwbS3fuvASYCx0h6b7ksPTf5qbJWisuWDbdGRp19\nF9gPWEEeEAAQER8tm7WBaeUaTe1TZUeOPN1PlbVB4bJl/RmKGk0jgWY1ML7Zr+KtHGimTJnBkiVH\n0PNUWZjL5MnzWbz4quHMlrUBly3rT7M0nd0C7FUyE2Zm1r4aGXX2YmCVpGXAn3NaRMQR5bL13OKn\nylopLlvWDBppOuuolx4R3QXys9VauekMUlv6lqfKnuQ2dBs0LlvWl6boo2kVrR5ozMyGw7A+vVnS\neur8e4AsImKXMlkyM7N20te/ct5pKDNiZmbtqdEbNs3MzLaKA42ZmRXlQGNmZkU50JiZWVEONGZm\nVpQDjZmZFVUs0EgaI2mppFsl3SJpVk7/kqTVkm6S9ANJu+b0sZI2SFqeXxdWtjVR0s2S7pB0fqk8\nm5nZ4Cv2ZABJewJ7RsQKSTsBNwJHAXsDP46ITZK+ABARn5Q0FvhhRBxUZ1vLgI9ExDJJC4ALImJR\nzTJ+MoCZ2QA1y9Obt0pErIuIFXl6PbAaGBURSyJiU17sBlLg6ZWkvYCdI2JZTrqMFLDMzKwFDEkf\nTa6tTCAFlqoTgAWV9/vmZrNuSYfktNHAmsoya3OamZm1gEb+TcA2yc1mVwKn5ppNT/pZwF8i4ns5\n6X5gTEQ8Kul1wDxJBw5kX52dnZunOzo66Ojo2Mbcm5m1l+7ubrq7u4d0n0Wf3ixpBHANsDAi5lTS\njwc+ABwWEX/qZd2lwGzgAeAnEXFATj8GODQiTq5Z3n00ZmYD1NJ9NJIEXAKsqgky04CPA0dWg4yk\nPSRtn6f3A8YBd0XEA8ATkiblbR4HzCuVbzMzG1wlR50dAvwUWMmWfzdwJnABsAPwSE77ZUScImkG\n8GlgI7AJODsifpS3NRG4FBgJLIiIWXX25xqNmdkA+R+fDYADjZnZwLV005mZmRk40JiZWWEONGZm\nVpQDjZmZFeVAY2ZmRTnQmJlZUQ40ZmZWlAONmZkV5UBjZmZFOdCYmVlRDjRmZlaUA42ZmRXlQGNm\nZkU50NgzdHV1MWXKDKZMmUFXV9dwZ6fl+XwOHp/LFhYRbfFKh2LbYtGiRTFy5EsDLg24NEaOfGks\nWrRouLPVstrpfC5atCgmT35nTJ78zmE5hnY6l80mXzvLXp9L72CoXsMdaIb7izgYJk9+Z/4iR35d\nGpMnv3O4s9Wy2uV8NsNFvl3OZTMaikDzvOGsTbWLrq4upk+fyYYNXwTguutmcvXVc5k6deow58xs\n25133sW5bM8EYMOGlObybY1yoBkE7fJFnD37JK67biYbNqT3I0eezuzZcwe8na6uLs477+LN22y1\n8zBYBut8ms9lyytdZRqqF8PYdNZO1fptbQJshmaWZtIOTarN8pm2w7lsRrRyHw0wBlgK3ArcAszK\n6V8CVgM3AT8Adq2scwZwB3AbMKWSPhG4Oc87v5f9DdqJH6hm+SI2g3YKuraFL/LtaygCTcmms43A\naRGxQtJOwI2SlgCLgdMjYpOkL+Tg8klJ44GjgfHAaOBaSePyibgIODEilklaIGlaRCwqmPcBmTp1\nKldfPbfSXOT+GWsvU6dOdZm2rVYs0ETEOmBdnl4vaTUwKiKWVBa7AZiRp48ELo+IjcDdku4EJkm6\nB9g5Ipbl5S4DjgKaJtCAv4g93JZuZrWGZDCApLHABFJgqToBuDxPjwKur8xbQ6rZbMzTPdbmdGtC\nrt2ZWa3igSY3m10JnBoR6yvpZwF/iYjvDda+Ojs7N093dHTQ0dExWJu2AXDtzqx5dXd3093dPaT7\nVOoCKbRxaQRwDbAwIuZU0o8HPgAcFhF/ymmfBIiIL+T3i4BzgHuApRFxQE4/Bjg0Ik6u2VeUPBZr\nLR5ibdYYSUSESu6j2LPOJAm4BFhVE2SmAR8HjuwJMtl84D2SdpC0LzAOWJb7ep6QNClv8zhgXql8\nW+vruYF2yZIjWLLkCKZPn+lnY5kNo5IP1TwYOBZ4i6Tl+XU48FVgJ2BJTrsQICJWAVcAq4CFwCmV\nKsopwDfCI48bAAAIdklEQVRJw5vvbKYRZ9Z8nnkDbXpiQ0/txqwdtNoDRkuOOruO+oFsXB/rnAuc\nWyf9RuCgwcudmdnWGe5m2VZ85JUfQWNtx0OsrZRmuMi34iOv/P9oaL1qqPWtZ4j15MnzmTx5ftP/\n2uuLy2ZzcbPsVir96IGherGVj6Dx42OsWbVb2WyHx9g0wyOWBrtc0MrPOhvq19YGmmYoOGb1tFPZ\nbJeg2SzHMZhBeygCjftozKy4VuxXqKdZnnzRajdFP+cDjTuOrVm5bDanVrvIN4OiTwYYStvyZIDh\nHq5o1pt2KZu1o7VGjjy9pQdptJOheDKAA42ZDYl2CZqDoZnOhQPNADjQmFkraLbanQPNADjQmFkr\nmDJlBkuWHEHPwAhI93wtXnzVsOSnpR+qaWZmBh51ZmY2pJ6LownddGZNp5k6Ss1KaKYy7j6aAXCg\naQ/N1lFq1u4caAbAgaY9NFtHqVm782AAMzNreR4MYE3ludhRatbu3HRmTaeZOkrN2p37aAbAgcbM\nbOBauo9G0hhJSyXdKukWSbNy+rty2tOSXldZfqykDZKW59eFlXkTJd0s6Q5J55fKs5mZDb6SfTQb\ngdMiYoWknYAbJS0BbgamA1+vs86dETGhTvpFwIkRsUzSAknTImJRuaybmdlgKVajiYh1EbEiT68H\nVgOjIuK2iLi90e1I2gvYOSKW5aTLgKMGPcNmZlbEkAxvljQWmADc0M+i++Zms25Jh+S00cCayjJr\nc5qZmbWA4sObc7PZlcCpuWbTm/uBMRHxaO67mSfpwIHsq7Ozc/N0R0cHHR0dA8+wmVkb6+7upru7\ne0j3WXTUmaQRwDXAwoiYUzNvKTA7In7dy7pLgdnAA8BPIuKAnH4McGhEnFyzvEedmZkNUKuPOhNw\nCbCqNshUF6ssv4ek7fP0fsA44K6IeAB4QtKkvM3jgHml8m1mZoOrWI0m97H8FFgJ9OzkTOD5wFeB\nPYDHgeURcbikGcCnSaPVNgFnR8SP8rYmApcCI4EFETGrzv5cozEzGyDfsDkADjRmZgPX0k1nZmZm\n4EBjZmaFOdCYmVlRDjRmZlaUA42ZmRXlQGNmZkU50JiZWVEONGZmVpQDjZmZFeVAY2ZmRTnQmJlZ\nUQ40ZmZWlAONmZkV5UBjZmZFOdCYmVlRDjRmZlaUA42ZmRXlQGNmZkUVCzSSxkhaKulWSbdImpXT\n35XTnpb0upp1zpB0h6TbJE2ppE+UdHOed36pPJuZ2eArWaPZCJwWEQcCbwA+LOkA4GZgOvDT6sKS\nxgNHA+OBacCFknr+j/VFwIkRMQ4YJ2lawXwb0N3dPdxZaCs+n4PH57L1FAs0EbEuIlbk6fXAamBU\nRNwWEbfXWeVI4PKI2BgRdwN3ApMk7QXsHBHL8nKXAUeVyrcl/jIPLp/PweNz2XqGpI9G0lhgAnBD\nH4uNAtZU3q8BRtdJX5vTzcysBRQPNJJ2Aq4ETs01GzMzew5RRJTbuDQCuAZYGBFzauYtBWZHxK/z\n+08CRMQX8vtFwDnAPcDSiDggpx8DHBoRJ9dsr9yBmJm1sYhQ/0ttveeV2nDuyL8EWFUbZKqLVabn\nA9+T9GVS09g4YFlEhKQnJE0ClgHHARfUbqj0iTIzs61TrEYj6RDSyLKVQM9OzgSeD3wV2AN4HFge\nEYfndc4ETgCeIjW1deX0icClwEhgQUTMKpJpMzMbdEWbzszMzNriyQCSpuWbPO+QdPpw56fVSbpb\n0kpJyyUt638N6yHpW5IelHRzJW13SUsk3S5psaQXDmceW0kv57NT0ppcPpf7vrrG9HETffHy2fKB\nRtL2wNdIN3mOB47JN4ba1gugIyImRMTrhzszLebbpLJY9UlgSUS8Avhxfm+NqXc+A/hyLp8TImLR\nMOSrFfV2E33x8tnygQZ4PXBnRNwdERuB75Nu/rRt48EVWyEifgY8WpN8BDA3T8/FNxw3rJfzCS6f\nA9bLTfSjGYLy2Q6BZjRwX+V9z42etvUCuFbSryR9YLgz0wZeGhEP5ukHgZcOZ2baxEcl3STpEjdF\nDlzNTfTFy2c7BBqPZhh8B0fEBOBwUvX6TcOdoXYRafSNy+y2uQjYF3gt8ABw3vBmp7Xkm+ivIo3s\nfbI6r1T5bIdAsxYYU3k/hmc+ssYGKCIeyH8fBq4mNU/a1ntQ0p4A+dl9Dw1zflpaRDwUGfBNXD4b\nlm+ivwr4TkTMy8nFy2c7BJpfkZ7oPFbSDqQnQM8f5jy1LEl/JWnnPP0CYArpidu29eYDM/P0TGBe\nH8taP/LFsMd0XD4b0sdN9MXLZ1vcRyPpcGAOsD1wSUR8fpiz1LIk7UuqxUB6csS/+Xw2TtLlwKGk\nG5IfBM4G/hO4AtgHuBt4d0Q8Nlx5bCV1zuc5QAep2SyA3wIfrPQxWC96uYn+DNITV4qWz7YINGZm\n1rzaoenMzMyamAONmZkV5UBjZmZFOdCYmVlRDjRmZlaUA42ZmRXlQGNmZkU50JgVIunTkg7rZ5l3\n9PwPJUlH+V9cWDvyDZtmTULSpcAPI+Kq4c6L2WByjcZsAPIz9VZLujj/l8IuSTv2suylkmbk6bvz\nf4a8Mf/30v1z+vGSvirpjcA7gC/l/xq539AdlVlZDjRmA/fXwNci4lXAY8CMXparPnI9gIcjYiLp\nMfcfe8aCEb8kPdzwY/m/Rt5VJOdmw8CBxmzgfhsRK/P0jcDYBtf7Qf776z7W8X+OtLbjQGM2cH+u\nTD9Nesr1QNbrax13mlrbcaAxax5PArsMdybMBpsDjdnA1dY6BloLqe276Zn+PvDxPGDAgwGsbXh4\ns5mZFeUajZmZFdVoJ6aZ9ULS14CDa5LnRMTc4ciPWbNx05mZmRXlpjMzMyvKgcbMzIpyoDEzs6Ic\naMzMrCgHGjMzK+r/AxeemZ4TCffRAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 71 } ], "metadata": {} } ] }