{ "metadata": { "name": "", "signature": "sha256:03d5d8a472cd0237c27fbbbf1a10ce973f99adf46e06c1d2f3bf3841bf6843b9" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Intro to TF-IDF, Clustering, and Pattern.py\n", "\n", "### Lynn Cherny, 2/8/15, arnicas@gmail\n", "Full repo here: https://github.com/arnicas/NLP-in-Python" ] }, { "cell_type": "code", "collapsed": false, "input": [ "%matplotlib inline" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "code", "collapsed": false, "input": [ "import itertools\n", "import math\n", "import matplotlib.pyplot as plt\n", "# also, down below, we use pattern, numpy, and scipy" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##TF-IDF (Term Frequency, Inverse Document Frequency)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Term Frequency**: Number of appearances of a word in a document (the counts we saw already)\n", "\n", "**Document Frequency**: Number of documents that contain a word in a set of docs\n", "\n", "**TF-IDF** is **Term Frequency / Document Frequency**, with some extra fiddles.\n", "\n", "Example from [Manning, Raghavan, and Schuetze](http://nlp.stanford.edu/IR-book/html/htmledition/inverse-document-frequency-1.html) showing IDF of a rare term is high:\n", "\n", "\n", "\n", "\n", "\n", "TF-IDF for a word and document is usually calculated as:\n", "\n", "**(Word t's frequency in the doc) * Log( Number of Docs / Number of docs that contain the word t)**\n", "\n", "However, it is usually done with a + 1 term or two. You can consider it an information measure for document words (or \"features\") in a bag-of-words style analysis, where the order of the words doesn't matter, just the set of words. It is a **\"weight\"** for a word. Some features of TF-IDF:\n", "\n", "* If a term is very frequent in the whole doc set, it's less interesting overall and gets a low TF-IDF. Note this tends to remove stopwords for you! However, you need a lot of documents for this to work well. Beware of effects of tf-idf on small doc sets, it may not work as you expect.\n", "* A term frequent in a few docs, but not in a lot, has a high score. It helps distinguish those docs.\n", "\n", "See the discussion in [Manning, Raghavan, and Schuetze](http://nlp.stanford.edu/IR-book/html/htmledition/term-frequency-and-weighting-1.html), and even [more math in Wikipedia](http://en.wikipedia.org/wiki/Tf%E2%80%93idf). Depending on implementation, TF-IDF may or may not be normalized. **Always check to see if the implementation you use cleans stopwords or not and decide if you like that.**\n", "\n", "Some more python references:\n", "* [Demo using TextBlob, another lib](http://stevenloria.com/finding-important-words-in-a-document-using-tf-idf/)\n", "* [A version written on top of NLTK](https://github.com/yebrahim/TF-IDF-Generator)\n", "* [TF-IDF in gensim](http://radimrehurek.com/gensim/tutorial.html)\n", "* [TF-IDF in scikit-learn](http://scikit-learn.org/stable/modules/feature_extraction.html)\n", "\n", "In other languages than Python:\n", "* [A version in Processing by Nic Felton](https://github.com/feltron/Processing_TFIDF)\n", "* Using [Nodejs package 'natural'](https://github.com/NaturalNode/natural) -- see example in my utils/booksNodeTfIdf.js. **Beware, this version strips stopwords, and it's a dumb list. I've filed an issue with them, but if you use it, you should replace or remove their stoplist.**\n", "\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# code example from Building Machine Learning Systems with Python (Richert & Coelho) \n", "# - modified slightly by Lynn\n", "\n", "import math\n", "\n", "def tfidf(t, d, D):\n", " tf = float(d.count(t)) / sum(d.count(w) for w in set(d)) # normalized\n", " # Note his version doesn't use +1 in denominator.\n", " idf = math.log( float(len(D)) / (len([doc for doc in D if t in doc])))\n", " return tf * idf\n", "\n", "\n", "a, abb, abc = [\"a\"], [\"a\", \"b\", \"b\"], [\"a\", \"b\", \"c\"] # try adding another c to the last doc!\n", "D = [a, abb, abc]\n", "\n", "print(tfidf(\"a\", a, D)) # a is in all of them\n", "print(tfidf(\"a\", abc, D)) # a is in all of them\n", "print(tfidf(\"b\", abc, D)) # b occurs only once here, but in 2 docs\n", "print(tfidf(\"b\", abb, D)) # b occurs more frequently in this doc\n", "print(tfidf(\"c\", abc, D)) # c is unique in the doc set" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0.0\n", "0.0\n", "0.135155036036\n", "0.270310072072\n", "0.366204096223\n" ] } ], "prompt_number": 93 }, { "cell_type": "markdown", "metadata": {}, "source": [ "*What if you change some of those docs, or add another one? Add another c in the last doc, e.g.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Using the Pattern lib for NLP" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Install: pip install pattern. Read the documentation here for the vector package: http://www.clips.ua.ac.be/pages/pattern-vector" ] }, { "cell_type": "code", "collapsed": false, "input": [ "from pattern.vector import Document, Model, TFIDF, TF, LEMMA, PORTER, COSINE, KMEANS, HIERARCHICAL" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "filelist = !ls data/stories/" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "code", "collapsed": false, "input": [ "filelist" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "['A_THE BELL.txt',\n", " 'A_THE DREAM OF LITTLE TUK.txt',\n", " 'A_THE ELDERBUSH.txt',\n", " \"A_THE EMPEROR'S NEW CLOTHES.txt\",\n", " 'A_THE FALSE COLLAR.txt',\n", " 'A_THE FIR TREE.txt',\n", " 'A_THE HAPPY FAMILY.txt',\n", " 'A_THE LEAP-FROG.txt',\n", " 'A_THE LITTLE MATCH GIRL.txt',\n", " 'A_THE NAUGHTY BOY.txt',\n", " 'A_THE OLD HOUSE.txt',\n", " 'A_THE REAL PRINCESS.txt',\n", " 'A_THE RED SHOES.txt',\n", " 'A_THE SHADOW.txt',\n", " 'A_THE SHOES OF FORTUNE.txt',\n", " 'A_THE SNOW QUEEN.txt',\n", " 'A_THE STORY OF A MOTHER.txt',\n", " 'A_THE SWINEHERD.txt',\n", " 'G_BEARSKIN.txt',\n", " 'G_BRIAR ROSE.txt',\n", " 'G_CATHERINE AND FREDERICK.txt',\n", " 'G_CINDERELLA.txt',\n", " 'G_DUMMLING AND THE THREE FEATHERS.txt',\n", " 'G_FAITHFUL JOHN.txt',\n", " 'G_HANSEL AND GRETHEL.txt',\n", " 'G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt',\n", " 'G_LITTLE RED-CAP.txt',\n", " 'G_LITTLE SNOW-WHITE.txt',\n", " 'G_MOTHER HOLLE.txt',\n", " 'G_OH, IF I COULD BUT SHIVER!.txt',\n", " 'G_RAPUNZEL.txt',\n", " 'G_RUMPELSTILTSKIN.txt',\n", " 'G_SNOW-WHITE AND ROSE-RED.txt',\n", " 'G_THE FROG PRINCE.txt',\n", " 'G_THE GOLDEN GOOSE.txt',\n", " 'G_THE GOOSE-GIRL.txt',\n", " 'G_THE LITTLE BROTHER AND SISTER.txt',\n", " 'G_THE SIX SWANS.txt',\n", " 'G_THE THREE LITTLE MEN IN THE WOOD.txt',\n", " 'G_THE TRAVELS OF TOM THUMB.txt',\n", " 'G_THE VALIANT LITTLE TAILOR.txt',\n", " 'G_THE WATER OF LIFE.txt',\n", " 'G_THUMBLING.txt']" ] } ], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "# Load in the stories...\n", "\n", "def load_texts(filenames, dirpath):\n", " \"\"\" filenames are the leaves, dirpath is the path to them with the / \"\"\"\n", " loaded_text = {}\n", " for filen in filenames:\n", " with open(dirpath + filen) as handle:\n", " loaded_text[filen] = handle.read()\n", " return loaded_text" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "code", "collapsed": false, "input": [ "loaded_text = load_texts(filelist, 'data/stories/')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "loaded_text.items()[0]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 9, "text": [ "('A_THE REAL PRINCESS.txt',\n", " 'THE REAL PRINCESS\\r\\n\\r\\nThere was once a Prince who wished to marry a Princess; but then she\\r\\nmust be a real Princess. He travelled all over the world in hopes of\\r\\nfinding such a lady; but there was always something wrong. Princesses he\\r\\nfound in plenty; but whether they were real Princesses it was impossible\\r\\nfor him to decide, for now one thing, now another, seemed to him not\\r\\nquite right about the ladies. At last he returned to his palace quite\\r\\ncast down, because he wished so much to have a real Princess for his\\r\\nwife.\\r\\n\\r\\nOne evening a fearful tempest arose, it thundered and lightened, and the\\r\\nrain poured down from the sky in torrents: besides, it was as dark as\\r\\npitch. All at once there was heard a violent knocking at the door, and\\r\\nthe old King, the Prince\\'s father, went out himself to open it.\\r\\n\\r\\nIt was a Princess who was standing outside the door. What with the rain\\r\\nand the wind, she was in a sad condition; the water trickled down from\\r\\nher hair, and her clothes clung to her body. She said she was a real\\r\\nPrincess.\\r\\n\\r\\n\"Ah! we shall soon see that!\" thought the old Queen-mother; however, she\\r\\nsaid not a word of what she was going to do; but went quietly into the\\r\\nbedroom, took all the bed-clothes off the bed, and put three little peas\\r\\non the bedstead. She then laid twenty mattresses one upon another over\\r\\nthe three peas, and put twenty feather beds over the mattresses.\\r\\n\\r\\nUpon this bed the Princess was to pass the night.\\r\\n\\r\\nThe next morning she was asked how she had slept. \"Oh, very badly\\r\\nindeed!\" she replied. \"I have scarcely closed my eyes the whole night\\r\\nthrough. I do not know what was in my bed, but I had something hard\\r\\nunder me, and am all over black and blue. It has hurt me so much!\"\\r\\n\\r\\nNow it was plain that the lady must be a real Princess, since she had\\r\\nbeen able to feel the three little peas through the twenty mattresses\\r\\nand twenty feather beds. None but a real Princess could have had such a\\r\\ndelicate sense of feeling.\\r\\n\\r\\nThe Prince accordingly made her his wife; being now convinced that he\\r\\nhad found a real Princess. The three peas were however put into the\\r\\ncabinet of curiosities, where they are still to be seen, provided they\\r\\nare not lost.\\r\\n\\r\\nWasn\\'t this a lady of real delicacy?\\r\\n\\r\\n\\r\\n\\r\\n\\r\\n')" ] } ], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "def make_pattern_docs(texts):\n", " \"\"\" texts is a dictionary! key is the name of text or filename \"\"\"\n", " from pattern.vector import Document\n", " docs = []\n", "\n", " # Create a pattern.vector Document object for each article, and lemmatize as it goes in\n", " for key, val in texts.iteritems():\n", " typestring = key[0] # will be a G or A, for Grimms or Andersen\n", " docs.append(Document(val, name=key, type=typestring, stemmer=LEMMA))\n", " return docs" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "code", "collapsed": false, "input": [ "docs = make_pattern_docs(loaded_text)\n", "docs[1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "Document(id='P46Xguc-2', name='G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt', type='G')" ] } ], "prompt_number": 11 }, { "cell_type": "code", "collapsed": false, "input": [ "docs[1].keywords() # normalized counts in the document (TF)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "[(0.045454545454545636, u'sister'),\n", " (0.020661157024793472, u'table'),\n", " (0.017906336088154343, u'goat'),\n", " (0.017906336088154343, u'tree'),\n", " (0.016528925619834777, u'little'),\n", " (0.015151515151515213, u'eye'),\n", " (0.015151515151515213, u'knight'),\n", " (0.015151515151515213, u'mother'),\n", " (0.013774104683195648, u'morning'),\n", " (0.013774104683195648, u'soon')]" ] } ], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "sorted(docs[1].features)[0:10] # the words = features" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "[u'accompany',\n", " u'according',\n", " u'admire',\n", " u'advice',\n", " u'afterward',\n", " u'ah',\n", " u'air',\n", " u'ala',\n", " u'alm',\n", " u'angry']" ] } ], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "# the normalized vector for the word occurrences in this document - \n", "# these scores are the same as the keywords above. \n", "docs[1].vector['sister']" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 14, "text": [ "0.045454545454545456" ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "# TF-IDF is a property of the doc set. The \"Model\" object handles operations across the doc set.\n", "mtfidf = Model(documents=docs, weight=TFIDF)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.documents" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 16, "text": [ "[Document(id='P46Xguc-1', name='A_THE REAL PRINCESS.txt', type='A'),\n", " Document(id='P46Xguc-2', name='G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt', type='G'),\n", " Document(id='P46Xguc-3', name='A_THE HAPPY FAMILY.txt', type='A'),\n", " Document(id='P46Xguc-4', name='G_THE FROG PRINCE.txt', type='G'),\n", " Document(id='P46Xguc-5', name='G_MOTHER HOLLE.txt', type='G'),\n", " Document(id='P46Xguc-6', name='G_THE VALIANT LITTLE TAILOR.txt', type='G'),\n", " Document(id='P46Xguc-7', name='A_THE SHOES OF FORTUNE.txt', type='A'),\n", " Document(id='P46Xguc-8', name='G_OH, IF I COULD BUT SHIVER!.txt', type='G'),\n", " Document(id='P46Xguc-9', name='A_THE STORY OF A MOTHER.txt', type='A'),\n", " Document(id='P46Xguc-10', name='G_THE TRAVELS OF TOM THUMB.txt', type='G'),\n", " Document(id='P46Xguc-11', name='G_HANSEL AND GRETHEL.txt', type='G'),\n", " Document(id='P46Xguc-12', name='A_THE SWINEHERD.txt', type='A'),\n", " Document(id='P46Xguc-13', name=\"A_THE EMPEROR'S NEW CLOTHES.txt\", type='A'),\n", " Document(id='P46Xguc-14', name='G_THUMBLING.txt', type='G'),\n", " Document(id='P46Xguc-15', name='G_DUMMLING AND THE THREE FEATHERS.txt', type='G'),\n", " Document(id='P46Xguc-16', name='G_RUMPELSTILTSKIN.txt', type='G'),\n", " Document(id='P46Xguc-17', name='G_BEARSKIN.txt', type='G'),\n", " Document(id='P46Xguc-18', name='G_CINDERELLA.txt', type='G'),\n", " Document(id='P46Xguc-19', name='A_THE FIR TREE.txt', type='A'),\n", " Document(id='P46Xguc-20', name='G_LITTLE RED-CAP.txt', type='G'),\n", " Document(id='P46Xguc-21', name='A_THE LEAP-FROG.txt', type='A'),\n", " Document(id='P46Xguc-22', name='G_THE GOLDEN GOOSE.txt', type='G'),\n", " Document(id='P46Xguc-23', name='A_THE BELL.txt', type='A'),\n", " Document(id='P46Xguc-24', name='A_THE ELDERBUSH.txt', type='A'),\n", " Document(id='P46Xguc-25', name='G_THE SIX SWANS.txt', type='G'),\n", " Document(id='P46Xguc-26', name='A_THE RED SHOES.txt', type='A'),\n", " Document(id='P46Xguc-27', name='G_CATHERINE AND FREDERICK.txt', type='G'),\n", " Document(id='P46Xguc-28', name='G_LITTLE SNOW-WHITE.txt', type='G'),\n", " Document(id='P46Xguc-29', name='A_THE FALSE COLLAR.txt', type='A'),\n", " Document(id='P46Xguc-30', name='G_RAPUNZEL.txt', type='G'),\n", " Document(id='P46Xguc-31', name='G_THE LITTLE BROTHER AND SISTER.txt', type='G'),\n", " Document(id='P46Xguc-32', name='A_THE DREAM OF LITTLE TUK.txt', type='A'),\n", " Document(id='P46Xguc-33', name='A_THE SNOW QUEEN.txt', type='A'),\n", " Document(id='P46Xguc-34', name='G_THE GOOSE-GIRL.txt', type='G'),\n", " Document(id='P46Xguc-35', name='G_SNOW-WHITE AND ROSE-RED.txt', type='G'),\n", " Document(id='P46Xguc-36', name='A_THE SHADOW.txt', type='A'),\n", " Document(id='P46Xguc-37', name='A_THE OLD HOUSE.txt', type='A'),\n", " Document(id='P46Xguc-38', name='G_THE WATER OF LIFE.txt', type='G'),\n", " Document(id='P46Xguc-39', name='G_FAITHFUL JOHN.txt', type='G'),\n", " Document(id='P46Xguc-40', name='G_THE THREE LITTLE MEN IN THE WOOD.txt', type='G'),\n", " Document(id='P46Xguc-41', name='G_BRIAR ROSE.txt', type='G'),\n", " Document(id='P46Xguc-42', name='A_THE LITTLE MATCH GIRL.txt', type='A'),\n", " Document(id='P46Xguc-43', name='A_THE NAUGHTY BOY.txt', type='A')]" ] } ], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.document_frequency('sister')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 17, "text": [ "0.3488372093023256" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.inverse_document_frequency('sister')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "1.0531506229959813" ] } ], "prompt_number": 18 }, { "cell_type": "code", "collapsed": false, "input": [ "doc1 = mtfidf.document(name='G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt') # or:" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "# equivalent:\n", "\n", "doc1 = mtfidf.documents[1]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "doc1.term_frequency('sister') # note this is same as doing it above on the doc object!" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 21, "text": [ "0.045454545454545456" ] } ], "prompt_number": 21 }, { "cell_type": "code", "collapsed": false, "input": [ "doc1.tf_idf('sister')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 22, "text": [ "0.047870482863453696" ] } ], "prompt_number": 22 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.documents[4].tf('sister')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 23, "text": [ "0.003105590062111801" ] } ], "prompt_number": 23 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.documents[4].tf_idf('sister')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 24, "text": [ "0.0032706541086831714" ] } ], "prompt_number": 24 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Time to discuss vectors, and similarity measures!\n", "\n", "Each document is a collection of weighted words, which we'll call a vector. Vectors can be compared to each other, to compute similarity. A common metric is \"cosine similarity.\" Image from \n", "Manning, Raghavan and Schuetze:\n", "\n", "\n", "\n", "Reminder: Angles close to each other are near 1 in cosine, far apart are closer to 0. This means that in practice you may want to subtract from 1, so that a higher score = further away. You should think of it as cosine = distance, 1-cos = similarity. Pattern (the library) does this for you so that similarity = 1 - cos.\n", "\n", "Another, perhaps simpler to understand, is euclidean distance (image from [this article](https://de.dariah.eu/tatom/working_with_text.html)):\n", "\n", "\n", "\n", "This is essentially the hypoteneuse between two sides of a vector triangle. Larger numbers = further apart vectors!\n", "\n", "Links:\n", "* Reference in Manning, Raghavan and Schuetze: http://nlp.stanford.edu/IR-book/html/htmledition/dot-products-1.html\n", "* A good article focusing on queries in search -- same idea! https://janav.wordpress.com/2013/10/27/tf-idf-and-cosine-similarity/\n", "* Computing the angle between vectors for cosine similarity: http://www.mathsisfun.com/algebra/vectors-dot-product.html\n", "* An article on text similarities using scikit-learn: https://de.dariah.eu/tatom/working_with_text.html\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Taken from the pattern.vec doc page: http://www.clips.ua.ac.be/pages/pattern-vector\n", "\n", "from pattern.vector import Document, Model\n", "\n", "d0 = Document('A tiger is a big yellow cat with stripes.', type='tiger')\n", "d1 = Document('A lion is a big yellow cat with manes.', type='lion',)\n", "d2 = Document('An elephant is a big grey animal with a slurf.', type='elephant')\n", "d3 = Document('An elephant is an animal.', type='elephant')\n", " \n", "print \"Before model, vector for d1:\", d1.vector\n", "\n", "simple = Model(documents=[d0, d1, d2, d3], weight=TFIDF)\n", "\n", "print \"After model, vector for d1:\", d1.vector # vector now weighted according to document collection!\n", "print\n", "print \"Tiger vs lion text similarity:\", simple.similarity(d0, d1) # tiger vs. lion, 1-cosine\n", "print \"Tiger vs. elephant text similarity:\", simple.similarity(d0, d2) # tiger vs. elephant, 1-cosine\n", "print \"Elephant 1 vs. Elephant 2 similarity:\", simple.similarity(d2, d3)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "Before model, vector for d1: {u'lion': 0.25, u'manes': 0.25, u'yellow': 0.25, u'cat': 0.25}\n", "After model, vector for d1: {u'lion': 0.34657382340379694, u'manes': 0.34657382340379694, u'yellow': 0.17328691170189847, u'cat': 0.17328691170189847}\n", "\n", "Tiger vs lion text similarity: 0.2\n", "Tiger vs. elephant text similarity: 0.0\n", "Elephant 1 vs. Elephant 2 similarity: 0.4472135955\n" ] } ], "prompt_number": 25 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Notice above that the document vectors changed after the model was created, even outside the model context. Be alert to this (I'm not sure I like it, personally.)**\n", "\n", "**I'm going to save the simple model out for use later on...**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# this exports the array of tf-idf, but with some extra stuff we can parse out. Will be large for real data.\n", "\n", "simple.export('data/csv/simple_tfidf.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 26 }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Back to our stories already in a model..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.similarity(docs[1], docs[1]) # similarity to self is 1." ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 27, "text": [ "1.0000000000000002" ] } ], "prompt_number": 27 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.similarity(mtfidf.docs[1], mtfidf.docs[6]) # try some different docs" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 28, "text": [ "0.061193610958748514" ] } ], "prompt_number": 28 }, { "cell_type": "code", "collapsed": false, "input": [ "# check what that was\n", "\n", "mtfidf.docs[6]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 29, "text": [ "Document(id='P46Xguc-7', name='A_THE SHOES OF FORTUNE.txt', type='A')" ] } ], "prompt_number": 29 }, { "cell_type": "code", "collapsed": false, "input": [ "docs[1]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 30, "text": [ "Document(id='P46Xguc-2', name='G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt', type='G')" ] } ], "prompt_number": 30 }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.neighbors(docs[1]) # finds the closest matches in similarity" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 31, "text": [ "[(0.13327973883571262,\n", " Document(id='P46Xguc-31', name='G_THE LITTLE BROTHER AND SISTER.txt', type='G')),\n", " (0.09115800288880449,\n", " Document(id='P46Xguc-40', name='G_THE THREE LITTLE MEN IN THE WOOD.txt', type='G')),\n", " (0.07103621399884713,\n", " Document(id='P46Xguc-25', name='G_THE SIX SWANS.txt', type='G')),\n", " (0.06968168677071662,\n", " Document(id='P46Xguc-22', name='G_THE GOLDEN GOOSE.txt', type='G')),\n", " (0.0647975008434651,\n", " Document(id='P46Xguc-18', name='G_CINDERELLA.txt', type='G')),\n", " (0.06260321618889808,\n", " Document(id='P46Xguc-17', name='G_BEARSKIN.txt', type='G')),\n", " (0.061193610958748514,\n", " Document(id='P46Xguc-7', name='A_THE SHOES OF FORTUNE.txt', type='A')),\n", " (0.0597097979694477,\n", " Document(id='P46Xguc-35', name='G_SNOW-WHITE AND ROSE-RED.txt', type='G')),\n", " (0.05611577026931497,\n", " Document(id='P46Xguc-24', name='A_THE ELDERBUSH.txt', type='A')),\n", " (0.05292807171289243,\n", " Document(id='P46Xguc-19', name='A_THE FIR TREE.txt', type='A'))]" ] } ], "prompt_number": 31 }, { "cell_type": "code", "collapsed": false, "input": [ "# Model.search() returns a sorted list of (similarity, Document)-tuples, \n", "# based on a list of query words. A Document is created on-the-fly for the \n", "# given list, using the given optional arguments.\n", "\n", "mtfidf.search(['witch','girl','boy'])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 32, "text": [ "[(0.12027675752727798,\n", " Document(id='P46Xguc-11', name='G_HANSEL AND GRETHEL.txt', type='G')),\n", " (0.10224370709271782,\n", " Document(id='P46Xguc-31', name='G_THE LITTLE BROTHER AND SISTER.txt', type='G')),\n", " (0.08604756672847662,\n", " Document(id='P46Xguc-16', name='G_RUMPELSTILTSKIN.txt', type='G')),\n", " (0.07588740938904971,\n", " Document(id='P46Xguc-24', name='A_THE ELDERBUSH.txt', type='A')),\n", " (0.07152974843469193,\n", " Document(id='P46Xguc-37', name='A_THE OLD HOUSE.txt', type='A')),\n", " (0.05359044997199823,\n", " Document(id='P46Xguc-43', name='A_THE NAUGHTY BOY.txt', type='A')),\n", " (0.05015018254228086,\n", " Document(id='P46Xguc-40', name='G_THE THREE LITTLE MEN IN THE WOOD.txt', type='G')),\n", " (0.039382097289471396,\n", " Document(id='P46Xguc-8', name='G_OH, IF I COULD BUT SHIVER!.txt', type='G')),\n", " (0.03694252146876888,\n", " Document(id='P46Xguc-5', name='G_MOTHER HOLLE.txt', type='G')),\n", " (0.03499167423176321,\n", " Document(id='P46Xguc-23', name='A_THE BELL.txt', type='A'))]" ] } ], "prompt_number": 32 }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Try your own searches now!*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##One of the benefits of these distance measures is that you can do clustering, including hierarchical!" ] }, { "cell_type": "code", "collapsed": false, "input": [ "# You can do hierchical clustering right inside pattern, without having to use scipy for it.\n", "# k is the number of \"clusters\" you want to produce\n", "\n", "hier = mtfidf.cluster(method=HIERARCHICAL, k=5)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 33 }, { "cell_type": "code", "collapsed": false, "input": [ "hier.depth" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 34, "text": [ "31" ] } ], "prompt_number": 34 }, { "cell_type": "code", "collapsed": false, "input": [ "# Get a giant listing of the Cluster objects in the tree structure. \n", "# Doesn't seem to be a built in tool to vis them, though!\n", "\n", "hier" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 35, "text": [ "Cluster([Document(id='P46Xguc-3', name='A_THE HAPPY FAMILY.txt', type='A'), Document(id='P46Xguc-29', name='A_THE FALSE COLLAR.txt', type='A'), Document(id='P46Xguc-27', name='G_CATHERINE AND FREDERICK.txt', type='G'), Document(id='P46Xguc-30', name='G_RAPUNZEL.txt', type='G'), Cluster([Document(id='P46Xguc-36', name='A_THE SHADOW.txt', type='A'), Cluster([Document(id='P46Xguc-21', name='A_THE LEAP-FROG.txt', type='A'), Cluster([Document(id='P46Xguc-11', name='G_HANSEL AND GRETHEL.txt', type='G'), Cluster([Document(id='P46Xguc-43', name='A_THE NAUGHTY BOY.txt', type='A'), Cluster([Cluster([Document(id='P46Xguc-12', name='A_THE SWINEHERD.txt', type='A'), Document(id='P46Xguc-13', name=\"A_THE EMPEROR'S NEW CLOTHES.txt\", type='A')]), Cluster([Document(id='P46Xguc-1', name='A_THE REAL PRINCESS.txt', type='A'), Cluster([Document(id='P46Xguc-15', name='G_DUMMLING AND THE THREE FEATHERS.txt', type='G'), Cluster([Document(id='P46Xguc-5', name='G_MOTHER HOLLE.txt', type='G'), Cluster([Document(id='P46Xguc-39', name='G_FAITHFUL JOHN.txt', type='G'), Cluster([Document(id='P46Xguc-32', name='A_THE DREAM OF LITTLE TUK.txt', type='A'), Cluster([Document(id='P46Xguc-4', name='G_THE FROG PRINCE.txt', type='G'), Cluster([Cluster([Document(id='P46Xguc-6', name='G_THE VALIANT LITTLE TAILOR.txt', type='G'), Document(id='P46Xguc-10', name='G_THE TRAVELS OF TOM THUMB.txt', type='G')]), Cluster([Document(id='P46Xguc-42', name='A_THE LITTLE MATCH GIRL.txt', type='A'), Cluster([Cluster([Document(id='P46Xguc-14', name='G_THUMBLING.txt', type='G'), Document(id='P46Xguc-20', name='G_LITTLE RED-CAP.txt', type='G')]), Cluster([Document(id='P46Xguc-19', name='A_THE FIR TREE.txt', type='A'), Cluster([Document(id='P46Xguc-22', name='G_THE GOLDEN GOOSE.txt', type='G'), Cluster([Document(id='P46Xguc-34', name='G_THE GOOSE-GIRL.txt', type='G'), Cluster([Document(id='P46Xguc-33', name='A_THE SNOW QUEEN.txt', type='A'), Cluster([Document(id='P46Xguc-8', name='G_OH, IF I COULD BUT SHIVER!.txt', type='G'), Cluster([Document(id='P46Xguc-41', name='G_BRIAR ROSE.txt', type='G'), Cluster([Document(id='P46Xguc-2', name='G_LITTLE ONE-EYE, TWO-EYES AND THREE-EYES.txt', type='G'), Cluster([Document(id='P46Xguc-9', name='A_THE STORY OF A MOTHER.txt', type='A'), Cluster([Document(id='P46Xguc-24', name='A_THE ELDERBUSH.txt', type='A'), Cluster([Cluster([Document(id='P46Xguc-37', name='A_THE OLD HOUSE.txt', type='A'), Document(id='P46Xguc-17', name='G_BEARSKIN.txt', type='G')]), Cluster([Cluster([Document(id='P46Xguc-31', name='G_THE LITTLE BROTHER AND SISTER.txt', type='G'), Cluster([Document(id='P46Xguc-25', name='G_THE SIX SWANS.txt', type='G'), Cluster([Document(id='P46Xguc-40', name='G_THE THREE LITTLE MEN IN THE WOOD.txt', type='G'), Cluster([Document(id='P46Xguc-38', name='G_THE WATER OF LIFE.txt', type='G'), Cluster([Document(id='P46Xguc-35', name='G_SNOW-WHITE AND ROSE-RED.txt', type='G'), Cluster([Document(id='P46Xguc-28', name='G_LITTLE SNOW-WHITE.txt', type='G'), Document(id='P46Xguc-16', name='G_RUMPELSTILTSKIN.txt', type='G')])])])])])]), Cluster([Document(id='P46Xguc-23', name='A_THE BELL.txt', type='A'), Cluster([Document(id='P46Xguc-7', name='A_THE SHOES OF FORTUNE.txt', type='A'), Cluster([Document(id='P46Xguc-18', name='G_CINDERELLA.txt', type='G'), Document(id='P46Xguc-26', name='A_THE RED SHOES.txt', type='A')])])])])])])])])])])])])])])])])])])])])])])])])])])])])])" ] } ], "prompt_number": 35 }, { "cell_type": "code", "collapsed": false, "input": [ "# Look at some of the functions on hier...\n", "hier" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###However, I don't see any easy way to do a graph from that." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Redoing and Graphing the Cluster Tree with SciPy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import csv\n", "import numpy as np\n", "from scipy.spatial.distance import pdist, squareform\n", "from scipy.cluster.hierarchy import linkage, dendrogram" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 38 }, { "cell_type": "code", "collapsed": false, "input": [ "def read_weka_tfidf(filen):\n", " \"\"\" Read in the Weka file output by pattern's model export and just keep tfidf scores.\"\"\"\n", " rows = []\n", " with open(filen, 'rb') as csvfile:\n", " spamreader = csv.reader(csvfile, delimiter='\\t')\n", " count = 0\n", " for row in spamreader:\n", " # skipping first row which is the word labels\n", " if count > 0:\n", " rows.append(row[:-2]) # skip extra junk at the last 2 cols\n", " count += 1\n", " return rows" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 39 }, { "cell_type": "code", "collapsed": false, "input": [ "simplerows = read_weka_tfidf('data/csv/simple_tfidf.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 40 }, { "cell_type": "code", "collapsed": false, "input": [ "simplerows" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 41, "text": [ "[['0', '0.1733', '0', '0', '0', '0', '0', '0.3466', '0.3466', '0.1733'],\n", " ['0', '0.1733', '0', '0', '0.3466', '0.3466', '0', '0', '0', '0.1733'],\n", " ['0.1733', '0', '0.1733', '0.3466', '0', '0', '0.3466', '0', '0', '0'],\n", " ['0.3466', '0', '0.3466', '0', '0', '0', '0', '0', '0', '0']]" ] } ], "prompt_number": 41 }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Scipy's pdist is pairwise distance - see http://docs.scipy.org/doc/scipy/reference/spatial.distance.html\n", "You can use cosine here as well! or a host of other options...**" ] }, { "cell_type": "code", "collapsed": false, "input": [ "dist = pdist(simplerows, metric='cosine') # look at the manpage and pick a different measure to try" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 42 }, { "cell_type": "code", "collapsed": false, "input": [ "linkage(dist)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 43, "text": [ "array([[ 2. , 3. , 0.5527864, 2. ],\n", " [ 0. , 1. , 0.8 , 2. ],\n", " [ 4. , 5. , 1. , 4. ]])" ] } ], "prompt_number": 43 }, { "cell_type": "code", "collapsed": false, "input": [ "from pylab import rcParams\n", "rcParams['figure.figsize'] = 6, 5\n", "\n", "dendrogram(linkage(dist)) # this plotting function has a ton of things you can manipulate if you look at the docs." ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 44, "text": [ "{'color_list': ['g', 'b', 'b'],\n", " 'dcoord': [[0.0, 0.55278640450004202, 0.55278640450004202, 0.0],\n", " [0.0, 0.79999999999999993, 0.79999999999999993, 0.0],\n", " [0.55278640450004202, 1.0, 1.0, 0.79999999999999993]],\n", " 'icoord': [[5.0, 5.0, 15.0, 15.0],\n", " [25.0, 25.0, 35.0, 35.0],\n", " [10.0, 10.0, 30.0, 30.0]],\n", " 'ivl': ['2', '3', '0', '1'],\n", " 'leaves': [2, 3, 0, 1]}" ] }, { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAW8AAAE0CAYAAADjdbIgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAADpVJREFUeJzt3H+s3Xddx/Hny3ZACE5ahpN0NSSyCDOCgIwhCRwcurIY\nlmgiVCUwQRa0xJgYK/7h7kIwkvjHQpYsZUzCXzYqJoywHzHAEV22wYgMie3SCiNthwtzlTjA2LK3\nf5yzcjm9957b22937rt9Pv665/v97Jz3vvfeZ7/ne865qSokSb382KIHkCSdOeMtSQ0Zb0lqyHhL\nUkPGW5IaMt6S1NDWZ+qBkvieREnagKrK7LZnLN7TAZ7Jh5Ok9pLTug142USSWjLektSQ8Zakhoy3\nJDVkvCWpIeMtSQ3NjXeSv07yWJJ/W2PNR5IcSvJQklcOO6IkadZ6zrw/DuxabWeSa4GXVNXlwHuB\nWweaTZK0irnxrqp/Bo6vseStwCemax8Anp/k0mHGkyStZIhr3juAI8tuHwUuG+B+JUmrGOoFy9nP\nb/o5eEk6h4b42ybHgJ3Lbl823XaapaWlU1+PRiNGo9EAD39h2r4djq91MUtakG3b4IknFj1FX+Px\nmPF4PHdd1vPHopK8GPh0Vf38CvuuBfZU1bVJrgJurqqrVlhX/mGq4STg4dRm5M/msJJs7K8KJvkb\n4I3AJUmOADcCFwFU1b6qujPJtUkOA98Frh92dEnSrHWdeQ/yQJ55D8qzG21W/mwOa7Uzbz9hKUkN\nGW9Jash4S1JDxluSGjLektSQ8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSG\njLckNWS8Jakh4y1JDRlvSWrIeEtSQ8Zbkhoy3pLUkPGWpIaMtyQ1ZLwlqSHjLUkNGW9Jash4S1JD\nxluSGjLektSQ8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh\n4y1JDRlvSWrIeEtSQ8Zbkhoy3pLU0Nx4J9mV5GCSQ0n2rrD/kiR3J/lKkq8ledc5mVSSdEqqavWd\nyRbgYeDNwDHgS8DuqjqwbM0S8Oyq+kCSS6brL62qkzP3VWs9ls5MAh5ObUb+bA4rCVWV2e3zzryv\nBA5X1SNVdQLYD1w3s+ZbwMXTry8G/ms23JKkYW2ds38HcGTZ7aPAa2fW3AZ8LsmjwI8DvznceJKk\nlcyL93qe/PwZ8JWqGiX5GeAfk7yiqv5nduHS0tKpr0ejEaPR6AxGlS5s27fD8eOLnmJ9ctqT/M1n\n2zZ44olFT3G68XjMeDyeu27eNe+rgKWq2jW9/QHgqar68LI1dwIfqqp7p7c/C+ytqgdn7str3gPy\nuuKFx+/5sLocz41e834QuDzJi5M8C3gbcMfMmoNMXtAkyaXAzwJfP/uRJUmrWfOySVWdTLIHuAfY\nAtxeVQeS3DDdvw/4C+DjSR5i8o/Bn1TVJnwyIknnjzUvmwz6QF42GVSXp3wajt/zYXU5nhu9bCJJ\n2oSMtyQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhL\nUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrIeEtSQ8Zbkhoy3pLUkPGWpIaMtyQ1ZLwl\nqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S\n1JDxlqSGjLckNWS8Jakh4y1JDc2Nd5JdSQ4mOZRk7yprRkn+NcnXkowHn1KS9CNSVavvTLYADwNv\nBo4BXwJ2V9WBZWueD9wLXFNVR5NcUlWPr3BftdZj6cwk4OG8sPg9H1aX45mEqsrs9nln3lcCh6vq\nkao6AewHrptZ81vAJ6vqKMBK4ZYkDWtevHcAR5bdPjrdttzlwPYkn0/yYJJ3DDmgJOl0W+fsX8+T\niouAVwFXA88F7ktyf1UdOtvhJEkrmxfvY8DOZbd3Mjn7Xu4I8HhVfR/4fpIvAK8ATov30tLSqa9H\noxGj0ejMJ5ak89h4PGY8Hs9dN+8Fy61MXrC8GngU+CKnv2D5UuAW4Brg2cADwNuq6t9n7ssXLAfU\n5cUWDcfv+bC6HM/VXrBc88y7qk4m2QPcA2wBbq+qA0lumO7fV1UHk9wNfBV4CrhtNtySpGGteeY9\n6AN55j2oLmcNGo7f82F1OZ4bfaugJGkTMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrIeEtSQ35I\nZ8b2D2/n+P8eX/QY833+RnjTTYueYq5tz9nGE3ufWPQY54UuHyrposvxXO1DOsZ7Rm4KdePmn7ML\nj+dwusSmiy7H009YStJ5xHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrIeEtSQ8Zb\nkhoy3pLUkPGWpIaMtyQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zakhoy3JDVkvCWpIeMt\nSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrIeEtSQ3PjnWRXkoNJDiXZ\nu8a61yQ5meTXhx1RkjRrzXgn2QLcAuwCrgB2J3nZKus+DNwN5BzMKUlaZt6Z95XA4ap6pKpOAPuB\n61ZY937g74FvDzyfJGkF8+K9Aziy7PbR6bZTkuxgEvRbp5tqsOkkSSuaF+/1hPhm4E+rqphcMvGy\niSSdY1vn7D8G7Fx2eyeTs+/lXg3sTwJwCfCWJCeq6o7ZO1taWjr19Wg0YjQanfnEknQeG4/HjMfj\nuesyOWFeZWeyFXgYuBp4FPgisLuqDqyy/uPAp6vqH1bYV2s91maRm0LduPnn7MLjOZwEGvwKtdHl\neCahqk67orHmmXdVnUyyB7gH2ALcXlUHktww3b/vnEwrSVrTvMsmVNVdwF0z21aMdlVdP9BckqQ1\n+AlLSWrIeEtSQ8Zbkhoy3pLUkPGWpIaMtyQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zak\nhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrIeEtS\nQ8Zbkhoy3pLUkPGWpIaMtyQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zakhoy3JDVkvCWp\nIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1NC64p1kV5KDSQ4l2bvC/t9O8lCSrya5N8nLhx9VkvS0ufFO\nsgW4BdgFXAHsTvKymWVfB95QVS8HPgh8dOhBJUk/tJ4z7yuBw1X1SFWdAPYD1y1fUFX3VdV3pjcf\nAC4bdkxJ0nLrifcO4Miy20en21bzbuDOsxlKkrS2retYU+u9syRvAn4XeP2GJ5IkzbWeeB8Ddi67\nvZPJ2fePmL5IeRuwq6qOr3RHS0tLp74ejUaMRqMzGFWSzn/j8ZjxeDx3XarWPrFOshV4GLgaeBT4\nIrC7qg4sW/PTwOeA36mq+1e5n5r3WJtBbgp14+afswuP53ASaPAr1EaX45mEqsrs9rln3lV1Mske\n4B5gC3B7VR1IcsN0/z7gz4FtwK1JAE5U1ZVD/g9Ikn5oPZdNqKq7gLtmtu1b9vV7gPcMO5okaTV+\nwlKSGjLektSQ8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh\n4y1JDRlvSWrIeEtSQ8Zbkhoy3pLUkPGWpIaMtyQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ\n8Zakhoy3JDVkvCWpIeMtSQ0Zb0lqyHhLUkPGW5IaMt6S1JDxlqSGjLckNWS8Jakh4y1JDRlvSWrI\neEtSQ8Zbkhoy3pLUkPGWpIaMtyQ1NDfeSXYlOZjkUJK9q6z5yHT/Q0leOfyYkqTl1ox3ki3ALcAu\n4Apgd5KXzay5FnhJVV0OvBe49RzNKkmamnfmfSVwuKoeqaoTwH7gupk1bwU+AVBVDwDPT3Lp4JNK\nkk6ZF+8dwJFlt49Ot81bc9nZjyZJWs28eNc67ycb/O8kSRuwdc7+Y8DOZbd3MjmzXmvNZdNtp0lm\nG785ZanHnF14PIfT5Feojc7Hc168HwQuT/Ji4FHgbcDumTV3AHuA/UmuAv67qh6bvaOqanyYJGlz\nWTPeVXUyyR7gHmALcHtVHUhyw3T/vqq6M8m1SQ4D3wWuP+dTS9IFLlVenpakbuZdNrkgJNkG/BLw\neiaXij5VVT9Y7FR9JXkBk7eZvoHJO5Fum77VVGcokxeKrgHeDHwZ+LuqOrnYqfpK8pPAS4F7u/+O\nX/Afj0/yLOBm4P3A94DfB9690KH6uxn4QyaX0X4F2JMur1ZvPlcxOZZPMrkk+U6P5cYkeSfwAPBZ\n4Ben29o20DPvib+sqgMASb4FvDrJc6vqewueq6t3PX1Wk+R9wGvK63MbdT1wf1XdlOTXgF8FfplJ\ngHRmxkyeDf4xk2faD3D625zbaPuvzlCq6v+mL8JumW56FPgFw71xVfWDJC9M8kEm7076wqJnauwb\n/DAwXwYeY3rWqDNTVd+sqiPAYeA1i57nbF3w8X7asutf7wA+ushZzhMFXAJ8BnhPktcveJ52ppdH\njgPPm256HPhP4KcWNtT54cvAS6ZfP7XIQc6Gl02WSbIL2MYkODoLVfU48D6AJE8B1yS5r6ra/rI8\n06qqknwTeF2S51XVk0lOAk8+fXvRMzb1H8Bzkryoqr616GE2yjNvTp3hALwd+FBVfTvJFl8YGszF\nTH7WvO595h4EXgi8bnr7VcCThnvjph8i/AbwIuj7omXLoYc2PcP5A+AtwLuSfAH4GPATi52spyQX\nJbkiye8l+Ssmbxu8xxctz1xVfRv4J+C9ST4GvBy4b7FT9ZZkJ/AC4I7pMf25BY+0IX5IZyrJHzF5\nW9vfAgeAL/kUf+OSvB34DeBfgM9X1VcXPFJbSS4C3gi8FvhMVX1lwSO1leRi4FPAd5j8bI6r6sHF\nTrUxxluSGvKyiSQ1ZLwlqSHjLUkNGW9Jash4S1JDxluSGjLektSQ8Zakhv4fy5+rztM0M18AAAAA\nSUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 44 }, { "cell_type": "code", "collapsed": false, "input": [ "# Reminder:\n", "\n", "print \"d0\", d0.words\n", "print \"d1\", d1.words\n", "print \"d2\", d2.words\n", "print \"d3\", d3.words\n", "\n", "# show the distances, which are used to get the hierarchy:\n", "print \"d2, d3 distance\", 1-simple.similarity(d2,d3)\n", "print \"d0, d1 distance\", 1-simple.similarity(d0,d1)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "d0 {u'tiger': 1, u'stripes': 1, u'yellow': 1, u'cat': 1}\n", "d1 {u'lion': 1, u'manes': 1, u'yellow': 1, u'cat': 1}\n", "d2 {u'slurf': 1, u'grey': 1, u'animal': 1, u'elephant': 1}\n", "d3 {u'animal': 1, u'elephant': 1}\n", "d2, d3 distance 0.5527864045\n", "d0, d1 distance 0.8\n" ] } ], "prompt_number": 45 }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Let's do this with the fairy tales now... " ] }, { "cell_type": "code", "collapsed": false, "input": [ "mtfidf.export('data/csv/fairy.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 46 }, { "cell_type": "code", "collapsed": false, "input": [ "fairyrows = read_weka_tfidf('data/csv/fairy.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 47 }, { "cell_type": "code", "collapsed": false, "input": [ "len(fairyrows)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 48, "text": [ "43" ] } ], "prompt_number": 48 }, { "cell_type": "code", "collapsed": false, "input": [ "len(fairyrows[0]) # words in the vector" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 49, "text": [ "5292" ] } ], "prompt_number": 49 }, { "cell_type": "code", "collapsed": false, "input": [ "def make_dend(data, labels=None, height=6):\n", " from pylab import rcParams\n", " dist = pdist(data, metric='cosine')\n", " link = linkage(dist, method='complete')\n", " rcParams['figure.figsize'] = 6, height\n", " rcParams['axes.labelsize'] = 5\n", " if not labels:\n", " dend = dendrogram(link, orientation='right') #labels=names)\n", " else:\n", " dend = dendrogram(link, orientation='right', labels=labels)\n", " return dist" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 82 }, { "cell_type": "code", "collapsed": false, "input": [ "# if you want to label by doc names\n", "names = [doc.name for doc in docs]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 51 }, { "cell_type": "code", "collapsed": false, "input": [ "dist = make_dend(fairyrows, height=15)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAWwAAANhCAYAAADDnoFOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xu4ZXV95/n3p7iIhAh1xKZQysE44MwkkUuUGMUAXrqR\nGC+dUdtOYqK5TZwWxmk1SKImk04TjVHHPGk6HcEgtmTyoBKvQSQgMGqppIpLVRnU0TQYq1DrgBDi\nI1rf+WOvao7Hc/bep+qsvc/v7Pfrec7D3muv9a21BT78/J3f+n1TVUiS1r4N074BSdJ4DGxJaoSB\nLUmNMLAlqREGtiQ1wsCWpEYcPOzDJK75k6T9UFVZ7ZojR9hVtV8/b3jDG/b72vXw4/f3+0/7Hvz+\n0/vpi1MiktQIA1uSGtFbYJ955pl9lW6C3//Mad/CVPn9z5z2LaxLGTbfkqT6nI+RpPUoCbXgl45J\nDgM+ATwEOBT466p6bZKTgP8M/BDwFeDnq+re5eo6JSJJPauqbwNnVdXJwOOBs5KcDrwDeE1VPR54\nP/DqYXUMbEmagKq6v3t5KHAQMA+cUFU3dMc/DvzcsBoGtiRNQJINSbYBu4Frq2o7sD3Jc7tTXgBs\nHlbDwJakCaiqvd2UyHHATyc5E3gZ8PIknwOOAL4zrMbQJx012twczM9P+y4ktaKq7knyYeAJVfVm\n4F8BJDkR+Jlh1xrYB2h+HlxII2mhZPH7HA18t6ruTvJQ4JnA7yV5RFV9PckG4HeAi4bVdUpEkvp3\nLPC33Rz2FuCDVXUN8G+T/D2wE7izqv5iWBHXYR+gxBG2pO+3eB32anGELUmNMLAlqREGtiQ1wsCW\npEYY2JLUCAN7DHNzg9UgS/1I0qS4rG8Mw5buuaxP0mIu65OkGWdgS1IjDGxJaoSbP0lSz4a0CPt/\ngBO7044C7q6qU5arY2BLUs+q6ttJzqqq+5McDNyY5PSqetG+c5K8Gbh7WB0DW5ImYIkWYXv2fZYk\nwAuBs4bVcA5bkiZgiRZhOxZ8/FRgd1V9aVgNA1uSJmCZFmH7vBh4z6gaTokcoI0bfeJR0vgWtggD\nruvmtJ8PnDrqWgP7AO3ZM/ocSbNl3BZh3cfPAHZW1T+OqmtgS1L/jgUu7Xo3bgAu61qEAbwIuHyc\nIu4lMgb3C5G0Eu4lIkkzzsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYkNcLAlqRG\nGNiS1Ag3f5KkniXZDLwL+BdAAf+lqt5uT0dJWnseAF5ZVduSHAHclOTqpno6zs3B/Pw070CS+ldV\nu4Bd3ev7kuwEHgnshPF7Ok41sOfn29i21I4yklZLkuOBU4AtCw6P1dPRKZEp8f9dSLOnmw65Ajiv\nqu5b8NFYPR2n2sCglcYAfdxnK99d0sot1cAgySHAh4CPVtXbFhw/GLgTOHVUmzCX9UlSz7o56ouB\nHQvDujN2T0cDW5L69xTgF4Czkmztfs7uPmujp2Mr0wJOiUhaCXs6StKMM7AlqREGtiQ1wsCWpEYY\n2JLUCAN7AubmBqtCFv5I0kq5rG8MB3qfS13fyneXtHIu65OkGWdgS1IjDGxJaoSBLUk9S3JJkt1J\nbl1w7HeT3LnE3iLLMrAlqX/vBBYHcgFvqapTup+/GVXEwJaknlXVDcBSLUtWtJLEwJak6XlFkpuT\nXJzkqFEnG9iSNB0XAY8BTga+BvzxqAvs6TglGzf6xKM0y6rqrn2vk7wD+OCoawzsKdmzZ9p3IKkv\n4wzGkhxbVV/r3j4fuHXY+WBgS1LvklwOnAEcneQO4A3AmUlOZrBa5MvAb4ys414io/Wxl4ik9cu9\nRCRpxjklMgZ/QShpLXBKZAJm5XtKGnBKRJJm3EQDe3HnFUnS+CY6JbJ4amBWpgpm5XtKGnBKRJJm\nnIEtSY0wsCWpEQa2JDXCwJakRhjYktSzJIcl2ZJkW5IdSS5c9Pm/T7I3ydywOj6aLkk9q6pvJzmr\nqu5PcjBwY5LTq+rGJJuBZwL/MKqOI2xJmoCqur97eShwELBvV/y3AK8Zp4aBLUkTkGRDkm3AbuDa\nqtqR5LnAnVV1yzg1nBKZAHf7k1RVe4GTkxwJXJXkHOC1wL9ccNrQpPDRdElaZaMeTU/yOgadZl4B\n7JsqOQ74KnDawn6PCzklIkk9S3J0kqO61w9l8EvGT1XVMVX1mKp6DHAncOpyYQ1OiUjSJBwLXJpk\nA4OB8mVVdc2ic0bONzglIkmrzN36JGnGGdiS1AgDW5IaYWBLUiMMbElqhIEtSY0wsCWpEQa2JDXC\nwJakRkwksOfm3K1Okg7URAJ7ft5H0CXNriSXJNmd5NYlPhurPRg4JSJJk/BO4OzFB1fSHgwMbEnq\nXVXdAMwv8dHY7cHAwJakqVhpezBwP2xJmrgkhwMXMJgO+e+HR1031cC216GkGfVY4Hjg5gxC8Djg\npiTLtgeDKQf2nj2jz5Gk1owaiFbVrcAxD56fLwM/UVVDU9E5bEnqWZLLgU8CJya5I8lLF50y1sLn\nibQI29cKzJZgkmaBLcIkacYZ2JLUCANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYkNcLAlqRGGNiS\n1AgDW5IaYWBLUs+SbE5ybZLtSW5Lcm53/KQkn0pyS5IPJPnhoXXc/EmSVtfizZ+SbAI2VdW2JEcA\nNwHPA94F/J9VdUO3g99jqur1y9V1hC1JPauqXVW1rXt9H7ATeBRwQtfvEeDjwM8Nq2NgS9IEJTke\nOAXYAmzvejsCvADYPOzaiXacsSWYpFnWTYdcAZxXVfcmeRnw9iSvAz4AfGfo9ZOcw5akWbBUA4Mk\nhwAfAj5aVW9b4poTgcuq6ieXq+uUiCT1LINOuxcDOxaGdZJHdH/dAPwOcNHQOo6wJWl1LbFK5HTg\neuAWHuzfeAFwAvC/d+/fW1UXDK1rYEvS6rKnoyTNOANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYk\nNcLAlqRGGNiS1AgDW5IaYWBL0oQkOSjJ1iQf7N7PJbk6ye1JPpbkqGHXG9iSNDnnATt4cAOo84Gr\nq+pE4Jru/bIMbEmagCTHAecA7wD2bQz1HODS7vWlDPo8LsvAlqTJeCvwamDvgmPHVNXu7vVu4Jhh\nBQxsSepZkmcDd1XVVh4cXX+fbi/roRtRT7Sn42qZm4P5+WnfhSSN7cnAc5KcAxwGPCzJZcDuJJuq\naleSY4G7hhXpvYHBvnBdzQYGNkSQtJYNa2CQ5AzgVVX1s0neBHyzqt6Y5HzgqKpa9hePvU+JOBKW\npB+wb8j5h8Azk9wOPK17v6zeR9jp/hvjCFvSrLBFmCTNOANbkhphYEtSIwxsSWqEgS1JjTCwJakR\nUw/subnBMr2V/EjSLJr6Ouz9WVPtOmxJa5nrsCVpxhnYktQIA1uSGmFgS9IEJDk7yeeTfCHJb+1X\nDX/pKEmra/EvHZMcBPw98Azgq8BngRdX1c6V1HWELUn9Ow34YlV9paoeAP4SeO5KixjYktS/RwF3\nLHh/Z3dsRZpsEbZxow/QSGrKqkziNhnYe/ZM+w4kaXlLDCi/Cmxe8H4zg1H2ijglIkn9+xxwQpLj\nkxwKvAj4wEqLNDnClqSWVNV3k/w74CrgIODila4QgUaX9UnSWrYu9xKZm5vmny5JbZnqCLuPjuqS\nNG3rcoQtSRqfgS1JjTCwJakRBrYkNcLAlqRGGNiS1AgDW5IaYWBLUiMMbEmakCQHJdma5IP7c72B\nLUmTcx6wg/3cH9vAlqQJSHIccA7wDmC/Hls3sCVpMt4KvBrYu78FDGxJ6lmSZwN3VdVW9nN0DWug\ngYH9GSXNgCcDz0lyDnAY8LAk76qql6ykyNS3V3VrVUnrzbDtVZOcAbyqqn52pXWdEpGkyduvoaoj\nbElaZTYwkKQZZ2BLUiMMbElqxESW9bl0T5IO3ER+6bhcCX/pKGk98peOkjTjDGxJaoSBLUmNMLAl\nqREGtiQ1wsCWpEYY2JLUsySHJdmSZFuSHUku7I7/fpKbu+PXJNk8tI7rsCVpdS21DjvJ4VV1f5KD\ngRuBVwE3V9W93eevAE6qql9drq4jbEmagKq6v3t5KHAQsGdfWHeOAL4xrMbUO85I0ixIsgH4O+Cx\nwEVVtaM7/gfALwL3A08aWmM9TonMzcH8/OrXlaTxDO04cyRwFXB+VV234Pj5wOOq6qXLVV2XI+z5\neefGJU3PsM3uquqeJB8GngBct+Cj9wAfGVbXOWxJ6lmSo5Mc1b1+KPBMYGuS/3HBac8Ftg6rsy5H\n2JK0xhwLXNrNY28ALquqa5JckeRxwPeALwG/OazIupzDdrmgpGlye1VJmnEGtiQ1wsCWpEYY2JLU\nCANbkhox1WV9dlOXpPFNdVlfX1zWJ2maXNYnSTPOwJakRhjYktQIA1uSepZkc5Jrk2xPcluSc7vj\nL+iOfS/JqaPquPmTJPXvAeCVVbUtyRHATUmuBm4Fng/82ThFDGxJ6llV7QJ2da/vS7ITeGRVXQOD\nVSXjcEpEkiYoyfHAKcCWlV5rYEvShHTTIVcA51XVfSu9fl1OifgEpaS1JskhwHuBd1fVlftTY10G\n9p49074DSbNs8YAxg0nqi4EdVfW25S4bWXc9PpouSdO0+NH0JKcD1wO3APsS8QLgIcCfAEcD9wBb\nq+pZy9Y1sCVpdbmXiCTNOANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYkNaLXwJ6b67O6JM2WXh+c\n2fd4pg/OSJolPjgjSTPOwJakRhjYktSzJIcl2ZJkW5IdSS7sjtvTUZLWkqr6dpKzqur+JAcDN3Y7\n+NnTUZLWmqq6v3t5KHAQsKeqPg/2dJSkNSXJhiTbgN3AtVW1Y6U11t0Ie24O5uenfReS9P2qai9w\ncpIjgauSnFlV162kxroL7Pl5131Lmq5hMxxVdU+SDwNPAK5bSV2nRCSpZ0mOTnJU9/qhwDOBrYtP\nG1XHwJak/h0L/G03h70F+GBVXZPk+UnuAJ4EfDjJR4cVWXePpttDUtK0+Wi6JM04A1uSGmFgS1Ij\nDGxJaoSBLUmNWBMPzvh0oiSNtiaW9a3mUjyX9UmaNpf1SdKMM7AlqREGtiQ1wsCWpJ4t1yJspdbE\nKhFJWs+WaxFWVTeupI4jbEmagKVahK20hoEtSROwGi3CDGxJmoCq2ltVJwPHAT+d5MyV1lh3c9gb\nNw5vzyNJ03QgLcLWXWDvWfGskCStrsWDxiRHA9+tqrsXtAj7vZXWXXeBLUlr0LHApUk2MJiKvqyq\nrllpkXW3l4gkTdu62Utkbm4Q0At/JEmjTXyEvdRo2hG2pPVk3YywJUn7x8CWpEYY2JLUCANbkhph\nYEtSIwxsSWqEgS1JjTCwJakRBrYk9Wy5FmFJTkvymSRbk3w2yROH1XHzJ0nq2XItwoDfB15XVVcl\neRbwJuCs5eo4wpakCViiRdg8sAs4sjt+FPDVYTXcS0SSVtlSe4l0W6v+HfBY4KKqek2S/wG4ESgG\nA+ifqqo7lqvrCFuSJmCZFmEXA+dW1aOBVwKXDKuxJkbYc3MwP7/ff4wkrTHDd+tL8jrgn4HXV9XD\numMB7q6qI5e7bk380tG2XpLWkzFbhP1fwBeTnFFVnwCeBtw+rO5EA3tubpJ/miStGUu1CPt4kl8H\n/jTJQxiMuH99WJGJTomM2zJMklpmAwNJmnEGtiQ1wsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1J\njTCwJakRBrYkNcLAlqRGGNiS1LMkm5Ncm2R7ktuSnNsdn0tydZLbk3wsyVFD67j5kyStrsWbPyXZ\nBGyqqm1JjgBuAp4HvBT4RlW9KclvARur6vzl6jrClqSeVdWuqtrWvb4P2Ak8CngOcGl32qUMQnxZ\nBrYkTVCS44FTgC3AMVW1u/toN3DMsGsNbEmakG465L3AeVV178LPuvnnoRPGE28RtnHjD7bPkaT1\nLskhDML6sqq6sju8O8mmqtqV5FjgrmE1Jh7Y9m+UtN4t0dMxDDqk76iqty346APALwFv7P56JUNM\nfJWIK0QkrXdLrBI5HbgeuIUHpz1eC3wG+Cvg0cBXgBdW1d3L1jWwJWl12dNRkmacgS1JjTCwJakR\nBrYkNcLAlqRGGNiS1AgDW5IaYWBLUiMMbElqhIEtSY0wsCWpZ0kOS7IlybYkO5JcuOCzVyTZ2bUO\ne+OwOhPfrU+SZk1VfTvJWVV1f5KDgRu7DaEOYdB15vFV9UCSRwyr4whbkiagqu7vXh4KHATMA/8b\ncGFVPdCd8/VhNQxsSZqAJBuSbGPQCuzaqtoOnAj8dJJPJ7kuyROG1XBKRJImoKr2AicnORK4KsmZ\nDDJ4Y1U9KckTGeyN/SPL1TCwR5ibg/n5ad+FpPWiqu5J8mHgCcCdwPu6459NsjfJw6vqm0tda2CP\nMD9v0wVJK7NEi7Cjge9W1d1JHgo8E/g94F7gacAnkpwIHLpcWMMEAtumu5LEscClSTYw+N3hZVV1\nTZLrgUuS3Ap8B3jJsCK9twhbeHmLLcJavGdJ02WLMEmacQa2JDXCwJakRhjYktQIA1uSGmFgS1Ij\n1vSDMz5lKEkPWtPrsNfCGui1cA+S2uI6bEmacQa2JDXCwJakRhjYktSzJJck2d1t8rTw+Nj9HMHA\nlqRJeCdw9sIDSc7iwX6OPwa8eVQRA1uSelZVNzDo4bjQb7KCfo5gYEvStJzACvo5whp/cGYtsAGD\npJ6sqJ/jvgs0xJ49074DSa0Zc5C3on6O4JSIJE3LlQz6OTJOP0dwhC1JvUtyOXAG8PAkdwCvBy5h\nBf0cwb1EJGnVrau9RObmBmE86keS9KCpjLDHHTk7wpbUonU1wpYkrZyBLUmNMLAlqREGtiQ1wsCW\npEYY2JLUCANbkhphYEtSIwxsSepZkscl2brg554k5664jk86StLqGvakY5INwFeB06rqjpXUdYQt\nSZP1DOBLKw1rMLAladL+DfCe/bnQKRFJWmXLTYkkOZTBdMj/Mk7T3cXWdAMD+ylKWmeeBdy0P2EN\nazyw7acoqUVDBpovBi7f77qTnBKZm4P5+cFrpzokrVdLTYkk+SHgH4DHVNW9+1V3koE96rgkrQc2\nMJCkGWdgS1IjDGxJaoSBLUmNMLAlqREGtiQ1wsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1JPUty\nWJItSbYl2ZHkwu747ya5c0Gvx7OH1nHzJ0laXcvs1nd4Vd2f5GDgRuBVwNOBe6vqLePUdYQtSRNQ\nVfd3Lw8FDgK6zaYZe1c/A1uSJiDJhiTbgN3AtVW1vfvoFUluTnJxkqOG1pjGlMjCRgaStP4svx92\nkiOBq4DzgR3AvnZhvw8cW1W/slzVqbQIs/WXpPVsWC/aqronyYeBJ1TVdQ9ek3cAHxxW1ykRSepZ\nkqP3TXckeSjwTGBrkk0LTns+cOuwOmu6Ca8krRPHApcm2cBgoHxZVV2T5F1JTgYK+DLwG8OKTGUO\nW5LWM3s6StKMM7AlqREGtiQ1wsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYk9SzJ\nJUl2J7l1wbE/SrKz2wv7fd22q0MZ2JLUv3cCi/s1fgz40ao6CbgdeO2oIga2JPWsqm7gwZZg+45d\nXVV7u7dbgONG1TGwJWn6XgZ8ZNRJBrYkTVGS3wa+U1XvGXWuDQymzP6W0uxK8svAOcDTxznfwJ6y\n+XmbPEjrzbCejg+ek7OBVwNnVNW3x6prx5np8n8jaf1Z3HEmyeXAGcDRwG7gDQxWhRwK7GtL/qmq\nevnQugb2dPm/kbT+2CJMkmacgS1JjTCwJakRBrYkNcLAlqRGGNiS1IjeH5zxST5JWh29r8MG1xkP\n4zpsaf1xHbYkzTgDW5IaYWBLUiMMbEnq2TI9HV+QZHuS7yU5dZw6BrYk9W+pno63As8Hrh+3iPth\nS1LPquqGJMcvOvZ5GKwoGZcjbElqhCPsKdu4cbzuFJJkYE/Znj2jz5HUlr4GYU6JSNL0jRXxPpou\nSatszJ6Oe4A/6Y7dA2ytqmcNrWtgS9Lqci8RSZpxBrYkNcLAlqRGGNiS1AgDW5IaYWBLUiMMbElq\nhIEtSY0wsCWpEQa2JDXCwJakniXZnOTariXYbUnO7Y6fluQzSbYm+WySJw6t414ikrS6ltj8aROw\nqaq2JTkCuAl4HnARcGFVXZXkWcBrquqs5eq6H7Yk9ayqdgG7utf3JdkJPAr4GnBkd9pRwFeH1XGE\nLUmrbNhufV1vx08APwo8HLgRKAZT1D9VVXcsV9c5bEmakG465ArgvKq6D7gYOLeqHg28Erhk6PV9\nj7A3boT5+f0uIUkN+sERdpJDgA8BH62qt3XHvlVVD+teB7i7qo78gXKd3uew7VkoadYs7unYhfHF\nwI59Yd35YpIzquoTwNOA24fW7XuE7fy1pFmzxCqR04HrgVsYzFcDXAB8HfhT4CHAPwMvr6qty9Y1\nsCVpddkiTJJmnIEtSY0wsCWpEQa2JDXCwJakRhjYktQIA1uSGmFgS1IjDGxJaoSBLUmNMLAlqREG\ntiT1LMlhSbYk2ZZkR5ILu+NzSa5OcnuSjyU5amgdN3+SpNW11OZPSQ6vqvuTHMygy8yrgOcA36iq\nNyX5LWBjVZ2/XF1H2JI0AVV1f/fyUOAgYJ5BYF/aHb+UQWPeZRnYkjQBSTYk2QbsBq6tqu3AMVW1\nuztlN3DMsBp2TR9ibs72ZpJWR1XtBU5OciRwVZKzFn1eSYZOIhvYQ8zPOwcvaeUWtwhbqKruSfJh\n4CeA3Uk2VdWuJMcCdw2r65SIJPUsydH7VoAkeSjwTGAr8AHgl7rTfgm4clgdR9iS1L9jgUuTbGAw\nUL6sqq5JshX4qyS/AnwFeOGwIi7rG6L1+5c0HfZ0lKQZZ2BLUiMMbElqhIEtSY0wsCWpEU0u6/MJ\nREmzqMllfZNabueyPkn7w2V9kjTjDGxJaoSBLUmNMLAlqWdJNie5Nsn2JLclOXfR5/8+yd4kc8Pq\nNLlKRJIa8wDwyqraluQI4KYkV1fVziSbGeze9w+jijjClqSeVdWuqtrWvb4P2Ak8svv4LcBrxqlj\nYEvSBCU5HjgF2JLkucCdVXXLONc6JSJJE9JNh1wBnAfsBS5gMB3y308Zdr2BPcTGjcNb/UjSuJIc\nArwXeHdVXZnkx4HjgZszCJrjGMxtn1ZVS7YK80lHSVpli590zCCRLwW+WVWvXOaaLwM/UVV7lqvr\nHLYk9e8pwC8AZyXZ2v08a9E5I4ehjrAlaZW5l4gkzbg1E9hzc4OR8zg/kjSL1syUSF/nStKkOSUi\nSTPOwJakRhjYktQIA1uSGmFgS1IjDGxJaoSBLUmNMLAlqREGtiT1LMklSXYnufVA6hjYktS/dwJn\nH2gRA1uSelZVNwDzB1rHwJakRjTZIszWXZJmUZOBvWfZBjqSNH19DSidEpGkRkwssEc1KJCk9SrJ\n5cAngROT3JHkpftVZ1INDEY1HbApgaT1wgYGkjTjDGxJaoSBLUmNMLAlqREGtiQ1wsCWpEYY2JLU\nCANbkhphYEtSIwxsSWqEgS1JPVuqRViSuSRXJ7k9yceSHDWqjoEtSf1bqkXY+cDVVXUicE33figD\nW5J6tkyLsOcAl3avLwWeN6qOgS1J03FMVe3uXu8Gjhl1gYEtSVPW7WM9coPpNdMizD6NkmbM7iSb\nqmpXkmOBu0ZdMJHAnpsbfY59GiWtF2MOPj8A/BLwxu6vV46sO4mOM/tu3o4ykmbB4o4zXYuwM4Cj\nGcxXvx74a+CvgEcDXwFeWFV3D61rYEvS6rJFmCTNOANbkhphYEtSIwxsSWqEgS1JjTCwJakRBrYk\nNcLAlqRGGNiS1AgDW5IaYWBLUs+SbE5ybZLtSW5Lcu6Cz16RZGd3/I3D6qyZ7VUlaR17AHhlVW1L\ncgRwU5KrgU0MOs88vqoeSPKIYUUMbEnqWVXtAnZ1r+9LshN4FPBrwIVV9UD32deH1XFKRJImKMnx\nwCnAFuBE4KeTfDrJdUmeMOxaR9iSNCHddMgVwHlVdW+Sg4GNVfWkJE9ksD/2jyx3/cQC2xZgkmZZ\nkkOA9wLvrqp93WXuBN4HUFWfTbI3ycOr6ptL1ZhYYNsCTNKsWDw4TRLgYmBHVb1twUdXAk8DPpHk\nRODQ5cIanBKRpEl4CvALwC1JtnbHXgtcAlyS5FbgO8BLhhWZWIsw24NJmhW2CJOkGWdgS1IjDGxJ\naoSBLUmNMLAlqREGtiQ1wsCWpEYY2JLUCANbkhphYEtSIwxsSWpE74HtlqqSBEnOS3Jr17vxvP2p\n0Xtgu+mTpFmX5MeAXwWeCJwEPDvJY1daxykRSerf/wRsqapvV9X3gE8A/3qlRQxsSerfbcBTk8wl\nORz4GeC4lRaxgYEk9ayqPp/kjcDHgH8CtgJ7V1pnJhsYzM3B/Py070LS+jW8gUGS/wj8t6r6zyuq\nOouBvdbuR9L6slTHmST/oqruSvJo4CrgJ6vqWyup65SIJE3GFUkeDjwAvHylYQ2OsCVp1dnTUZJm\nnIEtSY0wsCWpEQa2JDXCwJakRkxlWZ8PrkjSyk1lWd+0l9VN+8+XtL65rE+SZpyBLUmNMLAlqRHu\nJSJJE5DkK8C3gO8BD1TVaSutYWBL0mQUcGZV7dnfAk6JSNLkHNDKEQNbkiajgI8n+VySX9ufAk6J\nSNJkPKWqvpbkEcDVST5fVTespMBMBvbGjYOHZyRpUqrqa91fv57k/cBpgIE9yp79nvKXpNEWDwi7\nTukHVdW9SX4I+JfA76207kwGtiRN2DHA+zNI8oOB/1pVH1tpkZncS0SS+uReIpI04wxsSWqEgS1J\njTCwJakRBrYkNcLAlqRGTGQdtk8WStKBm8g67HGPS9J64DpsSZpxBrYkNcLAlqRGGNiS1LMklyTZ\nneTWBcdOSvKpJLck+UCSHx5Vx8CWpP69Ezh70bF3AK+pqscD7wdePaqIgS1JPes6y8wvOnzCgo4z\nHwd+blQdA1uSpmN7kud2r18AbB51gQ0MFpmbg/nF/x2UpNX3MuDtSV4HfAD4zqgLDOxF5ud9qEfS\ngRnnye5SxkR3AAAQd0lEQVSq+nvgXw3Oz4nAz4y6xikRSZqCrns6STYAvwNcNOoaA1uSepbkcuCT\nwOOS3JHkZcCLk/w9sBO4s6r+YmQd9xL5fmv53iS1wb1EJGnGGdiS1AgDW5IaYWBLUiMMbElqxFQe\nnLFlmCSt3FSW9a1lLd6zpLXFZX2SNOMMbElqhIEtSY0wsCWpZ8u0CPv9JDcn2ZbkmiQj98P2l46L\ntHjPktaWxb90TPJU4D7gXVX1492xH66qe7vXrwBOqqpfHVbXEbYk9WypFmH7wrpzBPCNUXVsYCBJ\nU5LkD4BfBO4HnjTqfEfYkjQlVfXbVfVo4C+At4463xH2Ij6FKWkK3gN8ZNRJBvYie/ZM+w4ktW6c\nQV+SE6rqC93b5wJbR11jYEtSz7oWYWcARye5A3gDcE6SxwHfA74E/ObIOi7rk6TV5V4ikjTjDGxJ\naoSBLUmNMLAlqREGtiQ1wsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1JjTCwJalnSTYnuTbJ9iS3\nJTm3O76ivo5u/iRJq2yJno6bgE1VtS3JEcBNwPOAO1fS19ERtiT1rKp2VdW27vV9wE7gkSvt6+h+\n2JI0QUmOB04BtnTvx+7rOPNTInNzMD8/+jxJGt/S+2F30yHXAf+hqq5c9Nn5wOOq6qXLVp31wG7h\nHiW1ZakGBkkOAT4EfLSq3rbENY8GPlJVP7ZcXeewJalnSQJcDOxYGNZJTlhw2si+jo6wG7hHSW1Z\nYpXI6cD1wC3AvsS5APgV4Pv6OlbVXcvWNbDX/j1Kaos9HSVpxhnYktQIA1uSGmFgS1IjDGxJasTM\nBPbc3GBFyOIfSWrFzCzrW+5e1tI9SlofXNYnSTPOwJakRhjYktQIA1uSJiDJUUmuSLIzyY4kQ/e+\nXooNDCRpMv5vBtun/q9JDgZ+aKUFXCWyhu5R0vqwxG59RwJbq+pHDqSuUyKS1L/HAF9P8s4kf5fk\nz5McvtIiBrYk9e9g4FTgP1XVqcA/AefvT5GZtnGjTzxK6t2dwJ1V9dnu/RUY2Cu3Z8+070DSerN4\nEFhVu5LckeTEqrodeAawfcV1Z/2XjpK02pZpwnsS8A7gUAbtwF5aVfesqK6BLUmry71EJGnGravA\nXm4LVX+pKGk9WFdTIsP+PKdEJE2KUyKSNOMMbElqhIEtSY0wsCWpEQa2JDXCwJakRhjYktQIA1uS\nGmFgS1LPkmxOcm2S7UluS3Jud/yPuh6PNyd5X9eZZvk6PukoSatriRZhm4BNVbUtyRHATcDzgOOA\na6pqb5I/BKiqZffJdoQtST2rql1Vta17fR+wE3hkVV1dVXu707YwCPBlGdiSNEFJjgdOYRDQC70M\n+Miwa2em44ytwCRNWzcdcgVwXjfS3nf8t4HvVNV7hl0/M4FtKzBJk7LU4DDJIcB7gXdX1ZULjv8y\ncA7w9FF1ZyawJWlakgS4GNhRVW9bcPxs4NXAGVX17ZF1WlklMjcH8/Ojz3MliKRpW2KVyOnA9cAt\nwL6UugB4O4Mej/vmAD5VVS9ftm4rgT1OLZfuSVoLbGAgSTPOwJakRhjYktQIA1uSGmFgS1IjDGxJ\naoSBLUmNMLAlqREGtiQ1wsCWpEYY2JI0AUle27UIuzXJe5I8ZKU1DGxJ6lnXtODXgFOr6seBg4B/\ns9I6bq8qSf37FvAAcHiS7wGHA19daRFH2JLUs6raA/wx8N+AfwTurqqPr7SOgS1JPUvyWOD/AI4H\nHgkckeTnV1pnXU2J2LdR0hr1BOCTVfVNgCTvA54M/NeVFJlqYI/bRWZc9m2UtBYsMXD8PPC6JA8F\nvg08A/jMSutONbDn58fvEOPIWVKrqurmJO8CPgfsBf4O+C8rrTPVFmEraell+y9JrbBFmCTNOANb\nkhphYEtSIwxsSWqEgS1JjTCwJakRBrYkNcLAlqRGGNiS1AgDW5IaYWBLUiMMbEnqWZJLkuxOcuuC\nY6cl+UySrUk+m+SJo+oY2JLUv3cCZy869ibgdVV1CvD67v1QBrYk9ayqbgAW7/7/NeDI7vVRjNHj\ncV11nJGkhpwP3JjkzQwGzz816oJmAtv2X5LWmYuBc6vq/UleAFwCPHPYBc00MJCkVizVwCDJ8cAH\nq+rHu/ffqqqHda/DoJP6kYtrLeQctiRNxxeTnNG9fhpw+6gLmpkSkaRWJbkcOAM4OskdDFaF/Drw\np0keAvxz9354HadEJGl12dNRkmbcRAN7bm4wqt73I0ka30SnREa9l6T1wCkRSZpxBrYkNcLAlqRG\nGNiS1AgDW5IaYWBLUiMMbElqhIEtST1LsjnJtUm2J7ktybnd8d9NcmfXJmxrksVdab6/jg/OSNLq\nWvzgTJJNwKaq2pbkCOAm4HnAC4F7q+ot49R1tz5J6llV7QJ2da/vS7ITeFT38dhPRDolIkkT1DUy\nOAX4dHfoFUluTnJxkqOGXWtgS9KEdNMhVwDnVdV9wEXAY4CTGTTl/eOh1zuH3b+5OZhf3C9Z0jq2\nZIuwQ4APAR+tqrf9wBWLWogtxTnsCZifn43/MEkaWLx9dNez8WJgx8KwTnJsVX2te/t84NahdR1h\n929WvqekgSVWiZwOXA/cAuxLgwuAFzOYDingy8BvVNXuZesa2P2ble8pacD9sCVpxhnYktQIA1uS\nGmFgS1IjDGxJasRU12Fv3PiD6xUlSUub6rK+WTGr31uaVS7rk6QZZ2BLUiMMbElqhIEtSY0wsCWp\nZ0kuSbI7ya0Ljv3lgl6OX06ydWQdV4n0b1a/tzSrltit76nAfcC7ltrvOsmbgbur6j8Mq+t+2JLU\ns6q6oWtQ8AO6vbJfCJw1qo5TIpI0XU8FdlfVl0ad6Ah7AnyiU9IQLwbeM86JBvYE7Nkz7TuQNEnj\nDtCSHMygNdip45zvlIgkTc8zgJ1V9Y/jnGxgS1LPklwOfBI4MckdSV7affQi4PKx67isT5JWV9Ob\nP83N+Us3STpQExlhL/6rJK1nTY+wJUkHzsCWpEYY2JLUCANbkhphYEtSIwxsSWqEgS1JjTCwJakR\nBrYkTUiSg7qWYB/s3r8gyfYk30sycsc+A1uSJuc8YAew75nvWxlsr3r9OBcb2JI0AUmOA84B3gEE\noKo+X1W3j1vDwJakyXgr8Gpg7/4WMLAlqWdJng3cVVVb6UbX+2NNtwibm4P5+WnfhSQdsCcDz0ly\nDnAY8LAk76qql6ykyJreXtXtWCW1aNj2qknOAF5VVT+74Ni13bGbhtV1SkSSJq8Akjw/yR3Ak4AP\nJ/nosIscYUvSKrOBgSTNOANbkhphYEtSIwxsSWqEgS1JjZhqYM/NDVaCLPcjSXrQVJf1jVq257I+\nSS1yWZ8kzTgDW5IaYWBLUiMMbElqhIEtST1LcliSLUm2JdmR5MLu+B8l2Znk5iTvS3Lk0DquEpGk\n1bXUKpEkh1fV/UkOBm4EXgU8FLimqvYm+UOAqjp/ubqOsCVpAqrq/u7locBBwJ6qurqq9rUM2wIc\nN6yGgS1JE5BkQ5JtwG7g2qraseiUlwEfGVZjTbcI27jRJx4lrQ/dSPrkbp76qiRnVtV1AEl+G/hO\nVb1nWI01Hdh79kz7DiRp5YYNNKvqniQfBp4AXJfkl4FzgKePquuUiCT1LMnRSY7qXj8UeCawNcnZ\nwKuB51bVt0fV6XWE7ZSGJAFwLHBpkg0MBsqXVdU1Sb7A4JeQV2cQlp+qqpcvV6TXZX0P1tm/ZX2S\n1KK+Nn+a6By2I25J2n8THWGPe1ySWub2qpI04wxsSWqEgS1JjTCwJakRBrYkNcLAlqRGGNiS1AgD\nW5IaYWBL0oQkOSjJ1iQf7N6vqEWYgS1Jk3MesAPY94z3x4AfraqTgNuB1w672MCWpAlIchyDfa/f\nAQTAFmGStDa9lcHe13uX+XxkizADW5J6luTZwF1VtZVudL3o87XfIsztViXNiCcDz0lyDnAY8LAk\n76qql6ykRdhUt1eVpPVo2PaqSc4AXlVVP9u1CPtj4Iyq+saouk6JSNJkhQdXifwJcASDFmFbk/yn\noRc6wpak1WUDA0macQa2JDXCwJakRhjYktQIA1uSGmFgS1IjDGxJaoSBLUmNMLAlqREGtiQ1wsCW\npJ4l2Zzk2iTbk9yW5Nz9quNeIpK0uhbvJZJkE7CpqrYlOQK4CXheVe1cSV1H2JLUs6raVVXbutf3\nATuBR660joEtSROU5HjgFAY9HFfEwJakCemmQ64AzutG2isykRZhtgKTNOuSHAK8F3h3VV25XzUm\n8UtHSZolS/zSMcClwDer6pX7XdfAlqTVtURgnw5cD9zCg+3BXltVf7Oiuga2JK0uW4RJ0owzsCWp\nEb0F9nXXXddX6Sb4/a+b9i1Mld//umnfwrpkYPfE73/dtG9hqvz+1037FtYlp0QkqREGtiQ1YuSy\nvgneiyStG30s6xsa2JKktcMpEUlqhIEtSY04oMBOckmS3UluHXLO25N8IcnNSU45kD9vLUpydpLP\nd9/xt5b4/Ogkf5NkW9ca6JencJu9GfX9u3POTLK1+/7XTfgWezXO9+/Oe2KS7yb515O8v76N8c//\nz3f/7t+S5P9N8vhp3Gdfxvznf/UysKr2+wd4KoONuG9d5vNzgI90r38S+PSB/Hlr7Qc4CPgicDxw\nCLAN+J8XnfO7wIXd66OBbwIHT/veJ/j9jwK2A8ft+99g2vc9ye+/4Ly/BT4E/Ny073vCf/9/Cjiy\ne332esqAMb//qmbgAY2wq+oGYH7IKc9hsKUgVbUFOCrJMQfyZ64xpwFfrKqvVNUDwF8Cz110zteA\nh3WvH8Zge8XvTvAe+zTO9/+3wHur6k6AqvrGhO+xT+N8f4BXMNi0/uuTvLkJGPn9q+pTVXVP93YL\ncNyE77FP4/z9X9UM7HsO+1HAHQve38n6+hu21Pd71KJz/hz40ST/CNwMnDehe5uEcb7/CcBc1zH6\nc0l+cWJ317+R3z/Joxj8S3xRd2g9Lcsa5+//Qr8CfKTXO5qscb7/qmbgJDrOLF6LuJ7+gR3nu1wA\nbKuqM5M8Frg6yUlVdW/P9zYJ43z/Q4BTgacDhwOfSvLpqvpCr3c2GeN8/7cB51dVdZvYr6feS2P/\nu5zkLOBlwFP6u52JG/f7r1oG9h3YXwU2L3h/XHdsvVj8/TYz+C/oQk8G/gCgqr6U5MvA44DPTeQO\n+zXO978D+EZV/TPwz0muB04C1kNgj/P9fwL4y0FWczTwrCQPVNUHJnOLvRrn+9P9ovHPgbOratgU\namvG+f6rmoF9T4l8AHgJQJInAXdX1e6e/8xJ+hxwQpLjkxwKvIjBd17o88AzALq5q8cB/99E77I/\n43z/vwZOT3JQksMZ/OJlx4Tvsy8jv39V/UhVPaaqHsNgHvs310lYwxjfP8mjgfcBv1BVX5zCPfZp\nnH/+VzUDD2iEneRy4Azg6CR3AG9g8H+Bqao/q6qPJDknyReBfwJeeiB/3lpTVd9N8u+Aqxj8xvji\nqtqZ5De6z/8M+I/AO5PczOA/kK+pqj1Tu+lVNM73r6rPJ/kbBq2R9gJ/XlXrIrDH/Pu/bo35/V8P\nbAQu6v5fxgNVddq07nk1jfnP/6pmoI+mS1IjfNJRkhphYEtSIwxsSWqEgS1JjTCwJakRBrYkNcLA\nlqRGGNiS1Ij/H45g3OkAkXCyAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 91 }, { "cell_type": "code", "collapsed": false, "input": [ "1-mtfidf.similarity(docs[12], docs[11])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 53, "text": [ "0.7700672032114109" ] } ], "prompt_number": 53 }, { "cell_type": "code", "collapsed": false, "input": [ "1-mtfidf.similarity(docs[25], docs[17])" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 54, "text": [ "0.7895033309691184" ] } ], "prompt_number": 54 }, { "cell_type": "markdown", "metadata": {}, "source": [ "###Now do it with the labels = names above.\n", "\n", "###This is an example of a heatmap based on similarity scores. Most of these are not very similar." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Code borrowed from: http://nbviewer.ipython.org/github/OxanaSachenkova/hclust-python/blob/master/hclust.ipynb\n", "\n", "def make_heatmap_matrix(dist, method='complete'):\n", " \"\"\" Pass in the distance matrix; method options are complete or single \"\"\"\n", " # Compute and plot first dendrogram.\n", " fig = plt.figure(figsize=(10,10))\n", " # x ywidth height\n", " ax1 = fig.add_axes([0.05,0.1,0.2,0.6])\n", " Y = linkage(dist, method=method)\n", " Z1 = dendrogram(Y, orientation='right') # adding/removing the axes\n", " ax1.set_xticks([])\n", "\n", " # Compute and plot second dendrogram.\n", " ax2 = fig.add_axes([0.3,0.71,0.6,0.2])\n", " Z2 = dendrogram(Y)\n", " ax2.set_xticks([])\n", " ax2.set_yticks([])\n", "\n", " #Compute and plot the heatmap\n", " axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])\n", " idx1 = Z1['leaves']\n", " idx2 = Z2['leaves']\n", " D = squareform(dist)\n", " D = D[idx1,:]\n", " D = D[:,idx2]\n", " im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=plt.cm.YlGnBu)\n", " axmatrix.set_xticks([])\n", " axmatrix.set_yticks([])\n", "\n", " # Plot colorbar.\n", " axcolor = fig.add_axes([0.91,0.1,0.02,0.6])\n", " plt.colorbar(im, cax=axcolor)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 79 }, { "cell_type": "code", "collapsed": false, "input": [ "make_heatmap_matrix(dist, method='complete')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAApoAAAJaCAYAAACP5OdLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzs3Xm45HV55/3Pp+psvbB0AwJCawsCUZ8IuKATjYKjI5pE\nNIkLMyZG8yQ+SYw8ZiZGzT6ZxOUxxhgdh4yAqOM2GJFchhAkSOBJJC7d0NKNgooBtBukDw1N99nv\n+aOq8Xg4y33O+W1V5/26rrroU3Xzq29V/arqrt/y/TgiBAAAABStVfcAAAAA0J9oNAEAAFAKGk0A\nAACUgkYTAAAApaDRBAAAQCloNAEAAFCKgboHAFTFNnN5AQB6UkS47jGsBI0m1hTmjQUA9Bq7J3tM\nSew6BwAAQEloNAEAAFAKGk0AAACUgkYTAAAApeBkIACrsnmzNDpa9yiAldm0Sdq7t+5RAP3LnIWL\ntcJ2zF3faZJQp15ocniPoBf1wntrOWz37PRGNJpYM+ZrNG2piLdA07+M++1Dd7WKet0XU+Q6Uefr\nt5rnquznuenvu4wmvzereJ9kVPU6N/u1oNEEGq/MRrMpH8gLafr4ilL0F9JqvniKfM7rfP3qajSL\nei3Lah7WwmdHU8ZW1Tia8njnQ6MJ9IDVNJqr+dIr+ouuil/3Tf5lv5iivyiasjVvLTaaTW/kmj6+\nhdS1FbiKH211buEu+zOTRhPoAatpNJvwhVnW8uq6jzI06bmu+suxiVvumvC+aUKj2aQfqr34elbx\nOb1aZd93LzeanHUOADUaHS2uoUIzreY15nVFr2MeTaBAmzd3vhhmX6RHXrd5c73jBPrFfO+5pd5/\nvA+B6rBFEyhQdssFWymAYrC1EGg2tmgCa8xSW4CW2grE1iAAQBaNJrDGHNoCtNpLr89f2GvYRYxe\nwSFEmI1d5wDQA9hFjF7BIUSYjS2aAOa1mi1obLEAAEg0mgAWUMQudnavA3lF73JmFzaagF3nAAA0\nQNG7nNmFjSZgiyaAVVlsF7vErnUAWMvYoglgVThJBQCwELZoAoA4ng0AykCjCaASTW/ksic/cYIT\nAOSx6xxAJTgxAQDWHrZoAuhJTd9CCgCg0QTQo9jVPT8acABNQqMJAH2EBnx+NODIWioVbaVJaWt1\nijeO0QQA9D2OEUbWaqZsW661sL6xRRMAAACloNEEAAAo2Vo9fINd5wAAACVbq4dvsEUTAAAApaDR\nBAAAQCloNIEGW6vH9ABAWZaavkha29MRFY1jNIEGW6vH9ABAWVYzfRGftcvHFk0AAIA52KNUDBpN\nAMCq8aWMfkPKVjHYdQ4AWDUO8wAwH7ZoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0\nmgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAA\nACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgF\njSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYA\nAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABK\nQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJ\nAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACA\nUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBo\nAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAA\noBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0\nmgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAA\nACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgF\njSYAAABKQaMJAACAUtBoAgAAoBQ0mgAAACgFjSYAAABKQaMJAACAUjgi6h4DUAnbrOwAgJ4UEa57\nDCtBowkAAIBSsOscAAAApaDRBAAAQCloNAEAAFAKGk0AAACUgkYTAAAApRhY7Eamg8FqzZ6OwfaI\npOskDUsakvS5iHir7dMl/Q9JGyTdIek/RcSDRY+F9RkA0KsWmt5otd9tZU+btOj0RraD6Y+wUrYf\nsQLbXh8RB2wPSLpB0n+R9BeSfisirrf9WkmPi4g/KGE8seGxr0nU5Tb0rxs+KlU3ObU/Vdduj6Tq\nxif25ZbXGkrVrR85OlUnSWMTo7n7Tj4WJ3eqjE/mHvPQ4GGpuqmpA7m66fFUXVZ2fOuT61Z2eZPT\nB1N19+27NVUXMZOqW05tdn3duP74VJ2V++48OL43VTc5nVtnRgaPSNW5teh2nh/e79RDufsd2pSq\ny74eQ4MbUnX3P3hHqm7dSG6dHhvPfcZI0mHrH52qW7/15FSd90+m6mLjYKpOU7nn+varfnLJGvu0\nRRvNkS2vyo1pjrE7P1l6o8muc1QqIg59Wg9JaksalXRKRFzfvf4Lkn6ujrEBANCL7NaKLlWg0USl\nbLdsb5e0R9K1EXGLpFtsn9ctebmkLbUNEAAAFIZGE5WKiJmIOEPSiZKeY/tsSa+T9Ou2vyJpo6SJ\nGocIAEBPsVorulQhd5BITTZvlkbzh2ugh0TEPtufl/S0iHi3pBdKku1TJf1UrYMDAKCHVLUbfCUa\n3WiOjkqci9S77Ll/+2hJUxFxv+11kl4g6Y9tHxMR97rzTvk9SR+sfLAAAKBwjW400XeOl3Rpt6Fs\nSfpoRFxj+wLbv96t+UxEfLi2EQIA0GPYoglIiogdkp4yz/V/Kekvqx8RAAAoE40m1pTpmaXPM2ol\n57ebmMzNKT8++UCq7vijH5+quzc5j+ZMTKXqsnNjSlIrOddh9iDz7BhnZnJ12TlGBwdycwR6Jjev\nXma9kvJzIj6UfF5GknMTtpPLy86xeOTGrak6Sbp//x3p2oyB9nCqbmIy91xPTY+l6rLzfA4Obkwu\nLzcX40xy3cqug9n5f/clX7ejjvyxVF3L7VRd9j0sSTMxnSucexzXAmIwuVUweUhfZn5MSXr8C69f\numgJTj7GOjSu0eQEoP61SDLQpySd2i07UtL9EXFmTcMEAKDHsOs8bfYJQA1u0LECETFm+5zZyUC2\nnx0RrzxUY/vdku6vb5QAAKAojWs00d/mSQZ6OP/NnW3/r5B0Tg1DAwCgJzX5ZKDmjgx9aZ5koJ2z\nbv5JSXsi4lv1jA4AABSJRhOVWiAZ6JDzJX28loEBANCjmpx13uhd55s2cZxmv5qdDCTpi91jNl+m\neaY/AgAAC6sqTnIlGt1o7t27dA2aK5sM1L35+ZJ2RcT3Kh0kAAAoTaMbTfSdeZOBure9UtInahsZ\nAAA9qsknA9FoojILJQN1b3ttxcMBAAAlo9HEmpJJ/Rlsr08ta93w5lRdNrHhoYP3pOqyySjZ1Izh\nwSNSdZJ0cDx3PEv2uZmZziXWZH+tZ9NWlEzAyaatFC17vNXBsXtTdePJFKush8Zy66qUT9TJOpB8\nzEODh6fqBtojqbp0+tPk/lTdeDKtaTqZTjXo3Nd5Ngks+zmz78E7UnUb1j0qVbec99zYeO6xbBjO\npRJpJPccfuuys1J1j3/x/5+734HVb41s8hbN5o4MAAAAPY0tmqiM7S2SPiLpUeqkxf51RLyPCEoA\nAFauyVs0aTRRpUlJb4qI7bY3Svqq7auJoAQAYOWs5s4FSaOJykTEbkm7u//eb3uXpEdL2iURQQkA\nQL8ppNHcvFkazR2TC0iSbG+VdKakG2ddTQQlAADL1Pe7zkdHpYgilpRLAqKx7W3d3eaXSbogImaf\nokkEJQAAfaQnd50X2diiPPP9aLA9KOkzkj4WEZfPup4ISgAAVqDvt2gCGd1jMC+StDMi3jvnZiIo\nAQBYgSY3ms0dGfrRsyS9WtI5trd1L+d2byOCEgCAPsMWTVQmIm7QAj9uqoqgnJlZOo1jUgdyyxrL\nJXtkTUw+kKqbnDpY6P0uJ+Ulm9rhidxUG61kmkk6lWUq99pl02CKFslEoqmZ8UKXl00aSo9veixV\nt5xlph9LcsuNlUzoSb7vMqliUvGvXbZufGJfqi77OAYHNuTudzKZNDSwLlWXXVclaSaZruTp3LF2\nt1/+jFTdyT//r7n7ncils3ky9xovrrnbDZs7snls3pw7WQgAAAD166lGk5OAepvti23vsb1j1nV/\nZPuueXalAwCABLu1oksVeqrRRM+7RNLcRjIkvScizuxe/r6GcQEA0LNoNAFJEXG9pPkO6OGACAAA\n+hCNJprgN23fZPsi20fWPRgAAHqJ1VrRpQo92Whu2tQ5KYhLsy9JH5T0OElnSPq+pD8vabUBAAAV\n68npjfburXsEyMg0mxHx8Nw6tj8k6W9LHBIAAH2HCduXYbGtleg/to+f9efLJO1YqBYAADyS7RVd\nFljWubZvtX2b7d+Z5/ZNtj/bPeTtRttPWmxsjduiudjWSprN3mb7E5KeK+lo23dK+kNJZ9s+Q52z\nz78j6fU1DhEAgDXLdlvS+9WJhb5b0pdtXxERu2aVvU3S1yLiZbZPk/SBbv28Vt1obt682iVgrYiI\n8+e5+uIqx5BJhCk6OWNy6qFU3dDg4am6rHYy/WY5KTnZhJ6RwSNSdWOTuTSTgZnkGNu5soHWcK4w\naXI6l9aUTUIaHMytW1PTuRSadIJKcvdbuzWUqpPy60x2PVw3fFSqbngotw622rnHMjaeS8DJpI9J\n+c+ZgXZuXXVy3ZpKrquRXGeyj6POXbu3X/7MVN3jX/ql3ALbya1eI7nnJoZXP0F4gc/vWZJuj4g7\nOsv1JyWdJ2l2o/kESe+QpIj4hu2tto+JiHvnW+CqRzaae+8BAACg2U6QdOesv+/qXjfbTZJ+VpJs\nnyXpsZJOXGiBjTtGE/3L9kj3eI7ttnfafvuc2/+z7RnbbCcHACCpwOmNMptX3yHpSNvbJL1B0jZJ\nCwa7N+4YTfSviBizfU5EHHBnP88Ntp8dETfY3iLpBZK+W/MwAQDoKdld52P7v62x/d9erORuSVtm\n/b1Fna2aD4uIByW97of37e9IWnChNJqoVEQc6P5zSJ0j6g6d/vUeSW+W9Lk6xgUAQL8b2XiSRjae\n9PDf+/ZcM7fkK5JOsb1V0vckvVLSj5xfYfsISQcjYsL2r0i6LiL2L3SfNJqolDs/u74m6WRJH4yI\nnbbPk3RXRNy80HQLAABgfkWdDBQRU7bfIOkqdTYGXRQRu2y/vnv7hZKeKOnDtkPS1yX98mLL7KlG\n89Acm+hdETEj6YzuL6KrbL9Y0lsl/YdZZbzKAADUICKulHTlnOsunPXvf5F0WnZ5PdVokgjUWxb7\nURAR+2x/XtJT1ImgvKm7NfNESV+1fdbs1CAAADC/qnLLV6K5I0PfsX207SO7/16nzsk//xIRx0bE\n4yLiceocdPwUmkwAAHpfT23RRM87XtKl3eM0W5I+GhFzj0Re/cy1AACsJQ3OOqfRRGUiYoc6u8oX\nqzlpsdtXK5OQMjWZS86Yns4lnmSTUTqHry5tcurA0kWSBgcmU3UHx+9L1UnSUDIN6UAyHUXJx1z0\ncz3dytVlE3UiFpxCbkVCuedl/fDRqbrs6LKPI7sOSvmTFKamx1J1YxO5lJDsep293+zjyKY/Zdfp\n7HM9PHhYocuLZFJT9n6zyUrZdV+SHvj2m1N1h5/0rvQyM7Kv8cjwpuTykpFmi6gzeWkpzR0ZAAAA\nehqNJipj+2Lbe2zvmOc2UoEAAFgB2yu6VGFVjeZmWgIszyWSzp17JalAAAD0p1U1mqO5Qy4ASVJE\nXC9pvrXmUCoQAABYpgKzzgvHyUCoFalAAACsTpNPBiqk0SSxBythe72kt6mz2/zhq2saDgAAKFgh\njSaJPZhP4sfHyZK2ilQgAABWrsFb+9h1jtp059U89tDftr8j6akRwU8XAAD6QHN36qPv2P6EpH+W\ndKrtO22/dk4JqUAAACxXa4WXCrBFE5WJiPOXuL3UVKDOfWRSJ3K7IJaTYJFaXjIlJyubktNuDaWX\nOTU9nrzzZF1SNqGn6OXlX5Nid1tl02oOjP+g0OXl5X8TFp2aVPRjyZ5EMTOTS9rKfnln163s+MYn\nHyx0edkEoaI/Zx76zu+l6iRpw9b/lqor+uzqaSWTxcaK/pxZRIN3nbNFEwAAAKWg0URlbG+xfa3t\nW2x/3fYbu9efbvtfbN9s+wrbufBcAADQ2aK5kksFaDRRpUlJb4qIJ0l6pqTfsP0ESR+S9OaIeLKk\nz0r67RrHCAAACkKjicpExO6I2N79935JuySdIOmUbmqQJH1B0s/VNEQAAHpPg08GotFELWxvlXSm\npBsl3dJNCJKkl0vaUtOwAADoOWGv6FKFVZ91TioQlsv2RkmXSbogIh60/TpJ77P9+5KukJKn9AEA\ngEZbdaNJKhAWMt8PENuDkj4j6WMRcbkkRcQ3JL2we/upkn6qulECANDjGrzBj13nqIw7OZMXSdoZ\nEe+ddf0x3f+2JP2epA/WM0IAAFAkJmxHlZ4l6dWSbra9rXvd2ySdYvs3un9/JiI+XMfgAADoSa3m\nbtKk0URlIuIGzb8V/UpJ76t4OAAA9IcGnyxDo4k1JRO/1nLubTEyvClVN5GMhtsw8qhU3YMHvpeq\na7VyjyMbSSdJA+3h5H3n4ubWDeWew30P/VuqLhvlln1uZmZyEXLZSMtsFN7wYC6zoN0eydW1BlN1\nB8dzB90fdcRpqTpJ2vvA7am67Psu+9xMTD2Uu9/kuhDJdXqwvS5Vl49BzcV9tlrtVN1Ae32qbmxi\nNFW3fuSYVN3una9N1R1+0rtSdVLxn9XZz4/sOjiUrNt/cPfSNaklNRPHaKJyttu2t9n+2+7fm21f\nbfubtv/B9pF1jxEAgJ7hFV4qQKOJOlwgaaekQz/V3yLp6og4VdI13b8BAECPo9FEpWyfKOnF6sRO\nHvo99RJJl3b/famkl9YwNAAAelPLK7tUMbRK7gX4ob9QJ8t89sEwx0bEnu6/90g6tvJRAQCAwpV6\nMtDmzdJo7nhirAG2f1rSPRGxzfbZ89VERNjOHf0OAAD686zzzZuXrhkdlZInzKEPzbPe/4Skl9h+\nsaQRSYfb/qikPbaPi4jdto+XdE+1IwUAoIc1t89c+a5ztlRiuSLibRGxJSIeJ+lVkv4xIn5BnXzz\n13TLXiPp8rrGCAAAisM8mqjToe3d75D0adu/LOkOSa+obUQAAPQakoGAHxUR10m6rvvvvZKeX++I\nAABA0QprNDnxB/2i3c4lgEwmk0eydaFsAkixvw+npsfStROTuXyK9cNHpeqyKUfZMQ4kk3Impw6k\n6iKmU3VFOzA+kaobGtyYqpuezi0ve6DXuiOPSy5Pin3fTNW1B3Lvu2yizvRM7jFn64YGcs/1wWSi\nTjaRK5tWk133hwYPT9VlU3eyiT/HPfGSVF329ZDyCT0Hxu5N1WWTwKamDqbqBgZyKUwHx+5L1S2q\nuRs0i5ve6NCJP7MvwFy2z7V9q+3bbP9O3eMBAKDXhb2iy3yW+p62fbTtv7e93fbXbf/SYmNjHk1U\nxnZb0vslnSvpiZLOt/2EekcFAACk9Pf0GyRti4gzJJ0t6c/thTeB02iiSmdJuj0i7oiISUmflHRe\nzWMCAKC3FZcMlPme/r6kQ8dgHC7pvoiFj2mh0USVTpB056y/7+peBwAA6pf5nv6fkp5k+3uSbpJ0\nwWILLPWs802bGj1ZParHkbsAABQt2Wsd/MGtOnjfrYuVZL6n3yZpe0ScbftkSVfbPj0iHpyvuNRG\nc+/eMpeOppvnR8bdkrbM+nuLOr+WAADASiW36q075glad8wPD7kcve1zc0sy39M/IelPJSkivmX7\nO5JOk/SV+e6TXeeo0lcknWJ7q+0hSa9UJxUIAADUL/M9fau6c1/bPladJvPbCy2wkC2amdxzICKm\nbL9B0lWS2pIuiohdNQ8LAIDeVlAy0ELf07Zf3739Qkl/JukS2zeps8Hyzd3glXkV0mgyUTuyIuJK\nSVfWPQ4AAPBI831PdxvMQ//+gaSfyS6PCEqsKZmUjWw6RNbgwIZUXbs1mKrLJmdkkz2y6RqSNDGZ\nSzmamhlP1bXcTtVlU1SyddnnJpfJkpdNbxlsr0vVRXKE+XU6d77eXd/9p+Ty8svMpk5lk7GyST6T\n07mUqImp3Piy62BWdnkD7eFUXTYl54FvvzlVd+TJ70nVZROdlvP5OzWd+5zJfsYtMhXkj1hkJp8f\nMfrA7am6Iw/bumTNkp+8DT7xmmM0UTnbbdvbbP9t3WMBAKDn2Su7VIBGE3W4QNJOMd0RAAB9jUYT\nlbJ9oqQXS/qQGr2xHwCAHsEWTeBhfyHpt1X84W8AAKBhCjsZiBQgLMX2T0u6JyK22T677vEAANAX\nGrzZsLBGkxQgzDXPD4+fkPQS2y+WNCLpcNsfiYhfrHpsAAD0jQZv6WtwD4x+ExFvi4gtEfE4Sa+S\n9I80mQAA9C/m0USdOOscAIDVau4GzdU1mhyXiZWKiOskXVf3OAAAQHlW1WgeOi6TZhO9ImJ6yZps\ngkU25SWb+JNNuXDyiJd8mk7+YyC7zOxjzmq3hlJ12XSZ7GvXSn5EZteZqemxVF06DaaVS4MZHMgl\nDWVTp9YNH5Wqk/JJNOnXJJkMlF0HJ6ZyE2Bk06SGBnNJYFmTUwdTdRvXn5Cq+7ebfj5Vd/hJ70rV\nrR85JlWXTfzJroNS8e+niGTS0NARqboN645P1Y1P7EvVLSYKyjovA8doojK2R2zfaHu77Z223969\n/k9s39S9/hrbW+oeKwAAWD0aTVQmIsYknRMRZ0h6sqRzbD9b0rsi4vTu9ZdL+sM6xwkAQE9p8ITt\nnAyESkXEge4/hyS1Je2NiAdnlWyU9IPKBwYAQK9q7p5zGk1Uy52DZb4m6WRJH4yInd3r/1TSL0g6\nIOmZ9Y0QAAAUpdJd55s3r3zrLpfeu8wnIma6u8hPlPScQwlBEfG7EfEYSR9WJ6YSAABktLyySwUq\n3aI5OioFMyeuGYsd/hER+2x/XtLTJH1x1k0fl/R3pQ4MAABUgpOBUBnbR9s+svvvdZJeIGmb7cfP\nKjtP0rY6xgcAQE8qctdjwThGE1U6XtKl3eM0W5I+GhHX2L7M9mmSpiV9S9Kv1TlIAAB6SjU944rQ\naKIyEbFD0lPmuT43gzAAAOgphTSaRFGiV9jtpWuSR5RMT+cSLIYGD0vVZdMhQrkkk4hc3cTkg0sX\ndWUTerLP4cTUQ6m6bPLOxvW5JI7JyWLvd7CVS97Jpj9lk3eyKSrTyQSVTHKWJI1NjKbqJCmbypJN\neRn0+vR9Z2TX6exzPZZ8H2eThrKyiT+POf2yVN1AO5c6lU9+yi0v+7klScPJz9ZW8jXOfgZnU44O\njt+Xqsuug4vq92SgvXs7J/ksdcHaZnuL7Wtt32L767bf2L3+5d3rpm0/YosnAADoTew6R5UmJb0p\nIrbb3ijpq7avlrRD0sskXVjr6AAA6EUN3qJJo4nKRMRuSbu7/95ve5ekR0fENZJkjr8AAGDZosFf\nn0xvhFrY3irpTEk31jsSAABQlkq3aHLSECSpu9v8MkkXRMT+uscDAEBPY9d5x969Vd4b6jbfjwrb\ng5I+I+ljEXF51WMCAADV4RhNVMadgzAvkrQzIt67UFmFQwIAoPc1eHcxjSaq9CxJr5Z0s+1DMZNv\nkzQs6a8kHS3p87a3RcSLahojAAC9hV3ngBQRN2jhE9DYjQ4AQJ+h0QTmOGLjY1J1Dx28J1WXTd7J\nJmIMtEcKXd5yEkqGh45I1WXTkEaSaSv79v9bqi6bUpJNLsqaTCYcZZKpJGlsPJe8k50SLApOzFhO\nekt6PWzl1sPBdi6FaTz5vsu+n9IpUe1cctFhG05M1d158ytSdRu3/lmqLpvUlH28RX/OZMcn5V/j\nkeTnVva+s+/3bBpSdnmLavAcQg0eGvqN7RHbN9rebnun7bd3rycZCACAPsQWTVQmIsZsnxMRB2wP\nSLrB9rNFMhAAACvHyUBAR0Qc6P5zSFJb0t6IuFUiGQgAgBVp8MlA7DpHpWy3bG+XtEfStRGxs+4x\nAQCADtvn2r7V9m22f2ee2/+L7W3dyw7bU7aPXGh5lW7R3LxZGs0d444+FZ0jx8+wfYSkq2yfHRFf\nrHlYAAD0rChoj6A7Zyy+X9LzJd0t6cu2r4iIXQ/fV8S7Jb27W//Tkv7fiLh/oWVW2miOjkoFn/yI\nBltsvY+IfbY/L+lpkr5Y0ZAAAMDCzpJ0e0TcIUm2PynpPEm7Fqj/j5I+sdgC2XWOytg++tDmddvr\nJL1A0ra5ZZUPDACAXtZa4eWRTpB056y/7+pe9wi210t6oTqx0gviZCBU6XhJl7ozWVlL0kcj4hrb\nL5P0PpEMBABAaQ7euUNjd319sZLl7Hf+GUk3LLbbXKLRRIUiYoekR8yTGRGflfTZ6kcEAEAfSJ51\nvu6xT9a6xz754b/33fipuSV3S9oy6+8t6mzVnM+rtMRuc6nARpMTfdALIqaXrNn7wO2pZWWTTLLJ\nGetHjknVZVNjsg4/LJdQIkkPPLTQ582PGpvIjXFy6sDSRcuQTfZYN3JUqi7/XBd7xEcomxK1IVWX\nTx7JPY7lpLdk3nOdutxjnpoZT9W120Opuomp/am67GMeGd6Uqssm/mx58qdTdVmDA8UmK2Wfl3yK\nVT51Kps2ND0zmarLPpZ2azBVt2449zmTfa4XVdz0gF+RdIrtrZK+J+mVks5/5N35CEnPUecYzUUV\n1mhmTvTNAJR3AAAgAElEQVRhmsS1zfaIpOskDaszj+bnIuKt9Y4KAABIUkRM2X6DpKvUmev6oojY\nZfv13dsPBau8VNJVEXFwqWWy6xyVWSgZKCJuqHtsAAD0rAInbI+IKyVdOee6C+f8famkS1NDK2xk\nQMJ8yUA1DgcAAJSIRhOVIhkIAICCeYWXClS663zTJo7TXOtIBgIAoFjR4KzzShvNvewkXVNIBgIA\nYG1j1zkqk0wGAgAAy9Hyyi4VWPUWTebPxDLMmwxU85gAAEBJVt1oHpo/k2MvsZSFkoEAAMAqNLgJ\nYx5NrDHVvxmzaRMxM1Xo8rL2H9hd6PIkqeV2qq7dyqW3zETuucmmihSdrlS06emJVN3hGx6TqpuZ\nyS0vG3O8rPSWZIpKdpnZdSubBpNNl8navfO1qbrjnnhJqi67rmbfI0XLv26553k5n2/Z2pnkZ2s2\n7a3oRKLsZ/+iGnwgZIOHhn5je8T2jba3295p++3d68+y/a+2t9n+su2n1z1WAACwemzRRGUWSgaS\n9CeSfj8irrL9IknvknROrYMFAKBXNHjXOVs0Ual5koFGJe2WdET3+iMl3V3D0AAAQMHYoolKdc84\n/5qkkyV9MCJusf0WdbZuvludHz//rs4xAgDQU9bChO2k/iBjvmQgSb8n6Y0R8VnbL5d0sTpzbAIA\ngKWshUaT1B/MtYxkoLMi4vndmy6T9KHyRwcAAMrGMZqozALJQNsl3W77ud2y50n6Zk1DBACg54S9\noksVOEYTVZovGegLtn9V0gdsD0s6KOlX6xwkAAAoBo0mKrNQMlBEfEXSM6ofEQAAfaDB+6dpNLGm\nZJIkskkXTr6zp2fGU3Vjk/tSdVnZ9Iqp6bFC71eSRoY2peompw6m6rLJHumkkMJTVLKJOtOpumzu\nzugDt+fuN7lEJ1N3BgfWp+okaWJyf6oum8qSTVuZTqYhZdOpHvj2m1N1G7f+Waou+5oUbWLyoVRd\n9nNwaGBjqm5yOvtez72+ktLNVdGfhdnPo+xjzq6ri2rw2dgN7oHRb2xvsX2t7Vtsf932G7vXb7Z9\nte1v2v6HQ8dxAgCA3kajiSpNSnpTRDxJ0jMl/YbtJ0h6i6SrI+JUSdd0/wYAABktr+xSxdAquRdA\nUkTsjojt3X/vl7RL0gmSXiLp0m7ZpZJeWs8IAQBAkThGE7WwvVXSmZJulHRsROzp3rRH0rE1DQsA\ngN7T4Anb2aKJytneKOkzki6IiAdn3xYRoeyZFQAAoNEK2aLZ4JOd0DC2B9VpMj8aEZd3r95j+7iI\n2G37eEn31DdCAAB6TIP7sEIazWD7E+Yx9weIbUu6SNLOiHjvrJuukPQaSe/s/vdyAQCAlGjwrnOO\n0USVniXp1ZJutr2te91bJb1D0qdt/7KkOyS9op7hAQCAItFoojIRcYMWPi74+VWOBQCAvtHgYxhp\nNLGmDLRHlqwZHNiQWtZMMs1hyLnlZRNPsqk72ZSLbCKLlE8LGWgPp+qOP/qpqbrv/eDLqbqswXYu\n2WZy+kByibkP+exznR1fNuEom2SSTavJrgdS/jFn3ptS/v256dgfS9Xd9k/npOoOP+ldqbqsopN3\nsuky2bpsytbEVC75KZsm1WoNpuqk/HM4MpL7zNx/4PupusHB3GPJfgZn3++9irPOURnbI7ZvtL3d\n9k7bb59122/a3tVNDHpnneMEAKCnNHjCdrZoojIRMWb7nIg4YHtA0g22ny1pUJ1J258cEZO2j6l3\npAAA9JDm7jlniyaqFRGH9kUOSWpLGpX0/0h6e0RMdmvurWl4AACgQDSaqJTtlu3t6iQAXRsRt0g6\nVdJzbH/J9hdtP63eUQIA0DtarZVdqsCuc1QqOkdvn2H7CElX2T5bnfVwU0Q80/bTJX1a0kk1DhMA\nABSgUY3m5s3S6Gjdo0AVImKf7c9LepqkuyT9Tff6L9uesX1URNxX6yABAOgBDZ7daHWNZtEPbHSU\nlKF+Mk8y0NGSpiLiftvrJL1A0h9LelDS8yRdZ/tUSUM0mQAA5BTZj9k+V9J71TmP4kMR8YiZYLp7\nI/9CnZN5fxARZy+0vFU1mhHN7qLROMdLutSdCdpa6uSdX2P7nyRdbHuHpAlJv1jnIAEAWItstyW9\nX50Qlbslfdn2FRGxa1bNkZI+IOmFEXFXdyPSghq16xz9LSJ2SHrKPNdPSvqF6kcEAEDvc3Fb/c6S\ndHtE3NFd7iclnSdp16ya/yjpMxFxlyRFxA8WWyCNJtaUTCrGzEQuRSWbeJJN4kgn/kzl0mrGJx9M\n1Q0PHpaqk6SJqYfStRkHRm9J1bVbQ6m6bAJOq9XO1c0kPyKTZ286WTiUfE3yyUC5dfDg+N5U3WRy\nHZTyiT9Z2TSYbOLPKc+5NlWXfR9n19W2c3XZdJn8+7jYpLJsMlD2vbkc2XVr3fBRqbqZ6dxrPDiY\nS2vKPuZ2Ig1pX2pJhThB0p2z/r5L0jPm1JwiadD2tZIOk/SXEfHRhRZYysntmzd3dqkv94L+Zvti\n23u6u8hnX08qEAAAK7SSnmuBvitzpsygOnsnXyzphZJ+3/YpCxWXskVzpSf10Gz2vUsk/ZWkjxy6\nwvY5IhUIAIAVy/ZPB76xTQe/uX2xkrslbZn19xZ1tmrOdqc6JwAdlHSwe57F6ZJum2+B7DpHZSLi\nettb51z9ayIVCACA0q0/7UytP+3Mh/8e/fyH55Z8RdIp3e/q70l6paTz59R8TtL7uycODauza/09\nC90nyUCo2ykiFQgAgBVza2WXuSJiStIbJF0laaekT0XELtuvt/36bs2tkv5e0s2SbpT0PyNi50Jj\nY4sm6kYqEAAADRERV0q6cs51F875+92S3p1ZXqMazU2bOE5zDSIVCACAVWhy79SoRnNvbmYN9Ijk\nin+5SAUCAGDFWg1uNAs5RnPudEbAfGx/QtI/SzrV9p22XyvpYkkndac8+oRIBQIAoG8UskVz7nRG\nNJuYT0TMPXPtEFKBAABYoSb3XY3adQ6ULZMqMjyUS9jIJmdk0ysimfIyOXUwVTc0mEsAWY71I7lp\nTj3f6YzzGGyvS9WNTYym6rJJHNl0mWz6k6PYCTwmkqlO64Y3p+pya6rUSiSUSFLL+a+O7HM9nUxl\n2fet/5KqO+rUD6TqsutqOvEn+RzOxHSqLpv4Mzmd+1zIPt6iE4my68zYZD4DJ/tYxidyy8w+h6Hc\nOp39vJyaHk/V9SqmN0JlbJ9me9usyz7bb6x7XAAA9LICk4EKxxZNVCYiviHpTEly56fo3ZI+W+ug\nAABAaWg0UZfnS/pWRNxZ90AAAOhlbvBBmjSaqMurJH287kEAANDrkoer1qLBQ0O/sj0k6Wck/e+6\nxwIAAMpTyhZNEn6whBdJ+mpE3Fv3QAAA6HVN7rlW3Whu2tSZR3M2En4gLbrin6/O5OwAAKCPrbrR\n3Lu32Z00msX2BnVOBPqVuscCAEA/aHIfxslAqFREPCTp6LrHAQBAv6DRBHrIWDJFIpsUMj2TSzzJ\nprek0zCS6TLZ5CJJmprIJVgMJBN/ppOJGOOTD6Tq0nIBQukEkJlkSpSU+zbIJhJNHjiQqsum82S5\nlT+PNJswc/Df/ihVt3Hrn6Xq2u3c+zMr+z7OPt6sKeeWl03Fyr7fs8vLJhxlE83y7yVpMrleZ9f/\nyali30/3P3hHqq7fcdY5KmN7xPaNtrfb3mn77d3r/8j2XbMSg86te6wAAPSKlld2qQJbNFGZiBiz\nfU5EHLA9IOkG28+WFJLeExHvqXmIAACgQDSaqFREHNo3MSSpLenQnAUNPsIEAIDmavIxmuw6R6Vs\nt2xvl7RH0rURcUv3pt+0fZPti2wfWeMQAQDoKfbKLlWg0USlImImIs6QdKKk59g+W9IHJT1O0hmS\nvi/pz+sbIQAAKEohu85JAsJyRcQ+25+X9LSI+OKh621/SNLf1jYwAAB6jKs6s2cFCmk0SQLCfOb+\n+LB9tKSpiLjf9jpJL5D0x7aPi4jd3bKXSdpR6UABAEApOBkIVTpe0qXuTATZkvTRiLjG9kdsn6HO\n2effkfT6OgcJAEAvafJeZRpNVCYidkh6yjzX/2INwwEAoC/QaAINEckUi4yZyCVnZFMksmkw2cSO\nrCnlk0xazo0xWrkxZlNUsmlI2ec6m/KSvd/87FyRXFry8SaTi/LJQLnxHfP0c5LLk771qaen6tY9\n5o9SddlErnyyTbIuubz8+ziXgJN97bLraj65KLcuTEzuT9Xl30vFK/79nlP0d0Sv4qxzVMb2xbb3\n2N4x67r/z/au7tRGf2P7iDrHCABAr2F6I6DjEklz4yX/QdKTIuJ0Sd+U9NbKRwUAAEpBo4nKRMT1\n+mES0KHrro4f7je4UZ35NQEAQBJZ50DO6yR9ou5BAADQS5p8MhBbNNEItn9X0kREfLzusQAAgGL0\n5BbNzZul0dGl69AbbP+SpBdL+vc1DwUAgJ5T40n9S+rJRnN0VIrczAuoUWZTvu1zJf22pOdGRH6e\nHQAA0HgN7oHRb2x/QtI/SzrN9p22XyfpryRtlHS17W22/3utgwQAoMc0eXqjntyiid4UEefPc/XF\nlQ8EAABUgkYTa8zSP+GyyR5DAxtSdZNTB1N1I8ObUnUTkw+m6tIpPsl0GUkaTD7m6WT6yEB7JFU3\nPvlAqi772mVl01vycpsQik78KTpZKZv2I0knv/LLqbrBgfWpuuwYswk42fvNrquTUwdSda3WYKqu\n6Nd43fBRqboDY/em6k587HNSdQfv352qu2/fN1J1Uv65yb522QSh7OfMYesfnaq7f/8dqbrFuMGn\nna9q1/lKN9Wu9oLetEAy0Mtt32J72vYjctABAMDiiuynbJ9r+1bbt9n+nXluP9v2vu7hbtts/95i\nY1vVz/+6Tsih2exZl6hzTOZHZl23Q9LLJF1Yy4gAAIAkyXZb0vslPV/S3ZK+bPuKiNg1p/S6iHhJ\nZpnsOkdlIuJ621vnXHer1OzN/gAANFmBX6FnSbo9Iu7oLNeflHSepLmNZvoeOescAAAAknSCpDtn\n/X1X97rZQtJP2L7J9t/ZfuJiC2SLJgAAQA/LbtEcvflrun/HtsVKMgdFfk3Slog4YPtFki6XdOpC\nxT3ZaG7axHGaAAAAktRK9kRHnf4UHXX6D8+7/e7HL5lbcrekLbP+3qLOVs2HRcSDs/59pe3/bntz\nROyd7z57stHcO+9DQdOs4McAPx8AAKjPVySd0j2f4nuSXinpR+bAtn2spHsiImyfJckLNZlSjzaa\n6E3dZKDnSjra9p2S/lDSXnXORD9a0udtb4uIF9U4TAAAekp2i+ZSImLK9hskXSWpLemiiNhl+/Xd\n2y+U9POSfs32lKQDkl612DJpNFGZBZKBpM7xHQAAoGYRcaWkK+dcd+Gsf39A0geyy6PRBAAA6GEt\n1zSxeQKNJipje4s6k7U/Sp0z2/46It7XPcbj/ZIGJU1J+vWIyOXWLdvSb8Zs7OB4Mgoy66GDe5KV\nuQ+Uzry7xZqY3J+qy8b6ZSPfsoqPjCxa7rXLPo5sjGHWge/+Qapu3WP+aBlLLXZ9zUYtZu83G1WZ\nFTGdrMtGv+YeR/Z+D47fV+j93vXdfyp0ectT7GucfU0mIxczmo2WzK/TCytq13kZaDRRpUlJb4qI\n7bY3Svqq7aslvUvS70fEVd2pEt4l6Zw6BwoAAFaPRhOViYjdknZ3/73f9i51JoL9vqQjumVHqjO9\nAgAASGhy+g6NJmrRnTrhTElfknSbpBtsv1ud98u/q29kAACgKE1ugtGnurvNL5N0QUTsl3SRpDdG\nxGMkvUnSxXWODwCAXtJyrOhShVVt0SSdB8tle1DSZyR9LCIOTWt0VkQ8v/vvyyR9qJbBAQDQg/r2\nZKBo7tn0aIC5P0RsW52tlzsj4r2zbrrd9nMj4jpJz5P0zcoGCQAASsMxmqjSsyS9WtLNtrd1r3ub\npF+V9AHbw5IOdv8GAAAJTT4OkkYTlYmIG7Tw++EZVY4FAACUj0YTAACgh/XtMZrActgekXSdpGFJ\nQ5I+FxFvtb1Z0qckPVbSHZJeERH3lzOGpdNHsikN7dZQqi6bfrNuaFOq7uDEaKouO76hgQ2pOkma\nmhlP3ncusWawva7Q+x0aPCy3vKlcssfE1EOpuqyRwSOWLpK0edOP5RY4nEvTue3656XqNmz9b6m6\n5SSZ5BNwctYNH5Wqm0m+7yYmc69xKPc4RoZyr/F0Mv3JyZ2irVaxSWCTUwdTddnXd2gw9zmznMS1\n7LqQ1XKuJcquCxvXHZeqy6wLS31iucERlE3erY8+ExFjks6JiDMkPVnSObafLektkq6OiFMlXdP9\nGwAA9DgaTVQq4uGQ2CFJbUmjkl4i6dLu9ZdKemkNQwMAoCe1vLJLJWOr5m6ADtst29sl7ZF0bUTc\nIunYiNjTLdkj6djaBggAAArDMZqoVHQO6DnD9hGSrrJ9zpzbw00+2AQAgIZp8lbDRjWamzdLo7nz\nHNDjImKf7c9LeqqkPbaPi4jdto+XdE/NwwMAAAVoVKM5OkraUD+ZJxnoaElTEXG/7XWSXiDpjyVd\nIek1kt7Z/e/lAgAAKVXllq9EoxpN9L3jJV3qztwoLUkfjYhruilBn7b9y+pOb1TjGAEA6CnMowlI\niogdkp4yz/V7JT2/+hEBAIAy0WgCAAD0sDV7MhAn96BpMmk5MzGVWtZAezhVl03OaLdHcnXJxJ9s\nnVv5j4ENydSTqelckk/W5HQupWQimSoymExDaifTW7LpT9k0mAf335Wq2/2vr03VnfKT/5iqK0Nr\nGetXRjZNKvdMS61Wbl2dmcl9LmxYd3yq7uD4fam66emxVF3282NmOreuDg7knufsOj3QXl/o8qT8\nujAT06m67Gd6NhlrYCD3mFvJ16RXldoEHzq5J3tBf7O9xfa1tm+x/XXbb5xz+3+2PdONpAQAAAlN\nnrCdXeeo0qSkN0XEdtsbJX3V9tURscv2FnXOQv9uvUMEAKC3NPms8ybv1kefiYjdEbG9++/9knZJ\nenT35vdIenNdYwMAAMVjiyZqYXurpDMl3Wj7PEl3RcTNnjv5JgAAWBTTGyVt2vTISb7Rf7q7zS+T\ndIGkGUlvU2e3+cMldYwLAAAUq1GN5t69dY8ARZrvR4PtQUmfkfSxiLjc9o9L2irppu7WzBPVOXbz\nrIggihIAgCU0+TjIQhtNpjPCYtzpJC+StDMi3is9PIn7sbNqviPpqd1J3AEAwBLWzMlAc6czAuZ4\nlqRXSzrH9rbu5UVzalhzAADoE43adY7+FhE3aIkfNxFxUkXDAQCgL3AyENCHsgkW2WSgrGxy0VD7\nsFTdcsb30FjusNmhweLvu8jlZdNWsstzwUdI7d6ZS/w57omXpOoO23hiqm4muU63WoOpOin/3IRy\nz/XB8dxRNdl1sOXc1+BU5NaZ0QduT9W128nkruTzl00Cy6ZnDbRyKTkTk/sLXd5U8r0pSQfGf5Cq\ny64L+w/mjv3LPpYDY/em6jaMHLt0UQ9r8vGj6DO2L7a9x/aOuscCAEC/aHIyEI0mqnSJpHPrHgQA\nAP2ktcJLVWMDKhER10tiXgIAABrK9rm2b7V9m+3fWaTu6banbP/sYsvjGE0AAIAeVtT0Rrbbkt4v\n6fmS7pb0ZdtXRMSueereKenvtUTISqmNJkk/AAAAPeMsSbdHxB2SZPuTks6TtGtO3W+qk/D39KUW\nuKpGc6kJ2kn6Wdv4kQEAQPkKPLHnBEl3zvr7LknPmF1g+wR1ms/nqdNoLro5dVWN5qEJ2n9456tZ\nGgAAAJYre8LNnV/drju/dtNiJZl98O+V9JaIiG7iX327zoHZbH9C0nMlHWX7Tkl/EBG5iQABAMCq\nbHnqGdry1DMe/vtfLvrI3JK7JW2Z/b+os1VztqdK+mSnx9TRkl5kezIirpjvPmk0UZmIOL/uMQAA\n0G8K3HX+FUmn2N4q6XuSXinpR767Zyf42b5E0t8u1GRKNJpYYzKpE3ZuJ0Q2wSKbLjM2kZv5Kbu8\n8Yl9qbrlaLVyHxnZ9JHp6YlUXTY1Jp3yMj2erMu9xtl15v5v/Vaq7siT35Oqy5pMprdkzczk0qk6\nsmfD5r4pZ1q5+86msmTXrYjpVF0262pmajnP4dKmZ3LvpWxdtLOPJPf6jic/E5azbk3M5JaZTVfK\n3vdkwYlmB8bvK3R5qxERU7bfIOkqSW1JF0XELtuv795+4XKXSaOJyti+WNJPSbonIn68e91mSZ+S\n9FhJd0h6RUTcX9sgAQDoMS5oeiNJiogrJV0557p5G8yIWDIvlwnbUaX5koHeIunqiDhV0jXdvwEA\nQB+g0URlFkgGeomkS7v/vlTSSysdFAAAPa7JWefsOkfdjo2IPd1/75F0bJ2DAQCg1zR5q2GhjSZJ\nQFiN7pxcxR1oAgAAalVoo0kSEGZL/ujYY/u4iNht+3hJ95Q7KgAA+ktRWedlaPLWVqwNV0h6Tfff\nr5F0eY1jAQAABeIYTVRmVjLQ0YeSgSS9Q9Knbf+yutMb1TdCAAB6T1Un9qwEjSYqs0gy0PMrHQgA\nAH2ERhNojKWPY8km72TrsvJJQ7mEkkim1SzncUxP5VJF2q2h3H0XnPiTTT3JLm+gPZKqe/A7uelf\n1z/2v6bqsuPLPn+TuVVmGZZzPFj2G7DYY8yyz83yUo4S91vw50L2eRloD6fqsu+R7OdRVvb1KMNM\n5F7jbF3Bq6pisr7npgoco4nK2N5i+1rbt9j+uu03zrrtN23v6l7/zjrHCQBAL2mv8FIFtmiiSpOS\n3hQR221vlPRV21dLOk6didufHBGTto+pdZQAAKAQNJqoTETslrS7++/9tndJOkHSr0h6e0RMdm+7\nt75RAgDQW5jeCJjD9lZJZ0q6UdKpkp5j+0u2v2j7aXWODQCAXtLXEZQkAWG5urvNL5N0QUQ8aHtA\n0qaIeKbtp0v6tKSTah0kAABYtVU3mtHcrbWo2Xw/QmwPSvqMpI9FxKHJ2e+S9DeSFBFftj1j+6iI\nuK+qsQIA0KuaPL0Ru85RGduWdJGknRHx3lk3XS7ped2aUyUN0WQCAND7OBkIVXqWpFdLutn2tu51\nb5V0saSLbe+QNCHpF2saHwAAPafd4C2aNJqoTETcoIW3ov9ClWMBAKBfNHnXOY0m1pji3o3Z1Jhs\n2oSTR7JM15RcJOUTf7LPzcTU/lTdyMimVN3Y+Giq7tjNp6fqbv/qi1J1hz3uHam67GviVm5dKDox\nqQxOJlQVXZd9boo+gCx9v0nZdWZ6ZjJVlx1fOiUnKX2/yt+vnZtyfLC9LlVX9PsknZBWwmd1k3CM\nJipl+wLbO7oJQBfUPR4AAHpdy7GiSyVjq+ReAEm2/y9J/7ekp0s6XdJP2z653lEBAICy0GiiSj8m\n6caIGIuIaUnXSfrZmscEAEBPa/KE7TSaqNLXJf2k7c2210v6KUkn1jwmAAB6WnuFlyrUdjLQ5s3S\naO64ffSJiLjV9jsl/YOkhyRtk9TfR0EDALCG1dZojo6SKtTv5ksGioiL1Zk3U7b/TNK/VTsqAAD6\nC9MbAV22HxUR99h+jKSXSXpG3WMCAADloNFE1S6zfZSkSUm/HhEP1D0gAAB6WVVTFa0EjSYqFRHP\nqXsMAACgGqtuNDmpB70kkyqSTWkouq7Vyr0ds8kow4OHpeqmpsdTdcu573Z7OFU3mDwX7ODYfam6\n7HOYTfx5/FOvTNVlX+NsYtJA8vlrJ5c3PT2WqhubuD9V12oNpuokaWYml/SSTY5pJdNg2gO5Mc7M\nTKfqJqcPpOqyRoZzaVfpdT/5/IWTn0fJ9iB7Nmf6820mPxlO0SlM2eUNDWzILTD5eTlTQCJRX2ed\nr/SknvlOFEH/s32HpAckTUuajIiz6h0RAAC9jZOBgB8KSWdHxN66BwIAAMpFo4k6NPi3FwAAvaXJ\nWzRJBkLVQtIXbH/F9q/UPRgAAFCe2rZobtrEcZpr1LMi4vu2j5F0te1bI+L6ugcFAECvavIWzdoa\nzb0codf3FkgG+n73v/fa/qyksyTRaAIAsELtAufRtH2upPeqE4f+oYh455zbz5P0X9WZdGBG0m9H\nxD8utDx2naMyttfbPqz77w2S/oOkHfWOCgAASJLttqT3SzpX0hMlnW/7CXPKvhARp0fEmZJ+SdJf\nL7ZMTgZClY6V9Fl3NnUOSPpfEfEP9Q4JAIDeVuBWw7Mk3R4Rd0iS7U9KOk/SrkMFEfHQrPqNkn6w\n2AJpNFGZiPiOpDPqHgcAAJjXCZLunPX3XZKeMbfI9kslvV3S8ersnVzQqhrNTZtIBUL/WZdM7Mim\nshx5xMmpuu/v+ddUXTa9YnpmMlU3OLAuVSflU4Smk3XZlJdHPfoRn3Pzuu2f/32q7vCT3pWqyyb+\nzEQu/abtoVRd9rWbiVyqzVAyJUrJZKAyTGfTUXJPdVoks22y68J05B7HgbF7U3VZg4O5tJrxAw+k\n6oYGN6bqWpFMEEomRGWfZyn/XE/N5D6PppIJWtnkrunp3Pt4eOiIVN1iCjwZKHWwZ0RcLuly2z8p\n6aOSTluodlWN5t69nDmOPNsXS/opSfdExI93rztd0v+QtEHSHZL+U0Q8WNsgAQDoMdlG85Yv3axb\nvnTzYiV3S9oy6+8t6mzVnFdEXG97wPZRETFvXiq7zlGlSyT9laSPzLruQ5J+q7uyvlbSb0v6gzoG\nBwBAP3vSM5+sJz3zyQ///b/f97/mlnxF0im2t0r6nqRXSjp/doHtkyV9OyLC9lMkaaEmU6LRRIW6\nzeTWOVefMmsezS9I+nvRaAIAkFbU9EYRMWX7DZKuUmd6o4siYpft13dvv1DSz0n6RduTkvZLetVi\ny6TRRN1usX1eRHxO0sv1o5vsAQBAhSLiSklXzrnuwln/fpek3IHuakijuXkzJxWtYa+T9D7bvy/p\nCknJMwIAAIBEMtCSRkelKG5SezRE5kSxiPiGpBd26n2qOicLAQCApCY3miQDoVbdzHPZbkn6PUkf\nrFj2+KcAABe3SURBVHdEAACgKI3Yoom1wfYnJD1X0tG275T0h5I22v6NbslnIuLDdY0PAIBe1OQt\nmjSaqExEnL/ATe+rdCAAAKASq240N21i0nb0js4e+sWNT+bmi/fkQ0sXSXro4J5U3UAyaSibXpFN\noZlIPg4p9/xJUrs9mLvvqdx9ZxN/TvmJa1J12ZSSdSNHperGxnNnMxadIJRNUZlIrtNS7sM8+/xJ\n+XUm+1iySVbZdKVsIlF2fK1W7ms1vbxkElg2aWhwYH2qLvs5k30c2c+37Poi5Z+b9PKSr1123Sr+\n/bmwdoP7sFUfo7l3b+dEntVcsDbYvtj2Hts7Zl33J7Zvsr3d9jW2md4IAIBlaDlWdKlkbJXcC9Bx\niaRz51z3rog4PSLOkHS5OsdtAgCAPsAxmqjMfMlAc3LNN0r6QZVjAgCg1zV5qyGNJmpn+08l/YKk\nA5KeWfNwAABAQRrRaHJC0doWEb8r6Xdtv0XSX0h6bc1DAgCgZzC90RL27q17BCjDCn48fFzS3xU/\nEgAAUIdGNJpYu2yfEhG3df88T9K2OscDAECvafL0RjSaqMwCyUAvtn2apGlJ35L0azUOEQCAnlPV\nVEUrQaOJyiyQDHRx5QMBAACVoNHEmpJJp9i47rhC77Po1JjBdi4Z5cjDTkrVLUd2jAPJ9JHbvvyC\nVN1hj3tH7n7bw6m6djuXvJNNdcqmmTg5CcnQ4GGpumwyynQy5WWqlasbbOdeXymfMDMwkEuOGRna\nlKo7OJ47+P/Yo09P1X3/B19N1WXXhenIJRIdsfExqboHHrorVZd+j7Ry6V6RTF05bP2jU3X3778j\nVSdJQwMbUnWt5Pt9oJV7bpxMEMoqIuGoyScDNXnqJfyf9u49WNK7rvP4+3uuM5MYMgOGSzJbERkv\nUCEEJclycURjVYwIWruKKVMuIEoJgUgprom1lFv6x4K3CCgVSEBiJCk3QBZLIma3gCUlxgRDEpnB\nGDCahAxhmcllMpfT55zv/tE9ycnhzDnfc6b7ebp73q+prurLd57+ne6n+/zO73l+v8+YiYjtEfGZ\niPhyRPxTRLytd7/pQJIkjSE7mmpSB3h7Zr6A7nqZb4mI78d0IEmSNmwiNnZpgofO1ZjM3APs6V3f\nHxG7gedk5u4lZaYDSZK0DsM8amhHU63oRVGeBdzSu206kCRJY6b1jua2bbCvNldCYyIiTgSuBy7J\nzP1gOpAkSRs1zOmKrXc09+2D4qQ1jZiVdvyImAY+BlyTmTes8N9MB5IkaUwM82F9jZmICOAqYFdm\nXr7k/h1LykwHkiRpHWKDlya0PqKp48rLgIuAOyPiSGfyMuAXTQeSJGljPHQuAZl5MyuPot/YdFsk\nSdLgtdLRdAKQ2pK5uGbN7MzTSts6PPdI7UmLSSEnbHpmqW7z5u8s1T3y6L2lutliCg3UX5tq4s+O\nl9xUqqsmgCwsdkp1U8V0pWqSTzX9pppIVE2XqUrW3u8BJieq7asPn8xM1967KKajJLWT+qemau9x\nNf1py6ba564zv79UN1FMl5kovifVz0h1X1jMhVJdNU3n4NwAfukXPyebpmvfW/vnH6xtb7L2nVn9\nPqq+J6sZ5vMgW2nbkQlATgI6/kTEyRFxfUTsjohdEeFSRpIkjSkPnatpfwx8KjP/c3SHMGp/hkuS\npBVFDO/InR1NNSYinga8IjP/C0BmzgPF48+SJGklQzwXaKgP62v8fBfwzYj4cET8Y0R8MCK2tN0o\nSZI0GK2PaG7dOtzT8tVXU8CLgYsz89aIuBz4TeCd7TZLkqTRNcz9qNY7mnv3tt0CDcoKO/79wP2Z\neWvv9vV0O5qSJGkMeehcjcnMPcB9EfE9vbvOA77cYpMkSRp5JgPh2pl6wluBv4iIGbopQK9vuT2S\nJI20CQ+dP7l2Jgz3uQQarMy8A3hJ2+2QJEmD1/o5mlKTFhbn1qx5+LGvlbZVTTKZ6zxWqqsme7B4\n7CkST7GOFJp7vvjjpbrn/UAtVbTyfgDMzT9eqqsm6izOz9fqFmt11Z+jmgZT3WdO2HRKqe5wp7aK\nWGf+QKluZnMtxQrqCVrV12ZqclNfn7fq0OHaIbnqz1FV3ReqKTTVfbWSogYw16klIW2e3VqqW4/F\n4s9yqLj/V79nspg2U02nqv4cq+nn+F1EnA9cDkwCV2bmu5Y9/vPAb/Se9jHgVzLzzqNtz3M01ZiI\n2B4Rn4mIL0fEP0XE23r3/14vKeiOiPh4b71NSZLUoIiYBN4HnA88H7gwIr5/WdnXgB/KzBcCvwN8\nYLVt2tFUkzrA2zPzBcC5wFt6O/DfAi/IzDOBu4FLW2yjJEkjJWJjlxWcDdyTmfdmZge4DnjN0oLM\n/EJmHhkmvgU4bbW22dFUYzJzT2Z+qXd9P7AbeE5m3pRPHqdZc6eVJEkDcSpw35Lb9/fuO5pfBD61\n2gY9R1OtiIjTgbPodiyXegNwbdPtkSRpVFXP0fyHm+/iH26+a7WScmh6RLyS7u/sl61W10pH0zSg\n41tEnEh3sfZLeiObR+7/LWAuMz/aWuMkSRox1S7VOS8/g3NefsYTt//0Xd82rvMAsH3J7e10RzWf\n+nwRLwQ+CJyfmavOlGulo2ka0PFhpT8mImIa+BhwTWbesOT+1wEXAD/aUPMkSdJT3Qbs6B11/Drw\nWuDCpQUR8R+AjwMXZeY9a21wYB1NF2jXchERwFXArsy8fMn95wPvAHZm5qG22idJ0ijq14LtmTkf\nERcDn6a7vNFVmbk7It7Ue/wK4J3AVuD93V/rdDLz7KNtc2AdzaULtIOHygV0z+O4CLgzIm7v3XcZ\n8B5gBript9N+ITPf3E4TJUk6fmXmjcCNy+67Ysn1NwJvrG7PyUBqTGbezMorHexoui2SJI2LYR7L\ns6Op40oUVvSqJv5MT22pPWcxraaa7PHQvlVnDD5hy+wzSnX33fmzpTqA7S/8y1phMVVkZvo7apsr\nbm+6mMTRmT9YqlvMWjJQOZGomDQ0O1PLLKgmmUxNzJbqOlFLBuoUnxfqr+FE8dfRfDW9qJi0VX3v\nqiaK3x8nbnlWqW7/gT2lumriT/0zV3vfDndqn83q99t63o/q/j87c3KpbnJiplS3eXZbqa6ayLVp\n5thTkyLKk8Ub5zqaalREXNpLBrorIj4aEbXfgJIkaeTY0VRjerPYfgl4cWaeQfdE459rs02SJI26\n2OClCR46V5MepRtDuSUiFoAtdNfskiRJY8gRTTUmM/cCfwD8O931uR7OzP/dbqskSRptfcw677vG\nRjRNA1JEfDfwq8DpwCPA/4yIn8/Mv2i1YZIkjbBhHjXsS0ezsji7aUDHnxX+sPhB4O8y81vdx+Pj\nwEsBO5qSJI2hvnSCjyzOvvQireArwLkRsbmXEnQesKvlNkmSNNKG+dD5MI+2asxk5h3A1XSzVO/s\n3f2B9lokSZIGyVnnalRmvht4d9vtkCRpXAzzFBg7mpIkSSNsmCdb29FUYyLiQ8BPAA/1FmwnIs4G\n3gdMA/PAmzPz1kG1oRKHd3iuFhtWjWibnNxUqpvrPFZ83lrk24O7Xleqe/bz/6xUB/UYuapqHN7k\nZC0abn7hcKluarIWSFWN9et3jGHVfDFKs/pzVFWiXI+oxm7Oc6hUNzkxXaqb6xTjCaMW91mPkq29\n1g8/dm+pbmKiv7+mq/Gh1e+ZamxjdXvr2Verzz2/UNu3qvvqgcP/r1RX/ZkPHKptb1R5jqaa9GHg\n/GX3vRv4b5l5FvBOPKwuSdK6DHMykB1NNSYzPw8sXwjrQeDIkMLJmBQkSdLY8NC52vabwM0R8ft0\n//D5jy23R5KkkTJxPJ6jaRKQiq4C3paZn4iInwE+BPxYy22SJGlkDHN3a2AdTZOAVPxD4+zMPK93\n/XrgyoE1SJIkNcpzNNW2eyJiZ+/6jwB3t9kYSZJGTURu6NKEYx7R3LatH83Q8SAirgV2As+IiPvo\nzjL/ZeBPImIWONi7LUmSxsAxdzT3LZ9DLB1FZl54lIfOabQhkiSNkWE+R9ND55IkSRoIlzdSYyJi\nO3A1cAqQwAcy8z0R8dvAG4Fv9kovzcy/GUQbKkkNMVH7+yui9vE5dLg27P/MM3+4VHfPDeeW6k56\nbm3t+/Wk2pyy9YxaYfE13LvvK6W6zvyB2tMW35NqUkg12SNzoVQXMVmqOzRX22eqP281baX6c1QS\ntgalU0xDmi2mTk0Vk7v6/Z5U9TuRKLK2vaSYDBS1dJ5q6k61bj211TSpqoni53i+2L6FPPbkrmFe\n5ceOpprUAd6emV+KiBOBL0bETXQ7nX+YmX/YbvMkSRo9Q9zPtKOp5mTmHmBP7/r+iNgNnNp7eJg/\nJ5IkaQM8R1OtiIjTgbOAv+/d9daIuCMiroqIk1trmCRJI2Zig5em2iY1qnfY/HrgkszcD7wf+C7g\nRXSzz/+gxeZJkqQ+GalD59u2uZzSqIuIaeBjwDWZeQNAZj605PErgb9qqXmSJI0cJwP1yb59kM0s\nZK8+WL7jR0TQzTbflZmXL7n/2Zn5YO/mTwN3NdVGSZJG3/D2NEeqo6mR9zLgIuDOiLi9d99lwIUR\n8SK6s8//FXhTS+2TJEl9ZEdTjcnMm1n5vOAbm26LJEnjIoZ4RNPJQJIkSRqIvoxobt063CeiajhE\nxIeAnwAeyswzevddB3xvr+Rk4OHMPGtQbaikgFQTNhaLdTNTJ5Tqqok/z/upv1+7iHpCyUwxQQVg\n/4EH1y4CJieLaSHFhJlqeks9KaTWvmoyUL9/jk0zW0t1VQvFJKQDh79VqlvPPlNPJaq91tPTm2tP\nXEzUqSb+VFU/7zFR+3zOdR4r1U1PbinVzc48rVQ3NTlbqltY7JTqqiNu+w/uKdVBPTVpZvqk8jb7\nub2tJ51Sqnv08fvXrNm/xuPrSXhrWl86mnv39mMra7MzO/I+DLyXbgwlAJn5c0euR8TvAw+30C5J\nkkbY8HaQhrcLrLGTmZ8HVhw+6M1I/1ng2kYbJUmSnhAR50fEVyLiXyLiv67w+PdFxBci4lBE/Npa\n23MykIbFK4BvZOZX226IJEmjpF+TgSJiEngfcB7wAHBrRHwyM3cvKfsW8FbgpyrbdERTw+JC4KNt\nN0KSpOPY2cA9mXlvZnaA64DXLC3IzG9m5m1A6QTdkRrRdNLReIqIKboLtb+47bZIkjR6+tY5OhW4\nb8nt+4FzjmWDI9XRbGrSkfpjHX8UnAfszsyvD641kiSNp+qs889/7g5u/twdq5X0PX/xmDqa27b1\nqxk6HkTEtcBO4OkRcR/wzsz8MPBanAQkSdJAvWLnmbxi55lP3P4fv3vN8pIHgO1Lbm+nO6q5YcfU\n0dzX3+XHNOYy88Kj3P/6ptsiSdL46Nuh89uAHRFxOvB1ugNBK/7urj7pSB06lyRJ0mBk5nxEXAx8\nGpgErsrM3RHxpt7jV0TEs4BbgZOAxYi4BHh+Zq64rrwdTTWut3zCbcD9mfmTEfEzwG8D3we8JDP/\ncVDPPV9ISNky+/TaxornxHzr7reU6p6+472lurn5x0t1m2Zr6TLVVBuAgwf7mxxTTS86vFBLRzn1\nWS8t1T2w5+9KddXznqqpNpX9D+DAoW+W6qrtq77H1e0dOlw/nFVNa6o+dzWJZr64z8wW99VqQs+h\nziOlunIS0lQt8Wd6upZI1JlfK2Om62AxJapT/D6aLiYmVT8j69Hvz/Fc59FS3aOP37d2EbCpmNa0\nmn5mnWfmjcCNy+67Ysn1PTz18PqqXN5IbbgE2MWTJx3fRXfW+f9trUWSJI2o2OC/JtjRVKMi4jTg\nAuBKeud3ZOZXMvPuVhsmSZL6zkPnatofAe+ge26HJEk6ZsM7bji8LdPYiYhXAQ9l5u30cYqcJEka\nTgMZ0dy2zaWPtKKXAq+OiAuATcBJEXF1Zv5Cy+2SJGlkxRDHJg6ko7lvH2Tf15bXqFm+32fmZcBl\n3cdiJ/DrK3Qyh/fTIkmS1sVD52pTAkTET/eSgs4F/joiblz9v0mSpCfFBi+D52QgtSIzPwd8rnf9\nE8An2m2RJEmjqamlijbCEU1JkiQNRF9GNJ38o4qI2ER3FHMWmAH+V2ZeGhG/B7wKmAO+Crw+M2vx\nGutuw9p/W3UWDpa29ejXfqNU9/Tv+ZNSXTW9pZq6s1BM2JiZrq80ldO15Ix+JF0sdbiYyvLwI18t\n1VUTicrJO/Q3/WZycqZWNzFdqptfOFysq+0z01ObS3UAOV/bZ6rvSdXkRO01rKbBTE7M1rZHbXvV\nfSGK40EHD9WSfKr7VvV1qX4fVZOBFhbnSnVQf22qz11Nf9o0U0tdy+JklempE0t1qxveccO+tOzI\n5J8jF2klmXkIeGVmvgh4IfDKiHg58LfACzLzTOBu4NIWmylJkvrEczTVqMw80Ls6A0wCezNz15KS\nW4D/1HjDJEkaUZ6jKfVExEREfAn4BvCZZZ1MgDcAn2q+ZZIkjaaI2NClCXY01ajMXOwdOj8N+KGI\n+OEjj0XEbwFzmfnRttonSZL6ZyCHzrdu/fbFuqWlMvORiPhr4AeBz0bE64ALgB9ttWGSJI2c4e10\nbbijuXXr0Wea79270a1qnCz/YyMingHMZ+bDEbEZ+DHgv0fE+cA7gJ29CUOSJGkMbLijuXdvtyOx\nWodTWubZwEeiu67HBPDnmfl/IuJf6E4Ouql3zsgXMvPNLbZTkqSRUV3qqQ3HfOj8SIdTWktm3gW8\neIX7d7TQHEmSxsTwdsSGtwssSZKkkeY6mmpcREwCtwH3Z+ZPDlsyUDXx56TnvrtUNz1ZS1Gppt/0\nO/Ek1pHIksX0otliMtDhudrbXE1bqaYhrSd9pJ+q70k1oadaV5W5UKqbm3+8vM3FxU7tuaP22lRT\nieY6tTbWE2sG8pW0phNOPKVU98j+fy/VVfeZfic1HTpcO8ducbH2HdNVS4jpFPfX6vfCwcO1iSid\nhQNrF63jeVfT1FJFG+GIptpwCbCLJ78lTAaSJGkM2dFUoyLiNLrLGF1J76SSzLwpnxzquYXuGpuS\nJKkkNngZPA+dq2l/RHcpo5OO8vgbgGuba44kSaNtmGedD2/LNHYi4lXAQ5l5Oyv8KWUykCRJ46Uv\nI5omAanopcCrI+ICYBNwUkRcnZm/YDKQJEkbNbydsL50NE0C0kqW//GRmZcBl3Ufi53Ar/c6mSYD\nSZI0hjxHU20Jnpx1/l5MBpIkaUNi3Ec0pfXKzM8Cn+1dNxlIkqQNch1NSZIkHXci8+gr60dErv44\nrPKwjnMRQWbGktvbgauBU+geNv9AZr6nwfZk5j+vWXfC6b9b2l415aVaNzW5qVRXTfaopCDB+hJA\nqgk91fSiahurP3P1ta6qb6/6RVgbdajuC1XV5JFqMtD6Jh7UXptuYFj/VPet6r7a/32wndel+h5X\nn7f6Olet5zPc7++4cmrSRG171ZSjyj74+L995Cm/T5eKiJxfvKv0XMtNTZxx1O32i4fO1aQO8PbM\n/FJEnAh8MSJuyszdbTdMkiT1nx1NNSYz9wB7etf3R8Ru4DmAHU1JkjZomCcDeY6mWhERpwNn0Y2c\nlCRJY8gRTTWud9j8euCSzNzfdnskSRptwzuieUwdTROBtF4RMQ18DLgmM29ouz2SJI26fi5v1AtR\nuRyYBK7MzHetUPMe4MeBA8DretHSKzqmjqaJQFrN8v0+up+Eq4BdmXl5G22SJEkri+5yA+8DzgMe\nAG6NiE8unbTbi5F+XmbuiIhzgPcD5x5tm56jqSa9DLgIeGVE3N67nN92oyRJGm0TG7x8m7OBezLz\n3szsANcBr1lW82rgIwCZeQtwckQ882gt8xxNNSYzb8Y/biRJGlanAvctuX0/cE6h5jTgGytt0I6m\nJEnSCOvj8kYbTZ846v9bs6M5zPmZ0npFfG/bTZAkqa/6+LvtAWD7ktvb6Y5YrlZzWu++Fa3a0Rx0\nLJHUJPdnSdK46fPvttuAHb21rr8OvBa4cFnNJ4GLgesi4lzg4cxc8bA5eOhckiRJQGbOR8TFwKfp\nLm90VWbujog39R6/IjM/FREXRMQ9wOPA61fbZmRWD8dLkiRJdc4AliRJ0kDY0ZQkSdJA2NGUJEnS\nQNjRlCRJ0kDY0ZQkSdJA2NGUJEnSQNjRlCRJ0kDY0ZQkSdJA/H/gPwPdadvI6wAAAABJRU5ErkJg\ngg==\n", "text": [ "" ] } ], "prompt_number": 92 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Optionally try with the book files too, using functions we made" ] }, { "cell_type": "code", "collapsed": false, "input": [ "books = !ls data/books\n", "books" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 56, "text": [ "['anderson.txt',\n", " 'grimms.txt',\n", " 'irishfairy.txt',\n", " 'lovecraft.txt',\n", " 'mrjames.txt',\n", " 'poe.txt']" ] } ], "prompt_number": 56 }, { "cell_type": "code", "collapsed": false, "input": [ "booktexts = load_texts(books, 'data/books/')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 57 }, { "cell_type": "code", "collapsed": false, "input": [ "bookdocs = make_pattern_docs(booktexts)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 58 }, { "cell_type": "code", "collapsed": false, "input": [ "booktfidf = Model(documents=bookdocs, weight=TFIDF)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 59 }, { "cell_type": "code", "collapsed": false, "input": [ "booktfidf.docs" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 60, "text": [ "[Document(id='P46Xguc-49', name='lovecraft.txt', type='l'),\n", " Document(id='P46Xguc-50', name='irishfairy.txt', type='i'),\n", " Document(id='P46Xguc-51', name='poe.txt', type='p'),\n", " Document(id='P46Xguc-52', name='anderson.txt', type='a'),\n", " Document(id='P46Xguc-53', name='grimms.txt', type='g'),\n", " Document(id='P46Xguc-54', name='mrjames.txt', type='m')]" ] } ], "prompt_number": 60 }, { "cell_type": "code", "collapsed": false, "input": [ "booknames = [doc.name for doc in booktfidf.docs]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 87 }, { "cell_type": "code", "collapsed": false, "input": [ "booktfidf.export('data/csv/books_tfidf.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 61 }, { "cell_type": "code", "collapsed": false, "input": [ "bookweights = read_weka_tfidf('data/csv/books_tfidf.tsv')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 62 }, { "cell_type": "code", "collapsed": false, "input": [ "dist = make_dend(bookweights, labels=booknames)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAAZ8AAAFrCAYAAAAO4YSbAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAGgxJREFUeJzt3HmUZWV97vHvIy0BvAzd4gS0tqLxGi4gIIgiiiGC4BAV\nFa8GBQ0hIIgRDMqVYTkgcCNeM1wVRERERRIcQEXB0IJNI0PTdAOCAUkuBpdGaExLZP7dP2oXHooa\nTlX3eau6+vtZq1bts/e73/17N137Oe8+m5OqQpKklh433QVIktY+ho8kqTnDR5LUnOEjSWrO8JEk\nNWf4SJKamzPexiQ+hy1JU1BVme4aZrIJZz5VNemf4447bkr7zZaftX38ngPHv7aPXxPztpskqTnD\nR5LU3EDCZ7fddhtEt2uMtX384Dlw/LtNdwma4TLe/ckk5f1LSZqcJJQPHIzL226SpOYMH0lSc4aP\nJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn\n+EiSmjN8JEnNGT6SpOYMH0magZLskORT01zDxkkO7qPdtkn2mkzfho8kzTBJ5lTVNVV1+DSXMhc4\npI922wF7T6Zjw0eSBizJgiQ3JTkjyc1Jzk6yR5LLk/w0yY5Jjk9yVpIfAV9M8rIk53f779S1XZJk\nUZI/7Nbvn+QbSb6f5LYkhyY5smu3OMncrt2WSb6b5OoklyZ5brf+TUmWJ1ma5IejlH4isGWSa5Oc\nnOR1SS7u9n1aN5b5wIeBfbt2b+rnnMxZ5bMqSerHlsA+wI3AVcC+VfXiJK8FjgaWAs8Ddqmq+5Ls\n1rPvT4Bdq+qhJH8CnAC8sdu2FfB8YH3gVuD9VbV9klOAtwOfAk4FDqqqW5K8EPi/wO7AMcAeVfWL\nJBuNUvNRwFZVtd3wiiT7JDkU2BM4tqpuT3IMsENVvaffk2H4SFIbt1XVDQBJbgAu7tZfDyxgKHy+\nWVX3jbLvJgzNhp4NFI++dl9SVfcA9yS5Gzi/W78c2CbJE4AXA+cmGd5n3e73IuDMJF8DzhvluBll\n3WHADcDlVXVOT7vR2o5pjQyfefNgxYrprkKSJqU3VB4G7u9ZHr4W/9cY+34E+EFVvT7JM4CF4/R7\nX8/yHIY+XlnRO3sZVlUHJ9kJeBVwTZIdququCcYxH3gIeEqSVFUxFIiTskaGz4oVUJMeqiS1kUnN\nAfqyEXBHt3xAv2UAVNXK7vOgN1bVP2Zo+rN1VS1LsmVVXQlc2T2ttgXQGz4rgQ0f6TCZA5wOvAXY\nH3gf8ImR7frhAweS1MbIt8yjva4xXp8MfDzJEmCdnvWj7TPa/m8D3pVkKUO3+V473G+SZUmWA4u6\nQNosybcBqupOYFH3UMLJwAeBS6vqcoaC58+7hxcuAf5oMg8cpMaZQvx+RjWzJM58JM1cSaiqVZr/\nJNkHeHVV9TvTWaOskbfdJGk2656A+yj932Jb4zjzkaTVbHXMfGY7P/ORJDVn+EiSmjN8JEnNGT6S\npOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1Jzh\nI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKm5aQ2fefMgmfyPJGnNlqoae2NS421f5YMH\nptL9VPeTpBaSUFW+VR6Ht90kSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKaM3wk\nSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnD\nR5LUnOEjSWugJC9L8qI+2r0jydNa1DQZho8krZleDry4j3b7A5sNtpTJM3wkacCSLEhyU5IvJbkx\nyblJ1k+ye5IlSZYlOT3Jul37HZIsTHJ1kguTPHVkf8BBwF91+78kyTeS7NdtP6g71j7AC4Czu3br\ntR352FJVY29Marztq3zwwFS6n+p+ktRCEqoqPa8XAD8DdqmqxUlOB24D/gL446q6JcmZwBLgH4BL\ngddU1Z1J9gX2qKp3jTjGccDKqjqle/1kYBHwTuBzwAur6u4klwBHVNWSwY56cuZMdwGStJa4vaoW\nd8tfAo4BflZVt3TrzgTeDVwMbAVcnARgHeCOMfp8JOCq6ldJjgX+GXhdVd09WruZYo0Mn7lzh2Y/\nkrQG6b1fE+Bu4Ikj1g3/vqGq+vk8Z6RtgF8Dm49z7BlhjQyfu+6a7gokaWxjvDl+epKdq+oK4K3A\n1cBBSbasqluB/YCFwM3Ak4bbJnk88JyqunFEfyuBjX5/zOwEvBLYHvhhku9X1b+ObDdT+MCBJLVx\nM/DuJDcCGwOnAAcA5yZZBjwIfKaqHgDeCJyUZClwLfAieORBgoO6/s4HXt89SPBS4FTggKr6BXAE\n8Pmu3ReAz/jAwaP698EBSbPPGA8cnF9VW09bUTOMMx9JasO32j2c+UjSajZy5qPHcuYjSWrO8JEk\nNWf4SJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnDR5LU3EDDZ968of+RdKwfSdLaaaDfcDDRNxj4DQeS\nZiO/4WBi3naTJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8\nJEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5\nw0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiS\nmjN8JEnNGT6SNGBJfjvdNfRKsm6Si5MsSfLmJB8cp+17k6w/QX8bJzl4MjUYPpI0eNX6gEnWGWfz\n9kBV1fZV9TXg6HHaHg5sMMHh5gKHTKY+w0eSGsmQ/51keZJlSd7crf9Kkr172n0hyRuSPK5rf2WS\n65L8RU+bo7o+liY5oVu3MMknk1wFHJ7k1Umu6GY4FyV5cpInA18CdkxybZKvAet3y2eNqPc9wGbA\nJUl+kOTpSX6a5IldbZcleQXwcWDLro+T+joXVWMHcpIab/uEnQfG232i7ZK0JkpCVaXn9cqq2jDJ\nPsBBwJ7Ak4CrgBcCOwOvq6r9k6wL3AI8B3gH8KSq+liSPwB+BLwJeB7wIWD3qro3ySZVdXeSS4Ab\nqurQ7ribVNXd3fKfA/+9qo5M8jLgyKp6TW99Y4zlNmCHqrqre/2urv6rgGdV1cFJngFcUFVb93uO\n5vTbUJK0yl4CfLl7V/+rJD8EdgS+C3yqC569gB9W1X1J9gC2TvLGbv+NGAql3YHPV9W9AMMB0zmn\nZ3l+N7N5KrAu8LNufZiiqjq9m7EdBGw71f6mNXzmzh2a/UjSWqJ49IU6DH32cl+ShQzNKN4MfKWn\nzaFVdVFvJ0n2ZOwL/j09y38H/E1VXdDNdo5ftfIhyQbAFgyNZcMRx+vbtIbPXXdN59ElaTDGeVN9\nGXBQkjOBJwK7Akd0284BDgR2YOh2G8D3gEOSXFJVDyb5Q+DnwEXAsUnOrqrfJZlbVSuGD99zvI2A\nO7rl/ccp+YEkc6rqwVG2rez6Gb5inwScBfw/4DTgNV2bUW/bjcUHDiRp8Aqgqr4OLAOuA34AvL+q\nftW1+T7wUuCinhD4HHAjsCTJcuDTwDpV9T3gW8DVSa7l9wH2yLE6xwPnJrka+I+ebTWi3anAsuEH\nDpJ8O8lTe7Zd2D1w8FKGwvGkqvoycH+Sd1TVncCi7kGKmf/AgSTNRiMfONBjOfORJDVn+EiSmjN8\nJEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5\nw0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiS\nmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aP\nJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eSZogkByXZb5qOfXQf\nbZ6R5H+uluNV1XgHqvG2T9h5YBV2l6Q1UhKqKpPcZ52qemhQNfVx/JVVteEEbXYDjqiq16zq8Zz5\nSFIDSY5JclOSy5J8OckRSS5J8skkVwGHJzk+yRFd+4VJTklyVZKfJNkxydeT/DTJR7o2C7o+z0hy\nc5Kzk+yRZFHXbseu3cuSXNv9LEny30bUdiKwfrf9rCQvSHJdkj9I8oQk1yfZCjgR2LVrd/iqnI85\nq7KzJGliXQi8AdgGWBdYAlzTbX58VQ2HxHHA8P2iAu6rqh2TvAf4JrAdsAK4NckpXbstgX2AG4Gr\ngH2rapckrwWOBl4PHAEcUlWLk2wA3NdbX1V9IMm7q2q7npq/BXwUWB84q6puSHIUcKQzH0laM+wC\nfKOq7q+q3wLn92w7Z5z9vtX9vh64vqp+WVX3Az8D5nfbbquqG7rPSG4ALu7ZZ0G3vAj4ZJLDgLl9\n3t77MLAH8ALg5G7dpG4ljmetmPnMmwcrVkx3FZLWYsXYF+7/Gme/4RnKwzx6tvIwv79+j1x//8g2\nVXVSkguAVwGLkuxZVTdPUPOmwBOAdRia/YxX56StFeGzYoUPPkhqJ4+NmUXAZ5N8HHg88Grg1LF2\nX/31ZMuqugG4obsF+FxgZPg8kGROVT3Yvf4s8CHgWcBJwGHASmDchxL6tVaEjyRNp6q6uvsMZRnw\nS2A58BuGZkQj3xqP9lZ5tHZjta9Rlg9P8nKGZkPXA98FSHJtz+c8pwLLkiwBvsfQ501fTfI44PLu\nSbcfAQ8lWQqcUVWfGmfY41orHrWeKXVIWjuM9qh1kidU1T3dB/4/BA6sqqXTU+H0c+YjSW2cmuSP\ngPWAL6zNwQPOfCRptZvK/2S6tvFRa0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6S\npOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1Jzh\nI0lqzvCRJDVn+EiSmpsz3QWMZt48WLFiuquQJA1KqmrsjUmNt33CzgNT2X2q+7XqT5LGk4SqynTX\nMZN5202S1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn\n+EiSmjN8JEnNGT6SpOYMH0lSc4aPJKk5w0eS1JzhI0lqzvCRJDVn+EiSmjN8JEnNGT6SNGBJFk1l\nW7f9t2Osf1KSHye5Jsku4+x/WpLn9V/tI/ttnOTgPtptm2SvyfZv+EjSgFXVY8IhyZyxto3cfYz1\nuwPLqmqHqhozwKrqwKr6ySjHn+j6Pxc4ZII2ANsBe/fR7lEMH0kasOHZS5LdklyW5JvA9SO2PS3J\npUmuTbK8dzaT5KNJliZZnOTJSZ4PnAT8aZIlSdZL8ukkVyW5PsnxPfsuTLL98LGS/E2SpcD/SvL1\nnnavSHJeT9knAlt29Zyc5HVJLu6p9eYk84EPA/t27d7U7zmZM8lzKEmavN7Zy3bAVlX1byO2vRW4\nsKpO6GYlG3TrnwAsrqoPJTkJOLCqPpbkWGCHqnoPQJKjq2pFknWAi5NsXVXLRxx7A+CKqjqy2+cn\nSZ5YVXcCBwCn97Q9qqtzu+EVSfZJciiwJ3BsVd2e5JjeOvrlzEeS2rqyJ3getR44IMlxwNZVNfxZ\nz/1V9e1u+RpgQbec7mfYvkmuAZYAWwGjfc7zEPBPPa/PAvZLsgmwM/Ddnm29fQ87DPggcG9VnTNG\nHX1ZK2Y+c+dCJn1qJGkg7hltZVVdlmRX4NXAF5KcUlVnAQ/0NHuYUa7bSZ4JHAG8oKp+k+QMYL1R\nDnNvVfXOhM4AzgfuBb5WVQ9PUPt8hgLsKUnS9TXWZ1LjWivC5667prsCSWuTqbzZTfJ04N+r6nNJ\n1mPo9txZfe6+EUOh9p9JngLsBVwy0U5V9YskdwAfYugBhl4rgQ176pvD0G25twD7A+8DPjGyXb/W\nivCRpGlWYyz3vn45cGSSBxi6oL99jH1r5HJVXZfkWuAm4HbgR33UMezLwKZVdXOSzYDTqupVVXVn\nkkVJljN0O24lcGlVXZ5kGXBVkgsYCrkPdMc/oarOHec8PCKPnoGN2PjIrGpqEpjK7lPdT5JmgiRU\n1Rpxsz/J3wPXVNUZTY9r+EjS6rWmhE/3gMJK4BVV9cBE7VfrsQ0fSVq91pTwmU4+ai1Jas7wkSQ1\nZ/hIkpozfCRJzRk+kqTmDB9JUnOGjySpOcNHktTcQL/bzW+TliSNZqDfcDBVfsOBpDWZ33AwMW+7\nSZKaM3wkSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKaM3wkSc0ZPpKk5gwfSVJz\nho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEk\nNWf4SJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKaM3wkSc0ZPpKk5gwf\nSZrhkvx2BtSwcZKD+2i3bZK9Jmpn+EjSzFf9NkxnADXMBQ7po912wN4TNTJ8JKmBJF9PcnWS65Mc\n2K37bZKPJlmaZHGSJ3frn9m9XpbkoyP6eX+SK5Ncl+T4bt2CJDcnORNYDsxP8oUky7s+3tu1e36S\nK7p9z0uySbd+YZITk/y46+clowzhRGDLJNcmOTnJ65Jc3O3/tG6/+cCHgX27dm8a63wYPpLUxjur\n6gXAjsB7kswDNgAWV9XzgUuBA7u2nwL+oaq2Ae4Y7iDJHsCzq2onhmYYOyTZtdv87G6f/wE8Cdis\nqrbu+vh81+aLwPuraluGQuq4bn0B61TVC4H39qzvdRRwa1VtV1V/XVXfAH6R5FDgVODYqrodOAb4\natfu3LFOhuEjSW0cnmQpsBjYAngOcH9Vfbvbfg2woFt+MfCVbvlLPX3sAeyR5Nqu/XMZCh2Af6uq\nK7vlW4FnJfnbJHsCK5NsDGxcVZd1bc4EXtrT93nd7yU9dfQa7VbeYcAHgXur6pyedhPe9pszUYPp\nMHcuDOSOpSRNgyS7AbsDO1fVvUkuAdYDHuhp9jD9XZM/XlWnjuh/AXDP8OuqujvJNsArgb8E3gz8\n1ciyRry+r/v9UJ91AMzv2j8lSaqq6PPzqRkZPnfdNd0VSNLUjfLmeSNgRRc8zwN2nqCLRcBbgLOB\nt/Ws/x7wkSRnV9U9STYH7n/s8fNE4IGqOi/JT4EvVtV/JlmR5CVV9SNgP2DhJIa1Etiw5xhzgNO7\nOvcH3gd8YmS7sczI8JGkWeZC4C+T3AjczNCtN3j0LKF31nA48OUkRwHfHF5fVRd14bW4e6BtJfBn\nI/YF2Bw4I8nwRysf6H6/A/hMkg0YujV3wBj1FkCSzYDTqupVVXVnkkVJlgPf7Y59aVVdnmQZcFWS\nC4BLgA90twZPGOtznwzNkkb3+1mUJKlfSagqPzwYhw8cSJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnD\nR5LUnOEjSWrO8JEkNWf4SJKaM3wkSc0ZPpKk5gwfSVJzho8kqTnDR5LUnOEjSWrO8JEkNWf4SJKa\nM3wkSc0ZPpKk5gwfSVJzAwmfhQsXDqLbNcbaPn7wHDj+hdNdgmY4w2cA1vbxg+fA8S+c7hI0w3nb\nTZLUnOEjSWouVTX2xmTsjZKkMVVVpruGmWzc8JEkaRC87SZJas7wkSQ1N+XwSfL5JL9MsnycNn+b\n5F+SXJdku6kea6ZK8sokN3VjPGqU7ZsmuTDJ0iTXJ9l/GsocmInG37XZLcm13fgXNi5xoPoZf9du\nxyQPJnlDy/pa6ONv4G3d3/+yJIuSbDMddQ5Kn38Ds/o6OGVVNaUfYFdgO2D5GNv3Br7TLb8QuGKq\nx5qJP8A6wC3AAuDxwFLgeSPaHA98vFveFLgTmDPdtTcc/ybADcAWw+dguutuOf6edv8MXADsM911\nT8O/gRcBG3fLr5xN14E+xz+rr4Or8jPlmU9VXQasGKfJa4Ezu7Y/BjZJ8pSpHm8G2gm4par+taoe\nAL4K/OmINr8ANuqWNwLurKoHG9Y4SP2M/63AP1XVzwGq6teNaxykfsYPcBjwj8B/tCyukQnPQVUt\nrqrfdC9/DGzRuMZB6uffwGy/Dk7ZID/z2Ry4vef1z5ld//BGG9/mI9qcBmyV5A7gOuDwRrW10M/4\nnwPMS3JJkquT7NesusGbcPxJNmfoYvTpbtVse7S0n38Dvd4FfGegFbXVz/hn+3VwyuYMuP+Rz7nP\npj++fsZyNLC0qnZLsiVwUZJtq2rlgGtroZ/xPx7YHtgd2ABYnOSKqvqXgVbWRj/j/z/AB6qqkoTH\n/j2s6fr+e07ycuCdwC6DK6e5fsc/m6+DUzbI8Pl3YH7P6y26dbPFyPHNZ+hdTa8XAx8DqKpbk9wG\nPBe4ukmFg9XP+G8Hfl1VvwN+l+RSYFtgNoRPP+PfAfjqUO6wKbBXkgeq6lttShy4fs4B3UMGpwGv\nrKrxbtWvafoZ/2y/Dk7ZIG+7fQt4O0CSnYG7q+qXAzxea1cDz0myIMm6wL4MjbnXTcCfAHT3eZ8L\n/KxplYPTz/i/CbwkyTpJNmDoA9cbG9c5KBOOv6qeVVXPrKpnMvS5z8GzKHigj3OQ5OnAecCfVdUt\n01DjIPXzNzDbr4NTNuWZT5KvAC8DNk1yO3AcQ7dZqKrPVtV3kuyd5BbgHuCA1VHwTFFVDyY5FPge\nQ0+9nF5VP0lyULf9s8AJwBlJrmMo6P+6qu6atqJXo37GX1U3JbkQWAY8DJxWVbMifPr87z+r9XkO\njgXmAp/uZoAPVNVO01Xz6tTn38Csvg6uCr9eR5LUnN9wIElqzvCRJDVn+EiSmjN8JEnNGT6SpOYM\nH0lSc4aPJKk5w0eS1Nz/B7Ut48JymnrGAAAAAElFTkSuQmCC\n", "text": [ "" ] } ], "prompt_number": 89 }, { "cell_type": "code", "collapsed": false, "input": [ "make_heatmap_matrix(dist, method='complete')" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "display_data", "png": "iVBORw0KGgoAAAANSUhEUgAAApoAAAJaCAYAAACP5OdLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAH7FJREFUeJzt3X+wZnddH/D3Zy9GgoJkwcE2rKZKjGSGn7aLP1CxTevK\nVGNtpyHWOkVtM52JOp3apjKddpxpO+LYKbVBZqsRUVvTGa0Qq2uqFSlpERNJAsimJoV0NhtRYBcE\nSXQXPv3jXtbL5d67l8tz7nO++7xeM89wz/OcPc8nu5fd931/zzlPdXcAAGDRDi17AAAALk2CJgAA\nkxA0AQCYhKAJAMAkBE0AACYhaAIAMIknLHsAOChV5V5eAAypu2vZM+yHoMlKcd9YAEZTNWTGTGLp\nHACAiQiaAABMQtAEAGASgiYAAJNwMRCwUIcPJ2fPLnsKuDRdcUVy5syyp4C9K1fhsiqqqn2/T68q\n8dsM0/D/r9VUVcPe3sjSOQAAkxA0AQCYhHM0AbjkXMrnCg987+4dOff00uUcTVaGczQPhnPImAPf\nh2Px57U752gCAMAWls5hSSztjcXSHsCnz9I5K2NuS+eWisbiz2ss/rzG4s9rd5bOAQBgC0vnALBi\n5njqzpxOuXGqzOJYOmdlWDrnM+HPayz+vHbn92d3c/v9sXQOAABbCJoAAExC0AQAYBIuBgJmycUK\nu3OxAjACQROYpbNn53Uy/tzMKfQC7MTSOQAAkxA0AQCYhKAJAMAkBE0AACYhaAIAMAlBEwCASQia\nAABMQtAEAGASgiYAAJMQNAEAmISgCQDAJARNAAAmIWgCADAJQRMAgEkImgAATELQBABgEoImAACT\nEDQBAJiEoAkAwCQETQAAJiFoAgAwCUETAIBJCJoAAExC0AQAYBKCJgAAkxA0AQCYhKAJAMAkBE0A\nACYhaAIAMAlBEwCASQiaAABMQtAEAGASgiYAAJMQNAEAmISgCQDAJARNAAAmIWgCADAJQRMAgEkI\nmgAATELQBABgEoImAACTEDQBAJiEoAkAwCQETQAAJiFoAgAwCUETAIBJCJoAAExC0AQAYBKCJgAA\nkxA0AQCYhKAJAMAkBE0AACYhaAIAMAlBEwCASQiaAABMQtAEAGASgiYAAJMQNAEAmISgCQDAJARN\nAAAmIWgCADAJQRMAgEkImgAATELQBABgEoImAACTEDQBAJiEoAkAwCQETQAAJiFoAgAwCUETAIBJ\nCJoAAExC0AQAYBKCJgAAkxA0AQCYhKAJAMAkBE0AACYhaAIAMAlBEwCASQiaAABMQtAEAGASgiYA\nAJMQNAEAmISgCQDAJARNAAAmIWgCADAJQRMAgEkImgAATELQBABgEoImAACTEDQBAJiEoAkAwCQE\nTQAAJiFoAgAwCUETAIBJCJoAAExC0AQAYBKCJgAAkxA0AQCYhKAJAMAkBE0AACYhaAIAMAlBEwCA\nSQiaAABMQtAEAGASgiYAAJMQNAEAmISgCQDAJARNAAAmIWgCADAJQRMAgEkImgAATELQBABgEoIm\nAACTEDQBAJiEoAkAwCQETQAAJiFoAgAwCUETAIBJCJoAAExC0AQAYBKCJgAAk6juXvYMcCCqyjc7\nAEPq7lr2DPshaAIAMAlL5wAATELQBABgEoImAACTEDQBAJiEoAkAwCSesNuLbgfDZ2pOt2Pw/QzA\nqHb69/Qz/bdt6n+ndw2aGwNM+f5cwqpmkzEv+NyrXr7sEWbr0KGL/nWw0j72sT9d9giz5vtnd096\n4ucve4TZe++7/P28k6prdn39iUdetq/jPn7q9n39uk+HvxkAAAZWNd8zIec7GQAAQ9NoAgAMrGbc\nGwqaAAADs3QOAMDK0WgCAAxMowkAwMrRaAIADGyO963+BEETAGBo812gnu9kAAAMTaMJADAwFwMB\nALByNJoAAAObc6MpaAIADGzOH0E538kAABiaRhMAYGBzXjqf72QAAAxNowkAMDCNJgAAK0ejCQAw\nsDk3moImAMDAKrXsEXY03wgMAMDQNJoAAAOb89L5fCcDAGBoGk0AgIHNudEUNAEABjbnoDnfyQAA\nGJpGEwBgaPPtDec7GQAAQ9NocqCq6uEkf5TkY0nOdffR5U4EAGOb8zmagiYHrZO8pLvPLHsQALgU\nzDlozncyLmXz/awsAGBhBE0OWif59aq6p6r+/rKHAYDRVQ7t63EQLJ1z0L66u3+/qj4/ya9V1QPd\n/eZlDwUALN7kQfPw4eTs2anfhVF09+9v/O/7quoXkxxNImgCwD4t8hzNqjqW5FVJ1pL8RHe/csvr\nVyT5ySRfnOTxJN/Z3b+70/EmD5pnzybdU78Lc1S1dbuelGStuz9cVZ+T5K8l+cEljAYAl4za+g/u\n/o+zluTWJNclOZ3k7qq6o7tPbtrtFUne1t1/o6quSfLqjf235RxNDtIzkry5qu5L8tYk/627//uS\nZwIA1h1N8lB3P9zd55LcnuT6Lfs8O8kbk6S7/0+SqzZOh9uWczQ5MN39niTPX/YcAHApWeDS+ZVJ\nTm3afiTJi7bsc3+Sb01yV1UdTfJFSZ6Z5H3bHVCjCQBAsn5nmIv5oSRPrap7k9yc5N6sfwjLtjSa\nAAAD2+utih7/yLvz+Efes9sup5Mc2bR9JOut5gXd/eEk33nhvavek+TdOx1Q0AQAGNhel84vf/Kz\ncvmTn3Vh+4/+8De27nJPkqur6qokjya5IcmNn/xe9XlJHuvuP924H/abuvsjO72noAkAQLr7fFXd\nnOTOrN/e6LbuPllVN228fjzJtUl+qqo6yTuTfNduxxQ0AQAGtsj7aHb3iSQntjx3fNPXb0lyzV6P\n52IgAAAmodEEABjYQX1u+X7MdzIAAIam0QQAGNkCz9FcNEETAGBgi7wYaNHmOxkAAEPTaAIADKyq\nlj3CjjSaAABMQqMJADCwOd/eSNAEABiYi4EAAFg5Gk0AgJG5GAgAgFWj0QQAGNmMa0NBEwBgZJbO\nAQBYNRpNAICRaTQBAFg1Gk0AgJHNuDYUNAEABtaWzgEAWDUaTQCAkc230NRoAgAwDY0mAMDIDs23\n0hQ0AQBG5mIgAABWjUaTlbJ26LJljzBb5z/2+LJHYGBPWPvsZY8wa+9918uXPcLsfcG1r132COOa\nb6G5uEbz8OH15nbrAwCA1bSwRvPs2aT7U58XNgEAJjTji4GcowkAwCScowkAMLIZLx8LmgAAI5tv\nzrR0DgDANDSaAAAjczEQAACrRtAEABhZ7fOx3aGqjlXVA1X1YFXdss3rT6+qX62q+6rqnVX193Yb\nTdAEABhYV+3rsVVVrSW5NcmxJNcmubGqnr1lt5uT3Nvdz0/ykiT/tqp2PBVT0AQAIEmOJnmoux/u\n7nNJbk9y/ZZ9fj/JUza+fkqSD3T3+Z0O6GIgAICRLe5ioCuTnNq0/UiSF23Z58eT/EZVPZrkyUn+\n9m4HFDQBAFbAYx94II994IHddtnmw8Q/xSuS3NfdL6mqL0nya1X1vO7+8HY7C5oAACPbY6F5+dO/\nLJc//csubJ998A1bdzmd5Mim7SNZbzU3+6ok/zpJuvv/VtV7klyT5J7t3tM5mgAAI6va3+NT3ZPk\n6qq6qqouS3JDkju27PNAkuvW37aekfWQ+e6dRtNoAgCQ7j5fVTcnuTPJWpLbuvtkVd208frxJP8m\nyWur6v6sF5b/tLvP7HRMQRMAYGQL/GSg7j6R5MSW545v+vr9Sb5pz6MtbDIAANhEowkAMLL5ftS5\noAkAMLTtL+yZBUvnAABMQqMJADAyjSYAAKtGowkAMLIZ14aCJgDAyCydAwCwajSaAAAjm2+hqdEE\nAGAaGk0AgIH1Aj/rfNE0mgAATEKjCQAwMledQ1JVR6rqjVX1u1X1zqr63mXPBADDq30+DoBGk4N0\nLsk/6u77qupzk/xOVf1ad59c9mAAwOIJmhyY7n5vkvdufP2RqjqZ5M8nETQBYL9cDASfrKquSvKC\nJG9d7iQAwFQ0mhy4jWXzn0/yfd39kWXPAwBDm/HFQJMHzSuumPV/Pwesqj4ryS8k+dnufv2y5wGA\n4c04Z00eNM+cmfodmKutP2BUVSW5Lcm7uvtVy5gJADg4ztHkIH11km9P8vVVde/G49iyhwKAoR2q\n/T0OgHM0OTDdfVf8cAMAK0PQBAAY2YxvbyRoAgAMrOebMy1jAgAwDY0mAMDIZrx0rtEEAGASGk0A\ngJHN+JNxBE0AgJFZOgcAYNVoNAEARjbj2nDGowEAMDKNJgDAyGZ8MZBGEwBgZIdqf49tVNWxqnqg\nqh6sqlu2ef37q+rejcc7qup8VT11x9EW+J8JAMCgqmotya1JjiW5NsmNVfXszft094909wu6+wVJ\nfiDJb3b3B3c6pqVzAICB9eKWzo8meai7H06Sqro9yfVJTu6w/7cl+bndDrjvRvPw4fVTAj7xAABg\naFcmObVp+5GN5z5FVT0pyTck+YXdDrjvRvPs2aR78xvu90gAAOzb4k6E7IvvcsE3Jblrt2XzxNI5\nAMBKeOzUO/L4I+/cbZfTSY5s2j6S9VZzOy/LRZbNE0ETAGBse/wIysu/6Lm5/Iuee2H7Q2/9L1t3\nuSfJ1VV1VZJHk9yQ5MatO1XV5yX52qyfo7krQRMAYGQLOn+xu89X1c1J7kyyluS27j5ZVTdtvH58\nY9dvSXJndz92sWMKmgAAJEm6+0SSE1ueO75l+3VJXreX4wmaAAAj2+PS+TK4YTsAAJPQaAIAjGy+\nhaagCQAwsrZ0DgDAqtFoAgCMTKMJAMCq0WgCAIxsQTdsn4KgCQAwshmvT894NAAARqbRBAAYmaVz\nmIc/OfehZY8Al6QzD75i2SPM2uGr//2yR5i98x/7k2WPwAQETQCAkc349kaCJgDAyGYcNF0MBADA\nJDSaAAAD6xlfDKTRBABgEhpNAICRzbg2FDQBAEZm6RwAgFWj0QQAGJnbGwEAsGo0mgAAI9NoAgCw\najSaAAAjm2+hKWgCAIysLZ0DALBqNJoAACNzw3YAAFaNRhMAYGQzPkdT0AQAGNl8c6alcwAApqHR\nBAAY2KEZ14YzHg0AgJFpNAEABjbjuxtpNAEARla1v8f2x6pjVfVAVT1YVbfssM9LqureqnpnVf3m\nbrNpNAEASFWtJbk1yXVJTie5u6ru6O6Tm/Z5apJXJ/mG7n6kqp6+2zEFTQCAgdXi1s6PJnmoux/e\nOO7tSa5PcnLTPt+W5Be6+5Ek6e7373ZAS+cAACTJlUlObdp+ZOO5za5Ocriq3lhV91TV393tgBpN\nAICB7bXQfOz37s1jv3ffbrv0Hg7zWUlemOSvJHlSkrdU1W9194Pb7SxoAgAMbK9B80nXvCBPuuYF\nF7bP/vJPbd3ldJIjm7aPZL3V3OxUkvd392NJHquq/5nkeUm2DZqWzgEASJJ7klxdVVdV1WVJbkhy\nx5Z93pDkxVW1VlVPSvKiJO/a6YAaTQCAgdWCasPuPl9VNye5M8laktu6+2RV3bTx+vHufqCqfjXJ\n25N8PMmPd7egyTxU1bEkr8r6N/BPdPcrlzwSALChu08kObHlueNbtn8kyY/s5XiWzjkwm+7PdSzJ\ntUlurKpnL3cqABjbIm/YvmgaTQ7SXu7PBQB8Gg75CEpIsrf7cwEAlwiNJgdpL/fnAgA+DQe1DL4f\nCwuaV1wx7/9QZmEv9+cCAC4RCwuaZ84s6khcKrb5wePC/bmSPJr1+3PdeKBDAcAlZs5Fn6VzDsxO\n9+da8lgAwEQETQ7UdvfnAgD2r2ZcaQqaAAADW9QnA01hxqMBADAyjSYAwMBmvHKu0QQAYBoaTQCA\ngc250RQ0AQAGNuegaekcAIBJaDQBAAZ2SKMJAMCq0WgCAAxszudoCpoAAAObc9C0dA4AwCQ0mgAA\nA6sZXw2k0QQAYBIaTQCAgc35HE1BEwBgYHMOmpbOAQCYhEYTAGBgGk0AAFaORhMAYGAzvruRoAkA\nMDJL5wAArByNJgDAwGrGteGMRwMAYGQaTQCAgTlHEwCAlSNoAgAMrKr29djhWMeq6oGqerCqbtnm\n9ZdU1Yeq6t6Nxz/fbTZL5wAAA1vU0nlVrSW5Ncl1SU4nubuq7ujuk1t2fVN3f/NejqnRBAAgSY4m\neai7H+7uc0luT3L9NvvtOdoKmgAAA6va32MbVyY5tWn7kY3nNuskX1VV91fVr1TVtbvNZukcAIBk\nPURezNuSHOnuj1bVNyZ5fZIv3WlnQRMAYGB7PUfz7Nvflg++497ddjmd5Mim7SNZbzUv6O4Pb/r6\nRFX9WFUd7u4z2x1Q0GSlPO3zrln2CLP1WU/4nGWPMGvvuXe705T4hMu/8F8ue4RZWzv02csegUvY\noT0Gzac974V52vNeeGH7//3n127d5Z4kV1fVVUkeTXJDkhs371BVz0jyh93dVXU0Se0UMhNBEwCA\nJN19vqpuTnJnkrUkt3X3yaq6aeP140n+VpJ/WFXnk3w0yct2O6agCQAwsL02mnvR3SeSnNjy3PFN\nX786yav3PNviRgMAgD+j0QQAGNih2svF4sshaAIADGyRS+eLZukcAIBJaDQBAAY259ZwzrMBADAw\njSYAwMBcDAQAwCRcDAQAwMrRaAIADGzOreGcZwMAYGAaTQCAgc35HE1BEwBgYDXjq84tnQMAMAmN\nJgDAwOa8dK7RBABgEhpNAICBzbk1nPNsAAAMTKMJADAwn3UOAMAkXAwEAMDK0WgCAAxszq3hnGcD\nAGBgGk0AgIHN+RxNQRMAYGBzvurc0jkAAJPQaAIADGzOS+caTQAAJiFocuCqaq2q7q2qX1r2LAAw\nukP7fBwES+csw/cleVeSJy97EAAYnYuBYENVPTPJS5P8RJIZn1UCAHymNJoctH+X5J8kecqyBwGA\nS4GLgSBJVf31JH/Y3fdGmwkAl7ylNJqHDydnzy7jnVmyr0ryzVX10iRPTPKUqvrp7v6OJc8FAMOa\nc6O5lKB59mzS8z1vlQWpLd/43f2KJK9Yf62+Lsn3C5kA8JmZ8/L0nGfj0ufHDQCYkao6VlUPVNWD\nVXXLLvv9pao6X1XfutvxXAzEUnT3m5K8adlzAMDoFnV7o6paS3JrkuuSnE5yd1Xd0d0nt9nvlUl+\nNRe55kKjCQBAkhxN8lB3P9zd55LcnuT6bfb7niQ/n+R9FzugoAkAMLBDtb/HNq5McmrT9iMbz11Q\nVVdmPXy+ZuOpXetUS+cAAAPba2t46nfuy6m33b/bLntZg39Vkn/W3V1VlYssnQuaAAAr4MiXPz9H\nvvz5F7bfcttPb93ldJIjm39J1lvNzb48ye3rGTNPT/KNVXWuu+/Y7j0FTQCAgS3wPpr3JLm6qq5K\n8miSG5LcuHmH7v7iT3xdVa9N8ks7hcxE0AQAIEl3n6+qm5PcmWQtyW3dfbKqbtp4/fine0xBEwBg\nYLWg2xslSXefSHJiy3PbBszufvnFjueqcwAAJqHRBAAYmM86BwBgEnNenp7zbAAADEyjCQAwsEV9\n1vkUNJoAAExCowkAMDAXAwEAMIk5B01L5wAATEKjCQAwsLVlD7ALjSYAAJPQaAIADGzOtzcSNAEA\nBuZiIAAAVo5GEwBgYBpNAABWjkYTAGBgazNuNAVNAICBWToHAGDlaDQBAAY25/toajQBAJiERhMA\nYGBzPkdT0AQAGNjasgfYhaVzAAAmodEEABjYSi2dHz6cnD276KPCYjz2+AeWPcJsPfLgty17hFn7\nCy94w7JHgEva2tplyx6BCSw8aJ49m/RFrrKvGSdvAICRuL0RAAArxzmaAAAD81nnAABMYs4XA1k6\nBwBgEhpNAICBaTQBAFg5Gk0AgIFpNAEAmMRa9b4e26mqY1X1QFU9WFW3bPP69VV1f1XdW1W/U1V/\nebfZNJoAAKSq1pLcmuS6JKeT3F1Vd3T3yU27/Xp3v2Fj/+ck+cUkz9rpmBpNAICBHdrnYxtHkzzU\n3Q9397kktye5fvMO3f3HmzY/N8n7LzYbAABcmeTUpu1HNp77JFX1LVV1MsmJJN+72wEtnQMADGyv\nFwP97m+9Pe9669t322VPH5re3a9P8vqq+pokP5Pkmp32FTQBAAa216D5nK98bp7zlc+9sP3zP/qf\ntu5yOsmRTdtHst5qbqu731xVT6iqp3X3B7adbW+jAQBwibsnydVVdVVVXZbkhiR3bN6hqr6kqmrj\n6xcmyU4hM9FoAgAMbadbFX26uvt8Vd2c5M4ka0lu6+6TVXXTxuvHk/zNJN9RVeeSfCTJy3Y7pqAJ\nAECSpLtPZP0in83PHd/09Q8n+eG9Hk/QBAAY2Jw/GUjQBAAY2JyDpouBAACYhEYTAGBgGk0AAFaO\nRhMAYGBrM240BU0AgIEdWtB9NKdg6RwAgEloNAEABjbn1nDOswEAMDCNJgDAwNzeCACAlaPRBAAY\n2Jxvb6TR5MBU1U9W1R9U1TuWPQsAXCoOVe/rcSCzHci7wLrXJjm27CEAgINh6ZwD091vrqqrlj0H\nAFxKXAwEAMDK0WgCAAxszo3mUoLmFVckNePfFACAUcx5eXopQfPMmWW8KwfNDxMAsNrmHIK5xFTV\nzyX530m+tKpOVdXLlz0TAIyuan+Pg+AcTQ5Md9+47BkAgIMjaAIADGzOZ6oJmgAAA5vzNRHO0QQA\nYBIaTQCAgc25NZzzbAAADEyjCQAwsKpe9gg7EjQBAAY242uBLJ0DADANjSYAwMDc3ggAgJWj0QQA\nGNiMC01BEwBgZIdmnDQtnQMAkCSpqmNV9UBVPVhVt2zz+t+pqvur6u1V9b+q6rm7HU+jCQAwsEUV\nmlW1luTWJNclOZ3k7qq6o7tPbtrt3Um+trs/VFXHkvzHJF+x0zH33WheccX6VU5bHwAADOlokoe6\n++HuPpfk9iTXb96hu9/S3R/a2HxrkmfudsB9N5pnzmz/vLAJAHBwFpi9rkxyatP2I0letMv+35Xk\nV3Y7oKVzAACSZM+fZVlVX5/kO5N89W77CZoAAAPba6H523e9I7991zt22+V0kiObto9kvdX85Pdb\nvwDox5Mc6+6zux1Q0AQAGNheg+aLXvycvOjFz7mw/WOv/Lmtu9yT5OqquirJo0luSHLjJ71X1Rcm\n+a9Jvr27H7rYewqaAACku89X1c1J7kyyluS27j5ZVTdtvH48yb9IckWS19T6yaHnuvvoTscUNAEA\nBrbIG7Z394kkJ7Y8d3zT19+d5Lv3PNviRgMAgD+j0QQAGNic7ywpaAIADKxqz3clOnCWzgEAmIRG\nEwBgYHNeOtdoAgAwCY0mAMDAFvhZ5wsnaAIADGzOy9Nzng0AgIFpNAEABjbnpXONJgAAk9BoslI+\n8OD3LHuE2Xra1f9h2SPM2uVPfNqyR2BgT1h74rJHmL2PffxPlz3CsGZcaAqaAAAjs3QOAMDK0WgC\nAAxsxoWmRhMAgGloNAEABnZoxpWmoAkAMLAZ50xL5wAATEOjCQAwsKpe9gg70mgCADAJjSYAwMCc\nowkAwMrRaAIADGzOH0EpaAIADGzGOdPSOQAA09BoAgAMbM6t4ZxnAwBgYBpNAICBuRgIAICJzDdp\nWjoHAGASGk0AgIGVRhMAgFWj0QQAGFjVfHtDQRMAYGiWzgEAmLmqOlZVD1TVg1V1yzavf1lVvaWq\nHq+qf3yx42k0AQAGtqiLgapqLcmtSa5LcjrJ3VV1R3ef3LTbB5J8T5Jv2csxNZoAACTJ0SQPdffD\n3X0uye1Jrt+8Q3e/r7vvSXJuLwcUNAEAhlb7fHyKK5Oc2rT9yMZz+2bpHABgYHu96vzNb7o/d73p\n/t126YUMtImgCQCwAr7m656Xr/m6513Y/qF/9bNbdzmd5Mim7SNZbzX3zdI5AMDQFrZ0fk+Sq6vq\nqqq6LMkNSe7Y5U0vSqPJgamqJyZ5U5LPTnJZkjd09w8sdyoAIEm6+3xV3ZzkziRrSW7r7pNVddPG\n68er6guS3J3kKUk+XlXfl+Ta7v7IdscUNDkw3f14VX19d3+0qp6Q5K6qenF337Xs2QBgVIv8rPPu\nPpHkxJbnjm/6+r355OX1XQmaHKju/ujGl5dl/aelM0scBwCGt8iguWjO0eRAVdWhqrovyR8keWN3\nv2vZMwEA0xA0OVDd/fHufn6SZyb52qp6yZJHAoDBHdrnY3oLXzq/4oqk5tvgMhPd/aGq+uUkfzHJ\nby55HABgAgsPmmeccceGrT9wVNXTk5zv7g9W1eVJ/mqSH1zCaABwyagZN3wuBuIg/bkkr6v1jzA4\nlORnuvt/LHkmAGAigiYHprvfkeSFy54DAC4tGk0AACbg9kYAAKwcjSYAwNDm2xvOdzIAAIam0QQA\nGNicz9EUNAEABjbn+2haOgcAYBIaTQCAoWk0AQBYMRpNAICB1Yx7Q0ETAGBols4BAFgxGk0AgIG5\nvREAACtHowkAMLT5NpqCJgDAwOZ81fl8JwMAYGgaTQCAoc136VyjCQDAJDSaAAADqxk3moImAMDA\n3EcTAICVo9EEABjafHvD+U4GAMDQNJoAAAOb88VAGk0AACah0QQAGJpGEwCACVTVvh47HOtYVT1Q\nVQ9W1S077POjG6/fX1Uv2G02QRMAgFTVWpJbkxxLcm2SG6vq2Vv2eWmSZ3X31Un+QZLX7HZMQRMA\nYGiH9vn4FEeTPNTdD3f3uSS3J7l+yz7fnOR1SdLdb03y1Kp6xm6TAQDAlUlObdp+ZOO5i+3zzJ0O\n6GIgAICBLfD2Rr3nt9zjr7to0Jzz52fCp6vqmmWPAKygx5c9AJe0Bf7bdjrJkU3bR7LeWO62zzM3\nntvWrkGzu6VMLhm+nwG41Cz437Z7klxdVVcleTTJDUlu3LLPHUluTnJ7VX1Fkg929x/sdEBL5wAA\npLvPV9XNSe5Mspbktu4+WVU3bbx+vLt/papeWlUPJfnjJC/f7ZjVvdfleAAA2DtXnQMAMAlBEwCA\nSQiaAABMQtAEAGASgiYAAJMQNAEAmISgCQDAJARNAAAm8f8BX56fvJi/U/0AAAAASUVORK5CYII=\n", "text": [ "" ] } ], "prompt_number": 90 }, { "cell_type": "markdown", "metadata": {}, "source": [ "##KMeans clustering on the set is easy in pattern too..." ] }, { "cell_type": "code", "collapsed": false, "input": [ "kmeans = mtfidf.cluster(method=KMEANS, k=5)\n", "from pattern.vector import centroid\n", "import operator\n", "# For each cluster center, look at the most important features.\n", "for i in range(5):\n", " print i\n", " print sorted(centroid(kmeans[i]).items(), key=operator.itemgetter(1))[0:10]\n", " print" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "0\n", "[(u'angry', 0.00013166062899110716), (u'kingdom', 0.00013166062899110716), (u'directly', 0.00014974707549083753), (u'killed', 0.000159956090569813), (u'lifted', 0.0001711388013424306), (u'filled', 0.0001711388013424306), (u'happen', 0.0001711388013424306), (u'field', 0.00018069944297765), (u'power', 0.00018350071458059924), (u'creature', 0.00018350071458059924)]\n", "\n", "1\n", "[(u'pas', 2.976767638983525e-05), (u'kept', 3.171778571726749e-05), (u'thank', 3.8534321488072485e-05), (u'fallen', 4.122829938216276e-05), (u'aside', 4.753553535300498e-05), (u'built', 4.753553535300498e-05), (u'laugh', 4.753553535300498e-05), (u'take', 5.130985012857345e-05), (u'tower', 5.130985012857345e-05), (u'milk', 5.130985012857345e-05)]\n", "\n", "2\n", "[(u'stone', 2.7289350720252417e-05), (u'bush', 3.307147399482816e-05), (u'bottom', 3.5326123547017947e-05), (u'coat', 3.5326123547017947e-05), (u'promised', 3.7795812703190714e-05), (u'follow', 3.7795812703190714e-05), (u'fallen', 3.7795812703190714e-05), (u'hundred', 3.7795812703190714e-05), (u'straight', 3.7795812703190714e-05), (u'meant', 3.7795812703190714e-05)]\n", "\n", "3\n", "[(u'jumped', 0.00013121993719606768), (u'clothe', 0.00013809336247776646), (u'afraid', 0.00013979242907895495), (u'rich', 0.0001565430829707474), (u'fly', 0.00016324393598442835), (u'charming', 0.00016324393598442835), (u'joy', 0.00016676991539243746), (u'chamber', 0.00016698564596595334), (u'danced', 0.00017416680191162777), (u'dres', 0.00018604063317774515)]" ] }, { "output_type": "stream", "stream": "stdout", "text": [ "\n", "\n", "4\n", "[(u'glad', 0.00015192739693528455), (u'quickly', 0.00016198455051392364), (u'sitting', 0.00016593539320938522), (u'standing', 0.00017261670309720806), (u'tear', 0.00017261670309720806), (u'able', 0.0002006765668818562), (u'light', 0.0002031540553618791), (u'led', 0.00020873205745744166), (u'hard', 0.00020873205745744166), (u'warm', 0.00020873205745744166)]\n", "\n" ] } ], "prompt_number": 63 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### It would take some work to plot that, though. Exercise for the students!\n", "\n", "Relevant links:\n", "* http://stackoverflow.com/questions/20176590/plot-the-centroid-values-over-the-existing-plot-using-matplotlib\n", "* http://glowingpython.blogspot.jp/2012/04/k-means-clustering-with-scipy.html" ] }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 63 } ], "metadata": {} } ] }