{
 "metadata": {
  "gist_id": "7c54600b9c2af68914b3",
  "name": "",
  "signature": "sha256:47c50ff502afb39876f531313aa7c66924c6e9a01b725345c7f390300fccf3b2"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Clustering Related Posts"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is a series of notebooks (in progress) to document my learning, and hopefully to help others learn machine learning. I would love suggestions / corrections / feedback for these notebooks.\n",
      "\n",
      "<a target=\"_parent\" href=\"http://rmdk.ca\">Visit my webpage for more</a>. \n",
      "\n",
      "Email me: <a href=\"mailto:email.ryan.kelly@gmail.com?Subject=Hey\" target=\"_top\">email.ryan.kelly@gmail.com</a>\n",
      "\n",
      "I'd love for you to share if you liked this post."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "social()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "\n",
        "    <a style='float:left; margin-right:5px;' href=\"https://twitter.com/share\" class=\"twitter-share-button\" data-text=\"Check this out\" data-via=\"Ryanmdk\">Tweet</a>\n",
        "<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>\n",
        "    <a style='float:left; margin-right:5px;' href=\"https://twitter.com/Ryanmdk\" class=\"twitter-follow-button\" data-show-count=\"false\">Follow @Ryanmdk</a>\n",
        "<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>\n",
        "    <a style='float:left; margin-right:5px;'target='_parent' href=\"http://www.reddit.com/submit\" onclick=\"window.location = 'http://www.reddit.com/submit?url=' + encodeURIComponent(window.location); return false\"> <img src=\"http://www.reddit.com/static/spreddit7.gif\" alt=\"submit to reddit\" border=\"0\" /> </a>\n",
        "<script src=\"//platform.linkedin.com/in.js\" type=\"text/javascript\">\n",
        "  lang: en_US\n",
        "</script>\n",
        "<script type=\"IN/Share\"></script>\n"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 123,
       "text": [
        "<IPython.core.display.HTML at 0x11f0b52d0>"
       ]
      }
     ],
     "prompt_number": 123
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "This notebook covers or includes:   "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Introduction to word processing\n",
      "* Natural Language Learning Toolkit \n",
      "* KMeans Clustering text data"
     ]
    },
    {
     "cell_type": "heading",
     "level": 6,
     "metadata": {},
     "source": [
      "TO DO:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Measuring Similarity Between Text Messages:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\n",
      "This notebook will explore the idea of recommending news posts to a reader based their search query. To do this, we also have to introduce basic text processing. Clustering can be defined as classifying unlabelled data by a measurement of similarity.\n",
      "\n",
      "One of the most robust methods to quantify meaning in textual data is using the **bag-of-word** approach. For each word in the post, we count track the number of occurances in a vector (vectorization). In this way the data can be stored in an efficient matrix structure."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Preprocessing:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "First we have to convert the text into a `bag-of-words`. We can do this using scikit's builtin `CountVectorizer`. The input `min_df` determines how the function will treat words that are used infrequently. If set to an interger, all words occuring less than that amount will be dropped. If set to a fraction, all words that occur less than the fraction of the overall dataset will be dropped. There are also a lot of other options which will we get into later."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.feature_extraction.text import CountVectorizer\n",
      "\n",
      "vect = CountVectorizer(min_df=1)\n",
      "print vect"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CountVectorizer(analyzer=word, binary=False, charset=None, charset_error=None,\n",
        "        decode_error=strict, dtype=<type 'numpy.int64'>, encoding=utf-8,\n",
        "        input=content, lowercase=True, max_df=1.0, max_features=None,\n",
        "        min_df=1, ngram_range=(1, 1), preprocessor=None, stop_words=None,\n",
        "        strip_accents=None, token_pattern=(?u)\\b\\w\\w+\\b, tokenizer=None,\n",
        "        vocabulary=None)\n"
       ]
      }
     ],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We see that for now the counting is done at the word level (`analyzer = word`)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "content = ['how to open a beer without a bottle opener', \n",
      "           'Beer bottles or beer cans',]\n",
      "X = vect.fit_transform(content)\n",
      "\n",
      "vect.get_feature_names()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 3,
       "text": [
        "[u'beer',\n",
        " u'bottle',\n",
        " u'bottles',\n",
        " u'cans',\n",
        " u'how',\n",
        " u'open',\n",
        " u'opener',\n",
        " u'or',\n",
        " u'to',\n",
        " u'without']"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#Print the vectorized word occurances\n",
      "print X\n",
      "print X.toarray()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "  (0, 0)\t1\n",
        "  (1, 0)\t2\n",
        "  (0, 1)\t1\n",
        "  (1, 2)\t1\n",
        "  (1, 3)\t1\n",
        "  (0, 4)\t1\n",
        "  (0, 5)\t1\n",
        "  (0, 6)\t1\n",
        "  (1, 7)\t1\n",
        "  (0, 8)\t1\n",
        "  (0, 9)\t1\n",
        "[[1 1 0 0 1 1 1 0 1 1]\n",
        " [2 0 1 1 0 0 0 1 0 0]]\n"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Count vectors returned by `transform` are stored in the more memory efficient coordinate matrix format, we have to access the full standard vector for analysis though. \n",
      "\n",
      "Let's add some more data."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "posts = ['how to open a beer without a bottle opener', \n",
      "           'Do girls like beer bottles or beer cans?',\n",
      "           'where did all my beer go?',\n",
      "           'where did all my beer go? where did all my beer go?',\n",
      "           'recycling beer bottles and cans',\n",
      "           'Is it worth recycling?',\n",
      "           'do not bring bottles to my backyard party, only cans please.', \n",
      "           'This is useless']"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X_train = vect.fit_transform(posts)\n",
      "\n",
      "num_samples, num_features = X_train.shape\n",
      "\n",
      "print '#samples: {}, #features: {}'.format(num_samples, num_features)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "#samples: 8, #features: 31\n"
       ]
      }
     ],
     "prompt_number": 6
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "- Unsurprisingly, we have 8 posts with a total of 31 different words. Now we can vectorize our data.\n",
      "\n",
      "Let's vectorize a new post, then see how similar it is to our existing corpus."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "new_post = 'Opening beer bottles and cans 101'\n",
      "new_post_vect = vect.transform([new_post])\n",
      "\n",
      "print(new_post_vect).toarray()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "[[0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]\n"
       ]
      }
     ],
     "prompt_number": 7
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import scipy as sp\n",
      "\n",
      "def dists(v1, v2):\n",
      "    delta = v1-v2\n",
      "    # Calculate Euclidean \"norm\" distance\n",
      "    return sp.linalg.norm(delta.toarray())\n",
      "\n",
      "import sys\n",
      "\n",
      "def similarity(new_post_vector, corpus):\n",
      "    best_dist = 999\n",
      "    best_i = None\n",
      "    \n",
      "    for i in xrange(len(corpus.toarray())):\n",
      "        post = posts[i]\n",
      "        \n",
      "        if post == new_post:\n",
      "            continue\n",
      "        post_vec = corpus.getrow(i)\n",
      "        d = dists(post_vec, new_post_vector)\n",
      "        print 'Post %i with dist = %.2f: %s'%(i, d, post)\n",
      "        \n",
      "        if d < best_dist:\n",
      "            best_dist = d\n",
      "            best_i = i\n",
      "    print 'Best post is {} with dist = {}'.format(best_i, best_dist)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "similarity(new_post_vect, X_train)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Post 0 with dist = 3.00: how to open a beer without a bottle opener\n",
        "Post 1 with dist = 2.45: Do girls like beer bottles or beer cans?\n",
        "Post 2 with dist = 2.83: where did all my beer go?\n",
        "Post 3 with dist = 4.90: where did all my beer go? where did all my beer go?\n",
        "Post 4 with dist = 1.00: recycling beer bottles and cans\n",
        "Post 5 with dist = 2.83: Is it worth recycling?\n",
        "Post 6 with dist = 3.32: do not bring bottles to my backyard party, only cans please.\n",
        "Post 7 with dist = 2.65: This is useless\n",
        "Best post is 4 with dist = 1.0\n"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Great, our first text similarity measurement! We can see here that post 3 is most similar to our new post. However, we can see that `post 2` is \"closer\" to `post 3`, even though `post 3` is simply `post 2` doubled. It is clear the simple counts of words is too simple. The next step is to normalize the word counts to get vectors of unitless lengths to avoid this problem."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Update our dists function\n",
      "def dists(v1, v2):\n",
      "    v1_norm = v1/sp.linalg.norm(v1.toarray())\n",
      "    v2_norm = v2/sp.linalg.norm(v2.toarray())\n",
      "    delta = v1_norm-v2_norm\n",
      "    # Calculate Euclidean \"norm\" distance\n",
      "    return sp.linalg.norm(delta.toarray())"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "similarity(new_post_vect, X_train)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Post 0 with dist = 1.27: how to open a beer without a bottle opener\n",
        "Post 1 with dist = 0.86: Do girls like beer bottles or beer cans?\n",
        "Post 2 with dist = 1.26: where did all my beer go?\n",
        "Post 3 with dist = 1.26: where did all my beer go? where did all my beer go?\n",
        "Post 4 with dist = 0.46: recycling beer bottles and cans\n",
        "Post 5 with dist = 1.41: Is it worth recycling?\n",
        "Post 6 with dist = 1.18: do not bring bottles to my backyard party, only cans please.\n",
        "Post 7 with dist = 1.41: This is useless\n",
        "Best post is 4 with dist = 0.459505841095\n"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Great, posts 2 & 3 are now equally similar to our new post."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Removing Less Important Words:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "There are many words in language that do not carry much meaning in terms of the overall interpretation of the message. Words like \"it\" should be much less meaningful than \"beer\" in our current context. These less important words are called `stop words`, and can be removed from the posts since they do not help us distiguish between different posts."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#Add english stop words to our vectorizer object.\n",
      "vect = CountVectorizer(min_df=1, stop_words='english')\n",
      "#Display a sample\n",
      "print sorted(vect.get_stop_words())[80:-150]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "['empty', 'enough', 'etc', 'even', 'ever', 'every', 'everyone', 'everything', 'everywhere', 'except', 'few', 'fifteen', 'fify', 'fill', 'find', 'fire', 'first', 'five', 'for', 'former', 'formerly', 'forty', 'found', 'four', 'from', 'front', 'full', 'further', 'get', 'give', 'go', 'had', 'has', 'hasnt', 'have', 'he', 'hence', 'her', 'here', 'hereafter', 'hereby', 'herein', 'hereupon', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'however', 'hundred', 'i', 'ie', 'if', 'in', 'inc', 'indeed', 'interest', 'into', 'is', 'it', 'its', 'itself', 'keep', 'last', 'latter', 'latterly', 'least', 'less', 'ltd', 'made', 'many', 'may', 'me', 'meanwhile', 'might', 'mill', 'mine', 'more', 'moreover', 'most', 'mostly', 'move', 'much', 'must', 'my', 'myself', 'name']\n"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "If you already have a list of words in mind you with to `stop`, you can simply pass them as a list to the `stop_words` argument."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Stemming"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We also need to consider that similar words, such as \"girl\" and \"girls\" should probably be considered as the same word. Thus we need a function that reduces words to a finite 'word stem'. We can do thsi with the **Natural Language Toolkit (NLTK)**. After installing NLTK, import the library and try out the stemmer for english."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import nltk.stem\n",
      "\n",
      "s = nltk.stem.SnowballStemmer('english')\n",
      "\n",
      "print s.stem('bottles')\n",
      "print s.stem('bottle')\n",
      "\n",
      "print s.stem('perception')\n",
      "print s.stem('perceptive')\n",
      "\n",
      "print s.stem('crashing')\n",
      "print s.stem('crashed')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "bottl\n",
        "bottl\n",
        "percept\n",
        "percept\n",
        "crash\n",
        "crash\n"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Extending the vectorizer with NLTK stemming"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We need to step the posts before we feed then into the `CountVectorizer`. The best way to do this is overwrite the method `build_analyzer`. \n",
      "\n",
      "By doing this we utilize the preprocessing functions in the parent class that converts the raw posts into lower case. We tokenize all the words, and then convert each word into the stemmed version."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import nltk.stem\n",
      "\n",
      "english_stemmer = nltk.stem.SnowballStemmer('english')\n",
      "\n",
      "class StemmedCountVectorizer(CountVectorizer):\n",
      "    def build_analyzer(self):\n",
      "        analyzer = super(StemmedCountVectorizer, self).build_analyzer()\n",
      "        return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))\n",
      "    \n",
      "vectorizer = StemmedCountVectorizer(min_df=1, stop_words='english')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "X = vectorizer.fit_transform(posts)\n",
      "vectorizer.get_feature_names()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 15,
       "text": [
        "[u'backyard',\n",
        " u'beer',\n",
        " u'bottl',\n",
        " u'bring',\n",
        " u'can',\n",
        " u'did',\n",
        " u'girl',\n",
        " u'like',\n",
        " u'open',\n",
        " u'parti',\n",
        " u'recycl',\n",
        " u'useless',\n",
        " u'worth']"
       ]
      }
     ],
     "prompt_number": 15
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Restate the new vectorizer on the data\n",
      "X_train = vectorizer.fit_transform(posts)\n",
      "new_post_vect = vectorizer.transform([new_post])\n",
      "\n",
      "similarity(new_post_vect, X_train)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Post 0 with dist = 0.61: how to open a beer without a bottle opener\n",
        "Post 1 with dist = 0.77: Do girls like beer bottles or beer cans?\n",
        "Post 2 with dist = 1.14: where did all my beer go?\n",
        "Post 3 with dist = 1.14: where did all my beer go? where did all my beer go?\n",
        "Post 4 with dist = 0.71: recycling beer bottles and cans\n",
        "Post 5 with dist = 1.41: Is it worth recycling?\n",
        "Post 6 with dist = 1.05: do not bring bottles to my backyard party, only cans please.\n",
        "Post 7 with dist = 1.41: This is useless\n",
        "Best post is 0 with dist = 0.605810893055\n"
       ]
      }
     ],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We see now that post 0 is most similar to our new post, because bottles and bottle are now treated as the same word."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print new_post\n",
      "print posts[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Opening beer bottles and cans 101\n",
        "how to open a beer without a bottle opener\n"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Thinking a bit deeper about relevant post features"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So far we have considered that higher occurrence of certains words in post equates to a greater importance of that word in the post. While this is true to some extent, there is the case where very frequent words really don't carry any meaning to posts. For example, the word \"Subject\" appears in every blog post, thus it is not really communicating anything important, and does not help us distinguish between posts.\n",
      "\n",
      "We could perhaps set a 90% occurrence cutoff in our tokenizer, such that words that occur in >90% of the posts are excluded, however, we still run into the problem of border cases, say where the word occurs in only 89% of the posts.\n",
      "\n",
      "To solve these problems we count the term frequencies for every post **while** discounting those words that appear in many posts. This is the concept of **term frequency - inverse document frequency (TF-IDF)**. We can implement this using scikit learn's `TfidfVectorizer`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.feature_extraction.text import TfidfVectorizer\n",
      "# Rebuild the function to include our stemmer\n",
      "\n",
      "class StemmedTfidfVectorizer(TfidfVectorizer):\n",
      "    def build_analyzer(self):\n",
      "        analyzer = super(TfidfVectorizer, self).build_analyzer()\n",
      "        \n",
      "        return lambda doc: (english_stemmer.stem(w) for w in analyzer(doc))\n",
      "\n",
      "vectorizer = StemmedTfidfVectorizer(min_df=1, stop_words='english', decode_error='ignore')\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now instead of counts, our document vectors will contain individual TF-IDF values per term (token)."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Restate new vectorizer\n",
      "X_train = vectorizer.fit_transform(posts)\n",
      "new_post_vect = vectorizer.transform([new_post])\n",
      "\n",
      "similarity(new_post_vect, X_train)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Post 0 with dist = 0.57: how to open a beer without a bottle opener\n",
        "Post 1 with dist = 0.99: Do girls like beer bottles or beer cans?\n",
        "Post 2 with dist = 1.26: where did all my beer go?\n",
        "Post 3 with dist = 1.26: where did all my beer go? where did all my beer go?\n",
        "Post 4 with dist = 0.90: recycling beer bottles and cans\n",
        "Post 5 with dist = 1.41: Is it worth recycling?\n",
        "Post 6 with dist = 1.17: do not bring bottles to my backyard party, only cans please.\n",
        "Post 7 with dist = 1.41: This is useless\n",
        "Best post is 0 with dist = 0.572957858071\n"
       ]
      }
     ],
     "prompt_number": 19
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Recap"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So far we have:\n",
      "\n",
      "1. Tokenized text\n",
      "2. Discard words that occur too often and don't help us detect relevant posts\n",
      "3. Throw away very uncommon words\n",
      "4. Count the remaining words\n",
      "5. Calculated TF-IDF values from the counts, considering the whole text corpus.\n",
      "\n",
      "**Limitations of the bag-of-words approach**\n",
      "\n",
      "* It does not cover word relations: \"Car hits wall\" and \"Wall hits car\" will both have the same feature vector.\n",
      "* It does not count negations well: \"I will eat soup\" and \"I will *not* eat soup\" will have very similar feature vectors. Though this can be remedied by also counting bigrams and trigrams (two or three words in a row together).\n",
      "* Totally fails with misspelled words."
     ]
    },
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Clustering"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now that we can represent our blog posts quantitatively, to some degree. Now our goal is to cluster similar posts. There are two main times of clustering algorithms: **flat** and **hierarchical**. \n",
      "\n",
      "**Flat clustering** divides the posts into sets of clusters that minimizes the difference _within_ clusters and maximized the difference _between_ clusters. Generally we have to specify the number of clusters upfront.\n",
      "\n",
      "**Hierarchical clustering** does not require the number of clusters as an input. It creates a hierarchy of clusters where very similar posts are grouped together, then similar clusters are then further grouped recursively until one cluster is left that contains all the data. Once completed, the user can discern the optimal number of clusters."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "KMeans"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "KMeans is probably the most common **flat** clustering algorithm. First you must specify the number of desired clusters (k). From there, the algorithm first specifies k random _seeds_ within the data. Then it assigns each post to the closest seed centroid. Next, the seeds are relocated to the mean center of the points initially assigned to it. Then the process is repeat, whereby the posts are then reassigned based on the new closest seed point. This continues as long as the seed centroids move a considerable amount, after some _n_ iterations, the movements will fall below a threshold. The algorithm is then considered converged."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Get some test data "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We will utilize a machine learning dataset that contains 18 826 posts from 20 different newsgroups. There are many topics including technology, politics, and religion. However, for now we will only use the technical groups.\n",
      "\n",
      "One question we could ask is, for a certain topic, can we effectivly cluster the newgroups who published that topic into distinct categories?\n",
      "\n",
      "\n",
      "This data is already split into testing and training data, we can download the data using sklearn."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sklearn.datasets\n",
      "\n",
      "save_dir = '/users/ryankelly/downloads/' # Your save file path\n",
      "\n",
      "# Download data using sklearn\n",
      "df = sklearn.datasets.load_mlcomp(\"20news-18828\", mlcomp_root=save_dir)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 20
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Data files\n",
      "print df.filenames\n",
      "print len(df.filenames)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "['/users/ryankelly/downloads/379/raw/comp.graphics/1190-38614'\n",
        " '/users/ryankelly/downloads/379/raw/comp.graphics/1383-38616'\n",
        " '/users/ryankelly/downloads/379/raw/alt.atheism/487-53344' ...,\n",
        " '/users/ryankelly/downloads/379/raw/rec.sport.hockey/10215-54303'\n",
        " '/users/ryankelly/downloads/379/raw/sci.crypt/10799-15660'\n",
        " '/users/ryankelly/downloads/379/raw/comp.os.ms-windows.misc/2732-10871']\n",
        "18828\n"
       ]
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Data Topics\n",
      "df.target_names"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 23,
       "text": [
        "['alt.atheism',\n",
        " 'comp.graphics',\n",
        " 'comp.os.ms-windows.misc',\n",
        " 'comp.sys.ibm.pc.hardware',\n",
        " 'comp.sys.mac.hardware',\n",
        " 'comp.windows.x',\n",
        " 'misc.forsale',\n",
        " 'rec.autos',\n",
        " 'rec.motorcycles',\n",
        " 'rec.sport.baseball',\n",
        " 'rec.sport.hockey',\n",
        " 'sci.crypt',\n",
        " 'sci.electronics',\n",
        " 'sci.med',\n",
        " 'sci.space',\n",
        " 'soc.religion.christian',\n",
        " 'talk.politics.guns',\n",
        " 'talk.politics.mideast',\n",
        " 'talk.politics.misc',\n",
        " 'talk.religion.misc']"
       ]
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Restrict data to only 'tech' categories\n",
      "group = ['comp.graphics', 'comp.os.ms-windows.misc', \n",
      "         'comp.sys.ibm.pc.hardware', 'comp.sys.ma c.hardware', \n",
      "         'comp.windows.x', 'sci.space']\n",
      "# Reload in only training data with the desired categories\n",
      "train_data = sklearn.datasets.load_mlcomp('20news-18828', 'train', \n",
      "                                          mlcomp_root=save_dir, \n",
      "                                          categories=group)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "print(len(train_data.filenames))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "3414\n"
       ]
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Clustering posts"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "While initializing our `vectorizer` we have to remember that we are working with real data, which has many errors, which in this case invalid characers that cannot be encoded."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "vec = StemmedTfidfVectorizer(min_df=10, max_df=0.5,\n",
      "                              stop_words='english', decode_error='ignore')\n",
      "\n",
      "vecData = vec.fit_transform(train_data.data)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 26
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "num_samples, num_features = vecData.shape\n",
      "print('#samples: {}, #features: {}').format(num_samples, num_features)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "#samples: 3414, #features: 4331\n"
       ]
      }
     ],
     "prompt_number": 27
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This is the information we will use as input for KMeans clustering. Since we know there are 5 topic groups in these data, it makes sense that there could be 5 clusters in the data, so we will try this first."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "num_clusters = 5\n",
      "from sklearn.cluster import KMeans\n",
      "\n",
      "km = KMeans(n_clusters=num_clusters, init='random', n_init=1, verbose=1)\n",
      "km.fit(vecData)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Initialization complete\n",
        "Iteration  0, inertia 6434.212\n",
        "Iteration  1, inertia 3302.138\n",
        "Iteration  2, inertia 3286.234\n",
        "Iteration  3, inertia 3278.006\n",
        "Iteration  4, inertia 3274.039"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3271.234\n",
        "Iteration  6, inertia 3268.856\n",
        "Iteration  7, inertia 3267.609\n",
        "Iteration  8, inertia 3266.964\n",
        "Iteration  9, inertia 3266.352"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3265.901\n",
        "Iteration 11, inertia 3265.509\n",
        "Iteration 12, inertia 3264.970\n",
        "Iteration 13, inertia 3263.969\n",
        "Iteration 14, inertia 3261.887"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3259.657\n",
        "Iteration 16, inertia 3258.196\n",
        "Iteration 17, inertia 3257.560\n",
        "Iteration 18, inertia 3256.997\n",
        "Iteration 19, inertia 3256.714"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3256.482\n",
        "Iteration 21, inertia 3256.326\n",
        "Iteration 22, inertia 3256.126\n",
        "Iteration 23, inertia 3255.998\n",
        "Iteration 24, inertia 3255.918"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3255.870\n",
        "Iteration 26, inertia 3255.826\n",
        "Iteration 27, inertia 3255.768\n",
        "Iteration 28, inertia 3255.658\n",
        "Iteration 29, inertia 3255.574"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3255.550\n",
        "Iteration 31, inertia 3255.533\n",
        "Iteration 32, inertia 3255.527\n",
        "Iteration 33, inertia 3255.522\n",
        "Iteration 34, inertia 3255.513"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3255.508\n",
        "Iteration 36, inertia 3255.503\n",
        "Converged at iteration 36\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 110,
       "text": [
        "KMeans(copy_x=True, init='random', max_iter=300, n_clusters=5, n_init=1,\n",
        "    n_jobs=1, precompute_distances=True, random_state=None, tol=0.0001,\n",
        "    verbose=1)"
       ]
      }
     ],
     "prompt_number": 110
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "After fitting, we can get the clustering information out of the `labels_` property, and cluster centers from `cluster_centers_`. We then measure the completeness score to see the percentage of correct predictions."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn import metrics\n",
      "\n",
      "metrics.completeness_score(train_data.target, km.labels_)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 111,
       "text": [
        "0.40904043798434664"
       ]
      }
     ],
     "prompt_number": 111
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "39% accuracy isn't the best, but this could be because although there are five different topics, the contents are related between them, why dont we test several `k` values and see the prediction scores. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sklearn.cluster import KMeans\n",
      "\n",
      "def best_k():\n",
      "    for i in range(2,40):\n",
      "        best_k = 0\n",
      "        best_score = 0\n",
      "        km = KMeans(n_clusters=num_clusters, init='random', n_init=1, verbose=1)\n",
      "        km.fit(vecData)\n",
      "        score = metrics.completeness_score(train_data.target, km.labels_)\n",
      "        if score > best_score:\n",
      "            best_k = i\n",
      "            best_score = score\n",
      "    out = [best_k, best_score]\n",
      "    return out\n",
      "    \n",
      "best_k()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Initialization complete\n",
        "Iteration  0, inertia 6445.479\n",
        "Iteration  1, inertia 3292.339\n",
        "Iteration  2, inertia 3275.461\n",
        "Iteration  3, inertia 3270.621\n",
        "Iteration  4, inertia 3268.049"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3266.777\n",
        "Iteration  6, inertia 3266.141\n",
        "Iteration  7, inertia 3265.889\n",
        "Iteration  8, inertia 3265.754\n",
        "Iteration  9, inertia 3265.668"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3265.602\n",
        "Iteration 11, inertia 3265.509\n",
        "Iteration 12, inertia 3265.367\n",
        "Iteration 13, inertia 3265.151\n",
        "Iteration 14, inertia 3264.775"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3264.314\n",
        "Iteration 16, inertia 3263.827\n",
        "Iteration 17, inertia 3263.243\n",
        "Iteration 18, inertia 3262.592\n",
        "Iteration 19, inertia 3262.179"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3261.991\n",
        "Iteration 21, inertia 3261.915\n",
        "Iteration 22, inertia 3261.842\n",
        "Iteration 23, inertia 3261.741\n",
        "Iteration 24, inertia 3261.661"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3261.614\n",
        "Iteration 26, inertia 3261.582\n",
        "Iteration 27, inertia 3261.569\n",
        "Iteration 28, inertia 3261.557\n",
        "Iteration 29, inertia 3261.539"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3261.525\n",
        "Iteration 31, inertia 3261.499\n",
        "Converged at iteration 31\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6524.930\n",
        "Iteration  1, inertia 3308.247\n",
        "Iteration  2, inertia 3292.389\n",
        "Iteration  3, inertia 3283.365\n",
        "Iteration  4, inertia 3278.358"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3276.421\n",
        "Iteration  6, inertia 3275.128\n",
        "Iteration  7, inertia 3273.981\n",
        "Iteration  8, inertia 3272.630\n",
        "Iteration  9, inertia 3270.863"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3268.894\n",
        "Iteration 11, inertia 3267.018\n",
        "Iteration 12, inertia 3265.305\n",
        "Iteration 13, inertia 3263.985\n",
        "Iteration 14, inertia 3263.395"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3262.957\n",
        "Iteration 16, inertia 3262.720\n",
        "Iteration 17, inertia 3262.581\n",
        "Iteration 18, inertia 3262.501\n",
        "Iteration 19, inertia 3262.414"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3262.318\n",
        "Iteration 21, inertia 3262.253\n",
        "Iteration 22, inertia 3262.192\n",
        "Iteration 23, inertia 3262.085\n",
        "Iteration 24, inertia 3261.962"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3261.815\n",
        "Iteration 26, inertia 3261.625\n",
        "Iteration 27, inertia 3261.492\n",
        "Iteration 28, inertia 3261.394\n",
        "Iteration 29, inertia 3261.278"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3261.206\n",
        "Iteration 31, inertia 3261.134\n",
        "Iteration 32, inertia 3261.077\n",
        "Iteration 33, inertia 3261.018\n",
        "Iteration 34, inertia 3260.997"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3260.975\n",
        "Iteration 36, inertia 3260.958\n",
        "Iteration 37, inertia 3260.949\n",
        "Converged at iteration 37\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6392.513\n",
        "Iteration  1, inertia 3298.129\n",
        "Iteration  2, inertia 3286.500\n",
        "Iteration  3, inertia 3280.842\n",
        "Iteration  4, inertia 3277.803"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3276.304\n",
        "Iteration  6, inertia 3274.915\n",
        "Iteration  7, inertia 3273.931\n",
        "Iteration  8, inertia 3273.201\n",
        "Iteration  9, inertia 3272.640"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3272.355\n",
        "Iteration 11, inertia 3272.069\n",
        "Iteration 12, inertia 3271.870\n",
        "Iteration 13, inertia 3271.619\n",
        "Iteration 14, inertia 3271.328"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3271.052\n",
        "Iteration 16, inertia 3270.824\n",
        "Iteration 17, inertia 3270.511\n",
        "Iteration 18, inertia 3270.053\n",
        "Iteration 19, inertia 3269.612"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3269.327\n",
        "Iteration 21, inertia 3269.190\n",
        "Iteration 22, inertia 3269.089\n",
        "Iteration 23, inertia 3269.024\n",
        "Iteration 24, inertia 3268.943"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3268.846\n",
        "Iteration 26, inertia 3268.764\n",
        "Iteration 27, inertia 3268.697\n",
        "Iteration 28, inertia 3268.597\n",
        "Iteration 29, inertia 3268.465"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3268.295\n",
        "Iteration 31, inertia 3268.120\n",
        "Iteration 32, inertia 3267.779\n",
        "Iteration 33, inertia 3267.203\n",
        "Iteration 34, inertia 3266.515\n",
        "Iteration 35, inertia 3265.992"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 36, inertia 3265.674\n",
        "Iteration 37, inertia 3265.235\n",
        "Iteration 38, inertia 3264.315\n",
        "Iteration 39, inertia 3263.987\n",
        "Iteration 40, inertia 3263.929"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 41, inertia 3263.905\n",
        "Iteration 42, inertia 3263.885\n",
        "Iteration 43, inertia 3263.866\n",
        "Iteration 44, inertia 3263.859\n",
        "Iteration 45, inertia 3263.852\n",
        "Converged at iteration 45\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6326.529\n",
        "Iteration  1, inertia 3294.746\n",
        "Iteration  2, inertia 3282.371\n",
        "Iteration  3, inertia 3276.461\n",
        "Iteration  4, inertia 3273.181"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3271.013\n",
        "Iteration  6, inertia 3268.783\n",
        "Iteration  7, inertia 3266.648\n",
        "Iteration  8, inertia 3265.133\n",
        "Iteration  9, inertia 3264.077"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3263.566\n",
        "Iteration 11, inertia 3263.328\n",
        "Iteration 12, inertia 3263.232\n",
        "Iteration 13, inertia 3263.172\n",
        "Iteration 14, inertia 3263.125"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3263.087\n",
        "Iteration 16, inertia 3263.064\n",
        "Iteration 17, inertia 3263.053\n",
        "Iteration 18, inertia 3263.047\n",
        "Iteration 19, inertia 3263.044"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 19\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6396.511\n",
        "Iteration  1, inertia 3292.367\n",
        "Iteration  2, inertia 3280.269\n",
        "Iteration  3, inertia 3275.911\n",
        "Iteration  4, inertia 3272.600"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3270.273\n",
        "Iteration  6, inertia 3269.109\n",
        "Iteration  7, inertia 3268.377\n",
        "Iteration  8, inertia 3267.638\n",
        "Iteration  9, inertia 3266.541"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3265.821\n",
        "Iteration 11, inertia 3265.175\n",
        "Iteration 12, inertia 3264.720\n",
        "Iteration 13, inertia 3264.471\n",
        "Iteration 14, inertia 3264.307"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3264.199\n",
        "Iteration 16, inertia 3264.110\n",
        "Iteration 17, inertia 3264.035\n",
        "Iteration 18, inertia 3263.980\n",
        "Iteration 19, inertia 3263.934"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3263.922\n",
        "Iteration 21, inertia 3263.906\n",
        "Iteration 22, inertia 3263.890\n",
        "Iteration 23, inertia 3263.867\n",
        "Iteration 24, inertia 3263.857"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3263.845\n",
        "Iteration 26, inertia 3263.827\n",
        "Iteration 27, inertia 3263.818\n",
        "Iteration 28, inertia 3263.816\n",
        "Converged at iteration 28\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6431.988\n",
        "Iteration  1, inertia 3293.092\n",
        "Iteration  2, inertia 3278.216\n",
        "Iteration  3, inertia 3269.663\n",
        "Iteration  4, inertia 3265.719"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3263.092\n",
        "Iteration  6, inertia 3261.218\n",
        "Iteration  7, inertia 3260.260\n",
        "Iteration  8, inertia 3259.782\n",
        "Iteration  9, inertia 3259.574"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3259.506\n",
        "Iteration 11, inertia 3259.466\n",
        "Iteration 12, inertia 3259.449\n",
        "Iteration 13, inertia 3259.435\n",
        "Iteration 14, inertia 3259.422"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 14\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6434.113\n",
        "Iteration  1, inertia 3296.655\n",
        "Iteration  2, inertia 3278.784\n",
        "Iteration  3, inertia 3272.196\n",
        "Iteration  4, inertia 3270.036"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3268.580\n",
        "Iteration  6, inertia 3266.836\n",
        "Iteration  7, inertia 3265.345\n",
        "Iteration  8, inertia 3264.172\n",
        "Iteration  9, inertia 3263.147"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3262.455\n",
        "Iteration 11, inertia 3261.793\n",
        "Iteration 12, inertia 3261.236\n",
        "Iteration 13, inertia 3260.754\n",
        "Iteration 14, inertia 3260.035\n",
        "Iteration 15, inertia 3259.548"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3259.407\n",
        "Iteration 17, inertia 3259.335\n",
        "Iteration 18, inertia 3259.323\n",
        "Iteration 19, inertia 3259.319\n",
        "Iteration 20, inertia 3259.313\n",
        "Iteration 21, inertia 3259.307"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 22, inertia 3259.302\n",
        "Iteration 23, inertia 3259.298\n",
        "Iteration 24, inertia 3259.296\n",
        "Converged at iteration 24\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6421.814\n",
        "Iteration  1, inertia 3300.660\n",
        "Iteration  2, inertia 3287.858\n",
        "Iteration  3, inertia 3281.381\n",
        "Iteration  4, inertia 3276.546"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3271.531\n",
        "Iteration  6, inertia 3267.330\n",
        "Iteration  7, inertia 3264.234\n",
        "Iteration  8, inertia 3263.418\n",
        "Iteration  9, inertia 3262.728"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3262.077\n",
        "Iteration 11, inertia 3261.563\n",
        "Iteration 12, inertia 3261.202\n",
        "Iteration 13, inertia 3260.836\n",
        "Iteration 14, inertia 3260.469"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3260.095\n",
        "Iteration 16, inertia 3259.766\n",
        "Iteration 17, inertia 3259.590\n",
        "Iteration 18, inertia 3259.492\n",
        "Iteration 19, inertia 3259.396"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3259.263\n",
        "Iteration 21, inertia 3259.172\n",
        "Iteration 22, inertia 3259.122\n",
        "Iteration 23, inertia 3259.087\n",
        "Iteration 24, inertia 3259.059"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3259.021\n",
        "Iteration 26, inertia 3258.983\n",
        "Iteration 27, inertia 3258.919\n",
        "Iteration 28, inertia 3258.870\n",
        "Iteration 29, inertia 3258.826"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3258.756\n",
        "Iteration 31, inertia 3258.694\n",
        "Iteration 32, inertia 3258.621\n",
        "Iteration 33, inertia 3258.534\n",
        "Iteration 34, inertia 3258.440"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3258.277\n",
        "Iteration 36, inertia 3258.160\n",
        "Iteration 37, inertia 3258.098\n",
        "Iteration 38, inertia 3258.041\n",
        "Iteration 39, inertia 3257.966"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 40, inertia 3257.909\n",
        "Iteration 41, inertia 3257.860\n",
        "Iteration 42, inertia 3257.774\n",
        "Iteration 43, inertia 3257.727\n",
        "Iteration 44, inertia 3257.694"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 45, inertia 3257.666\n",
        "Iteration 46, inertia 3257.593\n",
        "Iteration 47, inertia 3257.551\n",
        "Iteration 48, inertia 3257.537\n",
        "Converged at iteration 48\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6373.464\n",
        "Iteration  1, inertia 3297.963\n",
        "Iteration  2, inertia 3287.660\n",
        "Iteration  3, inertia 3282.323\n",
        "Iteration  4, inertia 3279.099"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3277.759\n",
        "Iteration  6, inertia 3277.064\n",
        "Iteration  7, inertia 3276.650\n",
        "Iteration  8, inertia 3276.232\n",
        "Iteration  9, inertia 3275.737\n",
        "Iteration 10, inertia 3275.473"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3275.339\n",
        "Iteration 12, inertia 3275.253\n",
        "Iteration 13, inertia 3275.199\n",
        "Iteration 14, inertia 3275.158\n",
        "Iteration 15, inertia 3275.128\n",
        "Iteration 16, inertia 3275.107"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 17, inertia 3275.076\n",
        "Iteration 18, inertia 3275.055\n",
        "Iteration 19, inertia 3275.041\n",
        "Iteration 20, inertia 3275.022\n",
        "Iteration 21, inertia 3274.999\n",
        "Iteration 22, inertia 3274.979"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 23, inertia 3274.960\n",
        "Iteration 24, inertia 3274.942\n",
        "Iteration 25, inertia 3274.931\n",
        "Iteration 26, inertia 3274.926\n",
        "Iteration 27, inertia 3274.922\n",
        "Iteration 28, inertia 3274.920"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 28\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6289.281\n",
        "Iteration  1, inertia 3304.450\n",
        "Iteration  2, inertia 3288.473\n",
        "Iteration  3, inertia 3282.639\n",
        "Iteration  4, inertia 3280.544\n",
        "Iteration  5, inertia 3279.671"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  6, inertia 3279.139\n",
        "Iteration  7, inertia 3278.606\n",
        "Iteration  8, inertia 3278.196\n",
        "Iteration  9, inertia 3277.723\n",
        "Iteration 10, inertia 3277.261"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3276.925\n",
        "Iteration 12, inertia 3276.570\n",
        "Iteration 13, inertia 3276.117\n",
        "Iteration 14, inertia 3275.711\n",
        "Iteration 15, inertia 3275.582"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3275.538\n",
        "Iteration 17, inertia 3275.526\n",
        "Iteration 18, inertia 3275.517\n",
        "Iteration 19, inertia 3275.509\n",
        "Converged at iteration 19\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6390.941\n",
        "Iteration  1, inertia 3290.356\n",
        "Iteration  2, inertia 3274.869\n",
        "Iteration  3, inertia 3268.843\n",
        "Iteration  4, inertia 3265.737"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3264.341\n",
        "Iteration  6, inertia 3263.580\n",
        "Iteration  7, inertia 3262.989\n",
        "Iteration  8, inertia 3262.543\n",
        "Iteration  9, inertia 3262.156"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3261.898\n",
        "Iteration 11, inertia 3261.653\n",
        "Iteration 12, inertia 3261.429\n",
        "Iteration 13, inertia 3261.209\n",
        "Iteration 14, inertia 3260.992"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3260.760\n",
        "Iteration 16, inertia 3260.407\n",
        "Iteration 17, inertia 3259.996\n",
        "Iteration 18, inertia 3259.382\n",
        "Iteration 19, inertia 3258.432"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3257.154\n",
        "Iteration 21, inertia 3256.723\n",
        "Iteration 22, inertia 3256.546\n",
        "Iteration 23, inertia 3256.446\n",
        "Iteration 24, inertia 3256.391"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3256.362\n",
        "Iteration 26, inertia 3256.344\n",
        "Iteration 27, inertia 3256.339\n",
        "Converged at iteration 27\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6432.035\n",
        "Iteration  1, inertia 3304.822\n",
        "Iteration  2, inertia 3291.831\n",
        "Iteration  3, inertia 3281.480\n",
        "Iteration  4, inertia 3275.025"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3270.730\n",
        "Iteration  6, inertia 3266.021\n",
        "Iteration  7, inertia 3261.621\n",
        "Iteration  8, inertia 3259.239\n",
        "Iteration  9, inertia 3258.382"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3257.763\n",
        "Iteration 11, inertia 3257.227\n",
        "Iteration 12, inertia 3256.768\n",
        "Iteration 13, inertia 3256.410\n",
        "Iteration 14, inertia 3256.245"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3256.139\n",
        "Iteration 16, inertia 3256.045\n",
        "Iteration 17, inertia 3256.003\n",
        "Iteration 18, inertia 3255.975\n",
        "Iteration 19, inertia 3255.955"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3255.938\n",
        "Iteration 21, inertia 3255.926\n",
        "Iteration 22, inertia 3255.919\n",
        "Iteration 23, inertia 3255.906\n",
        "Iteration 24, inertia 3255.901"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3255.899\n",
        "Iteration 26, inertia 3255.897\n",
        "Converged at iteration 26\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6439.297\n",
        "Iteration  1, inertia 3292.780\n",
        "Iteration  2, inertia 3279.272\n",
        "Iteration  3, inertia 3275.342\n",
        "Iteration  4, inertia 3271.297"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3266.407\n",
        "Iteration  6, inertia 3264.193\n",
        "Iteration  7, inertia 3262.548\n",
        "Iteration  8, inertia 3261.671\n",
        "Iteration  9, inertia 3260.768"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3259.996\n",
        "Iteration 11, inertia 3259.212\n",
        "Iteration 12, inertia 3258.566\n",
        "Iteration 13, inertia 3258.245\n",
        "Iteration 14, inertia 3258.081"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3257.916\n",
        "Iteration 16, inertia 3257.788\n",
        "Iteration 17, inertia 3257.724\n",
        "Iteration 18, inertia 3257.663\n",
        "Iteration 19, inertia 3257.642"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3257.620\n",
        "Iteration 21, inertia 3257.606\n",
        "Iteration 22, inertia 3257.599\n",
        "Iteration 23, inertia 3257.597\n",
        "Iteration 24, inertia 3257.592"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 24\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6437.135\n",
        "Iteration  1, inertia 3308.062\n",
        "Iteration  2, inertia 3296.359\n",
        "Iteration  3, inertia 3288.127\n",
        "Iteration  4, inertia 3284.844"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3282.816\n",
        "Iteration  6, inertia 3280.496\n",
        "Iteration  7, inertia 3277.755\n",
        "Iteration  8, inertia 3274.709\n",
        "Iteration  9, inertia 3271.397"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3269.900\n",
        "Iteration 11, inertia 3269.041\n",
        "Iteration 12, inertia 3268.558\n",
        "Iteration 13, inertia 3268.149\n",
        "Iteration 14, inertia 3267.920"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3267.757\n",
        "Iteration 16, inertia 3267.569\n",
        "Iteration 17, inertia 3267.379\n",
        "Iteration 18, inertia 3267.232\n",
        "Iteration 19, inertia 3267.083"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3266.887\n",
        "Iteration 21, inertia 3266.684\n",
        "Iteration 22, inertia 3266.575\n",
        "Iteration 23, inertia 3266.486\n",
        "Iteration 24, inertia 3266.413"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3266.331\n",
        "Iteration 26, inertia 3266.293\n",
        "Iteration 27, inertia 3266.268\n",
        "Iteration 28, inertia 3266.235\n",
        "Iteration 29, inertia 3266.214"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3266.203\n",
        "Iteration 31, inertia 3266.192\n",
        "Iteration 32, inertia 3266.186\n",
        "Iteration 33, inertia 3266.183\n",
        "Converged at iteration 33\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6493.097\n",
        "Iteration  1, inertia 3302.676\n",
        "Iteration  2, inertia 3285.066\n",
        "Iteration  3, inertia 3278.241\n",
        "Iteration  4, inertia 3274.562"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3270.829\n",
        "Iteration  6, inertia 3265.238\n",
        "Iteration  7, inertia 3261.167\n",
        "Iteration  8, inertia 3259.118\n",
        "Iteration  9, inertia 3258.502"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3258.201\n",
        "Iteration 11, inertia 3257.948\n",
        "Iteration 12, inertia 3257.797\n",
        "Iteration 13, inertia 3257.716\n",
        "Iteration 14, inertia 3257.673\n",
        "Iteration 15, inertia 3257.666"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 15\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6359.146\n",
        "Iteration  1, inertia 3291.541\n",
        "Iteration  2, inertia 3279.445\n",
        "Iteration  3, inertia 3275.558\n",
        "Iteration  4, inertia 3273.488"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3272.191\n",
        "Iteration  6, inertia 3271.287\n",
        "Iteration  7, inertia 3270.702\n",
        "Iteration  8, inertia 3270.374\n",
        "Iteration  9, inertia 3270.197"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3269.949\n",
        "Iteration 11, inertia 3269.697\n",
        "Iteration 12, inertia 3269.348\n",
        "Iteration 13, inertia 3268.820\n",
        "Iteration 14, inertia 3267.955"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3266.767\n",
        "Iteration 16, inertia 3265.877\n",
        "Iteration 17, inertia 3265.359\n",
        "Iteration 18, inertia 3264.872\n",
        "Iteration 19, inertia 3264.386"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3263.777\n",
        "Iteration 21, inertia 3263.350\n",
        "Iteration 22, inertia 3262.954\n",
        "Iteration 23, inertia 3262.645\n",
        "Iteration 24, inertia 3262.343"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3262.119\n",
        "Iteration 26, inertia 3262.012\n",
        "Iteration 27, inertia 3261.943\n",
        "Iteration 28, inertia 3261.875\n",
        "Iteration 29, inertia 3261.808"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3261.770\n",
        "Iteration 31, inertia 3261.744\n",
        "Iteration 32, inertia 3261.707\n",
        "Iteration 33, inertia 3261.679\n",
        "Iteration 34, inertia 3261.674"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3261.669\n",
        "Iteration 36, inertia 3261.667\n",
        "Converged at iteration 36\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6373.946\n",
        "Iteration  1, inertia 3294.749\n",
        "Iteration  2, inertia 3278.626\n",
        "Iteration  3, inertia 3273.958\n",
        "Iteration  4, inertia 3271.969"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3270.800\n",
        "Iteration  6, inertia 3269.873\n",
        "Iteration  7, inertia 3269.060\n",
        "Iteration  8, inertia 3268.193\n",
        "Iteration  9, inertia 3267.473"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3266.822\n",
        "Iteration 11, inertia 3266.335\n",
        "Iteration 12, inertia 3266.065\n",
        "Iteration 13, inertia 3265.876\n",
        "Iteration 14, inertia 3265.720\n",
        "Iteration 15, inertia 3265.663"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3265.627\n",
        "Iteration 17, inertia 3265.610\n",
        "Iteration 18, inertia 3265.577\n",
        "Iteration 19, inertia 3265.549\n",
        "Iteration 20, inertia 3265.523"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3265.513\n",
        "Iteration 22, inertia 3265.503\n",
        "Iteration 23, inertia 3265.497\n",
        "Converged at iteration 23\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6454.118\n",
        "Iteration  1, inertia 3303.824\n",
        "Iteration  2, inertia 3288.688\n",
        "Iteration  3, inertia 3282.998\n",
        "Iteration  4, inertia 3279.922"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3278.183\n",
        "Iteration  6, inertia 3276.889\n",
        "Iteration  7, inertia 3275.991\n",
        "Iteration  8, inertia 3275.039\n",
        "Iteration  9, inertia 3273.694"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3272.089\n",
        "Iteration 11, inertia 3270.481\n",
        "Iteration 12, inertia 3269.142\n",
        "Iteration 13, inertia 3267.853\n",
        "Iteration 14, inertia 3266.220"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3264.370\n",
        "Iteration 16, inertia 3262.774\n",
        "Iteration 17, inertia 3261.495\n",
        "Iteration 18, inertia 3260.136\n",
        "Iteration 19, inertia 3258.555"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3256.940\n",
        "Iteration 21, inertia 3256.170\n",
        "Iteration 22, inertia 3255.746\n",
        "Iteration 23, inertia 3255.497\n",
        "Iteration 24, inertia 3255.385"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3255.340\n",
        "Iteration 26, inertia 3255.299\n",
        "Iteration 27, inertia 3255.283\n",
        "Converged at iteration 27\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6362.969\n",
        "Iteration  1, inertia 3296.454\n",
        "Iteration  2, inertia 3282.673\n",
        "Iteration  3, inertia 3275.059\n",
        "Iteration  4, inertia 3269.156"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3264.227\n",
        "Iteration  6, inertia 3259.917\n",
        "Iteration  7, inertia 3257.108\n",
        "Iteration  8, inertia 3256.442\n",
        "Iteration  9, inertia 3256.069"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3255.857\n",
        "Iteration 11, inertia 3255.774\n",
        "Iteration 12, inertia 3255.708\n",
        "Iteration 13, inertia 3255.674\n",
        "Iteration 14, inertia 3255.650"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3255.635\n",
        "Iteration 16, inertia 3255.631\n",
        "Iteration 17, inertia 3255.629\n",
        "Converged at iteration 17\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6476.107\n",
        "Iteration  1, inertia 3296.095\n",
        "Iteration  2, inertia 3282.579\n",
        "Iteration  3, inertia 3276.890\n",
        "Iteration  4, inertia 3272.801\n",
        "Iteration  5, inertia 3268.908"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  6, inertia 3266.789\n",
        "Iteration  7, inertia 3265.977\n",
        "Iteration  8, inertia 3265.409\n",
        "Iteration  9, inertia 3264.982\n",
        "Iteration 10, inertia 3264.650\n",
        "Iteration 11, inertia 3264.401"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 12, inertia 3264.138\n",
        "Iteration 13, inertia 3263.900\n",
        "Iteration 14, inertia 3263.748\n",
        "Iteration 15, inertia 3263.628\n",
        "Iteration 16, inertia 3263.528"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 17, inertia 3263.422\n",
        "Iteration 18, inertia 3263.345\n",
        "Iteration 19, inertia 3263.335\n",
        "Iteration 20, inertia 3263.326\n",
        "Iteration 21, inertia 3263.324"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 21\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6467.892\n",
        "Iteration  1, inertia 3299.482\n",
        "Iteration  2, inertia 3284.474\n",
        "Iteration  3, inertia 3276.773\n",
        "Iteration  4, inertia 3273.421"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3271.134\n",
        "Iteration  6, inertia 3269.243\n",
        "Iteration  7, inertia 3268.631\n",
        "Iteration  8, inertia 3268.409\n",
        "Iteration  9, inertia 3268.296"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3268.184\n",
        "Iteration 11, inertia 3268.000\n",
        "Iteration 12, inertia 3267.834\n",
        "Iteration 13, inertia 3267.674\n",
        "Iteration 14, inertia 3267.473"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3267.362\n",
        "Iteration 16, inertia 3267.273\n",
        "Iteration 17, inertia 3267.147\n",
        "Iteration 18, inertia 3267.035\n",
        "Iteration 19, inertia 3266.914"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3266.829\n",
        "Iteration 21, inertia 3266.699\n",
        "Iteration 22, inertia 3266.545\n",
        "Iteration 23, inertia 3266.270\n",
        "Iteration 24, inertia 3265.958"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3265.560\n",
        "Iteration 26, inertia 3265.069\n",
        "Iteration 27, inertia 3264.684\n",
        "Iteration 28, inertia 3264.510\n",
        "Iteration 29, inertia 3264.421"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3264.306\n",
        "Iteration 31, inertia 3264.165\n",
        "Iteration 32, inertia 3264.036\n",
        "Iteration 33, inertia 3263.952\n",
        "Iteration 34, inertia 3263.910"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3263.856\n",
        "Iteration 36, inertia 3263.814\n",
        "Iteration 37, inertia 3263.778\n",
        "Iteration 38, inertia 3263.729\n",
        "Iteration 39, inertia 3263.623"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 40, inertia 3263.525\n",
        "Iteration 41, inertia 3263.408\n",
        "Iteration 42, inertia 3263.292\n",
        "Iteration 43, inertia 3263.134\n",
        "Iteration 44, inertia 3262.944"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 45, inertia 3262.742\n",
        "Iteration 46, inertia 3262.450\n",
        "Iteration 47, inertia 3261.958\n",
        "Iteration 48, inertia 3260.961\n",
        "Iteration 49, inertia 3259.360"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 50, inertia 3258.312\n",
        "Iteration 51, inertia 3257.919\n",
        "Iteration 52, inertia 3257.750\n",
        "Iteration 53, inertia 3257.643\n",
        "Iteration 54, inertia 3257.588\n",
        "Iteration 55, inertia 3257.580"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 55\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6420.248\n",
        "Iteration  1, inertia 3304.572\n",
        "Iteration  2, inertia 3289.501\n",
        "Iteration  3, inertia 3282.402\n",
        "Iteration  4, inertia 3278.539"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3276.338\n",
        "Iteration  6, inertia 3274.250\n",
        "Iteration  7, inertia 3272.702\n",
        "Iteration  8, inertia 3270.959\n",
        "Iteration  9, inertia 3269.232\n",
        "Iteration 10, inertia 3267.949"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3266.887\n",
        "Iteration 12, inertia 3265.973\n",
        "Iteration 13, inertia 3265.242\n",
        "Iteration 14, inertia 3264.568\n",
        "Iteration 15, inertia 3264.087\n",
        "Iteration 16, inertia 3263.834"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 17, inertia 3263.631\n",
        "Iteration 18, inertia 3263.505\n",
        "Iteration 19, inertia 3263.451\n",
        "Iteration 20, inertia 3263.379\n",
        "Iteration 21, inertia 3263.328\n",
        "Iteration 22, inertia 3263.294"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 23, inertia 3263.249\n",
        "Iteration 24, inertia 3263.226\n",
        "Iteration 25, inertia 3263.212\n",
        "Iteration 26, inertia 3263.198\n",
        "Iteration 27, inertia 3263.185\n",
        "Iteration 28, inertia 3263.176"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 29, inertia 3263.173\n",
        "Iteration 30, inertia 3263.171\n",
        "Converged at iteration 30\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6400.961\n",
        "Iteration  1, inertia 3298.251\n",
        "Iteration  2, inertia 3280.432\n",
        "Iteration  3, inertia 3275.345\n",
        "Iteration  4, inertia 3273.142\n",
        "Iteration  5, inertia 3271.588"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  6, inertia 3269.971\n",
        "Iteration  7, inertia 3268.344\n",
        "Iteration  8, inertia 3267.296\n",
        "Iteration  9, inertia 3266.664\n",
        "Iteration 10, inertia 3265.748"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3264.808\n",
        "Iteration 12, inertia 3263.649\n",
        "Iteration 13, inertia 3262.882\n",
        "Iteration 14, inertia 3262.461\n",
        "Iteration 15, inertia 3262.228"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3262.058\n",
        "Iteration 17, inertia 3261.915\n",
        "Iteration 18, inertia 3261.792\n",
        "Iteration 19, inertia 3261.680\n",
        "Iteration 20, inertia 3261.592"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3261.520\n",
        "Iteration 22, inertia 3261.401\n",
        "Iteration 23, inertia 3261.279\n",
        "Iteration 24, inertia 3261.215\n",
        "Iteration 25, inertia 3261.126"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 26, inertia 3261.046\n",
        "Iteration 27, inertia 3260.992\n",
        "Iteration 28, inertia 3260.953\n",
        "Iteration 29, inertia 3260.912\n",
        "Iteration 30, inertia 3260.862"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 31, inertia 3260.815\n",
        "Iteration 32, inertia 3260.791\n",
        "Iteration 33, inertia 3260.779\n",
        "Iteration 34, inertia 3260.773\n",
        "Iteration 35, inertia 3260.761"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 36, inertia 3260.741\n",
        "Iteration 37, inertia 3260.718\n",
        "Iteration 38, inertia 3260.701\n",
        "Iteration 39, inertia 3260.698\n",
        "Iteration 40, inertia 3260.688\n",
        "Iteration 41, inertia 3260.677"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 42, inertia 3260.672\n",
        "Converged at iteration 42\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6472.336\n",
        "Iteration  1, inertia 3303.275\n",
        "Iteration  2, inertia 3284.138\n",
        "Iteration  3, inertia 3274.154\n",
        "Iteration  4, inertia 3268.411"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3265.076\n",
        "Iteration  6, inertia 3262.077\n",
        "Iteration  7, inertia 3261.408\n",
        "Iteration  8, inertia 3260.914\n",
        "Iteration  9, inertia 3260.573"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3260.182\n",
        "Iteration 11, inertia 3259.746\n",
        "Iteration 12, inertia 3259.141\n",
        "Iteration 13, inertia 3258.615\n",
        "Iteration 14, inertia 3258.188"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3257.699\n",
        "Iteration 16, inertia 3257.071\n",
        "Iteration 17, inertia 3256.768\n",
        "Iteration 18, inertia 3256.620\n",
        "Iteration 19, inertia 3256.475"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3256.358\n",
        "Iteration 21, inertia 3256.205\n",
        "Iteration 22, inertia 3256.133\n",
        "Iteration 23, inertia 3256.099\n",
        "Iteration 24, inertia 3256.074"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3256.063\n",
        "Iteration 26, inertia 3256.057\n",
        "Iteration 27, inertia 3256.055\n",
        "Iteration 28, inertia 3256.053\n",
        "Converged at iteration 28\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6409.635\n",
        "Iteration  1, inertia 3306.802\n",
        "Iteration  2, inertia 3292.770\n",
        "Iteration  3, inertia 3282.228\n",
        "Iteration  4, inertia 3274.919"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3269.284\n",
        "Iteration  6, inertia 3265.479\n",
        "Iteration  7, inertia 3262.476\n",
        "Iteration  8, inertia 3260.595\n",
        "Iteration  9, inertia 3259.696"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3259.101\n",
        "Iteration 11, inertia 3258.481\n",
        "Iteration 12, inertia 3258.167\n",
        "Iteration 13, inertia 3257.964\n",
        "Iteration 14, inertia 3257.725"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3257.538\n",
        "Iteration 16, inertia 3257.429\n",
        "Iteration 17, inertia 3257.344\n",
        "Iteration 18, inertia 3257.202\n",
        "Iteration 19, inertia 3257.062"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3256.865\n",
        "Iteration 21, inertia 3256.692\n",
        "Iteration 22, inertia 3256.549\n",
        "Iteration 23, inertia 3256.403\n",
        "Iteration 24, inertia 3256.245"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 25, inertia 3256.127\n",
        "Iteration 26, inertia 3256.025\n",
        "Iteration 27, inertia 3255.952\n",
        "Iteration 28, inertia 3255.853\n",
        "Iteration 29, inertia 3255.769"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 30, inertia 3255.630\n",
        "Iteration 31, inertia 3255.571\n",
        "Iteration 32, inertia 3255.543\n",
        "Iteration 33, inertia 3255.516\n",
        "Iteration 34, inertia 3255.496"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 35, inertia 3255.489\n",
        "Converged at iteration 35\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6414.364\n",
        "Iteration  1, inertia 3292.636\n",
        "Iteration  2, inertia 3274.091\n",
        "Iteration  3, inertia 3266.486\n",
        "Iteration  4, inertia 3263.416"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3261.789\n",
        "Iteration  6, inertia 3260.794\n",
        "Iteration  7, inertia 3260.258\n",
        "Iteration  8, inertia 3259.941\n",
        "Iteration  9, inertia 3259.658\n",
        "Iteration 10, inertia 3259.351"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3258.914\n",
        "Iteration 12, inertia 3258.190\n",
        "Iteration 13, inertia 3257.195\n",
        "Iteration 14, inertia 3256.270\n",
        "Iteration 15, inertia 3255.707"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3255.556\n",
        "Iteration 17, inertia 3255.521\n",
        "Iteration 18, inertia 3255.483\n",
        "Iteration 19, inertia 3255.469\n",
        "Iteration 20, inertia 3255.460\n",
        "Iteration 21, inertia 3255.456"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 21\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6324.895\n",
        "Iteration  1, inertia 3293.965\n",
        "Iteration  2, inertia 3275.830\n",
        "Iteration  3, inertia 3267.741\n",
        "Iteration  4, inertia 3263.209"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3261.451\n",
        "Iteration  6, inertia 3260.725\n",
        "Iteration  7, inertia 3260.367\n",
        "Iteration  8, inertia 3260.137\n",
        "Iteration  9, inertia 3259.991"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3259.924\n",
        "Iteration 11, inertia 3259.890\n",
        "Iteration 12, inertia 3259.877\n",
        "Iteration 13, inertia 3259.861\n",
        "Converged at iteration 13\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6439.038\n",
        "Iteration  1, inertia 3291.442\n",
        "Iteration  2, inertia 3276.028\n",
        "Iteration  3, inertia 3271.637\n",
        "Iteration  4, inertia 3269.695"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3268.859\n",
        "Iteration  6, inertia 3268.340\n",
        "Iteration  7, inertia 3267.780\n",
        "Iteration  8, inertia 3267.261\n",
        "Iteration  9, inertia 3266.530"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3265.668\n",
        "Iteration 11, inertia 3264.816\n",
        "Iteration 12, inertia 3263.986\n",
        "Iteration 13, inertia 3263.582\n",
        "Iteration 14, inertia 3263.172"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3262.976\n",
        "Iteration 16, inertia 3262.861\n",
        "Iteration 17, inertia 3262.783\n",
        "Iteration 18, inertia 3262.751\n",
        "Iteration 19, inertia 3262.726"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3262.708\n",
        "Iteration 21, inertia 3262.699\n",
        "Converged at iteration 21\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6458.746\n",
        "Iteration  1, inertia 3309.368\n",
        "Iteration  2, inertia 3296.435\n",
        "Iteration  3, inertia 3288.927\n",
        "Iteration  4, inertia 3282.518"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3275.289\n",
        "Iteration  6, inertia 3267.311\n",
        "Iteration  7, inertia 3264.367\n",
        "Iteration  8, inertia 3263.004\n",
        "Iteration  9, inertia 3262.378\n",
        "Iteration 10, inertia 3261.967"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3261.658\n",
        "Iteration 12, inertia 3261.507\n",
        "Iteration 13, inertia 3261.294\n",
        "Iteration 14, inertia 3261.093\n",
        "Iteration 15, inertia 3260.902\n",
        "Iteration 16, inertia 3260.740"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 17, inertia 3260.652\n",
        "Iteration 18, inertia 3260.585\n",
        "Iteration 19, inertia 3260.539\n",
        "Iteration 20, inertia 3260.491\n",
        "Iteration 21, inertia 3260.454\n",
        "Iteration 22, inertia 3260.426"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 23, inertia 3260.412\n",
        "Iteration 24, inertia 3260.405\n",
        "Iteration 25, inertia 3260.402\n",
        "Iteration 26, inertia 3260.398\n",
        "Iteration 27, inertia 3260.390\n",
        "Iteration 28, inertia 3260.382"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 29, inertia 3260.380\n",
        "Iteration 30, inertia 3260.376\n",
        "Converged at iteration 30\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6350.535\n",
        "Iteration  1, inertia 3291.919\n",
        "Iteration  2, inertia 3279.374\n",
        "Iteration  3, inertia 3273.346\n",
        "Iteration  4, inertia 3269.117\n",
        "Iteration  5, inertia 3266.915"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  6, inertia 3265.431\n",
        "Iteration  7, inertia 3264.712\n",
        "Iteration  8, inertia 3264.349\n",
        "Iteration  9, inertia 3264.067\n",
        "Iteration 10, inertia 3263.850\n",
        "Iteration 11, inertia 3263.726"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 12, inertia 3263.650\n",
        "Iteration 13, inertia 3263.619\n",
        "Iteration 14, inertia 3263.607\n",
        "Iteration 15, inertia 3263.597\n",
        "Converged at iteration 15\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6456.248\n",
        "Iteration  1, inertia 3300.444\n",
        "Iteration  2, inertia 3283.503\n",
        "Iteration  3, inertia 3276.788\n",
        "Iteration  4, inertia 3274.204"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3272.677\n",
        "Iteration  6, inertia 3271.439\n",
        "Iteration  7, inertia 3270.415\n",
        "Iteration  8, inertia 3269.341\n",
        "Iteration  9, inertia 3268.165\n",
        "Iteration 10, inertia 3267.504"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3267.135\n",
        "Iteration 12, inertia 3266.829\n",
        "Iteration 13, inertia 3266.572\n",
        "Iteration 14, inertia 3266.337\n",
        "Iteration 15, inertia 3266.077"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3265.841\n",
        "Iteration 17, inertia 3265.544\n",
        "Iteration 18, inertia 3265.359\n",
        "Iteration 19, inertia 3265.181\n",
        "Iteration 20, inertia 3265.045"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3264.936\n",
        "Iteration 22, inertia 3264.811\n",
        "Iteration 23, inertia 3264.654\n",
        "Iteration 24, inertia 3264.496\n",
        "Iteration 25, inertia 3264.081"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 26, inertia 3263.339\n",
        "Iteration 27, inertia 3261.533\n",
        "Iteration 28, inertia 3258.654\n",
        "Iteration 29, inertia 3256.621\n",
        "Iteration 30, inertia 3255.979"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 31, inertia 3255.643\n",
        "Iteration 32, inertia 3255.477\n",
        "Iteration 33, inertia 3255.403\n",
        "Iteration 34, inertia 3255.360\n",
        "Iteration 35, inertia 3255.335"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Converged at iteration 35\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6451.563\n",
        "Iteration  1, inertia 3304.684\n",
        "Iteration  2, inertia 3285.713\n",
        "Iteration  3, inertia 3279.365\n",
        "Iteration  4, inertia 3277.067"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3275.508\n",
        "Iteration  6, inertia 3274.519\n",
        "Iteration  7, inertia 3273.507\n",
        "Iteration  8, inertia 3272.746\n",
        "Iteration  9, inertia 3272.162\n",
        "Iteration 10, inertia 3271.657"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3271.264\n",
        "Iteration 12, inertia 3270.956\n",
        "Iteration 13, inertia 3270.540\n",
        "Iteration 14, inertia 3270.082\n",
        "Iteration 15, inertia 3269.869"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3269.726\n",
        "Iteration 17, inertia 3269.584\n",
        "Iteration 18, inertia 3269.468\n",
        "Iteration 19, inertia 3269.352\n",
        "Iteration 20, inertia 3269.178"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3269.011\n",
        "Iteration 22, inertia 3268.723\n",
        "Iteration 23, inertia 3268.353\n",
        "Iteration 24, inertia 3267.843\n",
        "Iteration 25, inertia 3267.215\n",
        "Iteration 26, inertia 3266.362"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 27, inertia 3265.584\n",
        "Iteration 28, inertia 3265.157\n",
        "Iteration 29, inertia 3264.786\n",
        "Iteration 30, inertia 3264.364\n",
        "Iteration 31, inertia 3263.901\n",
        "Iteration 32, inertia 3263.552"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 33, inertia 3263.260\n",
        "Iteration 34, inertia 3262.937\n",
        "Iteration 35, inertia 3262.485\n",
        "Iteration 36, inertia 3261.695\n",
        "Iteration 37, inertia 3261.107\n",
        "Iteration 38, inertia 3260.828"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 39, inertia 3260.594\n",
        "Iteration 40, inertia 3260.428\n",
        "Iteration 41, inertia 3260.389\n",
        "Iteration 42, inertia 3260.367\n",
        "Iteration 43, inertia 3260.365"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 44, inertia 3260.359\n",
        "Converged at iteration 44\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6405.600\n",
        "Iteration  1, inertia 3302.004\n",
        "Iteration  2, inertia 3283.203\n",
        "Iteration  3, inertia 3276.145\n",
        "Iteration  4, inertia 3273.083\n",
        "Iteration  5, inertia 3271.498"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  6, inertia 3270.418\n",
        "Iteration  7, inertia 3269.699\n",
        "Iteration  8, inertia 3268.915\n",
        "Iteration  9, inertia 3267.884\n",
        "Iteration 10, inertia 3266.646"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 11, inertia 3265.083\n",
        "Iteration 12, inertia 3263.472\n",
        "Iteration 13, inertia 3262.431\n",
        "Iteration 14, inertia 3261.918\n",
        "Iteration 15, inertia 3261.636"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 16, inertia 3261.445\n",
        "Iteration 17, inertia 3261.310\n",
        "Iteration 18, inertia 3261.224\n",
        "Iteration 19, inertia 3261.135\n",
        "Iteration 20, inertia 3261.059\n",
        "Iteration 21, inertia 3261.018"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 22, inertia 3260.983\n",
        "Iteration 23, inertia 3260.947\n",
        "Iteration 24, inertia 3260.900\n",
        "Iteration 25, inertia 3260.840\n",
        "Iteration 26, inertia 3260.790"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 27, inertia 3260.764\n",
        "Iteration 28, inertia 3260.743\n",
        "Iteration 29, inertia 3260.738\n",
        "Converged at iteration 29\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6448.216\n",
        "Iteration  1, inertia 3298.831\n",
        "Iteration  2, inertia 3279.635\n",
        "Iteration  3, inertia 3269.284\n",
        "Iteration  4, inertia 3263.260"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3259.594\n",
        "Iteration  6, inertia 3257.439\n",
        "Iteration  7, inertia 3256.139\n",
        "Iteration  8, inertia 3255.675\n",
        "Iteration  9, inertia 3255.538"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3255.445\n",
        "Iteration 11, inertia 3255.393\n",
        "Iteration 12, inertia 3255.364\n",
        "Iteration 13, inertia 3255.356\n",
        "Iteration 14, inertia 3255.344"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3255.334\n",
        "Converged at iteration 15\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6455.246\n",
        "Iteration  1, inertia 3306.953\n",
        "Iteration  2, inertia 3294.150\n",
        "Iteration  3, inertia 3287.016\n",
        "Iteration  4, inertia 3283.105"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3280.206\n",
        "Iteration  6, inertia 3277.649\n",
        "Iteration  7, inertia 3275.314\n",
        "Iteration  8, inertia 3273.816\n",
        "Iteration  9, inertia 3272.719"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3271.792\n",
        "Iteration 11, inertia 3270.814\n",
        "Iteration 12, inertia 3270.039\n",
        "Iteration 13, inertia 3269.696\n",
        "Iteration 14, inertia 3269.384"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3269.025\n",
        "Iteration 16, inertia 3268.540\n",
        "Iteration 17, inertia 3268.051\n",
        "Iteration 18, inertia 3267.514\n",
        "Iteration 19, inertia 3267.302\n",
        "Iteration 20, inertia 3267.222"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3267.177\n",
        "Iteration 22, inertia 3267.135\n",
        "Iteration 23, inertia 3267.080\n",
        "Iteration 24, inertia 3266.960\n",
        "Iteration 25, inertia 3266.678\n",
        "Iteration 26, inertia 3265.716"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 27, inertia 3262.812\n",
        "Iteration 28, inertia 3257.784\n",
        "Iteration 29, inertia 3256.421\n",
        "Iteration 30, inertia 3255.818\n",
        "Iteration 31, inertia 3255.614\n",
        "Iteration 32, inertia 3255.518"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 33, inertia 3255.469\n",
        "Iteration 34, inertia 3255.443\n",
        "Iteration 35, inertia 3255.435\n",
        "Iteration 36, inertia 3255.429\n",
        "Iteration 37, inertia 3255.420\n",
        "Iteration 38, inertia 3255.416"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 39, inertia 3255.409\n",
        "Iteration 40, inertia 3255.399\n",
        "Iteration 41, inertia 3255.378\n",
        "Iteration 42, inertia 3255.365\n",
        "Iteration 43, inertia 3255.355"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 44, inertia 3255.347\n",
        "Iteration 45, inertia 3255.345\n",
        "Iteration 46, inertia 3255.342\n",
        "Iteration 47, inertia 3255.340\n",
        "Converged at iteration 47\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6373.585\n",
        "Iteration  1, inertia 3295.265\n",
        "Iteration  2, inertia 3276.429\n",
        "Iteration  3, inertia 3270.790\n",
        "Iteration  4, inertia 3269.210"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3268.392\n",
        "Iteration  6, inertia 3267.849\n",
        "Iteration  7, inertia 3267.406\n",
        "Iteration  8, inertia 3267.006\n",
        "Iteration  9, inertia 3266.540"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3266.094\n",
        "Iteration 11, inertia 3265.727\n",
        "Iteration 12, inertia 3265.176\n",
        "Iteration 13, inertia 3264.168\n",
        "Iteration 14, inertia 3262.569"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3261.010\n",
        "Iteration 16, inertia 3260.253\n",
        "Iteration 17, inertia 3260.028\n",
        "Iteration 18, inertia 3259.907\n",
        "Iteration 19, inertia 3259.861"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3259.830\n",
        "Iteration 21, inertia 3259.785\n",
        "Iteration 22, inertia 3259.758\n",
        "Iteration 23, inertia 3259.755\n",
        "Converged at iteration 23\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6354.581\n",
        "Iteration  1, inertia 3307.480\n",
        "Iteration  2, inertia 3294.591\n",
        "Iteration  3, inertia 3286.870\n",
        "Iteration  4, inertia 3283.171"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3280.286\n",
        "Iteration  6, inertia 3277.624\n",
        "Iteration  7, inertia 3275.121\n",
        "Iteration  8, inertia 3272.140\n",
        "Iteration  9, inertia 3269.519"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3267.144\n",
        "Iteration 11, inertia 3264.701\n",
        "Iteration 12, inertia 3262.442\n",
        "Iteration 13, inertia 3260.466\n",
        "Iteration 14, inertia 3258.164"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 15, inertia 3257.111\n",
        "Iteration 16, inertia 3256.494\n",
        "Iteration 17, inertia 3255.938\n",
        "Iteration 18, inertia 3255.690\n",
        "Iteration 19, inertia 3255.623"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 20, inertia 3255.598\n",
        "Iteration 21, inertia 3255.591\n",
        "Iteration 22, inertia 3255.587\n",
        "Iteration 23, inertia 3255.583\n",
        "Converged at iteration 23\n",
        "Initialization complete"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  0, inertia 6456.341\n",
        "Iteration  1, inertia 3299.840\n",
        "Iteration  2, inertia 3286.698\n",
        "Iteration  3, inertia 3281.930\n",
        "Iteration  4, inertia 3279.365"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration  5, inertia 3275.912\n",
        "Iteration  6, inertia 3271.700\n",
        "Iteration  7, inertia 3268.976\n",
        "Iteration  8, inertia 3267.243\n",
        "Iteration  9, inertia 3266.373"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 10, inertia 3265.959\n",
        "Iteration 11, inertia 3265.614\n",
        "Iteration 12, inertia 3265.320\n",
        "Iteration 13, inertia 3265.040\n",
        "Iteration 14, inertia 3264.620\n"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "Iteration 15, inertia 3264.257\n",
        "Iteration 16, inertia 3264.017\n",
        "Iteration 17, inertia 3263.875\n",
        "Iteration 18, inertia 3263.794\n",
        "Iteration 19, inertia 3263.725\n",
        "Iteration 20, inertia 3263.691"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 21, inertia 3263.666\n",
        "Iteration 22, inertia 3263.640\n",
        "Iteration 23, inertia 3263.625\n",
        "Iteration 24, inertia 3263.621\n",
        "Iteration 25, inertia 3263.610\n",
        "Iteration 26, inertia 3263.607"
       ]
      },
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "\n",
        "Iteration 27, inertia 3263.604\n",
        "Converged at iteration 27\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 109,
       "text": [
        "[39, 0.40027932557045898]"
       ]
      }
     ],
     "prompt_number": 109
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "40% accuracy using 39 clusters is only marginally better than our model with 5 clusters, we will definately choose the simpler model moving forward. Remember though that these results are still `in sample` error, and are probably better than we can expect on real data. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Solve a real problem"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now we are at the stage where we can recommend similar articles to the user. This could be implemented as part of the serach algorithm, or simply recommended posts to read after the current page.\n",
      "\n",
      "We first need to vectorize the new post before we predict it's label."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "new_post = '''hard drives can fail at any time,\n",
      "                    it is important to always backup your data.'''\n",
      "                    \n",
      "new_post_vec = vec.transform([new_post])\n",
      "new_post_label = km.predict(new_post_vec)[0] # predict the class it belongs to"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 114
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Select all posts with the same cluster label as the new post vector\n",
      "similar_label = (km.labels_ == new_post_label).nonzero()[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 115
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, between the records we know are similar, we build a new list of similarity scores, similar to what we did above in earlier examples."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "similar = []\n",
      "for i in similar_label:\n",
      "    dist = sp.linalg.norm((new_post_vec - vecData[i].toarray()))\n",
      "    similar.append((dist, train_data.target[i], train_data.data[i]))\n",
      "similar = sorted(similar)\n",
      "print(len(similar))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "175\n"
       ]
      }
     ],
     "prompt_number": 116
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Present the most similar posts\n",
      "print similar[0]"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(1.1757159813728066, 2, 'From: gjp@sei.cmu.edu (George Pandelios)\\nSubject:  Help me select a Backup Solution\\n\\n\\nHi Netters!\\n\\nI\\'m looking at purchasing some sort of backup solution.  After you read about\\nmy situation, I\\'d like your opinion.  Here\\'s the scenario:\\n\\n1.  There are two computers in the house.  One is a small 286 (40MB IDE drive).\\n    The other is a 386DX (213 SCSI drive w/ Adaptec 1522 controller).  Both \\n    systems have PC TOOLS and will use Central Point Backup as the backup / \\n    restore program.  Both systems have 3.5\" and 5.25\" floppies.\\n\\n2.  The computers are not networked (nor will they be anytime soon).\\n\\nFrom what I have seen so far, there appear to be at least 4 possible\\nsolutions (I\\'m sure there are others I haven\\'t thought about).  For these \\noptions, I would appreciate hearing from anyone who has tried them or sees \\nany flaws (drive type X won\\'t coexist with device Y, etc.) in my thinking \\n(I don\\'t know very much about these beasts):\\n\\n1.  Put 2.88MB floppy drives (or a combination drive) on each system.\\n    Can someone supply cost and brand information?  What\\'s a good brand?\\n    What do the floppies themselves cost?\\n\\n\\n2.  Put an internal tape backup unit on the 386 using my SCSI adapter, and\\n    continue to back up the 286 with floppies.  Again, can someone recommend a\\n    few manufacturers?  The only brand I remember is Colorado Memories.  Any\\n    happy or unhappy users (I know about the compression controversy)?\\n \\n\\n3.  Connect an external tape backup unit on the 386 using my SCSI adapter, and\\n    (maybe?) connect it to the 286 somehow (any suggestions?)\\n\\n\\n4.  Install a Floptical drive in each machine.  Again, any gotcha\\'s or \\n    recommendations for manufacturers?  \\n\\nI appreciate your help.  You may either post or send me e-mail.  I will\\nsummarize all responses for the net.\\n\\nThanks,\\n\\nGeorge\\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\\n  George J. Pandelios\\t\\t\\t\\tInternet:  gjp@sei.cmu.edu\\n  Software Engineering Institute\\t\\tusenet:\\t   sei!gjp\\n  4500 Fifth Avenue\\t\\t\\t\\tVoice:\\t   (412) 268-7186\\n  Pittsburgh, PA 15213\\t\\t\\t\\tFAX:\\t   (412) 268-5758\\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\\nDisclaimer:  These opinions are my own and do not reflect those of the\\n\\t     Software Engineering Institute, its sponsors, customers, \\n\\t     clients, affiliates, or Carnegie Mellon University.  In fact,\\n\\t     any resemblence of these opinions to any individual, living\\n\\t     or dead, fictional or real, is purely coincidental.  So there.\\n=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=\\n')\n"
       ]
      }
     ],
     "prompt_number": 117
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import HTML\n",
      "\n",
      "\n",
      "def css_styling():\n",
      "    styles = open(\"/users/ryankelly/desktop/custom_notebook.css\", \"r\").read()\n",
      "\n",
      "    return HTML(styles)\n",
      "css_styling()\n"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "\n",
        "<style>\n",
        "body {\n",
        "    font-family: Century Gothic, sans;\n",
        "\n",
        "}\n",
        "\n",
        "\n",
        "div.text_cell_render h1 { /* Main titles bigger, centered */\n",
        "font-size: 2.2em;\n",
        "line-height:1.4em;\n",
        "text-align:center;\n",
        "}\n",
        "\n",
        "/*Input and output cells formatting*/\n",
        "div.prompt.input_prompt, div.prompt.output_prompt {\n",
        "    visibility: hidden;\n",
        "    /*font-family: Consolas;*/\n",
        "    color: #575748;\n",
        "    /*background-color: #CCCCCC;*/\n",
        "    border: 0px;\n",
        "    width: 6.5em;\n",
        "    float:left;\n",
        "}\n",
        "\n",
        "\n",
        "div.output_subarea.output_text.output_stream.output_stdout,div.output_subarea.output_text {\n",
        "    margin-left: 1.5em;\n",
        "    padding-top: 1em;\n",
        "    padding-bottom: 0.5em;\n",
        "    margin-top: 8px; /*This is for getting the box-shadow property of the parent to display properly;*/\n",
        "}\n",
        "\n",
        "div.cell { /* Tunes the space between cells */\n",
        "margin-top:1em;\n",
        "margin-bottom:1em;\n",
        "width:100%;\n",
        "margin-right:auto;\n",
        "overflow-x:hidden;\n",
        "}\n",
        "\n",
        "div.text_cell_render{\n",
        "    overflow-x:hidden;\n",
        "      \n",
        "}\n",
        "\n",
        "\n",
        "div.input{\n",
        "margin-right:1%;\n",
        "}\n",
        "\n",
        "</style>\n",
        " \n",
        "\n",
        "\n",
        "\n"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 122,
       "text": [
        "<IPython.core.display.HTML at 0x11f0b5a10>"
       ]
      }
     ],
     "prompt_number": 122
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def social():\n",
      "    code = \"\"\"\n",
      "    <a style='float:left; margin-right:5px;' href=\"https://twitter.com/share\" class=\"twitter-share-button\" data-text=\"Check this out\" data-via=\"Ryanmdk\">Tweet</a>\n",
      "<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>\n",
      "    <a style='float:left; margin-right:5px;' href=\"https://twitter.com/Ryanmdk\" class=\"twitter-follow-button\" data-show-count=\"false\">Follow @Ryanmdk</a>\n",
      "<script>!function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0],p=/^http:/.test(d.location)?'http':'https';if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src=p+'://platform.twitter.com/widgets.js';fjs.parentNode.insertBefore(js,fjs);}}(document, 'script', 'twitter-wjs');</script>\n",
      "    <a style='float:left; margin-right:5px;'target='_parent' href=\"http://www.reddit.com/submit\" onclick=\"window.location = 'http://www.reddit.com/submit?url=' + encodeURIComponent(window.location); return false\"> <img src=\"http://www.reddit.com/static/spreddit7.gif\" alt=\"submit to reddit\" border=\"0\" /> </a>\n",
      "<script src=\"//platform.linkedin.com/in.js\" type=\"text/javascript\">\n",
      "  lang: en_US\n",
      "</script>\n",
      "<script type=\"IN/Share\"></script>\n",
      "\"\"\"\n",
      "    return HTML(code)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 121
    }
   ],
   "metadata": {}
  }
 ]
}