{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Demonstration for the `u_mass` topic coherence using topic coherence pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import logging\n",
    "import pyLDAvis.gensim\n",
    "import json\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')  # To ignore all warnings that arise here to enhance clarity\n",
    "\n",
    "from gensim.models.coherencemodel import CoherenceModel\n",
    "from gensim.models.ldamodel import LdaModel\n",
    "from gensim.corpora.dictionary import Dictionary\n",
    "from numpy import array"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up logging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.DEBUG)\n",
    "logging.debug(\"test\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up corpus"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As stated in table 2 from [this](http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf) paper, this corpus essentially has two classes of documents. First five are about human-computer interaction and the other four are about graphs. Let's see how our LDA models interpret them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "texts = [['human', 'interface', 'computer'],\n",
    "         ['survey', 'user', 'computer', 'system', 'response', 'time'],\n",
    "         ['eps', 'user', 'interface', 'system'],\n",
    "         ['system', 'human', 'system', 'eps'],\n",
    "         ['user', 'response', 'time'],\n",
    "         ['trees'],\n",
    "         ['graph', 'trees'],\n",
    "         ['graph', 'minors', 'trees'],\n",
    "         ['graph', 'minors', 'survey']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "dictionary = Dictionary(texts)\n",
    "corpus = [dictionary.doc2bow(text) for text in texts]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up two topic models"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll be setting up two different LDA Topic models. A good one and bad one. To build a \"good\" topic model, we'll simply train it using more iterations than the bad one. Therefore the `u_mass` coherence should in theory be better for the good model than the bad one since it would be producing more \"human-interpretable\" topics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2)\n",
    "badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "goodcm = CoherenceModel(model=goodLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View the pipeline parameters for one coherence model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Following are the pipeline parameters for `u_mass` coherence. By pipeline parameters, we mean the functions being used to calculate segmentation, probability estimation, confirmation measure and aggregation as shown in figure 1 in [this](http://svn.aksw.org/papers/2015/WSDM_Topic_Evaluation/public.pdf) paper."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CoherenceModel(segmentation=<function s_one_pre at 0x7fcfdbafe050>, probability estimation=<function p_boolean_document at 0x7fcfdbafe320>, confirmation measure=<function log_conditional_probability at 0x7fcfdbafe488>, aggregation=<function arithmetic_mean at 0x7fcfdbafe410>)\n"
     ]
    }
   ],
   "source": [
    "print goodcm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualize topic models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "pyLDAvis.enable_notebook()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.css\">\n",
       "\n",
       "\n",
       "<div id=\"ldavis_el91731405316990332965249003513\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "\n",
       "var ldavis_el91731405316990332965249003513_data = {\"plot.opts\": {\"xlab\": \"PC1\", \"ylab\": \"PC2\"}, \"topic.order\": [1, 2], \"token.table\": {\"Topic\": [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2], \"Freq\": [0.47294136262400904, 0.47294136262400904, 0.48449916274022453, 0.96899832548044906, 0.69274523632092888, 0.34637261816046444, 0.48101259513089778, 0.48101259513089778, 0.48202920324649212, 0.48202920324649212, 0.92434811792354643, 0.46217405896177322, 0.46427579723220547, 0.46427579723220547, 0.46623142137378998, 0.46623142137378998, 0.29075202433720088, 0.87225607301160257, 0.46461420221174082, 0.46461420221174082, 0.69286648877547163, 0.34643324438773582, 0.70414232577553226, 0.35207116288776613], \"Term\": [\"computer\", \"computer\", \"eps\", \"eps\", \"graph\", \"graph\", \"human\", \"human\", \"interface\", \"interface\", \"minors\", \"minors\", \"response\", \"response\", \"survey\", \"survey\", \"system\", \"system\", \"time\", \"time\", \"trees\", \"trees\", \"user\", \"user\"]}, \"mdsDat\": {\"y\": [-0.0, -0.0], \"cluster\": [1, 1], \"Freq\": [53.784344928540662, 46.215655071459352], \"topics\": [1, 2], \"x\": [-0.042456518473624826, 0.042456518473624826]}, \"R\": 12, \"lambda.step\": 0.01, \"tinfo\": {\"Category\": [\"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\"], \"Term\": [\"system\", \"eps\", \"interface\", \"human\", \"graph\", \"trees\", \"minors\", \"response\", \"time\", \"survey\", \"computer\", \"user\", \"graph\", \"trees\", \"minors\", \"response\", \"time\", \"survey\", \"user\", \"computer\", \"human\", \"interface\", \"eps\", \"system\", \"system\", \"eps\", \"interface\", \"human\", \"computer\", \"user\", \"survey\", \"time\", \"response\", \"minors\", \"trees\", \"graph\"], \"loglift\": [12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.33050000000000002, 0.3281, 0.32019999999999998, 0.25469999999999998, 0.24379999999999999, 0.18970000000000001, 0.073499999999999996, -0.0722, -0.51859999999999995, -0.59209999999999996, -0.79710000000000003, -0.80520000000000003, 0.49690000000000001, 0.49430000000000002, 0.41870000000000002, 0.38590000000000002, 0.077899999999999997, -0.092899999999999996, -0.27860000000000001, -0.38750000000000001, -0.41189999999999999, -0.57830000000000004, -0.60119999999999996, -0.60840000000000005], \"Freq\": [3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.1608945110490985, 2.1553068410454816, 1.6028520433214504, 1.4945266331488254, 1.4771765508450911, 1.3946092327816628, 1.6440935958201255, 1.0580641883485609, 0.66568186774490323, 0.61719131959400952, 0.50022609213443914, 0.8268371534431409, 2.6125198115843697, 1.5637609627537716, 1.4573717841867309, 1.4132657733826028, 1.0563628400926064, 1.1962412722723852, 0.75024835114825816, 0.67514680741563016, 0.65936552741937748, 0.56083494994337668, 0.73125216043642516, 0.72616973008767893], \"Total\": [3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.8870642411367773, 2.8865590014819067, 2.1636869932648271, 2.1538921605682031, 2.1523233582607215, 2.1448575839299209, 2.8403348680925107, 2.1144270284411673, 2.0789476411275061, 2.0745631037807404, 2.0639870548882109, 3.4393569650275104, 3.4393569650275104, 2.0639870548882109, 2.0745631037807404, 2.0789476411275061, 2.1144270284411673, 2.8403348680925107, 2.1448575839299209, 2.1523233582607215, 2.1538921605682031, 2.1636869932648271, 2.8865590014819067, 2.8870642411367773], \"logprob\": [12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, -1.9765999999999999, -1.9792000000000001, -2.2753000000000001, -2.3452999999999999, -2.3570000000000002, -2.4144999999999999, -2.2498999999999998, -2.6907000000000001, -3.1541000000000001, -3.2296999999999998, -3.4398, -2.9373, -1.6351, -2.1484000000000001, -2.2187999999999999, -2.2494999999999998, -2.5406, -2.4163000000000001, -2.8828, -2.9883000000000002, -3.0118999999999998, -3.1738, -2.9083999999999999, -2.9154]}};\n",
       "\n",
       "function LDAvis_load_lib(url, callback){\n",
       "  var s = document.createElement('script');\n",
       "  s.src = url;\n",
       "  s.async = true;\n",
       "  s.onreadystatechange = s.onload = callback;\n",
       "  s.onerror = function(){console.warn(\"failed to load library \" + url);};\n",
       "  document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "}\n",
       "\n",
       "if(typeof(LDAvis) !== \"undefined\"){\n",
       "   // already loaded: just create the visualization\n",
       "   !function(LDAvis){\n",
       "       new LDAvis(\"#\" + \"ldavis_el91731405316990332965249003513\", ldavis_el91731405316990332965249003513_data);\n",
       "   }(LDAvis);\n",
       "}else if(typeof define === \"function\" && define.amd){\n",
       "   // require.js is available: use it to load d3/LDAvis\n",
       "   require.config({paths: {d3: \"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min\"}});\n",
       "   require([\"d3\"], function(d3){\n",
       "      window.d3 = d3;\n",
       "      LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n",
       "        new LDAvis(\"#\" + \"ldavis_el91731405316990332965249003513\", ldavis_el91731405316990332965249003513_data);\n",
       "      });\n",
       "    });\n",
       "}else{\n",
       "    // require.js not available: dynamically load d3 & LDAvis\n",
       "    LDAvis_load_lib(\"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js\", function(){\n",
       "         LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n",
       "                 new LDAvis(\"#\" + \"ldavis_el91731405316990332965249003513\", ldavis_el91731405316990332965249003513_data);\n",
       "            })\n",
       "         });\n",
       "}\n",
       "</script>"
      ],
      "text/plain": [
       "PreparedData(topic_coordinates=            Freq  cluster  topics         x    y\n",
       "topic                                           \n",
       "0      53.784345        1       1 -0.042457 -0.0\n",
       "1      46.215655        1       2  0.042457 -0.0, topic_info=     Category      Freq       Term     Total  loglift  logprob\n",
       "term                                                          \n",
       "2     Default  3.000000     system  3.000000  12.0000  12.0000\n",
       "4     Default  2.000000        eps  2.000000  11.0000  11.0000\n",
       "10    Default  2.000000  interface  2.000000  10.0000  10.0000\n",
       "8     Default  2.000000      human  2.000000   9.0000   9.0000\n",
       "1     Default  2.000000      graph  2.000000   8.0000   8.0000\n",
       "3     Default  2.000000      trees  2.000000   7.0000   7.0000\n",
       "0     Default  2.000000     minors  2.000000   6.0000   6.0000\n",
       "11    Default  2.000000   response  2.000000   5.0000   5.0000\n",
       "9     Default  2.000000       time  2.000000   4.0000   4.0000\n",
       "6     Default  2.000000     survey  2.000000   3.0000   3.0000\n",
       "5     Default  2.000000   computer  2.000000   2.0000   2.0000\n",
       "7     Default  2.000000       user  2.000000   1.0000   1.0000\n",
       "1      Topic1  2.160895      graph  2.887064   0.3305  -1.9766\n",
       "3      Topic1  2.155307      trees  2.886559   0.3281  -1.9792\n",
       "0      Topic1  1.602852     minors  2.163687   0.3202  -2.2753\n",
       "11     Topic1  1.494527   response  2.153892   0.2547  -2.3453\n",
       "9      Topic1  1.477177       time  2.152323   0.2438  -2.3570\n",
       "6      Topic1  1.394609     survey  2.144858   0.1897  -2.4145\n",
       "7      Topic1  1.644094       user  2.840335   0.0735  -2.2499\n",
       "5      Topic1  1.058064   computer  2.114427  -0.0722  -2.6907\n",
       "8      Topic1  0.665682      human  2.078948  -0.5186  -3.1541\n",
       "10     Topic1  0.617191  interface  2.074563  -0.5921  -3.2297\n",
       "4      Topic1  0.500226        eps  2.063987  -0.7971  -3.4398\n",
       "2      Topic1  0.826837     system  3.439357  -0.8052  -2.9373\n",
       "2      Topic2  2.612520     system  3.439357   0.4969  -1.6351\n",
       "4      Topic2  1.563761        eps  2.063987   0.4943  -2.1484\n",
       "10     Topic2  1.457372  interface  2.074563   0.4187  -2.2188\n",
       "8      Topic2  1.413266      human  2.078948   0.3859  -2.2495\n",
       "5      Topic2  1.056363   computer  2.114427   0.0779  -2.5406\n",
       "7      Topic2  1.196241       user  2.840335  -0.0929  -2.4163\n",
       "6      Topic2  0.750248     survey  2.144858  -0.2786  -2.8828\n",
       "9      Topic2  0.675147       time  2.152323  -0.3875  -2.9883\n",
       "11     Topic2  0.659366   response  2.153892  -0.4119  -3.0119\n",
       "0      Topic2  0.560835     minors  2.163687  -0.5783  -3.1738\n",
       "3      Topic2  0.731252      trees  2.886559  -0.6012  -2.9084\n",
       "1      Topic2  0.726170      graph  2.887064  -0.6084  -2.9154, token_table=      Topic      Freq       Term\n",
       "term                            \n",
       "5         1  0.472941   computer\n",
       "5         2  0.472941   computer\n",
       "4         1  0.484499        eps\n",
       "4         2  0.968998        eps\n",
       "1         1  0.692745      graph\n",
       "1         2  0.346373      graph\n",
       "8         1  0.481013      human\n",
       "8         2  0.481013      human\n",
       "10        1  0.482029  interface\n",
       "10        2  0.482029  interface\n",
       "0         1  0.924348     minors\n",
       "0         2  0.462174     minors\n",
       "11        1  0.464276   response\n",
       "11        2  0.464276   response\n",
       "6         1  0.466231     survey\n",
       "6         2  0.466231     survey\n",
       "2         1  0.290752     system\n",
       "2         2  0.872256     system\n",
       "9         1  0.464614       time\n",
       "9         2  0.464614       time\n",
       "3         1  0.692866      trees\n",
       "3         2  0.346433      trees\n",
       "7         1  0.704142       user\n",
       "7         2  0.352071       user, R=12, lambda_step=0.01, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, topic_order=[1, 2])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pyLDAvis.gensim.prepare(goodLdaModel, corpus, dictionary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "\n",
       "<link rel=\"stylesheet\" type=\"text/css\" href=\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.css\">\n",
       "\n",
       "\n",
       "<div id=\"ldavis_el91731405306729498405665857890\"></div>\n",
       "<script type=\"text/javascript\">\n",
       "\n",
       "var ldavis_el91731405306729498405665857890_data = {\"plot.opts\": {\"xlab\": \"PC1\", \"ylab\": \"PC2\"}, \"topic.order\": [1, 2], \"token.table\": {\"Topic\": [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2], \"Freq\": [0.46472238672823357, 0.46472238672823357, 0.46631202139801625, 0.46631202139801625, 0.35461733330960249, 0.35461733330960249, 0.4672477127288393, 0.4672477127288393, 0.47093995786820525, 0.47093995786820525, 0.46931555094472488, 0.46931555094472488, 0.47276809048299434, 0.47276809048299434, 0.46156010461805058, 0.46156010461805058, 0.56351486332730683, 0.56351486332730683, 0.47602455642625319, 0.47602455642625319, 0.36337716699875655, 0.7267543339975131, 0.35640851770205034, 0.35640851770205034], \"Term\": [\"computer\", \"computer\", \"eps\", \"eps\", \"graph\", \"graph\", \"human\", \"human\", \"interface\", \"interface\", \"minors\", \"minors\", \"response\", \"response\", \"survey\", \"survey\", \"system\", \"system\", \"time\", \"time\", \"trees\", \"trees\", \"user\", \"user\"]}, \"mdsDat\": {\"y\": [-0.0, -0.0], \"cluster\": [1, 1], \"Freq\": [54.719704290199445, 45.280295709800555], \"topics\": [1, 2], \"x\": [-0.0037248033652212821, 0.0037248033652212821]}, \"R\": 12, \"lambda.step\": 0.01, \"tinfo\": {\"Category\": [\"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Default\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic1\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\", \"Topic2\"], \"Term\": [\"trees\", \"user\", \"time\", \"graph\", \"response\", \"survey\", \"interface\", \"system\", \"minors\", \"computer\", \"eps\", \"human\", \"survey\", \"computer\", \"eps\", \"human\", \"minors\", \"system\", \"interface\", \"response\", \"graph\", \"user\", \"time\", \"trees\", \"trees\", \"time\", \"user\", \"graph\", \"response\", \"interface\", \"system\", \"minors\", \"human\", \"eps\", \"computer\", \"survey\"], \"loglift\": [12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.16869999999999999, 0.1168, 0.089700000000000002, 0.073300000000000004, 0.036200000000000003, 0.031099999999999999, 0.0061000000000000004, -0.028899999999999999, -0.029999999999999999, -0.077700000000000005, -0.094500000000000001, -0.28870000000000001, 0.26469999999999999, 0.10340000000000001, 0.086499999999999994, 0.035099999999999999, 0.033799999999999997, -0.0074999999999999997, -0.038899999999999997, -0.045600000000000002, -0.096500000000000002, -0.1203, -0.16220000000000001, -0.25119999999999998], \"Freq\": [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 2.0, 2.0, 2.0, 2.0, 1.4034626445413592, 1.3233817046584027, 1.2835363654690168, 1.260209347836581, 1.208987547867141, 2.003470464027493, 1.1690653214085343, 1.1244643407738379, 1.4975031315308851, 1.4205222872759917, 1.0458649038126588, 1.1282461849559366, 1.6237153878068944, 1.0548670573859065, 1.3852467960431054, 1.3224379882872466, 0.99073763696194672, 0.95434761752890607, 1.5456817059629873, 0.92177500195038731, 0.87998304424546636, 0.86095048906664895, 0.82844060594784963, 0.76310242465481481], \"Total\": [2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.166565069196174, 2.1518223106062524, 2.1444868545356659, 2.1401923920820476, 2.1307625498175282, 3.5491521699904802, 2.1234129389374403, 2.1152019777357847, 2.8199411198181314, 2.8057690833190971, 2.1007319611985653, 2.751961572762831, 2.751961572762831, 2.1007319611985653, 2.8057690833190971, 2.8199411198181314, 2.1152019777357847, 2.1234129389374403, 3.5491521699904802, 2.1307625498175282, 2.1401923920820476, 2.1444868545356659, 2.1518223106062524, 2.166565069196174], \"logprob\": [12.0, 11.0, 10.0, 9.0, 8.0, 7.0, 6.0, 5.0, 4.0, 3.0, 2.0, 1.0, -2.4253999999999998, -2.4842, -2.5146999999999999, -2.5331000000000001, -2.5746000000000002, -2.0695000000000001, -2.6080999999999999, -2.6469999999999998, -2.3605999999999998, -2.4133, -2.7195, -2.6436999999999999, -2.0903, -2.5215999999999998, -2.2490999999999999, -2.2955000000000001, -2.5842999999999998, -2.6217000000000001, -2.1395, -2.6564999999999999, -2.7029000000000001, -2.7246999999999999, -2.7631999999999999, -2.8454000000000002]}};\n",
       "\n",
       "function LDAvis_load_lib(url, callback){\n",
       "  var s = document.createElement('script');\n",
       "  s.src = url;\n",
       "  s.async = true;\n",
       "  s.onreadystatechange = s.onload = callback;\n",
       "  s.onerror = function(){console.warn(\"failed to load library \" + url);};\n",
       "  document.getElementsByTagName(\"head\")[0].appendChild(s);\n",
       "}\n",
       "\n",
       "if(typeof(LDAvis) !== \"undefined\"){\n",
       "   // already loaded: just create the visualization\n",
       "   !function(LDAvis){\n",
       "       new LDAvis(\"#\" + \"ldavis_el91731405306729498405665857890\", ldavis_el91731405306729498405665857890_data);\n",
       "   }(LDAvis);\n",
       "}else if(typeof define === \"function\" && define.amd){\n",
       "   // require.js is available: use it to load d3/LDAvis\n",
       "   require.config({paths: {d3: \"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min\"}});\n",
       "   require([\"d3\"], function(d3){\n",
       "      window.d3 = d3;\n",
       "      LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n",
       "        new LDAvis(\"#\" + \"ldavis_el91731405306729498405665857890\", ldavis_el91731405306729498405665857890_data);\n",
       "      });\n",
       "    });\n",
       "}else{\n",
       "    // require.js not available: dynamically load d3 & LDAvis\n",
       "    LDAvis_load_lib(\"https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js\", function(){\n",
       "         LDAvis_load_lib(\"https://cdn.rawgit.com/bmabey/pyLDAvis/files/ldavis.v1.0.0.js\", function(){\n",
       "                 new LDAvis(\"#\" + \"ldavis_el91731405306729498405665857890\", ldavis_el91731405306729498405665857890_data);\n",
       "            })\n",
       "         });\n",
       "}\n",
       "</script>"
      ],
      "text/plain": [
       "PreparedData(topic_coordinates=            Freq  cluster  topics         x    y\n",
       "topic                                           \n",
       "0      54.719704        1       1 -0.003725 -0.0\n",
       "1      45.280296        1       2  0.003725 -0.0, topic_info=     Category      Freq       Term     Total  loglift  logprob\n",
       "term                                                          \n",
       "3     Default  2.000000      trees  2.000000  12.0000  12.0000\n",
       "7     Default  2.000000       user  2.000000  11.0000  11.0000\n",
       "9     Default  2.000000       time  2.000000  10.0000  10.0000\n",
       "1     Default  2.000000      graph  2.000000   9.0000   9.0000\n",
       "11    Default  2.000000   response  2.000000   8.0000   8.0000\n",
       "6     Default  2.000000     survey  2.000000   7.0000   7.0000\n",
       "10    Default  2.000000  interface  2.000000   6.0000   6.0000\n",
       "2     Default  3.000000     system  3.000000   5.0000   5.0000\n",
       "0     Default  2.000000     minors  2.000000   4.0000   4.0000\n",
       "5     Default  2.000000   computer  2.000000   3.0000   3.0000\n",
       "4     Default  2.000000        eps  2.000000   2.0000   2.0000\n",
       "8     Default  2.000000      human  2.000000   1.0000   1.0000\n",
       "6      Topic1  1.403463     survey  2.166565   0.1687  -2.4254\n",
       "5      Topic1  1.323382   computer  2.151822   0.1168  -2.4842\n",
       "4      Topic1  1.283536        eps  2.144487   0.0897  -2.5147\n",
       "8      Topic1  1.260209      human  2.140192   0.0733  -2.5331\n",
       "0      Topic1  1.208988     minors  2.130763   0.0362  -2.5746\n",
       "2      Topic1  2.003470     system  3.549152   0.0311  -2.0695\n",
       "10     Topic1  1.169065  interface  2.123413   0.0061  -2.6081\n",
       "11     Topic1  1.124464   response  2.115202  -0.0289  -2.6470\n",
       "1      Topic1  1.497503      graph  2.819941  -0.0300  -2.3606\n",
       "7      Topic1  1.420522       user  2.805769  -0.0777  -2.4133\n",
       "9      Topic1  1.045865       time  2.100732  -0.0945  -2.7195\n",
       "3      Topic1  1.128246      trees  2.751962  -0.2887  -2.6437\n",
       "3      Topic2  1.623715      trees  2.751962   0.2647  -2.0903\n",
       "9      Topic2  1.054867       time  2.100732   0.1034  -2.5216\n",
       "7      Topic2  1.385247       user  2.805769   0.0865  -2.2491\n",
       "1      Topic2  1.322438      graph  2.819941   0.0351  -2.2955\n",
       "11     Topic2  0.990738   response  2.115202   0.0338  -2.5843\n",
       "10     Topic2  0.954348  interface  2.123413  -0.0075  -2.6217\n",
       "2      Topic2  1.545682     system  3.549152  -0.0389  -2.1395\n",
       "0      Topic2  0.921775     minors  2.130763  -0.0456  -2.6565\n",
       "8      Topic2  0.879983      human  2.140192  -0.0965  -2.7029\n",
       "4      Topic2  0.860950        eps  2.144487  -0.1203  -2.7247\n",
       "5      Topic2  0.828441   computer  2.151822  -0.1622  -2.7632\n",
       "6      Topic2  0.763102     survey  2.166565  -0.2512  -2.8454, token_table=      Topic      Freq       Term\n",
       "term                            \n",
       "5         1  0.464722   computer\n",
       "5         2  0.464722   computer\n",
       "4         1  0.466312        eps\n",
       "4         2  0.466312        eps\n",
       "1         1  0.354617      graph\n",
       "1         2  0.354617      graph\n",
       "8         1  0.467248      human\n",
       "8         2  0.467248      human\n",
       "10        1  0.470940  interface\n",
       "10        2  0.470940  interface\n",
       "0         1  0.469316     minors\n",
       "0         2  0.469316     minors\n",
       "11        1  0.472768   response\n",
       "11        2  0.472768   response\n",
       "6         1  0.461560     survey\n",
       "6         2  0.461560     survey\n",
       "2         1  0.563515     system\n",
       "2         2  0.563515     system\n",
       "9         1  0.476025       time\n",
       "9         2  0.476025       time\n",
       "3         1  0.363377      trees\n",
       "3         2  0.726754      trees\n",
       "7         1  0.356409       user\n",
       "7         2  0.356409       user, R=12, lambda_step=0.01, plot_opts={'xlab': 'PC1', 'ylab': 'PC2'}, topic_order=[1, 2])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pyLDAvis.gensim.prepare(badLdaModel, corpus, dictionary)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-13.8048438862\n"
     ]
    }
   ],
   "source": [
    "print goodcm.get_coherence()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-15.5467907012\n"
     ]
    }
   ],
   "source": [
    "print badcm.get_coherence()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Hence as we can see, the `u_mass` coherence for the good LDA model is much more (better) than that for the bad LDA model. This is because, simply, the good LDA model usually comes up with better topics that are more human interpretable.\n",
    "For the first topic, the goodLdaModel rightly puts emphasis on \"graph\", \"trees\" and \"user\" with reference to the second class of documents.\n",
    "For the second topic, it puts emphasis on words such as \"system\", \"eps\", \"interface\" and \"human\" which signify human-computer interaction.\n",
    "The badLdaModel however fails to decipher between these two topics and comes up with topics which are mostly both graph based but are not clear to a human. The `u_mass` topic coherence captures this wonderfully by giving the interpretability of these topics and number as we can see above. Hence this coherence measure can be used to compare difference topic models based on their human-interpretability."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}