{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Topic Modeling of Twitter Followers\n", "\n", "This notebook is associated to [this article on my blog](http://alexperrier.github.io/jekyll/update/2015/09/04/topic-modeling-of-twitter-followers.html).\n", "\n", "We use LDAvis to visualize several LDA modeling of the followers of the [@alexip](https://twitter.com/alexip) account.\n", "\n", "The different LDAs were trained with the following parameters\n", "\n", "* 10 topics, 10 passes, alpha = 0.001\n", "* 50 topics, 50 passes, alpha = 0.01\n", "* 40 topics, 100 passes, alpha = 0.001\n", "\n", "Extraction of the data from twitter was done via [this python 2 script](https://github.com/alexperrier/datatalks/tree/master/twitter)\n", "And the dictionary and corpus were created via [this one](https://github.com/alexperrier/datatalks/tree/master/twitter)\n", "\n", "To see the best results, set lambda around [0.5, 0.6]. Lowering Lambda gives more importance to words that are discriminatory for the active topic, words that best define the topic. \n", "\n", "You can skip the 2 first models and jump to the last model which is the best (40 topics)\n", "\n", "A working version of this notebook is available on [nbviewer](http://nbviewer.ipython.org/github/alexperrier/datatalks/blob/master/twitter/LDAvis.ipynb)\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Load the corpus and dictionary\n", "from gensim import corpora, models\n", "import pyLDAvis.gensim\n", "\n", "corpus = corpora.MmCorpus('data/alexip_followers_py27.mm')\n", "dictionary = corpora.Dictionary.load('data/alexip_followers_py27.dict')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "
\n", "" ], "text/plain": [ "