{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial  Outline\n",
    "\n",
    "- [Introduction](#Introduction)\n",
    "- [Preprerequisites](#Preprerequisites)\n",
    "- [How does it work?](#How-does-it-work?)\n",
    "- [Authentication](#Authentication)\n",
    " - [Authentication keys](#Authentication-keys)\n",
    "- [MongoDB Collection](#MongoDB-Collection)\n",
    "- [Starting a Stream](#Starting-a-Stream)\n",
    " - [Stream Listener](#Stream-Listener)\n",
    " - [Connect to a streaming API](#Connect-to-a-streaming-API)\n",
    "- [Data Access and Analysis](#Data-Access-and-Analysis)\n",
    " - [Load results to a DataFrame](#Load-results-to-a-DataFrame)\n",
    "- [Visualization](#Visualization)\n",
    "\n",
    "# Introduction\n",
    "\n",
    "Twitter provides two types of API to access their data:\n",
    "\n",
    "- RESTful API: Used to get data about existing data objects like statuses \"tweets\", user, ... etc\n",
    "- Streaming API: Used to get live statuses \"tweets\" as they are sent\n",
    "\n",
    "The reason why you would like to use streaming API:\n",
    "\n",
    "- Capture large amount of data because RESTful API has limited access to older data\n",
    "- Real-time analysis like monitoring social discussion about a live event\n",
    "- In house archive like archiving social discussion about your brand(s)\n",
    "- AI response system for a twitter account like automated reply and filing questions or providing answers\n",
    "\n",
    "# Preprerequisites\n",
    "\n",
    "- Python 2 or 3\n",
    "- Jupyter /w IPyWidgets\n",
    "- Pandas\n",
    "- Numpy\n",
    "- Matplotlib\n",
    "- MogoDB Installtion\n",
    "- Pymongo\n",
    "- Scikit-learn\n",
    "- Tweepy\n",
    "- Twitter account\n",
    "\n",
    "\n",
    "# How does it work?\n",
    "\n",
    "Twitter streaming API can provide data through a streaming HTTP response. This is very similar to downloading a file where you read a number of bytes and store it to disk and repeat until the end of file. The only difference is this stream is endless. The only things that could stop this stream are:\n",
    "\n",
    "- If you closed your connection to the streaming response\n",
    "- If your connection speed is not capable of receiving data and the servers buffer is filling up\n",
    "\n",
    "This means that this process will be using the thread that it was launched from until it is stopped. In production, you should always start this in a different thread or process to make sure your software doesn't freeze until you stop the stream.\n",
    "\n",
    "# Authentication\n",
    "\n",
    "You will need four numbers from twitter development to start using streaming API. First, let's import some important libraries for dealing with twitter API, data analysis, data storage ... etc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tweepy\n",
    "import matplotlib.pyplot as plt\n",
    "import pymongo\n",
    "import ipywidgets as wgt\n",
    "from IPython.display import display\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "import re\n",
    "from datetime import datetime\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Authentication keys\n",
    "\n",
    "1. Go to https://apps.twitter.com/\n",
    "2. Create an App (if you don't have one yet)\n",
    "3. Grant read-only access to your account\n",
    "4. Copy the four keys and paste them here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "api_key = \"yP0yoCitoUNgD63ebMerGyJaE\" # <---- Add your API Key\n",
    "api_secret = \"kLO5YUtlth3cd4lOHLy8nlLHW5npVQgUfO4FhsyCn6wCMIz5E6\" # <---- Add your API Secret\n",
    "access_token = \"259862037-iMXNjfL8JBApm4LVcdfwc3FcMm7Xta4TKg5cd44K\" # <---- Add your access token\n",
    "access_token_secret = \"UIgh08dtmavzlvlWWukIXwN5HDIQD0wNwyn5sPzhrynBf\" # <---- Add your access token secret\n",
    "\n",
    "auth = tweepy.OAuthHandler(api_key, api_secret)\n",
    "auth.set_access_token(access_token, access_token_secret)\n",
    "\n",
    "api = tweepy.API(auth)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MongoDB Collection\n",
    "\n",
    "Connect to MongoDB and create/get a collection."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "2251"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col = pymongo.MongoClient()[\"tweets\"][\"StreamingTutorial\"]\n",
    "col.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Starting a Stream\n",
    "\n",
    "We need a listener which should extend `tweepy.StreamListener` class. There is a number of methods that you can extend to instruct the listener class to perform functionality. Some of the important methods are:\n",
    "\n",
    "- `on_status(self, status)`: This will pass a status \"tweet\" object when a tweet is received\n",
    "- `on_data(self, raw_data)`: Called when any any data is received and the raw data will be passed\n",
    "- `on_error(self, status_code)`: Called when you get a response with code other than 200 (ok)\n",
    "\n",
    "## Stream Listener"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "class MyStreamListener(tweepy.StreamListener):\n",
    "    \n",
    "    counter = 0\n",
    "    \n",
    "    def __init__(self, max_tweets=1000, *args, **kwargs):\n",
    "        self.max_tweets = max_tweets\n",
    "        self.counter = 0\n",
    "        super().__init__(*args, **kwargs)\n",
    "    \n",
    "    def on_connect(self):\n",
    "        self.counter = 0\n",
    "        self.start_time = datetime.now()\n",
    "    \n",
    "    def on_status(self, status):\n",
    "        # Increment counter\n",
    "        self.counter += 1\n",
    "        \n",
    "        # Store tweet to MongoDB\n",
    "        col.insert_one(status._json)\n",
    "        \n",
    "        \n",
    "        if self.counter % 1 == 0:\n",
    "            value = int(100.00 * self.counter / self.max_tweets)\n",
    "            mining_time = datetime.now() - self.start_time\n",
    "            progress_bar.value = value\n",
    "            html_value = \"\"\"<span class=\"label label-primary\">Tweets/Sec: %.1f</span>\"\"\" % (self.counter / max([1,mining_time.seconds]))\n",
    "            html_value += \"\"\" <span class=\"label label-success\">Progress: %.1f%%</span>\"\"\" % (self.counter / self.max_tweets * 100.0)\n",
    "            html_value += \"\"\" <span class=\"label label-info\">ETA: %.1f Sec</span>\"\"\" % ((self.max_tweets - self.counter) / (self.counter / max([1,mining_time.seconds])))\n",
    "            wgt_status.value = html_value\n",
    "            #print(\"%s/%s\" % (self.counter, self.max_tweets))\n",
    "            if self.counter >= self.max_tweets:\n",
    "                myStream.disconnect()\n",
    "                print(\"Finished\")\n",
    "                print(\"Total Mining Time: %s\" % (mining_time))\n",
    "                print(\"Tweets/Sec: %.1f\" % (self.max_tweets / mining_time.seconds))\n",
    "                progress_bar.value = 0\n",
    "                \n",
    "    \n",
    "myStreamListener = MyStreamListener(max_tweets=100)\n",
    "myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Connect to a streaming API\n",
    "\n",
    "There are two methods to connect to a stream:\n",
    "\n",
    "- `filter(follow=None, track=None, async=False, locations=None, stall_warnings=False, languages=None, encoding='utf8', filter_level=None)`\n",
    "- `firehose(count=None, async=False)`\n",
    "\n",
    "Firehose captures everything. You should make sure that you have connection speed that can handle the stream and you have the storage capacity that can store these tweets at the same rate. We cannot really use firehose for this tutorial but we'll be using `filter`.\n",
    "\n",
    "You have to specify one of two things to filter:\n",
    "\n",
    "- `follow`: A list of user ID to follow. This will stream all their tweets, retweets, and others retweeting their tweets. This doesn't include mentions and manual retweets where the user doesn't press the retweet button.\n",
    "- `track`: A string or list of string to be used for filtering. If you use multiple words separated by spaces, this will be used for AND operator. If you use multiple words in a string separated by commas or pass a list of words this will be treated as OR operator.\n",
    "\n",
    "**Note**: `track` is case insensitive.\n",
    "\n",
    "### What to track?\n",
    "I want to collect all tweets that contains any of these words:\n",
    "\n",
    "- Jupyter\n",
    "- Python\n",
    "- Data Mining\n",
    "- Machine Learning\n",
    "- Data Science\n",
    "- Big Data\n",
    "- IoT\n",
    "- #R\n",
    "\n",
    "This could be done with a string or a list. It is easier to to it with a list to make your code clear to read."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Finished\n",
      "Total Mining Time: 0:01:21.477351\n",
      "Tweets/Sec: 1.2\n",
      "Tweets collected: 100\n",
      "Total tweets in collection: 2351\n"
     ]
    }
   ],
   "source": [
    "keywords = [\"Jupyter\",\n",
    "            \"Python\",\n",
    "            \"Data Mining\",\n",
    "            \"Machine Learning\",\n",
    "            \"Data Science\",\n",
    "            \"Big Data\",\n",
    "            \"DataMining\",\n",
    "            \"MachineLearning\",\n",
    "            \"DataScience\",\n",
    "            \"BigData\",\n",
    "            \"IoT\",\n",
    "            \"#R\",\n",
    "           ]\n",
    "\n",
    "# Visualize a progress bar to track progress\n",
    "progress_bar = wgt.IntProgress(value=0)\n",
    "display(progress_bar)\n",
    "wgt_status = wgt.HTML(value=\"\"\"<span class=\"label label-primary\">Tweets/Sec: 0.0</span>\"\"\")\n",
    "display(wgt_status)\n",
    "\n",
    "# Start a filter with an error counter of 20\n",
    "for error_counter in range(20):\n",
    "    try:\n",
    "        myStream.filter(track=keywords)\n",
    "        print(\"Tweets collected: %s\" % myStream.listener.counter)\n",
    "        print(\"Total tweets in collection: %s\" % col.count())\n",
    "        break\n",
    "    except:\n",
    "        print(\"ERROR# %s\" % (error_counter + 1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Access and Analysis\n",
    "\n",
    "Now that we have stored all these tweets in a MongoDB collection, let's take a look at one of these tweets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'_id': ObjectId('56937d2e105f1970314720e2'),\n",
       " 'contributors': None,\n",
       " 'coordinates': None,\n",
       " 'created_at': 'Mon Jan 11 10:00:14 +0000 2016',\n",
       " 'entities': {'hashtags': [{'indices': [22, 27], 'text': 'Rã®æ³•å‰‡'}],\n",
       "  'symbols': [],\n",
       "  'urls': [],\n",
       "  'user_mentions': []},\n",
       " 'favorite_count': 0,\n",
       " 'favorited': False,\n",
       " 'filter_level': 'low',\n",
       " 'geo': None,\n",
       " 'id': 686487772970942466,\n",
       " 'id_str': '686487772970942466',\n",
       " 'in_reply_to_screen_name': None,\n",
       " 'in_reply_to_status_id': None,\n",
       " 'in_reply_to_status_id_str': None,\n",
       " 'in_reply_to_user_id': None,\n",
       " 'in_reply_to_user_id_str': None,\n",
       " 'is_quote_status': False,\n",
       " 'lang': 'ja',\n",
       " 'place': None,\n",
       " 'retweet_count': 0,\n",
       " 'retweeted': False,\n",
       " 'source': '<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>',\n",
       " 'text': 'ä½“åŠ›è½ã¡ã¦ãã¦ãŠã°ã•ã‚“ã¿ãŸã„ã«ãªã£ã¦ããŸã€‚\\n#Rã®æ³•å‰‡',\n",
       " 'timestamp_ms': '1452506414059',\n",
       " 'truncated': False,\n",
       " 'user': {'contributors_enabled': False,\n",
       "  'created_at': 'Tue Aug 18 16:19:16 +0000 2015',\n",
       "  'default_profile': True,\n",
       "  'default_profile_image': False,\n",
       "  'description': 'â˜® é–¢ã‚¸ãƒ£ãƒ‹âˆž ï¼† å±±ç”°æ¶¼ä»‹ ï¼† Justin Bieber ï¼† Benjamin Lasnier ï¼† Selena Gomez â˜®',\n",
       "  'favourites_count': 1121,\n",
       "  'follow_request_sent': None,\n",
       "  'followers_count': 121,\n",
       "  'following': None,\n",
       "  'friends_count': 92,\n",
       "  'geo_enabled': True,\n",
       "  'id': 3318871652,\n",
       "  'id_str': '3318871652',\n",
       "  'is_translator': False,\n",
       "  'lang': 'en',\n",
       "  'listed_count': 0,\n",
       "  'location': 'The land of dreams',\n",
       "  'name': 'rena',\n",
       "  'notifications': None,\n",
       "  'profile_background_color': 'C0DEED',\n",
       "  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',\n",
       "  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',\n",
       "  'profile_background_tile': False,\n",
       "  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/3318871652/1452436374',\n",
       "  'profile_image_url': 'http://pbs.twimg.com/profile_images/683964013558931456/Q1rx1s5b_normal.jpg',\n",
       "  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/683964013558931456/Q1rx1s5b_normal.jpg',\n",
       "  'profile_link_color': '0084B4',\n",
       "  'profile_sidebar_border_color': 'C0DEED',\n",
       "  'profile_sidebar_fill_color': 'DDEEF6',\n",
       "  'profile_text_color': '333333',\n",
       "  'profile_use_background_image': True,\n",
       "  'protected': False,\n",
       "  'screen_name': 'Q2HpiJwCX1huBwf',\n",
       "  'statuses_count': 497,\n",
       "  'time_zone': None,\n",
       "  'url': None,\n",
       "  'utc_offset': None,\n",
       "  'verified': False}}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "col.find_one()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load results to a DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>created_at</th>\n",
       "      <th>source</th>\n",
       "      <th>text</th>\n",
       "      <th>user</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Mon Jan 11 10:00:14 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>ä½“åŠ›è½ã¡ã¦ãã¦ãŠã°ã•ã‚“ã¿ãŸã„ã«ãªã£ã¦ããŸã€‚\\n#Rã®æ³•å‰‡</td>\n",
       "      <td>@Q2HpiJwCX1huBwf</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mon Jan 11 10:09:26 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>çš†ã«ãŠã°ã•ã‚“ã¨è¨€ã‚ã‚Œã¦ã†ã‚Œã—ãŒã£ã¦ã‚‹ #Rã®æ³•å‰‡</td>\n",
       "      <td>@Tamutamu1017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mon Jan 11 10:00:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://trendkeyword.blog.jp/\" rel=\"no...</td>\n",
       "      <td>ã€R.I.Pã€‘æ€¥ä¸Šæ˜‡ãƒ¯ãƒ¼ãƒ‰ã€ŒR.I.Pã€ã®ã¾ã¨ã‚é€Ÿå ± https://t.co/yi1yfC...</td>\n",
       "      <td>@pickword_matome</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mon Jan 11 10:00:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>#Rã®æ³•å‰‡ \\nã©ã‚Œã‚‚ãŠã°ã•ã‚“è‡­ã„ã‘ã‚Œã©ã‚„ã£ã±ã‚Šé»„è‰²ãŒä¸€ç•ªã ãªã</td>\n",
       "      <td>@kakinotise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Mon Jan 11 10:00:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://bufferapp.com\" rel=\"nofollow\"&gt;...</td>\n",
       "      <td>The New Best Thing HP ATP - Vertica Big Data S...</td>\n",
       "      <td>@DataCentreNews1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Mon Jan 11 10:00:11 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>IoT Now: lâ€™Internet of Things Ã¨ qui, ora https...</td>\n",
       "      <td>@datamanager_it</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mon Jan 11 10:00:11 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://trendkeyword.doorblog.jp/\" rel...</td>\n",
       "      <td>ä»Šè©±é¡Œã®ã€ŒR.I.Pã€ã¾ã¨ã‚ https://t.co/VOc5cwK5hg #R.I.P ...</td>\n",
       "      <td>@buzz_wadai</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Mon Jan 11 10:00:11 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitterfeed.com\" rel=\"nofollow...</td>\n",
       "      <td>#oldham #stockport VIDEO: Snake thief hides py...</td>\n",
       "      <td>@Labour_is_PIE</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Mon Jan 11 10:00:11 +0000 2016</td>\n",
       "      <td>&lt;a href=\"https://about.twitter.com/products/tw...</td>\n",
       "      <td>Las #startup pioneras de #machinelearning ofre...</td>\n",
       "      <td>@techreview_es</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Mon Jan 11 10:00:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.linkedin.com/\" rel=\"nofoll...</td>\n",
       "      <td>Lets talk about how to harness the power of ma...</td>\n",
       "      <td>@jansmit1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Mon Jan 11 10:00:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://catalystfive.com\" rel=\"nofollo...</td>\n",
       "      <td>Business Intelligence and Big Data Consulting ...</td>\n",
       "      <td>@Catalyst5Jobs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Mon Jan 11 10:00:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/NewsICT\" rel=\"nofo...</td>\n",
       "      <td>[æƒ…å ±é€šä¿¡]2016å¹´å°åŒ—å›½éš›ã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼è¦‹æœ¬å¸‚ãŒæ–°ã—ã„ä½ç½®ã¥ã‘ã¨æ–°ã—ã„å±•ç¤ºã§è£…ã„æ–°ãŸã«ï¼...</td>\n",
       "      <td>@NewsICT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Mon Jan 11 10:02:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>#bonplan Parties de Laser Quest entre amis Ã  2...</td>\n",
       "      <td>@Bons_Plans_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Mon Jan 11 10:02:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>Parties de Laser Quest entre amis Ã  22.00â‚¬ au ...</td>\n",
       "      <td>@keepmymindfree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Mon Jan 11 10:02:10 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @jose_garde: Why a Simple Data Analytics St...</td>\n",
       "      <td>@martingeldish</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Mon Jan 11 10:02:11 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>èŠ¸èƒ½äººã®äººãŸãã•ã‚“æ‰‹ãŸãŸãç¬‘ã„ã—ã¦ã‚‹ã‹ã‚‰ãŠã°ã•ã‚“ãŸãã•ã‚“ã«ãªã£ã¡ã‚ƒã†ã‚ˆwww\\n\\n#Rã®æ³•å‰‡</td>\n",
       "      <td>@YK__0704</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>DÃ©couvrez le jeu Pure Mission entre amis Ã  22....</td>\n",
       "      <td>@keepmymindfree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>#bonplan Parties de bowling pour 4 Ã  #POINCY :...</td>\n",
       "      <td>@Bons_Plans_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>20 min de vol dÃ©couverte ULM pour 1 ou 2 Ã  79....</td>\n",
       "      <td>@CrationSiteWeb</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @hortonworks: Paris is the city of love but...</td>\n",
       "      <td>@bigdataparis</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @hynek: So #emacs / @spacemacs nerds: is th...</td>\n",
       "      <td>@fdiesch</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Mon Jan 11 10:02:12 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.hootsuite.com\" rel=\"nofoll...</td>\n",
       "      <td>.@QonexCyber founder member of @IoT_SF is orga...</td>\n",
       "      <td>@QonexCyber</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Mon Jan 11 10:02:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>Parties de bowling pour 4 Ã  #POINCY : 35.00â‚¬ a...</td>\n",
       "      <td>@keepmymindfree</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Mon Jan 11 10:02:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>#bonplan 30 sÃ©ances de Squash Ã  #LISSES : 39.9...</td>\n",
       "      <td>@Bons_Plans_</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Mon Jan 11 10:02:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>App: ExZeus 2 â€“ free to play   https://t.co/ZT...</td>\n",
       "      <td>@UniversalConsol</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Mon Jan 11 10:02:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://dlvr.it\" rel=\"nofollow\"&gt;dlvr.i...</td>\n",
       "      <td>#discount Parties de bowling pour 4 Ã  #POINCY ...</td>\n",
       "      <td>@PromosPromos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Mon Jan 11 10:02:13 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.google.com/\" rel=\"nofollow...</td>\n",
       "      <td>spiegel.de :  Tier macht Sachen: Python beiÃŸt ...</td>\n",
       "      <td>@arminfischer_de</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Mon Jan 11 10:09:02 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>è‹¥ã„ã£ã¦ã„ã„ã­ãˆâ€¦ã£ã¦ã‚ˆãè¨€ã†w #Rã®æ³•å‰‡</td>\n",
       "      <td>@naco75x</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Mon Jan 11 10:09:17 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>æœ€è¿‘ã®è‹¥ã„å­ã¯æœ€è¿‘ä½¿ã£ãŸ\\n#Rã®æ³•å‰‡</td>\n",
       "      <td>@K1224West</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Mon Jan 11 10:09:19 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>#Rã®æ³•å‰‡\\nè‡ªåˆ†ã‚‚è‹¥è€…ãªã®ã«ç¬‘</td>\n",
       "      <td>@V6ZRRT7Q22BZ1cF</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2221</th>\n",
       "      <td>Mon Jan 11 10:29:40 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://201512291327-7430af.bitnamiapp...</td>\n",
       "      <td>https://t.co/BTAAq6HuuJ - pcgamer - #machinele...</td>\n",
       "      <td>@vinceyue</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2222</th>\n",
       "      <td>Mon Jan 11 10:29:40 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @jose_garde: The Big Data Analytics Softwar...</td>\n",
       "      <td>@LJ_Blanchard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2223</th>\n",
       "      <td>Mon Jan 11 10:29:41 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.linkedin.com/\" rel=\"nofoll...</td>\n",
       "      <td>What is data mining? Do you have to be a mathe...</td>\n",
       "      <td>@ednuwan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2224</th>\n",
       "      <td>Mon Jan 11 10:29:41 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @AgroKnow: A #BigData platform for the futu...</td>\n",
       "      <td>@albertspijkers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2225</th>\n",
       "      <td>Mon Jan 11 10:29:42 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.linkedin.com/\" rel=\"nofoll...</td>\n",
       "      <td>Big Data: Is It A Tsunami, The New Oil, Or Sim...</td>\n",
       "      <td>@Summerlovegrove</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2226</th>\n",
       "      <td>Mon Jan 11 10:29:43 +0000 2016</td>\n",
       "      <td>&lt;a href=\"https://www.jobfindly.com/php-jobs.ht...</td>\n",
       "      <td>Sr Software Engineer C Php Python Linux Jobs i...</td>\n",
       "      <td>@jobfindlyphpdev</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2227</th>\n",
       "      <td>Mon Jan 11 10:29:43 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>RT @bigdataparis: #Bigdata bang : un marchÃ© en...</td>\n",
       "      <td>@LifeIsWeb</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2228</th>\n",
       "      <td>Mon Jan 11 10:29:43 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>Learn from the best professors in India .. cou...</td>\n",
       "      <td>@ashwaniapex</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2229</th>\n",
       "      <td>Mon Jan 11 10:29:45 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://getsmoup.com\" rel=\"nofollow\"&gt;S...</td>\n",
       "      <td>RT @rebrandtoday: #startup or #rebrand -Buy Cr...</td>\n",
       "      <td>@SmartData_Fr</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2230</th>\n",
       "      <td>Mon Jan 11 10:29:45 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.ajaymatharu.com/\" rel=\"nof...</td>\n",
       "      <td>Â¿CÃ³mo serÃ¡ el futuro del Big Data? https://t.c...</td>\n",
       "      <td>@eduardogarsanch</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2231</th>\n",
       "      <td>Mon Jan 11 10:29:46 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @jose_garde: 3 Ways to Transform Your Compa...</td>\n",
       "      <td>@LJ_Blanchard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2232</th>\n",
       "      <td>Mon Jan 11 10:29:46 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://publicize.wp.com/\" rel=\"nofoll...</td>\n",
       "      <td>Woman Tries To Kiss Python, Gets Bitten In The...</td>\n",
       "      <td>@NAIJA_VIBEZ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2233</th>\n",
       "      <td>Mon Jan 11 10:29:46 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.itknowingness.com\" rel=\"no...</td>\n",
       "      <td>RT @jose_garde: 3 Ways to Transform Your Compa...</td>\n",
       "      <td>@itknowingness</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2234</th>\n",
       "      <td>Mon Jan 11 10:29:48 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>RT @ErikaPauwels: Building a #BigData platform...</td>\n",
       "      <td>@impulsater</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2235</th>\n",
       "      <td>Mon Jan 11 10:29:49 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>En 2016 j'aimerais moins rÃ¢ler. #rÃ©solution. S...</td>\n",
       "      <td>@ce1ce2makarenko</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2236</th>\n",
       "      <td>Mon Jan 11 10:29:51 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @jose_garde: How Marketing Can Be Better Au...</td>\n",
       "      <td>@LJ_Blanchard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2237</th>\n",
       "      <td>Mon Jan 11 10:29:51 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>Hey @Pontifex accueille ces rÃ©fugiÃ©es dans ta ...</td>\n",
       "      <td>@Atmosfive</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2238</th>\n",
       "      <td>Mon Jan 11 10:29:51 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @Ubixr: .@GroupeLaPoste choisit le toulousa...</td>\n",
       "      <td>@The_Nextwork</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2239</th>\n",
       "      <td>Mon Jan 11 10:29:56 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitterfeed.com\" rel=\"nofollow...</td>\n",
       "      <td>Thanks @hackplayers Blade: un webshell en Pyth...</td>\n",
       "      <td>@Navarmedia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2240</th>\n",
       "      <td>Mon Jan 11 10:29:56 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @jose_garde: Which big data personality are...</td>\n",
       "      <td>@LJ_Blanchard</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2241</th>\n",
       "      <td>Mon Jan 11 10:29:56 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://ifttt.com\" rel=\"nofollow\"&gt;IFTT...</td>\n",
       "      <td>Cybersecurity Forum tackles challenges with th...</td>\n",
       "      <td>@wulfsec</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2242</th>\n",
       "      <td>Mon Jan 11 10:29:56 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://publicize.wp.com/\" rel=\"nofoll...</td>\n",
       "      <td>Woman Tries To Kiss Python, Gets Bitten In The...</td>\n",
       "      <td>@Lola2Records</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2243</th>\n",
       "      <td>Mon Jan 11 10:29:56 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @TeamAnodot: Join David Drai, CEO of Anodot...</td>\n",
       "      <td>@iottechexpo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2244</th>\n",
       "      <td>Mon Jan 11 10:29:57 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://getsmoup.com\" rel=\"nofollow\"&gt;A...</td>\n",
       "      <td>RT @rebrandtoday: #startup or #rebrand -Buy Cr...</td>\n",
       "      <td>@AI__news</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2245</th>\n",
       "      <td>Mon Jan 11 10:29:57 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.twitter.com\" rel=\"nofollow...</td>\n",
       "      <td>RT @Matthis__VERNON: \"@Fred_Poquet Sans #confi...</td>\n",
       "      <td>@sibueta</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2246</th>\n",
       "      <td>Mon Jan 11 10:29:57 +0000 2016</td>\n",
       "      <td>&lt;a href=\"https://social.zoho.com\" rel=\"nofollo...</td>\n",
       "      <td>The right place for #BigData is #Cloud #Storag...</td>\n",
       "      <td>@TyroneSystems</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2247</th>\n",
       "      <td>Mon Jan 11 10:29:59 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @PEBlanrue: Il a toujours le mot pour rire,...</td>\n",
       "      <td>@lesroisduring</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2248</th>\n",
       "      <td>Mon Jan 11 10:29:58 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>RT @DigitalAgendaEU: â‚¬15 million for a #IoT so...</td>\n",
       "      <td>@ImproveNPA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2249</th>\n",
       "      <td>Mon Jan 11 10:29:59 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>Neat IoT innovation https://t.co/atARX0m5Bj</td>\n",
       "      <td>@sherwinnovator</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2250</th>\n",
       "      <td>Mon Jan 11 10:30:00 +0000 2016</td>\n",
       "      <td>&lt;a href=\"http://www.hubspot.com/\" rel=\"nofollo...</td>\n",
       "      <td>Check out our #Mobile App Predicitions for 201...</td>\n",
       "      <td>@B60uk</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2251 rows Ã— 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          created_at  \\\n",
       "0     Mon Jan 11 10:00:14 +0000 2016   \n",
       "1     Mon Jan 11 10:09:26 +0000 2016   \n",
       "2     Mon Jan 11 10:00:10 +0000 2016   \n",
       "3     Mon Jan 11 10:00:10 +0000 2016   \n",
       "4     Mon Jan 11 10:00:10 +0000 2016   \n",
       "5     Mon Jan 11 10:00:11 +0000 2016   \n",
       "6     Mon Jan 11 10:00:11 +0000 2016   \n",
       "7     Mon Jan 11 10:00:11 +0000 2016   \n",
       "8     Mon Jan 11 10:00:11 +0000 2016   \n",
       "9     Mon Jan 11 10:00:12 +0000 2016   \n",
       "10    Mon Jan 11 10:00:13 +0000 2016   \n",
       "11    Mon Jan 11 10:00:13 +0000 2016   \n",
       "12    Mon Jan 11 10:02:10 +0000 2016   \n",
       "13    Mon Jan 11 10:02:10 +0000 2016   \n",
       "14    Mon Jan 11 10:02:10 +0000 2016   \n",
       "15    Mon Jan 11 10:02:11 +0000 2016   \n",
       "16    Mon Jan 11 10:02:12 +0000 2016   \n",
       "17    Mon Jan 11 10:02:12 +0000 2016   \n",
       "18    Mon Jan 11 10:02:12 +0000 2016   \n",
       "19    Mon Jan 11 10:02:12 +0000 2016   \n",
       "20    Mon Jan 11 10:02:12 +0000 2016   \n",
       "21    Mon Jan 11 10:02:12 +0000 2016   \n",
       "22    Mon Jan 11 10:02:13 +0000 2016   \n",
       "23    Mon Jan 11 10:02:13 +0000 2016   \n",
       "24    Mon Jan 11 10:02:13 +0000 2016   \n",
       "25    Mon Jan 11 10:02:13 +0000 2016   \n",
       "26    Mon Jan 11 10:02:13 +0000 2016   \n",
       "27    Mon Jan 11 10:09:02 +0000 2016   \n",
       "28    Mon Jan 11 10:09:17 +0000 2016   \n",
       "29    Mon Jan 11 10:09:19 +0000 2016   \n",
       "...                              ...   \n",
       "2221  Mon Jan 11 10:29:40 +0000 2016   \n",
       "2222  Mon Jan 11 10:29:40 +0000 2016   \n",
       "2223  Mon Jan 11 10:29:41 +0000 2016   \n",
       "2224  Mon Jan 11 10:29:41 +0000 2016   \n",
       "2225  Mon Jan 11 10:29:42 +0000 2016   \n",
       "2226  Mon Jan 11 10:29:43 +0000 2016   \n",
       "2227  Mon Jan 11 10:29:43 +0000 2016   \n",
       "2228  Mon Jan 11 10:29:43 +0000 2016   \n",
       "2229  Mon Jan 11 10:29:45 +0000 2016   \n",
       "2230  Mon Jan 11 10:29:45 +0000 2016   \n",
       "2231  Mon Jan 11 10:29:46 +0000 2016   \n",
       "2232  Mon Jan 11 10:29:46 +0000 2016   \n",
       "2233  Mon Jan 11 10:29:46 +0000 2016   \n",
       "2234  Mon Jan 11 10:29:48 +0000 2016   \n",
       "2235  Mon Jan 11 10:29:49 +0000 2016   \n",
       "2236  Mon Jan 11 10:29:51 +0000 2016   \n",
       "2237  Mon Jan 11 10:29:51 +0000 2016   \n",
       "2238  Mon Jan 11 10:29:51 +0000 2016   \n",
       "2239  Mon Jan 11 10:29:56 +0000 2016   \n",
       "2240  Mon Jan 11 10:29:56 +0000 2016   \n",
       "2241  Mon Jan 11 10:29:56 +0000 2016   \n",
       "2242  Mon Jan 11 10:29:56 +0000 2016   \n",
       "2243  Mon Jan 11 10:29:56 +0000 2016   \n",
       "2244  Mon Jan 11 10:29:57 +0000 2016   \n",
       "2245  Mon Jan 11 10:29:57 +0000 2016   \n",
       "2246  Mon Jan 11 10:29:57 +0000 2016   \n",
       "2247  Mon Jan 11 10:29:59 +0000 2016   \n",
       "2248  Mon Jan 11 10:29:58 +0000 2016   \n",
       "2249  Mon Jan 11 10:29:59 +0000 2016   \n",
       "2250  Mon Jan 11 10:30:00 +0000 2016   \n",
       "\n",
       "                                                 source  \\\n",
       "0     <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "1     <a href=\"http://twitter.com/download/android\" ...   \n",
       "2     <a href=\"http://trendkeyword.blog.jp/\" rel=\"no...   \n",
       "3     <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "4     <a href=\"http://bufferapp.com\" rel=\"nofollow\">...   \n",
       "5     <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "6     <a href=\"http://trendkeyword.doorblog.jp/\" rel...   \n",
       "7     <a href=\"http://twitterfeed.com\" rel=\"nofollow...   \n",
       "8     <a href=\"https://about.twitter.com/products/tw...   \n",
       "9     <a href=\"http://www.linkedin.com/\" rel=\"nofoll...   \n",
       "10    <a href=\"http://catalystfive.com\" rel=\"nofollo...   \n",
       "11    <a href=\"http://twitter.com/NewsICT\" rel=\"nofo...   \n",
       "12    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "13    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "14    <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "15    <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "16    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "17    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "18    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "19    <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "20    <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "21    <a href=\"http://www.hootsuite.com\" rel=\"nofoll...   \n",
       "22    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "23    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "24    <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "25    <a href=\"http://dlvr.it\" rel=\"nofollow\">dlvr.i...   \n",
       "26    <a href=\"http://www.google.com/\" rel=\"nofollow...   \n",
       "27    <a href=\"http://twitter.com/download/android\" ...   \n",
       "28    <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "29    <a href=\"http://twitter.com/download/android\" ...   \n",
       "...                                                 ...   \n",
       "2221  <a href=\"http://201512291327-7430af.bitnamiapp...   \n",
       "2222  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2223  <a href=\"http://www.linkedin.com/\" rel=\"nofoll...   \n",
       "2224  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2225  <a href=\"http://www.linkedin.com/\" rel=\"nofoll...   \n",
       "2226  <a href=\"https://www.jobfindly.com/php-jobs.ht...   \n",
       "2227  <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "2228  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2229  <a href=\"http://getsmoup.com\" rel=\"nofollow\">S...   \n",
       "2230  <a href=\"http://www.ajaymatharu.com/\" rel=\"nof...   \n",
       "2231  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2232  <a href=\"http://publicize.wp.com/\" rel=\"nofoll...   \n",
       "2233  <a href=\"http://www.itknowingness.com\" rel=\"no...   \n",
       "2234  <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "2235  <a href=\"http://twitter.com/download/android\" ...   \n",
       "2236  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2237  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2238  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2239  <a href=\"http://twitterfeed.com\" rel=\"nofollow...   \n",
       "2240  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2241  <a href=\"http://ifttt.com\" rel=\"nofollow\">IFTT...   \n",
       "2242  <a href=\"http://publicize.wp.com/\" rel=\"nofoll...   \n",
       "2243  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2244  <a href=\"http://getsmoup.com\" rel=\"nofollow\">A...   \n",
       "2245  <a href=\"http://www.twitter.com\" rel=\"nofollow...   \n",
       "2246  <a href=\"https://social.zoho.com\" rel=\"nofollo...   \n",
       "2247  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2248  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2249  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "2250  <a href=\"http://www.hubspot.com/\" rel=\"nofollo...   \n",
       "\n",
       "                                                   text              user  \n",
       "0                          ä½“åŠ›è½ã¡ã¦ãã¦ãŠã°ã•ã‚“ã¿ãŸã„ã«ãªã£ã¦ããŸã€‚\\n#Rã®æ³•å‰‡  @Q2HpiJwCX1huBwf  \n",
       "1                              çš†ã«ãŠã°ã•ã‚“ã¨è¨€ã‚ã‚Œã¦ã†ã‚Œã—ãŒã£ã¦ã‚‹ #Rã®æ³•å‰‡     @Tamutamu1017  \n",
       "2     ã€R.I.Pã€‘æ€¥ä¸Šæ˜‡ãƒ¯ãƒ¼ãƒ‰ã€ŒR.I.Pã€ã®ã¾ã¨ã‚é€Ÿå ± https://t.co/yi1yfC...  @pickword_matome  \n",
       "3                      #Rã®æ³•å‰‡ \\nã©ã‚Œã‚‚ãŠã°ã•ã‚“è‡­ã„ã‘ã‚Œã©ã‚„ã£ã±ã‚Šé»„è‰²ãŒä¸€ç•ªã ãªã       @kakinotise  \n",
       "4     The New Best Thing HP ATP - Vertica Big Data S...  @DataCentreNews1  \n",
       "5     IoT Now: lâ€™Internet of Things Ã¨ qui, ora https...   @datamanager_it  \n",
       "6     ä»Šè©±é¡Œã®ã€ŒR.I.Pã€ã¾ã¨ã‚ https://t.co/VOc5cwK5hg #R.I.P ...       @buzz_wadai  \n",
       "7     #oldham #stockport VIDEO: Snake thief hides py...    @Labour_is_PIE  \n",
       "8     Las #startup pioneras de #machinelearning ofre...    @techreview_es  \n",
       "9     Lets talk about how to harness the power of ma...         @jansmit1  \n",
       "10    Business Intelligence and Big Data Consulting ...    @Catalyst5Jobs  \n",
       "11    [æƒ…å ±é€šä¿¡]2016å¹´å°åŒ—å›½éš›ã‚³ãƒ³ãƒ”ãƒ¥ãƒ¼ã‚¿ãƒ¼è¦‹æœ¬å¸‚ãŒæ–°ã—ã„ä½ç½®ã¥ã‘ã¨æ–°ã—ã„å±•ç¤ºã§è£…ã„æ–°ãŸã«ï¼...          @NewsICT  \n",
       "12    #bonplan Parties de Laser Quest entre amis Ã  2...      @Bons_Plans_  \n",
       "13    Parties de Laser Quest entre amis Ã  22.00â‚¬ au ...   @keepmymindfree  \n",
       "14    RT @jose_garde: Why a Simple Data Analytics St...    @martingeldish  \n",
       "15      èŠ¸èƒ½äººã®äººãŸãã•ã‚“æ‰‹ãŸãŸãç¬‘ã„ã—ã¦ã‚‹ã‹ã‚‰ãŠã°ã•ã‚“ãŸãã•ã‚“ã«ãªã£ã¡ã‚ƒã†ã‚ˆwww\\n\\n#Rã®æ³•å‰‡         @YK__0704  \n",
       "16    DÃ©couvrez le jeu Pure Mission entre amis Ã  22....   @keepmymindfree  \n",
       "17    #bonplan Parties de bowling pour 4 Ã  #POINCY :...      @Bons_Plans_  \n",
       "18    20 min de vol dÃ©couverte ULM pour 1 ou 2 Ã  79....   @CrationSiteWeb  \n",
       "19    RT @hortonworks: Paris is the city of love but...     @bigdataparis  \n",
       "20    RT @hynek: So #emacs / @spacemacs nerds: is th...          @fdiesch  \n",
       "21    .@QonexCyber founder member of @IoT_SF is orga...       @QonexCyber  \n",
       "22    Parties de bowling pour 4 Ã  #POINCY : 35.00â‚¬ a...   @keepmymindfree  \n",
       "23    #bonplan 30 sÃ©ances de Squash Ã  #LISSES : 39.9...      @Bons_Plans_  \n",
       "24    App: ExZeus 2 â€“ free to play   https://t.co/ZT...  @UniversalConsol  \n",
       "25    #discount Parties de bowling pour 4 Ã  #POINCY ...     @PromosPromos  \n",
       "26    spiegel.de :  Tier macht Sachen: Python beiÃŸt ...  @arminfischer_de  \n",
       "27                               è‹¥ã„ã£ã¦ã„ã„ã­ãˆâ€¦ã£ã¦ã‚ˆãè¨€ã†w #Rã®æ³•å‰‡          @naco75x  \n",
       "28                                  æœ€è¿‘ã®è‹¥ã„å­ã¯æœ€è¿‘ä½¿ã£ãŸ\\n#Rã®æ³•å‰‡        @K1224West  \n",
       "29                                     #Rã®æ³•å‰‡\\nè‡ªåˆ†ã‚‚è‹¥è€…ãªã®ã«ç¬‘  @V6ZRRT7Q22BZ1cF  \n",
       "...                                                 ...               ...  \n",
       "2221  https://t.co/BTAAq6HuuJ - pcgamer - #machinele...         @vinceyue  \n",
       "2222  RT @jose_garde: The Big Data Analytics Softwar...     @LJ_Blanchard  \n",
       "2223  What is data mining? Do you have to be a mathe...          @ednuwan  \n",
       "2224  RT @AgroKnow: A #BigData platform for the futu...   @albertspijkers  \n",
       "2225  Big Data: Is It A Tsunami, The New Oil, Or Sim...  @Summerlovegrove  \n",
       "2226  Sr Software Engineer C Php Python Linux Jobs i...  @jobfindlyphpdev  \n",
       "2227  RT @bigdataparis: #Bigdata bang : un marchÃ© en...        @LifeIsWeb  \n",
       "2228  Learn from the best professors in India .. cou...      @ashwaniapex  \n",
       "2229  RT @rebrandtoday: #startup or #rebrand -Buy Cr...     @SmartData_Fr  \n",
       "2230  Â¿CÃ³mo serÃ¡ el futuro del Big Data? https://t.c...  @eduardogarsanch  \n",
       "2231  RT @jose_garde: 3 Ways to Transform Your Compa...     @LJ_Blanchard  \n",
       "2232  Woman Tries To Kiss Python, Gets Bitten In The...      @NAIJA_VIBEZ  \n",
       "2233  RT @jose_garde: 3 Ways to Transform Your Compa...    @itknowingness  \n",
       "2234  RT @ErikaPauwels: Building a #BigData platform...       @impulsater  \n",
       "2235  En 2016 j'aimerais moins rÃ¢ler. #rÃ©solution. S...  @ce1ce2makarenko  \n",
       "2236  RT @jose_garde: How Marketing Can Be Better Au...     @LJ_Blanchard  \n",
       "2237  Hey @Pontifex accueille ces rÃ©fugiÃ©es dans ta ...        @Atmosfive  \n",
       "2238  RT @Ubixr: .@GroupeLaPoste choisit le toulousa...     @The_Nextwork  \n",
       "2239  Thanks @hackplayers Blade: un webshell en Pyth...       @Navarmedia  \n",
       "2240  RT @jose_garde: Which big data personality are...     @LJ_Blanchard  \n",
       "2241  Cybersecurity Forum tackles challenges with th...          @wulfsec  \n",
       "2242  Woman Tries To Kiss Python, Gets Bitten In The...     @Lola2Records  \n",
       "2243  RT @TeamAnodot: Join David Drai, CEO of Anodot...      @iottechexpo  \n",
       "2244  RT @rebrandtoday: #startup or #rebrand -Buy Cr...         @AI__news  \n",
       "2245  RT @Matthis__VERNON: \"@Fred_Poquet Sans #confi...          @sibueta  \n",
       "2246  The right place for #BigData is #Cloud #Storag...    @TyroneSystems  \n",
       "2247  RT @PEBlanrue: Il a toujours le mot pour rire,...    @lesroisduring  \n",
       "2248  RT @DigitalAgendaEU: â‚¬15 million for a #IoT so...       @ImproveNPA  \n",
       "2249        Neat IoT innovation https://t.co/atARX0m5Bj   @sherwinnovator  \n",
       "2250  Check out our #Mobile App Predicitions for 201...            @B60uk  \n",
       "\n",
       "[2251 rows x 4 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dataset = [{\"created_at\": item[\"created_at\"],\n",
    "            \"text\": item[\"text\"],\n",
    "            \"user\": \"@%s\" % item[\"user\"][\"screen_name\"],\n",
    "            \"source\": item[\"source\"],\n",
    "           } for item in col.find()]\n",
    "\n",
    "dataset = pd.DataFrame(dataset)\n",
    "dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Checking the highest used words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>word</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>https</td>\n",
       "      <td>1986</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>co</td>\n",
       "      <td>1907</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>rt</td>\n",
       "      <td>804</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>de</td>\n",
       "      <td>550</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>rã®æ³•å‰‡</td>\n",
       "      <td>408</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>iot</td>\n",
       "      <td>374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>the</td>\n",
       "      <td>358</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>bigdata</td>\n",
       "      <td>293</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>00</td>\n",
       "      <td>275</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>data</td>\n",
       "      <td>250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>in</td>\n",
       "      <td>234</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>python</td>\n",
       "      <td>219</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>to</td>\n",
       "      <td>212</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>au</td>\n",
       "      <td>199</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>of</td>\n",
       "      <td>188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>lieu</td>\n",
       "      <td>168</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>rÃ©duction</td>\n",
       "      <td>166</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>big</td>\n",
       "      <td>157</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>on</td>\n",
       "      <td>143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>is</td>\n",
       "      <td>142</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>and</td>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>for</td>\n",
       "      <td>136</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>analytics</td>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>le</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>via</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>you</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>thingsexpo</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>by</td>\n",
       "      <td>85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>2016</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>snake</td>\n",
       "      <td>80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>en</td>\n",
       "      <td>76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>bowie</td>\n",
       "      <td>75</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>la</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>thief</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>video</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>m2m</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>jose_garde</td>\n",
       "      <td>68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>19</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>david</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>with</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>how</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>it</td>\n",
       "      <td>60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>will</td>\n",
       "      <td>55</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>un</td>\n",
       "      <td>54</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>amp</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45</th>\n",
       "      <td>des</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46</th>\n",
       "      <td>rÃ©paration</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47</th>\n",
       "      <td>new</td>\n",
       "      <td>52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48</th>\n",
       "      <td>39</td>\n",
       "      <td>52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>at</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          word  count\n",
       "0        https   1986\n",
       "1           co   1907\n",
       "2           rt    804\n",
       "3           de    550\n",
       "4         rã®æ³•å‰‡    408\n",
       "5          iot    374\n",
       "6          the    358\n",
       "7      bigdata    293\n",
       "8           00    275\n",
       "9         data    250\n",
       "10          in    234\n",
       "11      python    219\n",
       "12          to    212\n",
       "13          au    199\n",
       "14          of    188\n",
       "15        lieu    168\n",
       "16   rÃ©duction    166\n",
       "17         big    157\n",
       "18          on    143\n",
       "19          is    142\n",
       "20         and    140\n",
       "21         for    136\n",
       "22   analytics    107\n",
       "23          le     89\n",
       "24         via     86\n",
       "25         you     86\n",
       "26  thingsexpo     86\n",
       "27          by     85\n",
       "28        2016     84\n",
       "29       snake     80\n",
       "30          en     76\n",
       "31       bowie     75\n",
       "32          la     74\n",
       "33       thief     74\n",
       "34       video     73\n",
       "35         m2m     70\n",
       "36  jose_garde     68\n",
       "37          19     67\n",
       "38       david     66\n",
       "39        with     63\n",
       "40         how     61\n",
       "41          it     60\n",
       "42        will     55\n",
       "43          un     54\n",
       "44         amp     53\n",
       "45         des     53\n",
       "46  rÃ©paration     53\n",
       "47         new     52\n",
       "48          39     52\n",
       "49          at     51"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cv = CountVectorizer()\n",
    "count_matrix = cv.fit_transform(dataset.text)\n",
    "\n",
    "word_count = pd.DataFrame(cv.get_feature_names(), columns=[\"word\"])\n",
    "word_count[\"count\"] = count_matrix.sum(axis=0).tolist()[0]\n",
    "word_count = word_count.sort_values(\"count\", ascending=False).reset_index(drop=True)\n",
    "word_count[:50]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_source_name(x):\n",
    "    value = re.findall(pattern=\"<[^>]+>([^<]+)</a>\", string=x)\n",
    "    if len(value) > 0:\n",
    "        return value[0]\n",
    "    else:\n",
    "        return \"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Facebook                25\n",
       "TweetDeck               37\n",
       "RoundTeam               41\n",
       "Hootsuite               46\n",
       "twitterfeed             81\n",
       "IFTTT                  134\n",
       "dlvr.it                200\n",
       "Twitter Web Client     388\n",
       "Twitter for Android    392\n",
       "Twitter for iPhone     515\n",
       "Name: source, dtype: int64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAegAAAD7CAYAAAChZQeNAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xm8HEW9///Xm7AlAUGEqyIgyOJNlCVEEEFwEEX0yiYo\niAq4gQuLV+C6gJroVVD44VVUVNw1AVEQQZDVHIPs2VlEjBq/ICCIoEHAQPL+/dF1ks4wZ8nkhEzm\nvJ+Pxzymu7qqumpO4DNVXdMt20RERERnWW1lNyAiIiKeLgE6IiKiAyVAR0REdKAE6IiIiA6UAB0R\nEdGBEqAjIiI60OoruwGxapCU3+NFRLTBttoplxF0DJrtrn196lOfWultSN/Sv/Sv+17LIwE6IiKi\nAyVAR0REdCAt7xA8hgdJ9qSV3YoVp+cOaIxd2a1YMbq5b5D+reoG7N9hq3aMkoTbvAadAB2D0u0B\nOiI61DAO0JnijoiI6EAJ0BERER0oAToiIqIDLXOAlvQcSTPL6z5J95TtGZJa3vhE0tGS3lG2j5T0\n/NqxD0ka2X4XQNJGkm6SNF3Sbm3WMVHSq8t2j6Q7Jc2S9BtJ25T0eZI2WJ62LmObJpV23Crp272f\nr6S3SZotaY6k6yRt10f5V5fP5FZJ35M0oqQfJOk2SVN7+yNpS0nnPVN9i4iI/i1zgLb9kO1xtscB\nXwfOLPs72n6qjzLfsP3DsnsEsHHt8PHAqGVpg6Tmdu8FzLE93vZ17dRh+1O2f9W7Cxxmewfg+8Dp\ntfS2Lva36Ue2/9P2tsBI4D0l/Y/AHra3Az4DfLO5YOnf94BDSvk/U332AMcALwO+ARxW0j4DnLyC\n+hEREctoKKa4R0iaBiBpe0mLJG1S9udKGilpgqQTJB1EFRgmlVH3cVTBeoqka0qZvSVdX0Z+50sa\nXdLnSTpN0nTg4N6TS9oB+DywfxnFry3prWV0eauk02p5H5V0hqRZwC71TpQR5kEt+nctsFVt/9jS\ntjmSXlzKbiDpojKqvUHStiV9gqTvSJoi6Q+Sjq2d7+1l1D9T0tdbfOnA9i9ru7cAm5T0G2z/o6Tf\n1Jve5DnAAttzy/7VQG//FgFrA6OBBZJ2B+6z/YcW9URExEowFAF6EbCWpHWB3akCyR6SXgg8YPtx\nqpGnbV8ATKManY6z/WXgXqBhey9JG1KN4vayPR6YDny4nMfA38oo+fzek9ueBXwSOM/2jsAGwGnA\nnsAOwE6S9i/ZRwE32t7B9vVN/XB59eodKe8LzKmlP1jadjZwYkmbCEy3vT3wceAHtfzbAHsDOwOf\nkjRC0hjgLcCuZSZiEfC2vj5gSWsAbwd+2eLwu4HLWqT/DVhd0viyfzCwadk+lSpg/xdwHnAK1Qg6\nIiI6xFA9LOMGYDeqAH0qsA9VgJvaR/6+pol3AcYC10sCWBOoB9If91Nfb507AVNsPwTVdVxgD+Dn\nwELggoG7g6hG+Y8DfwKOrR27sLzPAN5Utnfr3bY9pVynX5cq4F9q+0ngIUkPAM+jmpIfD0wr/RwJ\n3N9Pe74G/Lp5+l7SnsC7yvmXYtuSDgW+KGkt4MrSf2xfTTWTgaTDgUuB/5R0AvAwcHz5YrWUCbVP\nrjGmu2+eEBHRjp6eHnp6eoakrqEK0FOpguBmVIHwo1TB6Rd95O/vl+dX2T6sj2P/GkR9zdeJVTv+\nhAd3Z5bea9AzWhz7d3lfyNKfX19fOhbUtutlvm/74wM1RNKngOfYfm9T+nbAOcA+th9uVdb2jVR/\nFyTtDWzdVMcoquvSr6P6Wx0IvJlqNP+t5vomtLoAEBERizUaDRqNxuL9iRMntl3XUP3M6lqqKdjf\nlwD4d+ANwG9qeXoD2HzgWbX0+v5NwG6StgSQNFrSUkGlD/XgeAvwqjKKHQEcCvx6GfvTXOdArqVM\nUUtqUE2Dz++jDgPXAAdL2qiU2UDSZk9rgPQequnxw5rSN6Mayb+9do356R1YUv9awP9QLeqrOwn4\nUlnc17uS3rXtiIhYSYYiQNv2n8t275T2tcDDtYVMsGQU+z3g670LuqhWIF8u6RrbDwJHAudKmk01\nvf3iwbSht37b91GN4KcAs4Bpti9pasOg+jVAWv2a9QRgfGnz51iyWrr5ujaljb+luu57ZSlzJdXU\nd7Ozgf8AbiiLyU4p6Z8Eng2cXdJv7i0g6VJJvXWdJOkOYDZwse2eWr6NgZ1sX1ySzqL6cnMUMLlF\nWyIi4hmUe3HHoCj34o6IlSH34o6IiIhOkgAdERHRgRKgIyIiOlACdERERAfKIrEYFEmD/Al5RET0\nyiKxiIiILpMAHRER0YESoCMiIjrQUN2LO4aDyc/ko7AjhqlV/MYcMXQygo6IiOhACdAREREdKAE6\nIiKiAyVAR0REdKB+A3R5pvLM8rpP0j1le4aklgvMJB0t6R1l+0hJz68d+5Ck5XrWsKSNJN0kabqk\n3ZajntUlPSjp1DbLz5O0wTLk31jST/o41iNpfIv0YyTNlbSofi5JJ9b+LrdKekrS+v2c+8uS5tf2\nD5J0m6SpvfVK2lLSeYPtT0RErFj9BmjbD9keZ3sc8HXgzLK/o+2n+ijzDds/LLtHABvXDh8PjFqW\nBkpqbuNewBzb421f12YdAK8FpgMHLUt7ap621FJFy8z2vbbf3E9drZZu/oaqv39eKrN9Ru3v8jGg\nx/YjrSqW9DJg/ab6jwFeBnwDOKykfQY4uY/2RUTEM2xZp7hHSJoGIGn7MrLbpOzPlTRS0gRJJ0g6\niCoITCojveOogvUUSdeUMntLur6Mhs+XNLqkz5N0mqTpwMG9J5e0A/B5YP8yil9b0lslzSkjydNq\neR+VdIakWcAuLfpyKHA28EdJr6iVm1f6ML3U++KS/hxJV5aR5zmASvrmkn4n6fvArcCmkk4v7Zkj\n6S21fLeW7ZGSzpN0h6QLgZG99dXZnmX7z83pTQ4Dzm11QNII4AvA/zTVvwhYGxgNLJC0O3Cf7T8M\ncK6IiHiGLGuAXgSsJWldYHfgFmAPSS8EHrD9OGU0aPsCYBpwWBntfRm4F2jY3kvShlQjtr1sj6ca\nzX64nMfA38oo+fzek9ueBXwSOM/2jsAGwGnAnsAOwE6S9i/ZRwE32t7B9vX1TkhaG3g18EvgfOCt\ntcMGHixtOhs4saR/Cphq+6XAz4DNamW2Ar5aju0EbA9sB7wGOF3Sc5s+x/cDj9oeW+odT+sRdL8k\njQJeB1zQR5ZjgJ/bvr8p/VTgauC/gPOAU6hG0BER0SHauVHJDcBuVAH6VGAfqtHZ1D7y93V3i12A\nscD1ZVZ4TaAeSH/cT329de4ETLH9EICkScAewM+BhfQduN5INS28QNJFwARJx9eeBnFheZ8BvKls\n7w4cCGD7MkkP1+r7s+2by/ZuwORS1wOSfg3sTDW6plbXl0pdt0qa00c7B7Iv8JtW09uSNqaafWg0\nT7vbvppqdgNJhwOXAv8p6QTgYeD48mVrKRNqn2ZjDDTGttnqiIgu1dPTQ09Pz5DU1U6AnkoVBDej\nCoQfpRr9/aKP/P2NDK+yfVgfx/41iPrM0l8AVDv+RD+PX3orsJukP5X9Daiu9V5d9v9d3hey9GfU\n15eN5rY252vVjqG4Ldeh9DG9TTWjsBUwt+yPknSX7W0WN6AagR9BNQr/BdUXkDcDbwO+1VzhhHav\n1kdEDBONRoNGo7F4f+LEiW3X1c7PrK4F3g78vgTAvwNvoFrQ1Ks3+MwHnlVLr+/fRBUktwSQNFrS\n1oM4fz2w3QK8qlwfHkEVsH7db2HpWcArgU1tb2F7C6qp4Lf2V47qi8lhpY7XA8/uI9+1wCGSVpO0\nEdWXmZub8tTreinVdPhAlgroktZjyWzB09i+zPbza318rB6ci5OAL5UFf72r613bjoiIlWRZA7Rr\ni5Z6p7SvBR62/Y96vvL+PeDrvQu6gG8Cl0u6xvaDwJHAuZJmU01vv3gwbeit3/Z9VCP4KcAsYJrt\nS5ra0OwA4BrbT9bSLgbeKGnNvs4FTKS63n4b1Ujzz035KG36GTAHmA1cA5xk+4GmfGcD60i6o9Q7\nrVVDJR0n6W7gBcAcSd9s6scVzVPRki6V9LwW1bkp38bATrYvLklnUX3hOQqY3Ko9ERHxzFHfs8AR\nS0iyJ63sVkQMA3lYRleRhO22LmnmTmIREREdKAE6IiKiAyVAR0REdKAE6IiIiA6URWIxKJL6+Vl5\nRES0kkViERERXSYBOiIiogMlQEdERHSgBOiIiIgO1M7DMmK4mjwUz/eIiKfJ3cOihYygIyIiOlAC\ndERERAdKgI6IiOhAgwrQ5XnLM8vrPkn3lO0Zklpex5Z0tKR3lO0jJT2/duxDktp+5rCk9SX9rbb/\nCkmLyiMUkbSepIf6Kd+QdElfx2v51pB0mqS7JE2XdL2kfcqxeZI2KNvXLUdfjqh/Nk3H3izpdkkL\nJY2vpR9W+3vMLMef9kxpSadL+q2k2ZIuLM+QRtJuJe0WSVuVtPUlXdFuPyIiYmgNKkDbfsj2ONvj\ngK8DZ5b9HW0/1UeZb9j+Ydk9Ati4dvh4YNSyNFTS4rbafgS4T9KYkrQrMAPYrezvAty0LPX34TPA\nc4GX2B5P9QzmdXubUWvPbi3KDtaRLP3Z1N1K9ezpqU3nm1z7e7wD+KPtOS3KX1navj1wF/Cxkv5h\n4PXAh4D3lbRTgM8uRz8iImIItTvFPULSNABJ25fR6yZlf66kkZImSDpB0kHAy4BJZbR3HFVAmiLp\nmlJm7zI6nS7pfEmjS/q8MoKdDhzc1IbrqQIzwCuA/6vt7wpcJ2m1Moq8uYwYj6qVf5akX0i6U9LZ\nkpZaoixpFPAe4FjbTwLYfsD2T5o/DEmP1rZPqp1vQknbvIxkvynpNklXSFpb0sG1z2aGpLXr9dq+\n0/ZdA/wtDgPOa3XA9lW2F5Xdm4BNyvaTwOjyWiBpS2AT21MHOFdERDxD2g3Qi4C1JK0L7A7cAuwh\n6YXAA7Yfpxrx2fYFwDTgsDLq+zJwL9CwvZekDYGTgb3KKHU61QiPUsffbI+3fX5TG65jSUB+EfAT\nqmAHVcC+nirAPmJ7Z2Bn4L2SNi95dgaOAcYCWwJvaqp/K+D/2X6UgRmqLxrAVuV844Dxknav1fcV\n2y8FHgEOsv3T2mezo+0nBnGuZm8Bzh1EvncBl5XtU4EfAB8Bvgr8L9XfICIiOsTy/A76Bqop5d2p\n/oe/DyCq6dhW+voR7S5UQfL6Mohdkyq49vpxH+WuBz5WAu482/9WZTQwHrgZ+CCwbRmpAjyLKlA+\nBdxsex6ApHOBVwIX9N3dQdkb2FvSzLI/upzvbuBPtWno6cDmtXJt/cBY0suBx2zfMUC+k4EFticD\n2J5N9SUGSXtQfWFaTdKPgQXACbYfaKdNERExNJYnQE8F9gA2A34OfJRqJPmLPvL390v8q2wf1sex\nf7WszJ4raX1gX5YE9OlUI8U/2f5XCfjH2L6qXlZSo6k9atG+ucBmkta1Pb+ftjc71fY3m863OfDv\nWtJCoD6d3e5dCg4FJveXQdKRwBuAvVocE9XI+VDgLOBEYAvgOKpr0kuZUPv60hgDjbFttjoiokv1\n9PTQ09MzJHUtT4C+Fvgc0GPbkv5OFQg+WsvTOzKcTzV6pWn/71TXRr8qaUvbfygj4I1t/34QbbiR\nasHZEWX/BqqFTr1fEq4APiBpiu2nJG0D3FOO7VwC5/+jmib+Rr1i249J+jbwJUlH235S0kbAq8rU\ndCtXAJ+RNKl8QXgB1Yi0lb4+m740XyNfDXgz1ci/dYFqxflJpc2tps8PBy61/XC55u7yarmAb8JB\ng2hlRMQw1mg0aDQai/cnTpzYdl3tXoO27T+X7d4p7WuBh23/o56vvH8P+HptIdQ3gcslXWP7QaqV\nzOdKmk01Gn7xINtxHdXCp2ll/0aqEWDviPpbwB3ADEm3AmdTfSkx1XXzr5TjfwQualH/KcCDwB2l\n/CXAP1rkM1SLsqhGtDdImgOcD6zT9FnQtP89lv5sFpN0oKS7qS4DXCrpl7XDe1BdI5/XVOYcSTuW\n3bPK+a8qC/S+Vss3iuqLzVdL0plU16jPpPqcIiJiJZKde8DGwCTZk1Z2KyK6VO7F3bUkYbutdUa5\nk1hEREQHSoCOiIjoQAnQERERHSgBOiIiogMlQEdERHSgrOKOQZHk/FuJiFg2WcUdERHRZRKgIyIi\nOlACdERERAdanntxx3Azua3LKLGy5S5VEaukjKAjIiI6UAJ0REREB0qAjoiI6EAJ0BERER0oAXoI\nSJog6YSy/T1JB7VZz76SPlK2D5A0po98W0m6tjzjebak1/eR75By/DZJp9XSj5V0q6RLJa1R0l4p\n6cx22h0REUMvAXpouLyatwckafHfwPYltj9fdg8AxvZR7BTgR7bHAYcCX2tR73OALwCvtv1S4HmS\nXl0OH2Z7W+B64HWSVOr89GDbHRERK1YCdJsknSzpd5KuBV789MN6naTzawkNSZeU7UclnSFpFrBL\nLc+Rks6S9ApgX+D0Mkp+UVP99wHrle31gb+0aOKLgN/bfqjsXwP0juwlaS1gFPAk8HbgMtuPLNun\nEBERK0p+B90GSeOBQ4DtgTWAGcC0WhYDVwPflDTS9uMl/7nl+CjgRtsnNlVtANs3SLoYuMT2hS2a\ncCpwg6RjgdHAXi3yzAVeLOmFVAH8AJb8vb8C3ADcBlwH/BzYe5Ddj4iIZ0ACdHt2By60/QTwRAmm\nS7G9UNLlwH6SLgDeAPQG5IXABYM4T193BjkT+JbtL0raBfgR8JKm8z8s6f3Aj4FFVNPZW5ZjPypl\nkPRJ4EvAf0l6B3A3cEKrJ2NMqLW4MQYafU3AR0QMUz09PfT09AxJXQnQ7TFLB8++Aul5wDHA34Fp\ntv9V0p8Y5KOh+sqzK/ApANs3Slpb0oa2/7ZUYfsXwC8AJB0FPFU/LmljYCfbn5bUA+wJfIJqRH51\n80kntLX0LSJi+Gg0GjQajcX7EydObLuuXINuz1TggBIY1wXe2HRctXw7Au9lyfR2f+qBfj7wrD7y\n3Qm8BqCs9F67OTiXY/9R3p8NvB/4VlOWz1AFZICR5fwu2xERsRIlQLfB9kyqqePZwGXAzc1ZSr6F\nVCPYfcr7UscBJB0t6ehaeu+x84CTJE1vsUjsJOCdZZHZZOCIWn0za/n+T9LtwG+AU23PreXbAVhk\ne1ZJmgzMAV4BXD7ghxARESuUBjfTGsOdJHvSym5FtCUPy4hYaSRhu60nDWUEHRER0YESoCMiIjpQ\nAnREREQHSoCOiIjoQFkkFoMiaZA/3Y6IiF5ZJBYREdFlEqAjIiI6UAJ0REREB8q9uGPwJrd1GWX4\nyY1BImIIZAQdERHRgRKgIyIiOlACdERERAdKgI6IiOhACdBDQNKj5X1zSY9Lmll7fbK2vbC2vai8\n3y7psRbpv5f0SC19lxbnPVbSbyXdJunzfbRtH0l3lvo+Ukv/vKTZkr5fS3u7pONXxGcUERHLJqu4\nh0Z92e5c2+Oajn8aQNL85mOSXgj8okX6q4ATbe/b6oSS9gT2A7az/aSkjVrkGQF8BXgN8BfgFkkX\nA/cC42xvL+kcSS8F/gAcCbxusJ2OiIgVJyPola+v3y4N9Jum9wOn2n4SwPaDLfLsTPWFYV7Jdx6w\nP7AQWEOSgFHAk8CJwJdtL2yjDxERMcQSoIfelrVp6bNW4Hm2BvaQdKOkHkkva5HnBcDdtf17gBfY\nfhS4DJhBNZr+J7Cz7YtXYHsjImIZZIp76P2hxRT3irA68Gzbu0jaCTgfeFFTnj7vmGH7dOB0AEnn\nAJ+Q9B7gtcAc259tLjPhgiXbjTHQGLu8XYiI6C49PT309PQMSV0J0Kuue4ALAWzfUhaXPcf2Q7U8\nfwE2re1vWsotJqn3y8RdwGm295H0HUlb2Z5bzzvhoCHvQ0REV2k0GjQajcX7EydObLuuTHGvui4C\nXg0gaRtgzabgDDAN2LqsLl8TOARonsb+NPAJYE1gRElbBIxcUQ2PiIiBJUAPDfex3V++gdI9QF3f\nAV4k6VbgXOBwAEkbS7oUwPZTwDHAFcAdwI9t/7a3Akn7A7fYvt/2I8AsSXOAtWzf2s+5IyJiBZOd\nG/vHwCTZk1Z2K1YReVhGRBSSsN3Wk4Yygo6IiOhACdAREREdKAE6IiKiAyVAR0REdKAsEotBkeT8\nW4mIWDZZJBYREdFlEqAjIiI6UAJ0REREB0qAjoiI6EB5WEYM3uS21jmsPLmjV0SswjKCjoiI6EAJ\n0BERER0oAToiIqIDdVWAlrSepPcvY5nryvsLJb21lr69pNe32Y7TJd0m6fPtlG+qq0fS+Bbp35Y0\nS9IcST+TtF4f5TeTdKWkOyTdLmmzkj5J0mxJn63lPaU8gjIiIlayrgrQwLOBDyxLAdu7lc0tgMNq\nh8YBb1iWuiSNKJvvBba1/ZFlKd+Hvp4L/SHbO9jeDvgjcGwf5X8AfN72WGAn4EFJ2wGP2d4e2EnS\nupKeD+xs++dD0OaIiFhO3baK+zRgS0kzgauAUcAVti+R9DPg77bfLeldwItsnyLpUdvrlLL/Wcqe\nC3wQGCnplcDngMuAs4CXAGsAE2xfLOlI4E3AaGCEpH8A6wAzJJ0KTAHOBjYrbfyQ7eslje6jvpHA\nd4HtgDuBkcDTlk/bng8gSSXP75vzSBoLjLB9TSnzWElfUPq2Wjn3IuDTwCeX9QOPiIgVo9sC9EeA\nl9geByDpEGB34BLgBcBzS77dgcll27WyJ9ret5T9KzDe9nFl/3PANbbfJWl94CZJV5ey46hGzI+U\nvPNrbZgMfNH2dWV6+XJgLHByH/W9D3jU9lhJ2wIzaD2CRtJ3gdcDc4HjWmTZBnhE0gVUMwRXAx+1\nfaekB4HpVCPsranuyz5rwE84IiKeEd0WoJtHmtcCH5I0BrgdWF/S84BdgGMGKKumtL2BfSWdWPbX\nohoVG7iqNzi38BpgTDXQBWDdMnruq77dgS8B2L5V0py+Omv7nWUU/BWqgD+xKcvqpb4dgLuBHwNH\nAt+x/d+LOypdDBwl6WSqkftVtr/V13kjImLF67YAvRTb95bR6T7AVGAD4BCqEeq/BireIu1Ntpea\nSpb0cqC/ugS83PaCpnJ91ddbZlBsL5J0HvA/LQ7fDcyyPa/UfRHVl5Pv1M63PzANWJdq2v8QSZdL\nmmT78XplEy5Yst0YA42xg21lRMTw0NPTQ09Pz5DU1W0Bej5VoKm7EfgQsCewIXABcP4gyjbvX0E1\njXwsgKRxtmcycDC9spQ7o5Tb3vbsfuqbSrVYbYqkl1KNaJ9G0la255Zr0PsBM1tkm0Y1a7Ch7b8B\newE31+pYAzieajHcNiz5UjKC6tr00gH6oAF6GhExzDUaDRqNxuL9iRObJzYHr6tWcdt+CLhO0q21\nnzhdS7VQ6o9UQezZJW1xsfI+G1hYfrp0PNXirrGSZkp6M/AZYI3ys6bbWDKd3GqVdX3/OOBl5SdN\ntwNHl/S+6jsbWEfSHSVtWnM/S1D+Xpn+nk01M/C5cmy8pHPK57EQOBG4puQ1cE6tqg8A37P9hO05\nwKiSb5rtfz7tA46IiGeM7NyvOAYmyZ60sluxjHIv7ohYySRhu60HGXTVCDoiIqJbJEBHRER0oATo\niIiIDpQAHRER0YESoCMiIjpQVnHHoEhy/q1ERCybrOKOiIjoMgnQERERHSgBOiIiogN12724Y0Wa\n3NZllMHLnb8iIhbLCDoiIqIDJUBHRER0oAToiIiIDpQAHRER0YGGVYCW9GjT/pGSzmqzrv0ljVmO\ntlxX3l8o6a3LWHZTSVMk3S7pNknH9ZP3y5J+X55HPa6kbSTpN+W52fvX8l4k6Xnt9ikiIobOsArQ\nQPMy4eVZNnwgMLbthti7lc0tgMOWsfiTwH/bfgmwC/DBVl8WJL0B2Mr21sBRwNnl0FuBrwE7Ax8q\nefcFZti+f1n7EhERQ2+4Behmi383JGlzSb8qI82rJW3aV7qkXYF9gdMlzZD0IknHlRHtbEmTS9kJ\nkk6oneM2SZuV7d7R/GnA7pJmSjpe0mqSTpd0c6nrqOZG277f9qyy/SjwW2DjFv3bD/h+yXcTsH4Z\nIS8ARgNrAwsljQCOB76wHJ9lREQMoeH2O+iRkmbW9jcAfl62zwK+a/uHkt4JfJlqlPy0dNsHSroY\nuMT2hQCSPgJsbvtJSc8qdfY3Yu/d/ghwou19Sz1HAY/Y3lnSWsBvJF1pe16rDknaHBgH3NTi8AuA\nu2v791AF8snldRTwP8AHgR/YfqLVOSIi4pk33AL047bH9e5IOgJ4WdndBTigbP+IJaPJvtKhNgIH\n5gCTJV0EXLQMbWq++8fewLaSDi77zwK2AuY9raC0DvBT4Pgykh5M/dj+J/DGUsezgY8BB0o6B1gf\n+P9s39hcbsIFS7YbY6DR9gR/RER36unpoaenZ0jqGm4Bullz8OrrVll9pddHxP8F7EE19X2ypG2B\np1j6MsLag2zXMbav6i+DpDWAC4Af2e7rC8FfgE1r+5uUtLpPAP9LdR18aqnzQmCf5somHDSotkdE\nDFuNRoNGo7F4f+LEiW3XNdyvQdddDxxatt9GFaz6S59PNbpFkoDNbPcAHwXWo7rGOw/YseTZkWpB\nWLP5wLq1/SuAD0havZTbRtKoeoFyvm8Dd9j+v376dDFweCmzC9XU+V9r9WwNbGx7KjCSJV84RvZT\nZ0REPAOG2wi61TXh3rRjge9KOgl4AHjnAOnnAedIOpZqVfS3Ja1HNdr+ku1/SroAOFzSbVTXiH/X\noi2zqRZqzQK+S3Xte3NgRgnED1BdC6/bDXg7MKd2Tf1jti+XdDSA7W/YvkzSGyTNBf5Va3uv/wU+\nXrbPpZqa/yjVqDoiIlYi2XlAQQxMkj1pBZ8kD8uIiC4jCdttPWkoU9wREREdKAE6IiKiAyVAR0RE\ndKAE6IiIiA6URWIxKJKcfysREcsmi8QiIiK6TAJ0REREB0qAjoiI6EDD7U5isTwmt3UZZWC5QUlE\nxNNkBB0REdGBEqAjIiI6UAJ0REREB0qAjoiI6EBdEaAlLZQ0U9IcSRdKWmcFn+9ISWdJ+ng578xa\nG2ZKOmbOXEc5AAAPmklEQVQFn38DSVMkzZd0Vj/5dpZ0c2nTLZJ2Kum7SZpd0rYqaetLumJFtjsi\nIgavKwI08Jjtcba3A/4JHP1MnNT258p5x9XaMM72V1bwqZ8ATgFOHCDfF4BPlPZ9suwDfBh4PfAh\n4H0l7RTgs0Pf1IiIaEe3BOi6G4AtASTtIOnGMlq8UNL6Jb1H0viyvaGkP5XtI0u+X0q6S9LneyuV\n9E5Jv5N0E7BrXyeXNELS6WXkOlvSUSV9HUlXS5peRvr7lfTNJd0p6bul/kmS9pZ0XWnDTs3nsP2Y\n7euAfw/wWdwHrFe21wf+UrafBEaX1wJJWwKb2J46QH0REfEM6arfQUsaAewNXFOSfgB80Pa1kiYC\nnwL+G3B5tbI9sAOwAPidpC8Di4AJwI5UI/QpwIw+yr8beMT2zpLWAn4j6UrgbuBA2/MlbUj1ReLi\nUmZL4CDgDuAW4BDbu5Ug/nHgwD7ONdAPiD9azn8G1ZexV5T0U6k+m8eAw4EzgJMHqCsiIp5B3RKg\nR0qaCbwAmAd8XdJ6wHq2ry15vg/8ZBB1XWN7PoCkO4DNgY2AHtsPlfQfA9v0UX5vYFtJB5f9ZwFb\nAfcAp0ranSrgbyzpP0qeP9m+vdR9O3B1Sb+tnL9d3waOs/0zSW8GvgO81vZsSrCWtAdwL7Ba6dcC\n4ATbDzRXNuGCJduNMdAYuxwti4joQj09PfT09AxJXd0SoB+3PU7SSOAKYH+WjKJ71W+D9RRLpvfX\nbspXnzZeSPUZNY9UB7ql1jG2r1qqgHQksCGwo+2FZVq999z1cy6iCpK928vzN9rZ9mvK9k+BbzW1\nSVQj50OBs6iuaW8BHEd1TXopEw5ajpZERAwDjUaDRqOxeH/ixIlt19VV16BtP04VXD4LzAcelvTK\ncvgdQE/Znge8rGwfTP8M3AS8qqyeXgN4cz/5rwA+IGl1AEnbSBpFNZJ+oATnPYEXLkvf+jDQF4W5\nkl5Vtl8N3NV0/HDgUtsPA6NYMvU/agjaFhERy6FbRtCLR7i2Z0maC7wFOIJqunsU8AfgnSXbGcD5\nZQHXpbXyLa9N275f0gSq68aPADNb5Ovd/xbVtPSMMkJ9ADgAmARcImkOMA34bav2t9hveZ1Z0jxg\nXWBNSQdQTV3fKekc4Ou2pwNHAV8t18IfL/u95UeVz+e1JelM4DKq0fxhrc4ZERHPHNl5UEEMTJI9\naQVVnodlRESXkoTttp401FVT3BEREd0iAToiIqIDJUBHRER0oAToiIiIDpRFYjEokpx/KxERyyaL\nxCIiIrpMAnREREQHSoCOiIjoQAnQERERHahbbvUZz4TJba1zaC13D4uI6FdG0BERER0oAToiIqID\nJUBHRER0oK4L0JKeI2lmed0n6Z6yPaP3Gc1DcI7tJb2+tn+kpAfLOe6SdLmkV7RZ9+aSbh1Evssl\nPSzpkgHyvUXS7ZJukzSppL1Y0nRJsyXtUtJWl3SVpLXbaXdERAytrlskZvshYByApE8B822fOcSn\nGQeMB37Ze1rgXNvHlfM2gAsl7Wn7ziE+d68vAKOAo/vKIGlr4KPArrb/IWnDcugo4Fjgz8CXgIOB\n9wM/tP3ECmpvREQsg64bQbcwQtI0WDzyXSRpk7L/B0lrS9pI0k8l3Vxeu5bjoyV9R9JNZXS8n6Q1\ngE8Dh5SR+VvKeRYvcbbdA3yTKhAiaUtJv5Q0TdJUSS8u6c+V9DNJs8prl3rDJb2onHd8c6ds/wp4\ndIC+vxf4iu1/lDJ/K+lPAqPLa4Gk9YA32v7B4D7SiIhY0bpuBN3CImAtSesCuwO3AHtIug74q+0n\nJH0H+KLt6yRtBlwOjAVOBq6x/S5J6wM3AVcDnwDG10bMR7Q470xKgKYK1kfbnivp5cDXgL2ALwNT\nbB8oaTVgHWCDUueLgXOBI2wPOOXdh60BS/oNMAKYYPsK4KvAD4A1gfcBnwQ+2+Y5IiJiBRgOARrg\nBmA3qgB9KrAP1Yh3ajn+GmCMtHgQvK6k0cDewL6STizpawGblbID/ShYUI3CgV2Bn9TqX7O87wm8\nHcD2IuCfkjYA/gO4CDhwOafIVwe2Al4FbApMlbSt7bvLuZG0FfAC4E5JPwTWAD5h+/fLcd6IiFhO\nwyVATwX2oAquP6e6LmvgF+W4gJfbXlAvVALqm5qDVRkFD2QccAfVZYSHbY/rI1+rQP8I1fXh3YH+\nAvRAd/u4B7jJ9kJgnqS7qAL29Fqe/6WaKTieaqT/Z+BzlC8OdRMuWLLdGAONsQOcPSJimOnp6aGn\np2dI6houAfpaqqDTY9uS/g68gSpQA1wJHAecAdW1atuzgStK+rElfZztmcB8YN1a/UsFWUmvorr+\n27A9X9KfJB1s+6eqov62tucA11AtzvqSpBFU14QBFgBvAq6Q9Kjtc/vo10Cj+IuAtwLfKwvEtgH+\n2NTOv9j+g6SRVAHfVIvPnmbCQQOcLSJimGs0GjQajcX7EydObLuu4bBIzLb/XLZ7p7SvpRrV/qPs\nHwe8rPzs6HaWrIz+DLCGpDmSbgN6P+kpwNjaIjGzZNHY76gC/5ts/67kfxvwbkmzgNuA/Ur68cCe\nkuYA04AxtTY/BrwR+G9Jb2zulKRrgfOBvSTdLem1JX2ipH1LJVcAD5U+/Qo40fbDJZ+oRs6fKVV+\nk2pF9yXA6YP6ZCMiYoWRnXsix8Ak2ZOGsMLcizsihgFJ2G7rQQbDYQQdERGxykmAjoiI6EAJ0BER\nER0oAToiIqIDZZFYDIok599KRMSyySKxiIiILpMAHRER0YESoCMiIjpQAnREREQHGi734o6hMLmN\ndQ65Y1hERFsygo6IiOhACdAREREdKAE6IiKiAw27AC1pYXksZO9rsyGqd4KkE4agnoakSwbIs4Gk\nKZLmSzqrn3xvlnR76fP4Wvpu5dGat0jaqqStL+mK5W1/REQMjeG4SOwx2+NWQL3P5GqoJ4BTgJeW\nV19uBQ4EvsHS7fsw8HpgC+B9wImlvs+uiMZGRMSyG3Yj6GaSRku6WtJ0SXMk7Vc7dngZac6S9IOS\ntpGkn0q6ubx2rVW3vaTrJd0l6T0lvySdLunWUv9b+ktvattOkmZI2qKebvsx29cB/+6vb7bvtH1X\ni0NPAqPLa4GkLYFNbE8d1IcWEREr3HAcQY+UNLNs/xF4C3Cg7fmSNgRuAC6W9BLgZOAVtv8uaf1S\n5kvAF21fV6bHLwfGAgK2A14OrAPMlHQpsCuwfTm2EXCLpKnAbn2kA1AC/5eB/Wzf00df2h21nwr8\nAHgMOBw4o/Q1IiI6xHAM0I/Xp7glrQGcKml3YBGwsaTnAq8Gzrf9dwDbj5QirwHGSIt/E7yupNFU\nwfIi2/8G/i1pCrAzVSCeXJ408YCkXwM79ZP+T6qA/w3gtbbvH+oPwPZs4BWl/3sA9wKrSfoxsAA4\nwfYDQ33eiIgYvOEYoJu9DdgQ2NH2Qkl/AtamCrit7swh4OW2FyyVqJY38egd4fZ1h4/m9N789wJr\nATsClw3UgXapavTJwKHAWVTXorcAjqO6Jr2UCRcs2W6MgcbYFdWyiIhVU09PDz09PUNSVwI0PAt4\noATnPYEXUgXKXwE/k3RmmeJ+tu2HgSupAtgZAJJ2sD2LKtjuL+lUqinuBvARYARwtKTvA88B9qAK\nhKv3kT4WeAR4N3CVpH/Z/nUfbV+WW3u1yns4cKnthyWNKv02MKpVBRMOWoazRUQMQ41Gg0ajsXh/\n4sSJbdc1HAN083XbScAlkuYA04DfAti+Q9JngV9LWgjMAN5FFZy/Kmk21ef3a+ADpd45wBSqEfmn\ny/T0zyS9Aphd8pxUpo9bpksaU53eD0h6I/BLSe+0fUu90ZLmAesCa0o6gGo6/E5J5wBftz1d0oFU\n17E3BC6VNNP260v5UcARwGtLlWdSjdb/DRy2HJ9vREQMAVWXQCP6J8me1EbB3Is7IoYxSdhu40EG\n+ZlVRERER0qAjoAhW9TRibq5b5D+req6vX/LIwE6gu7+n0Q39w3Sv1Vdt/dveSRAR0REdKAE6IiI\niA6UVdwxKJLyDyUiog3truJOgI6IiOhAmeKOiIjoQAnQERERHSgBOvolaR9Jd0r6vaSPrOz2tEPS\ndyT9VdKttbQNJF1Vnt19Ze1xokj6WOnvnZL2XjmtHjxJm0qaIul2SbdJOq6kr/J9lLS2pJvKM9nv\nKPe674q+1UkaIWmmpEvKftf0T9K88sz7mZJuLmnd1L/1Jf1U0m/Lv9GXD1n/bOeVV8sX1YM+5gKb\nA2sAs4AxK7tdbfRjd2AccGst7QvA/5TtjwCnle2xpZ9rlH7PBVZb2X0YoH/PA3Yo2+sAvwPGdEsf\ngVHlfXXgRuCV3dK3Wh8/TPVcgIu78N/nn4ANmtK6qX/fB95VtlcH1huq/mUEHf3ZGZhre57tJ4Hz\ngP1XcpuWme1rgYebkvej+g+L8n5A2d4fONf2k7bnUf0HtPMz0c522b7f1RPVsP0o1QNfXkCX9NH2\nY2VzTaovjQ/TJX0DkLQJ8AbgWyx56lzX9K9oXsXcFf2TtB6wu+3vANh+yvY/GKL+JUBHf14A3F3b\nv6ekdYPn2v5r2f4r8NyyvTFVP3utUn2WtDnVbMFNdEkfJa0maRZVH6bYvp0u6VvxReAkYFEtrZv6\nZ+BqSdMkvbekdUv/tgAelPRdSTMknSNpNEPUvwTo6M+w+A2eq7mn/vq6SnwOktYBLgCOtz2/fmxV\n7qPtRbZ3ADYB9ijPba8fX2X7Vh4p+4DtmfTxfPdVuX/FbrbHAa8HPihp9/rBVbx/qwM7Al+zvSPw\nL+Cj9QzL078E6OjPX4BNa/ubsvS3v1XZXyU9D0DS84EHSnpznzcpaR1N0hpUwfmHti8qyV3VxzJ1\neCkwnu7p267AfpL+BJwLvFrSD+me/mH7vvL+IPAzqindbunfPcA9tm8p+z+lCtj3D0X/EqCjP9OA\nrSVtLmlN4BDg4pXcpqFyMXBE2T4CuKiWfqikNSVtAWwN3LwS2jdokgR8G7jD9v/VDq3yfZS0Ye8K\nWEkjgdcCM+mCvgHY/rjtTW1vARwK/Mr2O+iS/kkaJWndsj0a2Bu4lS7pn+37gbslbVOSXgPcDlzC\nUPRvZa+Ay6uzX1TTUr+jWszwsZXdnjb7cC5wL7CA6pr6O4ENgKuBu4ArgfVr+T9e+nsn8LqV3f5B\n9O+VVNcvZ1EFr5nAPt3QR2BbYEbp2xzgpJK+yvetRV9fxZJV3F3RP6prtLPK67be/4d0S/9Ke7cH\nbgFmAxdSreIekv7lVp8REREdKFPcERERHSgBOiIiogMlQEdERHSgBOiIiIgOlAAdERHRgRKgIyIi\nOlACdERERAdKgI6IiOhA/z8aaHKp8XyaNwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x7fec53d81550>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "dataset.source_name = dataset.source.apply(get_source_name)\n",
    "\n",
    "source_counts = dataset.source_name.value_counts().sort_values()[-10:]\n",
    "\n",
    "bottom = [index for index, item in enumerate(source_counts.index)]\n",
    "plt.barh(bottom, width=source_counts, color=\"orange\", linewidth=0)\n",
    "\n",
    "y_labels = [\"%s %.1f%%\" % (item, 100.0*source_counts[item]/len(dataset)) for index,item in enumerate(source_counts.index)]\n",
    "plt.yticks(np.array(bottom)+0.4, y_labels)\n",
    "\n",
    "source_counts"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}