{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import seaborn as sns\n",
    "from IPython import display\n",
    "\n",
    "import pandas as pd\n",
    "import twitter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A basic twitter grab and do something. \n",
    "\n",
    "## make a twitter dev account and get api keys\n",
    "\n",
    "First, we need access to the twitter api, which one gets over at [twitter's dev site](https://dev.twitter.com/). Sign up as a dev, then [go to the twitter apps site](https://apps.twitter.com/) and click create a new app. This gives you four, yes four thingamjigs u need to access the API. Why four? why can't it just one thing? \n",
    "\n",
    "Now this notebook is in github, so step 1 is to put all four of the secret codes in a file which doesn't get uploaded to github. Twitter has a [built in module called configparser](https://docs.python.org/3/library/configparser.html) which parses config files, so I have a config.ini txt file which looks like:\n",
    "\n",
    "```\n",
    "[twitter]\n",
    "\n",
    "c_key = this_is_a_fake_to_be_replaced_by_real_thingamajig\n",
    "c_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig \n",
    "\n",
    "a_token = this_is_a_fake_to_be_replaced_by_real_thingamajig\n",
    "a_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig\n",
    "```\n",
    "\n",
    "### Now to read the keys into our python script/notebook"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The config file has the following sections: ['twitter']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "['c_key', 'c_secret', 'a_token', 'a_secret']"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# api keys are in config.ini to keep them outside of this public notebook\n",
    "import configparser\n",
    "config = configparser.ConfigParser()\n",
    "config.read('config.ini')\n",
    "\n",
    "print(f'The config file has the following sections: {config.sections()}')\n",
    "\n",
    "if \"twitter\" in config:\n",
    "    twit = config['twitter']\n",
    "\n",
    "# check to see if we got all the keys needed to access the twitter api\n",
    "[key for key in twit]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## using python to access the twitter api\n",
    "\n",
    "Now, there are many [twitter api libraries](https://dev.twitter.com/resources/twitter-libraries) but \n",
    "I'm using the [python-twitter module](https://github.com/bear/python-twitter), just cause it seems popular and is the first one listed under python libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 237,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "User(ID=7914, ScreenName=KO)"
      ]
     },
     "execution_count": 237,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## define the necessary keys\n",
    "cKey = twit[\"c_key\"]\n",
    "cSecret = twit[\"c_secret\"]\n",
    "aKey = twit[\"a_token\"]\n",
    "aSecret = twit[\"a_secret\"]\n",
    "\n",
    "## create the api object with the twitter-python library\n",
    "api = twitter.Api(consumer_key=cKey,\n",
    "                  consumer_secret=cSecret,\n",
    "                  access_token_key=aKey,\n",
    "                  access_token_secret=aSecret)\n",
    "api.VerifyCredentials()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All right! we have a succesful api connection to twitter!\n",
    "\n",
    "### get tweets from a user\n",
    "\n",
    "this grabs the tweets alongs with a bunch of metadata for each tweet:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 238,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "so we got 20 statuses, printing the first:\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Status(ID=895177279470489601, ScreenName=KO, Created=Wed Aug 09 06:57:49 +0000 2017, Text='RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…')"
      ]
     },
     "execution_count": 238,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## get the user timeline with screen_name = 'KO'\n",
    "statuses = api.GetUserTimeline(screen_name = 'KO')\n",
    "print(f\"so we got {len(statuses)} statuses, printing the first:\")\n",
    "status = [s for s in statuses][0]\n",
    "status"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So each status is an [object holding all the info about a tweet](http://python-twitter.readthedocs.io/en/latest/twitter.html#twitter.models.Status).\n",
    "\n",
    "Now, the status object can be resturned as a dictionary, which is handy since we can use that to build a pandas dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 239,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>created_at</th>\n",
       "      <th>favorite_count</th>\n",
       "      <th>favorited</th>\n",
       "      <th>hashtags</th>\n",
       "      <th>id</th>\n",
       "      <th>id_str</th>\n",
       "      <th>in_reply_to_screen_name</th>\n",
       "      <th>in_reply_to_user_id</th>\n",
       "      <th>lang</th>\n",
       "      <th>media</th>\n",
       "      <th>...</th>\n",
       "      <th>quoted_status_id</th>\n",
       "      <th>quoted_status_id_str</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeted</th>\n",
       "      <th>retweeted_status</th>\n",
       "      <th>source</th>\n",
       "      <th>text</th>\n",
       "      <th>urls</th>\n",
       "      <th>user</th>\n",
       "      <th>user_mentions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Wed Aug 09 06:57:49 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895177279470489601</td>\n",
       "      <td>895177279470489601</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>8.946695e+17</td>\n",
       "      <td>894669466675621889</td>\n",
       "      <td>15.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @Pinboard: This letter to Google from a pot...</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 55525953, 'name': 'Pinboard', 'screen_...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Wed Aug 09 06:57:20 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895177159039430656</td>\n",
       "      <td>895177159039430656</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @glcarlstrom: .@TheEconomist scenario of nu...</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 14346260, 'name': 'Gregg Carlstrom', '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wed Aug 09 06:55:08 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895176604950855680</td>\n",
       "      <td>895176604950855680</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>73.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @jonathanshainin: I'm biased, but this is o...</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 46073276, 'name': 'Jonathan Shainin', ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wed Aug 09 06:53:36 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895176215631462400</td>\n",
       "      <td>895176215631462400</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @Pinboard: Unpopular but correct opinion: t...</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 55525953, 'name': 'Pinboard', 'screen_...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Wed Aug 09 06:36:08 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895171819946356736</td>\n",
       "      <td>895171819946356736</td>\n",
       "      <td>WorkingCopyApp</td>\n",
       "      <td>7.993167e+17</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>@WorkingCopyApp can the app display jupyter no...</td>\n",
       "      <td>[{'expanded_url': 'http://nbviewer.jupyter.org...</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 799316732280274944, 'name': 'Working C...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                       created_at  favorite_count favorited hashtags  \\\n",
       "0  Wed Aug 09 06:57:49 +0000 2017             NaN       NaN       []   \n",
       "1  Wed Aug 09 06:57:20 +0000 2017             NaN       NaN       []   \n",
       "2  Wed Aug 09 06:55:08 +0000 2017             NaN       NaN       []   \n",
       "3  Wed Aug 09 06:53:36 +0000 2017             NaN       NaN       []   \n",
       "4  Wed Aug 09 06:36:08 +0000 2017             NaN       NaN       []   \n",
       "\n",
       "                   id              id_str in_reply_to_screen_name  \\\n",
       "0  895177279470489601  895177279470489601                     NaN   \n",
       "1  895177159039430656  895177159039430656                     NaN   \n",
       "2  895176604950855680  895176604950855680                     NaN   \n",
       "3  895176215631462400  895176215631462400                     NaN   \n",
       "4  895171819946356736  895171819946356736          WorkingCopyApp   \n",
       "\n",
       "   in_reply_to_user_id lang media  \\\n",
       "0                  NaN   en   NaN   \n",
       "1                  NaN   en   NaN   \n",
       "2                  NaN   en   NaN   \n",
       "3                  NaN   en   NaN   \n",
       "4         7.993167e+17   en   NaN   \n",
       "\n",
       "                         ...                         quoted_status_id  \\\n",
       "0                        ...                             8.946695e+17   \n",
       "1                        ...                                      NaN   \n",
       "2                        ...                                      NaN   \n",
       "3                        ...                                      NaN   \n",
       "4                        ...                                      NaN   \n",
       "\n",
       "   quoted_status_id_str retweet_count  retweeted  \\\n",
       "0    894669466675621889          15.0       True   \n",
       "1                   NaN           4.0       True   \n",
       "2                   NaN          73.0       True   \n",
       "3                   NaN          50.0       True   \n",
       "4                   NaN           NaN        NaN   \n",
       "\n",
       "                                    retweeted_status  \\\n",
       "0  {'created_at': 'Wed Aug 09 06:15:01 +0000 2017...   \n",
       "1  {'created_at': 'Wed Aug 09 06:28:50 +0000 2017...   \n",
       "2  {'created_at': 'Tue Aug 08 22:22:25 +0000 2017...   \n",
       "3  {'created_at': 'Wed Aug 09 03:56:50 +0000 2017...   \n",
       "4                                                NaN   \n",
       "\n",
       "                                              source  \\\n",
       "0  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "1  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "2  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "3  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "4  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "\n",
       "                                                text  \\\n",
       "0  RT @Pinboard: This letter to Google from a pot...   \n",
       "1  RT @glcarlstrom: .@TheEconomist scenario of nu...   \n",
       "2  RT @jonathanshainin: I'm biased, but this is o...   \n",
       "3  RT @Pinboard: Unpopular but correct opinion: t...   \n",
       "4  @WorkingCopyApp can the app display jupyter no...   \n",
       "\n",
       "                                                urls  \\\n",
       "0                                                 []   \n",
       "1                                                 []   \n",
       "2                                                 []   \n",
       "3                                                 []   \n",
       "4  [{'expanded_url': 'http://nbviewer.jupyter.org...   \n",
       "\n",
       "                                                user  \\\n",
       "0  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "1  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "2  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "3  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "4  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "\n",
       "                                       user_mentions  \n",
       "0  [{'id': 55525953, 'name': 'Pinboard', 'screen_...  \n",
       "1  [{'id': 14346260, 'name': 'Gregg Carlstrom', '...  \n",
       "2  [{'id': 46073276, 'name': 'Jonathan Shainin', ...  \n",
       "3  [{'id': 55525953, 'name': 'Pinboard', 'screen_...  \n",
       "4  [{'id': 799316732280274944, 'name': 'Working C...  \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 239,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "## create a data frame\n",
    "## first get a list of panda Series\n",
    "tweets = [t.AsDict() for t in statuses]\n",
    "\n",
    "## then create the data frame\n",
    "data = pd.DataFrame(tweets)\n",
    "\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, there is a bunch of columns, most of which we probably won't need, so for analysis can probably drop some of them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 240,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['created_at', 'favorite_count', 'favorited', 'hashtags', 'id', 'id_str',\n",
       "       'in_reply_to_screen_name', 'in_reply_to_user_id', 'lang', 'media',\n",
       "       'quoted_status', 'quoted_status_id', 'quoted_status_id_str',\n",
       "       'retweet_count', 'retweeted', 'retweeted_status', 'source', 'text',\n",
       "       'urls', 'user', 'user_mentions'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 240,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## grabbing more tweets\n",
    "\n",
    "See [twitter timeline doc](https://dev.twitter.com/rest/public/timelines) - this says you can grab at most 200 tweets in one request, for a max of 3,200 tweets altogether.\n",
    "\n",
    "Now we only grabbed the first 20 tweets with the above, so we need a function which keeps making requests for tweets until we hit twitters 3,200 tweet limit:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 241,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_tweets(user=\"KO\", limit=20):\n",
    "    # initial batch of tweets\n",
    "    statuses = api.GetUserTimeline(screen_name = user, count=limit)\n",
    "    \n",
    "    ## create a data frame\n",
    "    ## first get a list of panda Series\n",
    "    pdSeriesList = [t.AsDict() for t in statuses]\n",
    "\n",
    "    ## then create the data frame\n",
    "    tweets = pd.DataFrame(pdSeriesList)\n",
    "\n",
    "    # now to grab the older ones\n",
    "    \n",
    "    while len(statuses) >= 20:\n",
    "        # get the last tweet id and subtract one to make sure we don't get a duplicate tweet\n",
    "        last_tweet_id = tweets.tail(1)[\"id\"].values[0] -1\n",
    "        statuses = api.GetUserTimeline(screen_name = 'KO', max_id=last_tweet_id, count=limit)\n",
    "        \n",
    "        pdSeriesList = [t.AsDict() for t in statuses]\n",
    "        tweets = tweets.append(pdSeriesList, ignore_index=True)\n",
    "        \n",
    "    return tweets\n",
    "\n",
    "tweets = get_tweets()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 242,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(499, 23)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>created_at</th>\n",
       "      <th>favorite_count</th>\n",
       "      <th>favorited</th>\n",
       "      <th>hashtags</th>\n",
       "      <th>id</th>\n",
       "      <th>id_str</th>\n",
       "      <th>in_reply_to_screen_name</th>\n",
       "      <th>in_reply_to_status_id</th>\n",
       "      <th>in_reply_to_user_id</th>\n",
       "      <th>lang</th>\n",
       "      <th>...</th>\n",
       "      <th>quoted_status_id_str</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeted</th>\n",
       "      <th>retweeted_status</th>\n",
       "      <th>source</th>\n",
       "      <th>text</th>\n",
       "      <th>truncated</th>\n",
       "      <th>urls</th>\n",
       "      <th>user</th>\n",
       "      <th>user_mentions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Wed Aug 09 06:57:49 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895177279470489601</td>\n",
       "      <td>895177279470489601</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>894669466675621889</td>\n",
       "      <td>15.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @Pinboard: This letter to Google from a pot...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 55525953, 'name': 'Pinboard', 'screen_...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Wed Aug 09 06:57:20 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895177159039430656</td>\n",
       "      <td>895177159039430656</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @glcarlstrom: .@TheEconomist scenario of nu...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 14346260, 'name': 'Gregg Carlstrom', '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Wed Aug 09 06:55:08 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895176604950855680</td>\n",
       "      <td>895176604950855680</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>73.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @jonathanshainin: I'm biased, but this is o...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 46073276, 'name': 'Jonathan Shainin', ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wed Aug 09 06:53:36 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895176215631462400</td>\n",
       "      <td>895176215631462400</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>50.0</td>\n",
       "      <td>True</td>\n",
       "      <td>{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/#!/download/ipad\" ...</td>\n",
       "      <td>RT @Pinboard: Unpopular but correct opinion: t...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 55525953, 'name': 'Pinboard', 'screen_...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Wed Aug 09 06:36:08 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895171819946356736</td>\n",
       "      <td>895171819946356736</td>\n",
       "      <td>WorkingCopyApp</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7.993167e+17</td>\n",
       "      <td>en</td>\n",
       "      <td>...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>@WorkingCopyApp can the app display jupyter no...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'expanded_url': 'http://nbviewer.jupyter.org...</td>\n",
       "      <td>{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...</td>\n",
       "      <td>[{'id': 799316732280274944, 'name': 'Working C...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 23 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                       created_at  favorite_count favorited hashtags  \\\n",
       "0  Wed Aug 09 06:57:49 +0000 2017             NaN       NaN       []   \n",
       "1  Wed Aug 09 06:57:20 +0000 2017             NaN       NaN       []   \n",
       "2  Wed Aug 09 06:55:08 +0000 2017             NaN       NaN       []   \n",
       "3  Wed Aug 09 06:53:36 +0000 2017             NaN       NaN       []   \n",
       "4  Wed Aug 09 06:36:08 +0000 2017             NaN       NaN       []   \n",
       "\n",
       "                   id              id_str in_reply_to_screen_name  \\\n",
       "0  895177279470489601  895177279470489601                     NaN   \n",
       "1  895177159039430656  895177159039430656                     NaN   \n",
       "2  895176604950855680  895176604950855680                     NaN   \n",
       "3  895176215631462400  895176215631462400                     NaN   \n",
       "4  895171819946356736  895171819946356736          WorkingCopyApp   \n",
       "\n",
       "   in_reply_to_status_id  in_reply_to_user_id lang  \\\n",
       "0                    NaN                  NaN   en   \n",
       "1                    NaN                  NaN   en   \n",
       "2                    NaN                  NaN   en   \n",
       "3                    NaN                  NaN   en   \n",
       "4                    NaN         7.993167e+17   en   \n",
       "\n",
       "                         ...                         quoted_status_id_str  \\\n",
       "0                        ...                           894669466675621889   \n",
       "1                        ...                                          NaN   \n",
       "2                        ...                                          NaN   \n",
       "3                        ...                                          NaN   \n",
       "4                        ...                                          NaN   \n",
       "\n",
       "  retweet_count  retweeted                                   retweeted_status  \\\n",
       "0          15.0       True  {'created_at': 'Wed Aug 09 06:15:01 +0000 2017...   \n",
       "1           4.0       True  {'created_at': 'Wed Aug 09 06:28:50 +0000 2017...   \n",
       "2          73.0       True  {'created_at': 'Tue Aug 08 22:22:25 +0000 2017...   \n",
       "3          50.0       True  {'created_at': 'Wed Aug 09 03:56:50 +0000 2017...   \n",
       "4           NaN        NaN                                                NaN   \n",
       "\n",
       "                                              source  \\\n",
       "0  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "1  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "2  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "3  <a href=\"http://twitter.com/#!/download/ipad\" ...   \n",
       "4  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "\n",
       "                                                text truncated  \\\n",
       "0  RT @Pinboard: This letter to Google from a pot...       NaN   \n",
       "1  RT @glcarlstrom: .@TheEconomist scenario of nu...       NaN   \n",
       "2  RT @jonathanshainin: I'm biased, but this is o...       NaN   \n",
       "3  RT @Pinboard: Unpopular but correct opinion: t...       NaN   \n",
       "4  @WorkingCopyApp can the app display jupyter no...       NaN   \n",
       "\n",
       "                                                urls  \\\n",
       "0                                                 []   \n",
       "1                                                 []   \n",
       "2                                                 []   \n",
       "3                                                 []   \n",
       "4  [{'expanded_url': 'http://nbviewer.jupyter.org...   \n",
       "\n",
       "                                                user  \\\n",
       "0  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "1  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "2  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "3  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "4  {'created_at': 'Tue Oct 10 08:35:25 +0000 2006...   \n",
       "\n",
       "                                       user_mentions  \n",
       "0  [{'id': 55525953, 'name': 'Pinboard', 'screen_...  \n",
       "1  [{'id': 14346260, 'name': 'Gregg Carlstrom', '...  \n",
       "2  [{'id': 46073276, 'name': 'Jonathan Shainin', ...  \n",
       "3  [{'id': 55525953, 'name': 'Pinboard', 'screen_...  \n",
       "4  [{'id': 799316732280274944, 'name': 'Working C...  \n",
       "\n",
       "[5 rows x 23 columns]"
      ]
     },
     "execution_count": 242,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(tweets.shape)\n",
    "tweets.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## we got tweets in a dataframe! \n",
    "\n",
    "Now we can do some analysis. Say we put all the tweets in a list so we can do something with them:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 243,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…',\n",
       " 'RT @glcarlstrom: .@TheEconomist scenario of nuclear war seems far more plausible now than when it was published (a whole week ago!). https:…',\n",
       " \"RT @jonathanshainin: I'm biased, but this is one of the best things I've ever read about the psychology of American exceptionalism: https:/…\"]"
      ]
     },
     "execution_count": 243,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "t = [u for u in tweets['text'].values]\n",
    "t[:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 245,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "499"
      ]
     },
     "execution_count": 245,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Searching \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 246,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "pk_search = api.GetSearch(\"pakistan\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 250,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(15, 18)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>created_at</th>\n",
       "      <th>favorite_count</th>\n",
       "      <th>hashtags</th>\n",
       "      <th>id</th>\n",
       "      <th>id_str</th>\n",
       "      <th>lang</th>\n",
       "      <th>media</th>\n",
       "      <th>quoted_status</th>\n",
       "      <th>quoted_status_id</th>\n",
       "      <th>quoted_status_id_str</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeted_status</th>\n",
       "      <th>source</th>\n",
       "      <th>text</th>\n",
       "      <th>truncated</th>\n",
       "      <th>urls</th>\n",
       "      <th>user</th>\n",
       "      <th>user_mentions</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tue Aug 08 06:04:24 +0000 2017</td>\n",
       "      <td>15925.0</td>\n",
       "      <td>[]</td>\n",
       "      <td>894801449384910848</td>\n",
       "      <td>894801449384910848</td>\n",
       "      <td>en</td>\n",
       "      <td>[{'display_url': 'pic.twitter.com/DOcW7STnt6',...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5116.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>It is so satisfying for me to see the reffores...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>{'created_at': 'Fri Mar 12 19:28:06 +0000 2010...</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mon Aug 07 21:51:54 +0000 2017</td>\n",
       "      <td>1113.0</td>\n",
       "      <td>[]</td>\n",
       "      <td>894677507370254336</td>\n",
       "      <td>894677507370254336</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>585.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>The Guardian view on Pakistan and the Panama P...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'expanded_url': 'https://www.theguardian.com...</td>\n",
       "      <td>{'created_at': 'Thu Nov 27 16:37:52 +0000 2008...</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mon Aug 07 17:51:05 +0000 2017</td>\n",
       "      <td>897.0</td>\n",
       "      <td>[]</td>\n",
       "      <td>894616901887840257</td>\n",
       "      <td>894616901887840257</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'created_at': 'Mon Aug 07 13:14:19 +0000 2017...</td>\n",
       "      <td>8.945473e+17</td>\n",
       "      <td>894547250860482561</td>\n",
       "      <td>326.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/iphone\" r...</td>\n",
       "      <td>Is that why Pakistan's per capita rape ratio i...</td>\n",
       "      <td>True</td>\n",
       "      <td>[{'expanded_url': 'https://twitter.com/i/web/s...</td>\n",
       "      <td>{'created_at': 'Mon Jul 25 11:10:59 +0000 2011...</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Wed Aug 09 07:26:33 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex...</td>\n",
       "      <td>895184511482376192</td>\n",
       "      <td>895184511482376192</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com\" rel=\"nofollow\"&gt;Tw...</td>\n",
       "      <td>#Pakistan urges South Korea to invest in #CPEC...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'expanded_url': 'http://www.cpecinfo.com/cpe...</td>\n",
       "      <td>{'created_at': 'Tue Jan 26 06:23:32 +0000 2016...</td>\n",
       "      <td>[{'id': 4848532433, 'name': 'CPEC Official', '...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Wed Aug 09 07:26:32 +0000 2017</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[]</td>\n",
       "      <td>895184505815912450</td>\n",
       "      <td>895184505815912450</td>\n",
       "      <td>en</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>&lt;a href=\"http://twitter.com/download/android\" ...</td>\n",
       "      <td>A Pakistan army major and three soldiers sacri...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>[{'expanded_url': 'https://paktimes.pk/pakista...</td>\n",
       "      <td>{'created_at': 'Mon Dec 05 06:12:04 +0000 2016...</td>\n",
       "      <td>[]</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       created_at  favorite_count  \\\n",
       "0  Tue Aug 08 06:04:24 +0000 2017         15925.0   \n",
       "1  Mon Aug 07 21:51:54 +0000 2017          1113.0   \n",
       "2  Mon Aug 07 17:51:05 +0000 2017           897.0   \n",
       "3  Wed Aug 09 07:26:33 +0000 2017             NaN   \n",
       "4  Wed Aug 09 07:26:32 +0000 2017             NaN   \n",
       "\n",
       "                                            hashtags                  id  \\\n",
       "0                                                 []  894801449384910848   \n",
       "1                                                 []  894677507370254336   \n",
       "2                                                 []  894616901887840257   \n",
       "3  [{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex...  895184511482376192   \n",
       "4                                                 []  895184505815912450   \n",
       "\n",
       "               id_str lang                                              media  \\\n",
       "0  894801449384910848   en  [{'display_url': 'pic.twitter.com/DOcW7STnt6',...   \n",
       "1  894677507370254336   en                                                NaN   \n",
       "2  894616901887840257   en                                                NaN   \n",
       "3  895184511482376192   en                                                NaN   \n",
       "4  895184505815912450   en                                                NaN   \n",
       "\n",
       "                                       quoted_status  quoted_status_id  \\\n",
       "0                                                NaN               NaN   \n",
       "1                                                NaN               NaN   \n",
       "2  {'created_at': 'Mon Aug 07 13:14:19 +0000 2017...      8.945473e+17   \n",
       "3                                                NaN               NaN   \n",
       "4                                                NaN               NaN   \n",
       "\n",
       "  quoted_status_id_str  retweet_count retweeted_status  \\\n",
       "0                  NaN         5116.0              NaN   \n",
       "1                  NaN          585.0              NaN   \n",
       "2   894547250860482561          326.0              NaN   \n",
       "3                  NaN            NaN              NaN   \n",
       "4                  NaN            NaN              NaN   \n",
       "\n",
       "                                              source  \\\n",
       "0  <a href=\"http://twitter.com/download/android\" ...   \n",
       "1  <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "2  <a href=\"http://twitter.com/download/iphone\" r...   \n",
       "3  <a href=\"http://twitter.com\" rel=\"nofollow\">Tw...   \n",
       "4  <a href=\"http://twitter.com/download/android\" ...   \n",
       "\n",
       "                                                text truncated  \\\n",
       "0  It is so satisfying for me to see the reffores...       NaN   \n",
       "1  The Guardian view on Pakistan and the Panama P...       NaN   \n",
       "2  Is that why Pakistan's per capita rape ratio i...      True   \n",
       "3  #Pakistan urges South Korea to invest in #CPEC...       NaN   \n",
       "4  A Pakistan army major and three soldiers sacri...       NaN   \n",
       "\n",
       "                                                urls  \\\n",
       "0                                                 []   \n",
       "1  [{'expanded_url': 'https://www.theguardian.com...   \n",
       "2  [{'expanded_url': 'https://twitter.com/i/web/s...   \n",
       "3  [{'expanded_url': 'http://www.cpecinfo.com/cpe...   \n",
       "4  [{'expanded_url': 'https://paktimes.pk/pakista...   \n",
       "\n",
       "                                                user  \\\n",
       "0  {'created_at': 'Fri Mar 12 19:28:06 +0000 2010...   \n",
       "1  {'created_at': 'Thu Nov 27 16:37:52 +0000 2008...   \n",
       "2  {'created_at': 'Mon Jul 25 11:10:59 +0000 2011...   \n",
       "3  {'created_at': 'Tue Jan 26 06:23:32 +0000 2016...   \n",
       "4  {'created_at': 'Mon Dec 05 06:12:04 +0000 2016...   \n",
       "\n",
       "                                       user_mentions  \n",
       "0                                                 []  \n",
       "1                                                 []  \n",
       "2                                                 []  \n",
       "3  [{'id': 4848532433, 'name': 'CPEC Official', '...  \n",
       "4                                                 []  "
      ]
     },
     "execution_count": 250,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pk = pd.DataFrame([s.AsDict() for s in pk_search])\n",
    "print(pk.shape)\n",
    "pk.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 253,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "#Pakistan urges South Korea to invest in #CPEC #SEZs \n",
      "https://t.co/FLa5LjS1jg via @CPEC_Official @zlj517\n"
     ]
    }
   ],
   "source": [
    "for t in pk['text'].values:\n",
    "    if \"CPEC\" in t:\n",
    "        print(t)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}