In [27]:
%matplotlib inline
import matplotlib.pyplot as plt

import seaborn as sns
from IPython import display

import pandas as pd
import twitter

A basic twitter grab and do something. 

## make a twitter dev account and get api keys

First, we need access to the twitter api, which one gets over at [twitter's dev site](https://dev.twitter.com/). Sign up as a dev, then [go to the twitter apps site](https://apps.twitter.com/) and click create a new app. This gives you four, yes four thingamjigs u need to access the API. Why four? why can't it just one thing? 

Now this notebook is in github, so step 1 is to put all four of the secret codes in a file which doesn't get uploaded to github. Twitter has a [built in module called configparser](https://docs.python.org/3/library/configparser.html) which parses config files, so I have a config.ini txt file which looks like:

```
[twitter]

c_key = this_is_a_fake_to_be_replaced_by_real_thingamajig
c_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig 

a_token = this_is_a_fake_to_be_replaced_by_real_thingamajig
a_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig
```

### Now to read the keys into our python script/notebook

In [17]:
# api keys are in config.ini to keep them outside of this public notebook
import configparser
config = configparser.ConfigParser()
config.read('config.ini')

print(f'The config file has the following sections: {config.sections()}')

if "twitter" in config:
    twit = config['twitter']

# check to see if we got all the keys needed to access the twitter api
[key for key in twit]

The config file has the following sections: ['twitter']


['c_key', 'c_secret', 'a_token', 'a_secret']

## using python to access the twitter api

Now, there are many [twitter api libraries](https://dev.twitter.com/resources/twitter-libraries) but 
I'm using the [python-twitter module](https://github.com/bear/python-twitter), just cause it seems popular and is the first one listed under python libraries.

In [237]:
## define the necessary keys
cKey = twit["c_key"]
cSecret = twit["c_secret"]
aKey = twit["a_token"]
aSecret = twit["a_secret"]

## create the api object with the twitter-python library
api = twitter.Api(consumer_key=cKey,
                  consumer_secret=cSecret,
                  access_token_key=aKey,
                  access_token_secret=aSecret)
api.VerifyCredentials()

User(ID=7914, ScreenName=KO)

All right! we have a succesful api connection to twitter!

### get tweets from a user

this grabs the tweets alongs with a bunch of metadata for each tweet:

In [238]:
## get the user timeline with screen_name = 'KO'
statuses = api.GetUserTimeline(screen_name = 'KO')
print(f"so we got {len(statuses)} statuses, printing the first:")
status = [s for s in statuses][0]
status

so we got 20 statuses, printing the first:


Status(ID=895177279470489601, ScreenName=KO, Created=Wed Aug 09 06:57:49 +0000 2017, Text='RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…')

So each status is an [object holding all the info about a tweet](http://python-twitter.readthedocs.io/en/latest/twitter.html#twitter.models.Status).

Now, the status object can be resturned as a dictionary, which is handy since we can use that to build a pandas dataframe:

In [239]:
## create a data frame
## first get a list of panda Series
tweets = [t.AsDict() for t in statuses]

## then create the data frame
data = pd.DataFrame(tweets)

data.head()

Unnamed: 0,created_at,favorite_count,favorited,hashtags,id,id_str,in_reply_to_screen_name,in_reply_to_user_id,lang,media,...,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,urls,user,user_mentions
0,Wed Aug 09 06:57:49 +0000 2017,,,[],895177279470489601,895177279470489601,,,en,,...,8.946695e+17,8.946694666756218e+17,15.0,True,{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @Pinboard: This letter to Google from a pot...,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 55525953, 'name': 'Pinboard', 'screen_..."
1,Wed Aug 09 06:57:20 +0000 2017,,,[],895177159039430656,895177159039430656,,,en,,...,,,4.0,True,{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @glcarlstrom: .@TheEconomist scenario of nu...,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 14346260, 'name': 'Gregg Carlstrom', '..."
2,Wed Aug 09 06:55:08 +0000 2017,,,[],895176604950855680,895176604950855680,,,en,,...,,,73.0,True,{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...","RT @jonathanshainin: I'm biased, but this is o...",[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 46073276, 'name': 'Jonathan Shainin', ..."
3,Wed Aug 09 06:53:36 +0000 2017,,,[],895176215631462400,895176215631462400,,,en,,...,,,50.0,True,{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @Pinboard: Unpopular but correct opinion: t...,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 55525953, 'name': 'Pinboard', 'screen_..."
4,Wed Aug 09 06:36:08 +0000 2017,,,[],895171819946356736,895171819946356736,WorkingCopyApp,7.993167e+17,en,,...,,,,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",@WorkingCopyApp can the app display jupyter no...,[{'expanded_url': 'http://nbviewer.jupyter.org...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 799316732280274944, 'name': 'Working C..."


Now, there is a bunch of columns, most of which we probably won't need, so for analysis can probably drop some of them:

In [240]:
data.columns

Index(['created_at', 'favorite_count', 'favorited', 'hashtags', 'id', 'id_str',
       'in_reply_to_screen_name', 'in_reply_to_user_id', 'lang', 'media',
       'quoted_status', 'quoted_status_id', 'quoted_status_id_str',
       'retweet_count', 'retweeted', 'retweeted_status', 'source', 'text',
       'urls', 'user', 'user_mentions'],
      dtype='object')

## grabbing more tweets

See [twitter timeline doc](https://dev.twitter.com/rest/public/timelines) - this says you can grab at most 200 tweets in one request, for a max of 3,200 tweets altogether.

Now we only grabbed the first 20 tweets with the above, so we need a function which keeps making requests for tweets until we hit twitters 3,200 tweet limit:

In [241]:
def get_tweets(user="KO", limit=20):
    # initial batch of tweets
    statuses = api.GetUserTimeline(screen_name = user, count=limit)
    
    ## create a data frame
    ## first get a list of panda Series
    pdSeriesList = [t.AsDict() for t in statuses]

    ## then create the data frame
    tweets = pd.DataFrame(pdSeriesList)

    # now to grab the older ones
    
    while len(statuses) >= 20:
        # get the last tweet id and subtract one to make sure we don't get a duplicate tweet
        last_tweet_id = tweets.tail(1)["id"].values[0] -1
        statuses = api.GetUserTimeline(screen_name = 'KO', max_id=last_tweet_id, count=limit)
        
        pdSeriesList = [t.AsDict() for t in statuses]
        tweets = tweets.append(pdSeriesList, ignore_index=True)
        
    return tweets

tweets = get_tweets()

In [242]:
print(tweets.shape)
tweets.head()

(499, 23)


Unnamed: 0,created_at,favorite_count,favorited,hashtags,id,id_str,in_reply_to_screen_name,in_reply_to_status_id,in_reply_to_user_id,lang,...,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,urls,user,user_mentions
0,Wed Aug 09 06:57:49 +0000 2017,,,[],895177279470489601,895177279470489601,,,,en,...,8.946694666756218e+17,15.0,True,{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @Pinboard: This letter to Google from a pot...,,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 55525953, 'name': 'Pinboard', 'screen_..."
1,Wed Aug 09 06:57:20 +0000 2017,,,[],895177159039430656,895177159039430656,,,,en,...,,4.0,True,{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @glcarlstrom: .@TheEconomist scenario of nu...,,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 14346260, 'name': 'Gregg Carlstrom', '..."
2,Wed Aug 09 06:55:08 +0000 2017,,,[],895176604950855680,895176604950855680,,,,en,...,,73.0,True,{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...","RT @jonathanshainin: I'm biased, but this is o...",,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 46073276, 'name': 'Jonathan Shainin', ..."
3,Wed Aug 09 06:53:36 +0000 2017,,,[],895176215631462400,895176215631462400,,,,en,...,,50.0,True,{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...,"<a href=""http://twitter.com/#!/download/ipad"" ...",RT @Pinboard: Unpopular but correct opinion: t...,,[],{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 55525953, 'name': 'Pinboard', 'screen_..."
4,Wed Aug 09 06:36:08 +0000 2017,,,[],895171819946356736,895171819946356736,WorkingCopyApp,,7.993167e+17,en,...,,,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",@WorkingCopyApp can the app display jupyter no...,,[{'expanded_url': 'http://nbviewer.jupyter.org...,{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...,"[{'id': 799316732280274944, 'name': 'Working C..."


## we got tweets in a dataframe! 

Now we can do some analysis. Say we put all the tweets in a list so we can do something with them:

In [243]:
t = [u for u in tweets['text'].values]
t[:3]

['RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…',
 'RT @glcarlstrom: .@TheEconomist scenario of nuclear war seems far more plausible now than when it was published (a whole week ago!). https:…',
 "RT @jonathanshainin: I'm biased, but this is one of the best things I've ever read about the psychology of American exceptionalism: https:/…"]

499

## Searching 


In [246]:
pk_search = api.GetSearch("pakistan")

In [250]:
pk = pd.DataFrame([s.AsDict() for s in pk_search])
print(pk.shape)
pk.head()

(15, 18)


Unnamed: 0,created_at,favorite_count,hashtags,id,id_str,lang,media,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted_status,source,text,truncated,urls,user,user_mentions
0,Tue Aug 08 06:04:24 +0000 2017,15925.0,[],894801449384910848,894801449384910848,en,"[{'display_url': 'pic.twitter.com/DOcW7STnt6',...",,,,5116.0,,"<a href=""http://twitter.com/download/android"" ...",It is so satisfying for me to see the reffores...,,[],{'created_at': 'Fri Mar 12 19:28:06 +0000 2010...,[]
1,Mon Aug 07 21:51:54 +0000 2017,1113.0,[],894677507370254336,894677507370254336,en,,,,,585.0,,"<a href=""http://twitter.com/download/iphone"" r...",The Guardian view on Pakistan and the Panama P...,,[{'expanded_url': 'https://www.theguardian.com...,{'created_at': 'Thu Nov 27 16:37:52 +0000 2008...,[]
2,Mon Aug 07 17:51:05 +0000 2017,897.0,[],894616901887840257,894616901887840257,en,,{'created_at': 'Mon Aug 07 13:14:19 +0000 2017...,8.945473e+17,8.945472508604826e+17,326.0,,"<a href=""http://twitter.com/download/iphone"" r...",Is that why Pakistan's per capita rape ratio i...,True,[{'expanded_url': 'https://twitter.com/i/web/s...,{'created_at': 'Mon Jul 25 11:10:59 +0000 2011...,[]
3,Wed Aug 09 07:26:33 +0000 2017,,"[{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex...",895184511482376192,895184511482376192,en,,,,,,,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",#Pakistan urges South Korea to invest in #CPEC...,,[{'expanded_url': 'http://www.cpecinfo.com/cpe...,{'created_at': 'Tue Jan 26 06:23:32 +0000 2016...,"[{'id': 4848532433, 'name': 'CPEC Official', '..."
4,Wed Aug 09 07:26:32 +0000 2017,,[],895184505815912450,895184505815912450,en,,,,,,,"<a href=""http://twitter.com/download/android"" ...",A Pakistan army major and three soldiers sacri...,,[{'expanded_url': 'https://paktimes.pk/pakista...,{'created_at': 'Mon Dec 05 06:12:04 +0000 2016...,[]


In [253]:
for t in pk['text'].values:
    if "CPEC" in t:
        print(t)

#Pakistan urges South Korea to invest in #CPEC #SEZs 
https://t.co/FLa5LjS1jg via @CPEC_Official @zlj517
