{ "cells": [ { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "import seaborn as sns\n", "from IPython import display\n", "\n", "import pandas as pd\n", "import twitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic twitter grab and do something. \n", "\n", "## make a twitter dev account and get api keys\n", "\n", "First, we need access to the twitter api, which one gets over at [twitter's dev site](https://dev.twitter.com/). Sign up as a dev, then [go to the twitter apps site](https://apps.twitter.com/) and click create a new app. This gives you four, yes four thingamjigs u need to access the API. Why four? why can't it just one thing? \n", "\n", "Now this notebook is in github, so step 1 is to put all four of the secret codes in a file which doesn't get uploaded to github. Twitter has a [built in module called configparser](https://docs.python.org/3/library/configparser.html) which parses config files, so I have a config.ini txt file which looks like:\n", "\n", "```\n", "[twitter]\n", "\n", "c_key = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "c_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig \n", "\n", "a_token = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "a_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "```\n", "\n", "### Now to read the keys into our python script/notebook" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The config file has the following sections: ['twitter']\n" ] }, { "data": { "text/plain": [ "['c_key', 'c_secret', 'a_token', 'a_secret']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# api keys are in config.ini to keep them outside of this public notebook\n", "import configparser\n", "config = configparser.ConfigParser()\n", "config.read('config.ini')\n", "\n", "print(f'The config file has the following sections: {config.sections()}')\n", "\n", "if \"twitter\" in config:\n", " twit = config['twitter']\n", "\n", "# check to see if we got all the keys needed to access the twitter api\n", "[key for key in twit]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## using python to access the twitter api\n", "\n", "Now, there are many [twitter api libraries](https://dev.twitter.com/resources/twitter-libraries) but \n", "I'm using the [python-twitter module](https://github.com/bear/python-twitter), just cause it seems popular and is the first one listed under python libraries." ] }, { "cell_type": "code", "execution_count": 237, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "User(ID=7914, ScreenName=KO)" ] }, "execution_count": 237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## define the necessary keys\n", "cKey = twit[\"c_key\"]\n", "cSecret = twit[\"c_secret\"]\n", "aKey = twit[\"a_token\"]\n", "aSecret = twit[\"a_secret\"]\n", "\n", "## create the api object with the twitter-python library\n", "api = twitter.Api(consumer_key=cKey,\n", " consumer_secret=cSecret,\n", " access_token_key=aKey,\n", " access_token_secret=aSecret)\n", "api.VerifyCredentials()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All right! we have a succesful api connection to twitter!\n", "\n", "### get tweets from a user\n", "\n", "this grabs the tweets alongs with a bunch of metadata for each tweet:" ] }, { "cell_type": "code", "execution_count": 238, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "so we got 20 statuses, printing the first:\n" ] }, { "data": { "text/plain": [ "Status(ID=895177279470489601, ScreenName=KO, Created=Wed Aug 09 06:57:49 +0000 2017, Text='RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…')" ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## get the user timeline with screen_name = 'KO'\n", "statuses = api.GetUserTimeline(screen_name = 'KO')\n", "print(f\"so we got {len(statuses)} statuses, printing the first:\")\n", "status = [s for s in statuses][0]\n", "status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So each status is an [object holding all the info about a tweet](http://python-twitter.readthedocs.io/en/latest/twitter.html#twitter.models.Status).\n", "\n", "Now, the status object can be resturned as a dictionary, which is handy since we can use that to build a pandas dataframe:" ] }, { "cell_type": "code", "execution_count": 239, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
created_atfavorite_countfavoritedhashtagsidid_strin_reply_to_screen_namein_reply_to_user_idlangmedia...quoted_status_idquoted_status_id_strretweet_countretweetedretweeted_statussourcetexturlsuseruser_mentions
0Wed Aug 09 06:57:49 +0000 2017NaNNaN[]895177279470489601895177279470489601NaNNaNenNaN...8.946695e+1789466946667562188915.0True{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @Pinboard: This letter to Google from a pot...[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 55525953, 'name': 'Pinboard', 'screen_...
1Wed Aug 09 06:57:20 +0000 2017NaNNaN[]895177159039430656895177159039430656NaNNaNenNaN...NaNNaN4.0True{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @glcarlstrom: .@TheEconomist scenario of nu...[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 14346260, 'name': 'Gregg Carlstrom', '...
2Wed Aug 09 06:55:08 +0000 2017NaNNaN[]895176604950855680895176604950855680NaNNaNenNaN...NaNNaN73.0True{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @jonathanshainin: I'm biased, but this is o...[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 46073276, 'name': 'Jonathan Shainin', ...
3Wed Aug 09 06:53:36 +0000 2017NaNNaN[]895176215631462400895176215631462400NaNNaNenNaN...NaNNaN50.0True{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @Pinboard: Unpopular but correct opinion: t...[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 55525953, 'name': 'Pinboard', 'screen_...
4Wed Aug 09 06:36:08 +0000 2017NaNNaN[]895171819946356736895171819946356736WorkingCopyApp7.993167e+17enNaN...NaNNaNNaNNaNNaN<a href=\"http://twitter.com\" rel=\"nofollow\">Tw...@WorkingCopyApp can the app display jupyter no...[{'expanded_url': 'http://nbviewer.jupyter.org...{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 799316732280274944, 'name': 'Working C...
\n", "

5 rows × 21 columns

\n", "
" ], "text/plain": [ " created_at favorite_count favorited hashtags \\\n", "0 Wed Aug 09 06:57:49 +0000 2017 NaN NaN [] \n", "1 Wed Aug 09 06:57:20 +0000 2017 NaN NaN [] \n", "2 Wed Aug 09 06:55:08 +0000 2017 NaN NaN [] \n", "3 Wed Aug 09 06:53:36 +0000 2017 NaN NaN [] \n", "4 Wed Aug 09 06:36:08 +0000 2017 NaN NaN [] \n", "\n", " id id_str in_reply_to_screen_name \\\n", "0 895177279470489601 895177279470489601 NaN \n", "1 895177159039430656 895177159039430656 NaN \n", "2 895176604950855680 895176604950855680 NaN \n", "3 895176215631462400 895176215631462400 NaN \n", "4 895171819946356736 895171819946356736 WorkingCopyApp \n", "\n", " in_reply_to_user_id lang media \\\n", "0 NaN en NaN \n", "1 NaN en NaN \n", "2 NaN en NaN \n", "3 NaN en NaN \n", "4 7.993167e+17 en NaN \n", "\n", " ... quoted_status_id \\\n", "0 ... 8.946695e+17 \n", "1 ... NaN \n", "2 ... NaN \n", "3 ... NaN \n", "4 ... NaN \n", "\n", " quoted_status_id_str retweet_count retweeted \\\n", "0 894669466675621889 15.0 True \n", "1 NaN 4.0 True \n", "2 NaN 73.0 True \n", "3 NaN 50.0 True \n", "4 NaN NaN NaN \n", "\n", " retweeted_status \\\n", "0 {'created_at': 'Wed Aug 09 06:15:01 +0000 2017... \n", "1 {'created_at': 'Wed Aug 09 06:28:50 +0000 2017... \n", "2 {'created_at': 'Tue Aug 08 22:22:25 +0000 2017... \n", "3 {'created_at': 'Wed Aug 09 03:56:50 +0000 2017... \n", "4 NaN \n", "\n", " source \\\n", "0 Tw... \n", "\n", " text \\\n", "0 RT @Pinboard: This letter to Google from a pot... \n", "1 RT @glcarlstrom: .@TheEconomist scenario of nu... \n", "2 RT @jonathanshainin: I'm biased, but this is o... \n", "3 RT @Pinboard: Unpopular but correct opinion: t... \n", "4 @WorkingCopyApp can the app display jupyter no... \n", "\n", " urls \\\n", "0 [] \n", "1 [] \n", "2 [] \n", "3 [] \n", "4 [{'expanded_url': 'http://nbviewer.jupyter.org... \n", "\n", " user \\\n", "0 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "1 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "2 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "3 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "4 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "\n", " user_mentions \n", "0 [{'id': 55525953, 'name': 'Pinboard', 'screen_... \n", "1 [{'id': 14346260, 'name': 'Gregg Carlstrom', '... \n", "2 [{'id': 46073276, 'name': 'Jonathan Shainin', ... \n", "3 [{'id': 55525953, 'name': 'Pinboard', 'screen_... \n", "4 [{'id': 799316732280274944, 'name': 'Working C... \n", "\n", "[5 rows x 21 columns]" ] }, "execution_count": 239, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## create a data frame\n", "## first get a list of panda Series\n", "tweets = [t.AsDict() for t in statuses]\n", "\n", "## then create the data frame\n", "data = pd.DataFrame(tweets)\n", "\n", "data.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, there is a bunch of columns, most of which we probably won't need, so for analysis can probably drop some of them:" ] }, { "cell_type": "code", "execution_count": 240, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['created_at', 'favorite_count', 'favorited', 'hashtags', 'id', 'id_str',\n", " 'in_reply_to_screen_name', 'in_reply_to_user_id', 'lang', 'media',\n", " 'quoted_status', 'quoted_status_id', 'quoted_status_id_str',\n", " 'retweet_count', 'retweeted', 'retweeted_status', 'source', 'text',\n", " 'urls', 'user', 'user_mentions'],\n", " dtype='object')" ] }, "execution_count": 240, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## grabbing more tweets\n", "\n", "See [twitter timeline doc](https://dev.twitter.com/rest/public/timelines) - this says you can grab at most 200 tweets in one request, for a max of 3,200 tweets altogether.\n", "\n", "Now we only grabbed the first 20 tweets with the above, so we need a function which keeps making requests for tweets until we hit twitters 3,200 tweet limit:" ] }, { "cell_type": "code", "execution_count": 241, "metadata": {}, "outputs": [], "source": [ "def get_tweets(user=\"KO\", limit=20):\n", " # initial batch of tweets\n", " statuses = api.GetUserTimeline(screen_name = user, count=limit)\n", " \n", " ## create a data frame\n", " ## first get a list of panda Series\n", " pdSeriesList = [t.AsDict() for t in statuses]\n", "\n", " ## then create the data frame\n", " tweets = pd.DataFrame(pdSeriesList)\n", "\n", " # now to grab the older ones\n", " \n", " while len(statuses) >= 20:\n", " # get the last tweet id and subtract one to make sure we don't get a duplicate tweet\n", " last_tweet_id = tweets.tail(1)[\"id\"].values[0] -1\n", " statuses = api.GetUserTimeline(screen_name = 'KO', max_id=last_tweet_id, count=limit)\n", " \n", " pdSeriesList = [t.AsDict() for t in statuses]\n", " tweets = tweets.append(pdSeriesList, ignore_index=True)\n", " \n", " return tweets\n", "\n", "tweets = get_tweets()" ] }, { "cell_type": "code", "execution_count": 242, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(499, 23)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
created_atfavorite_countfavoritedhashtagsidid_strin_reply_to_screen_namein_reply_to_status_idin_reply_to_user_idlang...quoted_status_id_strretweet_countretweetedretweeted_statussourcetexttruncatedurlsuseruser_mentions
0Wed Aug 09 06:57:49 +0000 2017NaNNaN[]895177279470489601895177279470489601NaNNaNNaNen...89466946667562188915.0True{'created_at': 'Wed Aug 09 06:15:01 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @Pinboard: This letter to Google from a pot...NaN[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 55525953, 'name': 'Pinboard', 'screen_...
1Wed Aug 09 06:57:20 +0000 2017NaNNaN[]895177159039430656895177159039430656NaNNaNNaNen...NaN4.0True{'created_at': 'Wed Aug 09 06:28:50 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @glcarlstrom: .@TheEconomist scenario of nu...NaN[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 14346260, 'name': 'Gregg Carlstrom', '...
2Wed Aug 09 06:55:08 +0000 2017NaNNaN[]895176604950855680895176604950855680NaNNaNNaNen...NaN73.0True{'created_at': 'Tue Aug 08 22:22:25 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @jonathanshainin: I'm biased, but this is o...NaN[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 46073276, 'name': 'Jonathan Shainin', ...
3Wed Aug 09 06:53:36 +0000 2017NaNNaN[]895176215631462400895176215631462400NaNNaNNaNen...NaN50.0True{'created_at': 'Wed Aug 09 03:56:50 +0000 2017...<a href=\"http://twitter.com/#!/download/ipad\" ...RT @Pinboard: Unpopular but correct opinion: t...NaN[]{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 55525953, 'name': 'Pinboard', 'screen_...
4Wed Aug 09 06:36:08 +0000 2017NaNNaN[]895171819946356736895171819946356736WorkingCopyAppNaN7.993167e+17en...NaNNaNNaNNaN<a href=\"http://twitter.com\" rel=\"nofollow\">Tw...@WorkingCopyApp can the app display jupyter no...NaN[{'expanded_url': 'http://nbviewer.jupyter.org...{'created_at': 'Tue Oct 10 08:35:25 +0000 2006...[{'id': 799316732280274944, 'name': 'Working C...
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " created_at favorite_count favorited hashtags \\\n", "0 Wed Aug 09 06:57:49 +0000 2017 NaN NaN [] \n", "1 Wed Aug 09 06:57:20 +0000 2017 NaN NaN [] \n", "2 Wed Aug 09 06:55:08 +0000 2017 NaN NaN [] \n", "3 Wed Aug 09 06:53:36 +0000 2017 NaN NaN [] \n", "4 Wed Aug 09 06:36:08 +0000 2017 NaN NaN [] \n", "\n", " id id_str in_reply_to_screen_name \\\n", "0 895177279470489601 895177279470489601 NaN \n", "1 895177159039430656 895177159039430656 NaN \n", "2 895176604950855680 895176604950855680 NaN \n", "3 895176215631462400 895176215631462400 NaN \n", "4 895171819946356736 895171819946356736 WorkingCopyApp \n", "\n", " in_reply_to_status_id in_reply_to_user_id lang \\\n", "0 NaN NaN en \n", "1 NaN NaN en \n", "2 NaN NaN en \n", "3 NaN NaN en \n", "4 NaN 7.993167e+17 en \n", "\n", " ... quoted_status_id_str \\\n", "0 ... 894669466675621889 \n", "1 ... NaN \n", "2 ... NaN \n", "3 ... NaN \n", "4 ... NaN \n", "\n", " retweet_count retweeted retweeted_status \\\n", "0 15.0 True {'created_at': 'Wed Aug 09 06:15:01 +0000 2017... \n", "1 4.0 True {'created_at': 'Wed Aug 09 06:28:50 +0000 2017... \n", "2 73.0 True {'created_at': 'Tue Aug 08 22:22:25 +0000 2017... \n", "3 50.0 True {'created_at': 'Wed Aug 09 03:56:50 +0000 2017... \n", "4 NaN NaN NaN \n", "\n", " source \\\n", "0
Tw... \n", "\n", " text truncated \\\n", "0 RT @Pinboard: This letter to Google from a pot... NaN \n", "1 RT @glcarlstrom: .@TheEconomist scenario of nu... NaN \n", "2 RT @jonathanshainin: I'm biased, but this is o... NaN \n", "3 RT @Pinboard: Unpopular but correct opinion: t... NaN \n", "4 @WorkingCopyApp can the app display jupyter no... NaN \n", "\n", " urls \\\n", "0 [] \n", "1 [] \n", "2 [] \n", "3 [] \n", "4 [{'expanded_url': 'http://nbviewer.jupyter.org... \n", "\n", " user \\\n", "0 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "1 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "2 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "3 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "4 {'created_at': 'Tue Oct 10 08:35:25 +0000 2006... \n", "\n", " user_mentions \n", "0 [{'id': 55525953, 'name': 'Pinboard', 'screen_... \n", "1 [{'id': 14346260, 'name': 'Gregg Carlstrom', '... \n", "2 [{'id': 46073276, 'name': 'Jonathan Shainin', ... \n", "3 [{'id': 55525953, 'name': 'Pinboard', 'screen_... \n", "4 [{'id': 799316732280274944, 'name': 'Working C... \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 242, "metadata": {}, "output_type": "execute_result" } ], "source": [ "print(tweets.shape)\n", "tweets.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## we got tweets in a dataframe! \n", "\n", "Now we can do some analysis. Say we put all the tweets in a list so we can do something with them:" ] }, { "cell_type": "code", "execution_count": 243, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…',\n", " 'RT @glcarlstrom: .@TheEconomist scenario of nuclear war seems far more plausible now than when it was published (a whole week ago!). https:…',\n", " \"RT @jonathanshainin: I'm biased, but this is one of the best things I've ever read about the psychology of American exceptionalism: https:/…\"]" ] }, "execution_count": 243, "metadata": {}, "output_type": "execute_result" } ], "source": [ "t = [u for u in tweets['text'].values]\n", "t[:3]" ] }, { "cell_type": "code", "execution_count": 245, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "499" ] }, "execution_count": 245, "metadata": {}, "output_type": "execute_result" } ], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Searching \n" ] }, { "cell_type": "code", "execution_count": 246, "metadata": { "collapsed": true }, "outputs": [], "source": [ "pk_search = api.GetSearch(\"pakistan\")" ] }, { "cell_type": "code", "execution_count": 250, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(15, 18)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
created_atfavorite_counthashtagsidid_strlangmediaquoted_statusquoted_status_idquoted_status_id_strretweet_countretweeted_statussourcetexttruncatedurlsuseruser_mentions
0Tue Aug 08 06:04:24 +0000 201715925.0[]894801449384910848894801449384910848en[{'display_url': 'pic.twitter.com/DOcW7STnt6',...NaNNaNNaN5116.0NaN<a href=\"http://twitter.com/download/android\" ...It is so satisfying for me to see the reffores...NaN[]{'created_at': 'Fri Mar 12 19:28:06 +0000 2010...[]
1Mon Aug 07 21:51:54 +0000 20171113.0[]894677507370254336894677507370254336enNaNNaNNaNNaN585.0NaN<a href=\"http://twitter.com/download/iphone\" r...The Guardian view on Pakistan and the Panama P...NaN[{'expanded_url': 'https://www.theguardian.com...{'created_at': 'Thu Nov 27 16:37:52 +0000 2008...[]
2Mon Aug 07 17:51:05 +0000 2017897.0[]894616901887840257894616901887840257enNaN{'created_at': 'Mon Aug 07 13:14:19 +0000 2017...8.945473e+17894547250860482561326.0NaN<a href=\"http://twitter.com/download/iphone\" r...Is that why Pakistan's per capita rape ratio i...True[{'expanded_url': 'https://twitter.com/i/web/s...{'created_at': 'Mon Jul 25 11:10:59 +0000 2011...[]
3Wed Aug 09 07:26:33 +0000 2017NaN[{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex...895184511482376192895184511482376192enNaNNaNNaNNaNNaNNaN<a href=\"http://twitter.com\" rel=\"nofollow\">Tw...#Pakistan urges South Korea to invest in #CPEC...NaN[{'expanded_url': 'http://www.cpecinfo.com/cpe...{'created_at': 'Tue Jan 26 06:23:32 +0000 2016...[{'id': 4848532433, 'name': 'CPEC Official', '...
4Wed Aug 09 07:26:32 +0000 2017NaN[]895184505815912450895184505815912450enNaNNaNNaNNaNNaNNaN<a href=\"http://twitter.com/download/android\" ...A Pakistan army major and three soldiers sacri...NaN[{'expanded_url': 'https://paktimes.pk/pakista...{'created_at': 'Mon Dec 05 06:12:04 +0000 2016...[]
\n", "
" ], "text/plain": [ " created_at favorite_count \\\n", "0 Tue Aug 08 06:04:24 +0000 2017 15925.0 \n", "1 Mon Aug 07 21:51:54 +0000 2017 1113.0 \n", "2 Mon Aug 07 17:51:05 +0000 2017 897.0 \n", "3 Wed Aug 09 07:26:33 +0000 2017 NaN \n", "4 Wed Aug 09 07:26:32 +0000 2017 NaN \n", "\n", " hashtags id \\\n", "0 [] 894801449384910848 \n", "1 [] 894677507370254336 \n", "2 [] 894616901887840257 \n", "3 [{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex... 895184511482376192 \n", "4 [] 895184505815912450 \n", "\n", " id_str lang media \\\n", "0 894801449384910848 en [{'display_url': 'pic.twitter.com/DOcW7STnt6',... \n", "1 894677507370254336 en NaN \n", "2 894616901887840257 en NaN \n", "3 895184511482376192 en NaN \n", "4 895184505815912450 en NaN \n", "\n", " quoted_status quoted_status_id \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 {'created_at': 'Mon Aug 07 13:14:19 +0000 2017... 8.945473e+17 \n", "3 NaN NaN \n", "4 NaN NaN \n", "\n", " quoted_status_id_str retweet_count retweeted_status \\\n", "0 NaN 5116.0 NaN \n", "1 NaN 585.0 NaN \n", "2 894547250860482561 326.0 NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " source \\\n", "0
Tw... \n", "4