{ "cells": [ { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "\n", "import seaborn as sns\n", "from IPython import display\n", "\n", "import pandas as pd\n", "import twitter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A basic twitter grab and do something. \n", "\n", "## make a twitter dev account and get api keys\n", "\n", "First, we need access to the twitter api, which one gets over at [twitter's dev site](https://dev.twitter.com/). Sign up as a dev, then [go to the twitter apps site](https://apps.twitter.com/) and click create a new app. This gives you four, yes four thingamjigs u need to access the API. Why four? why can't it just one thing? \n", "\n", "Now this notebook is in github, so step 1 is to put all four of the secret codes in a file which doesn't get uploaded to github. Twitter has a [built in module called configparser](https://docs.python.org/3/library/configparser.html) which parses config files, so I have a config.ini txt file which looks like:\n", "\n", "```\n", "[twitter]\n", "\n", "c_key = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "c_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig \n", "\n", "a_token = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "a_secret = this_is_a_fake_to_be_replaced_by_real_thingamajig\n", "```\n", "\n", "### Now to read the keys into our python script/notebook" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The config file has the following sections: ['twitter']\n" ] }, { "data": { "text/plain": [ "['c_key', 'c_secret', 'a_token', 'a_secret']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# api keys are in config.ini to keep them outside of this public notebook\n", "import configparser\n", "config = configparser.ConfigParser()\n", "config.read('config.ini')\n", "\n", "print(f'The config file has the following sections: {config.sections()}')\n", "\n", "if \"twitter\" in config:\n", " twit = config['twitter']\n", "\n", "# check to see if we got all the keys needed to access the twitter api\n", "[key for key in twit]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## using python to access the twitter api\n", "\n", "Now, there are many [twitter api libraries](https://dev.twitter.com/resources/twitter-libraries) but \n", "I'm using the [python-twitter module](https://github.com/bear/python-twitter), just cause it seems popular and is the first one listed under python libraries." ] }, { "cell_type": "code", "execution_count": 237, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "User(ID=7914, ScreenName=KO)" ] }, "execution_count": 237, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## define the necessary keys\n", "cKey = twit[\"c_key\"]\n", "cSecret = twit[\"c_secret\"]\n", "aKey = twit[\"a_token\"]\n", "aSecret = twit[\"a_secret\"]\n", "\n", "## create the api object with the twitter-python library\n", "api = twitter.Api(consumer_key=cKey,\n", " consumer_secret=cSecret,\n", " access_token_key=aKey,\n", " access_token_secret=aSecret)\n", "api.VerifyCredentials()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All right! we have a succesful api connection to twitter!\n", "\n", "### get tweets from a user\n", "\n", "this grabs the tweets alongs with a bunch of metadata for each tweet:" ] }, { "cell_type": "code", "execution_count": 238, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "so we got 20 statuses, printing the first:\n" ] }, { "data": { "text/plain": [ "Status(ID=895177279470489601, ScreenName=KO, Created=Wed Aug 09 06:57:49 +0000 2017, Text='RT @Pinboard: This letter to Google from a potential recruit is a stand on principle, but I’m stuck on the first paragraph. Damn. https://t…')" ] }, "execution_count": 238, "metadata": {}, "output_type": "execute_result" } ], "source": [ "## get the user timeline with screen_name = 'KO'\n", "statuses = api.GetUserTimeline(screen_name = 'KO')\n", "print(f\"so we got {len(statuses)} statuses, printing the first:\")\n", "status = [s for s in statuses][0]\n", "status" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So each status is an [object holding all the info about a tweet](http://python-twitter.readthedocs.io/en/latest/twitter.html#twitter.models.Status).\n", "\n", "Now, the status object can be resturned as a dictionary, which is handy since we can use that to build a pandas dataframe:" ] }, { "cell_type": "code", "execution_count": 239, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | created_at | \n", "favorite_count | \n", "favorited | \n", "hashtags | \n", "id | \n", "id_str | \n", "in_reply_to_screen_name | \n", "in_reply_to_user_id | \n", "lang | \n", "media | \n", "... | \n", "quoted_status_id | \n", "quoted_status_id_str | \n", "retweet_count | \n", "retweeted | \n", "retweeted_status | \n", "source | \n", "text | \n", "urls | \n", "user | \n", "user_mentions | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Wed Aug 09 06:57:49 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895177279470489601 | \n", "895177279470489601 | \n", "NaN | \n", "NaN | \n", "en | \n", "NaN | \n", "... | \n", "8.946695e+17 | \n", "894669466675621889 | \n", "15.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 06:15:01 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @Pinboard: This letter to Google from a pot... | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 55525953, 'name': 'Pinboard', 'screen_... | \n", "
1 | \n", "Wed Aug 09 06:57:20 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895177159039430656 | \n", "895177159039430656 | \n", "NaN | \n", "NaN | \n", "en | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "4.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 06:28:50 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @glcarlstrom: .@TheEconomist scenario of nu... | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 14346260, 'name': 'Gregg Carlstrom', '... | \n", "
2 | \n", "Wed Aug 09 06:55:08 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895176604950855680 | \n", "895176604950855680 | \n", "NaN | \n", "NaN | \n", "en | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "73.0 | \n", "True | \n", "{'created_at': 'Tue Aug 08 22:22:25 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @jonathanshainin: I'm biased, but this is o... | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 46073276, 'name': 'Jonathan Shainin', ... | \n", "
3 | \n", "Wed Aug 09 06:53:36 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895176215631462400 | \n", "895176215631462400 | \n", "NaN | \n", "NaN | \n", "en | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "50.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 03:56:50 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @Pinboard: Unpopular but correct opinion: t... | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 55525953, 'name': 'Pinboard', 'screen_... | \n", "
4 | \n", "Wed Aug 09 06:36:08 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895171819946356736 | \n", "895171819946356736 | \n", "WorkingCopyApp | \n", "7.993167e+17 | \n", "en | \n", "NaN | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "<a href=\"http://twitter.com\" rel=\"nofollow\">Tw... | \n", "@WorkingCopyApp can the app display jupyter no... | \n", "[{'expanded_url': 'http://nbviewer.jupyter.org... | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 799316732280274944, 'name': 'Working C... | \n", "
5 rows × 21 columns
\n", "\n", " | created_at | \n", "favorite_count | \n", "favorited | \n", "hashtags | \n", "id | \n", "id_str | \n", "in_reply_to_screen_name | \n", "in_reply_to_status_id | \n", "in_reply_to_user_id | \n", "lang | \n", "... | \n", "quoted_status_id_str | \n", "retweet_count | \n", "retweeted | \n", "retweeted_status | \n", "source | \n", "text | \n", "truncated | \n", "urls | \n", "user | \n", "user_mentions | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Wed Aug 09 06:57:49 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895177279470489601 | \n", "895177279470489601 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "en | \n", "... | \n", "894669466675621889 | \n", "15.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 06:15:01 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @Pinboard: This letter to Google from a pot... | \n", "NaN | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 55525953, 'name': 'Pinboard', 'screen_... | \n", "
1 | \n", "Wed Aug 09 06:57:20 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895177159039430656 | \n", "895177159039430656 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "en | \n", "... | \n", "NaN | \n", "4.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 06:28:50 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @glcarlstrom: .@TheEconomist scenario of nu... | \n", "NaN | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 14346260, 'name': 'Gregg Carlstrom', '... | \n", "
2 | \n", "Wed Aug 09 06:55:08 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895176604950855680 | \n", "895176604950855680 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "en | \n", "... | \n", "NaN | \n", "73.0 | \n", "True | \n", "{'created_at': 'Tue Aug 08 22:22:25 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @jonathanshainin: I'm biased, but this is o... | \n", "NaN | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 46073276, 'name': 'Jonathan Shainin', ... | \n", "
3 | \n", "Wed Aug 09 06:53:36 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895176215631462400 | \n", "895176215631462400 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "en | \n", "... | \n", "NaN | \n", "50.0 | \n", "True | \n", "{'created_at': 'Wed Aug 09 03:56:50 +0000 2017... | \n", "<a href=\"http://twitter.com/#!/download/ipad\" ... | \n", "RT @Pinboard: Unpopular but correct opinion: t... | \n", "NaN | \n", "[] | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 55525953, 'name': 'Pinboard', 'screen_... | \n", "
4 | \n", "Wed Aug 09 06:36:08 +0000 2017 | \n", "NaN | \n", "NaN | \n", "[] | \n", "895171819946356736 | \n", "895171819946356736 | \n", "WorkingCopyApp | \n", "NaN | \n", "7.993167e+17 | \n", "en | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "<a href=\"http://twitter.com\" rel=\"nofollow\">Tw... | \n", "@WorkingCopyApp can the app display jupyter no... | \n", "NaN | \n", "[{'expanded_url': 'http://nbviewer.jupyter.org... | \n", "{'created_at': 'Tue Oct 10 08:35:25 +0000 2006... | \n", "[{'id': 799316732280274944, 'name': 'Working C... | \n", "
5 rows × 23 columns
\n", "\n", " | created_at | \n", "favorite_count | \n", "hashtags | \n", "id | \n", "id_str | \n", "lang | \n", "media | \n", "quoted_status | \n", "quoted_status_id | \n", "quoted_status_id_str | \n", "retweet_count | \n", "retweeted_status | \n", "source | \n", "text | \n", "truncated | \n", "urls | \n", "user | \n", "user_mentions | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Tue Aug 08 06:04:24 +0000 2017 | \n", "15925.0 | \n", "[] | \n", "894801449384910848 | \n", "894801449384910848 | \n", "en | \n", "[{'display_url': 'pic.twitter.com/DOcW7STnt6',... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "5116.0 | \n", "NaN | \n", "<a href=\"http://twitter.com/download/android\" ... | \n", "It is so satisfying for me to see the reffores... | \n", "NaN | \n", "[] | \n", "{'created_at': 'Fri Mar 12 19:28:06 +0000 2010... | \n", "[] | \n", "
1 | \n", "Mon Aug 07 21:51:54 +0000 2017 | \n", "1113.0 | \n", "[] | \n", "894677507370254336 | \n", "894677507370254336 | \n", "en | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "585.0 | \n", "NaN | \n", "<a href=\"http://twitter.com/download/iphone\" r... | \n", "The Guardian view on Pakistan and the Panama P... | \n", "NaN | \n", "[{'expanded_url': 'https://www.theguardian.com... | \n", "{'created_at': 'Thu Nov 27 16:37:52 +0000 2008... | \n", "[] | \n", "
2 | \n", "Mon Aug 07 17:51:05 +0000 2017 | \n", "897.0 | \n", "[] | \n", "894616901887840257 | \n", "894616901887840257 | \n", "en | \n", "NaN | \n", "{'created_at': 'Mon Aug 07 13:14:19 +0000 2017... | \n", "8.945473e+17 | \n", "894547250860482561 | \n", "326.0 | \n", "NaN | \n", "<a href=\"http://twitter.com/download/iphone\" r... | \n", "Is that why Pakistan's per capita rape ratio i... | \n", "True | \n", "[{'expanded_url': 'https://twitter.com/i/web/s... | \n", "{'created_at': 'Mon Jul 25 11:10:59 +0000 2011... | \n", "[] | \n", "
3 | \n", "Wed Aug 09 07:26:33 +0000 2017 | \n", "NaN | \n", "[{'text': 'Pakistan'}, {'text': 'CPEC'}, {'tex... | \n", "895184511482376192 | \n", "895184511482376192 | \n", "en | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "<a href=\"http://twitter.com\" rel=\"nofollow\">Tw... | \n", "#Pakistan urges South Korea to invest in #CPEC... | \n", "NaN | \n", "[{'expanded_url': 'http://www.cpecinfo.com/cpe... | \n", "{'created_at': 'Tue Jan 26 06:23:32 +0000 2016... | \n", "[{'id': 4848532433, 'name': 'CPEC Official', '... | \n", "
4 | \n", "Wed Aug 09 07:26:32 +0000 2017 | \n", "NaN | \n", "[] | \n", "895184505815912450 | \n", "895184505815912450 | \n", "en | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "<a href=\"http://twitter.com/download/android\" ... | \n", "A Pakistan army major and three soldiers sacri... | \n", "NaN | \n", "[{'expanded_url': 'https://paktimes.pk/pakista... | \n", "{'created_at': 'Mon Dec 05 06:12:04 +0000 2016... | \n", "[] | \n", "