{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "This is this first in a series of notebooks designed to show you how to download social media data. I assume you have already acquired your API keys from the Twitter developer site, as shown in this tutorial:\n", "\n", "- Setting up Access to the Twitter API\n", "\n", "If you are new to Python, you may wish to go through a series of tutorials I have created in order. If you don't wish to do all the tutorials you should at least ensure you have your Twitter API key and you've set up Python on your computer as shown in the tutorial Setting up Your Computer to Use My Python Code.\n", "\n", "In this notebook I will show you how to use the API keys you've acquired. I'll also show you the difference between *OAuth1* and *OAuth2* authentication.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Import Twython\n", "I use the *twython* package as my Python interface with the Twitter API: https://twython.readthedocs.io/en/latest/usage/starting_out.html\n", "\n", "Let's import the twython package, which we installed earlier using *pip install twython* from the command line." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from twython import Twython" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# OAuth1 Authentication (*user* authentication)\n", "Twitter and other social media platforms use a form of authentication known as 'OAuth' authentication. There are two types. *OAuth1* authentication on Twitter is a user-level authentication. For this you'll use the four-part password you generated in the tutorial Setting up Access to the Twitter API" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Generated Automatically -- Copy from Your 'Keys and Access Tokens' Page\n", "Your *APP_KEY* and *APP_SECRET* will not change" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": true }, "outputs": [], "source": [ "APP_KEY = 'YOUR APP KEY'\n", "APP_SECRET = 'YOUR APP SECRET'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### You Generate these two variables from you 'Keys and Access Tokens' Page\n", "These two change whenever you 'regenerate' them. You can use the values you generated from the prior tutorial noted above. " ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": true }, "outputs": [], "source": [ "OAUTH_TOKEN = 'YOUR OAUTH TOKEN'\n", "OAUTH_TOKEN_SECRET = 'YOUR OAUTH SECRET'\n", "twitter = Twython(APP_KEY, APP_SECRET,\n", " OAUTH_TOKEN, OAUTH_TOKEN_SECRET)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Check our rate limit\n", "We can now use our API keys to access the Twitter API. We'll do this with a simple example. Namely, let's check our API *rate limit.* Most APIs apply a window-based rate limit -- a limit on the amount of data you can download in any time time span. On Twitter, these are 15-minute windows. \n", "\n", "What I do in the following code block is use the *twitter* variable set above (which contains our four-part password) combined with the *get_application_rate_limit_status()* function in order to see how many API calls we have remaining in the current window. There are many different parts of the API; I'll access one of them here -- the 'search' API, which you would use, for example, to download all tweets containing a specific search term or hashtag, etc. Don't worry about learning all of this for now; instead, just go with the flow. For details on what *twython* is doing you can look here: https://twython.readthedocs.io/en/latest/usage/starting_out.html\n", "\n", "Note the rate limit of 180 API calls. This means we can make 180 different calls to the API within the current 15-minute window. With the search API we can access 100 tweets per call. This means that, if we were downloading tweets with a specific hashtag, such as *#arnova16*, we could download 180 $\\times$ 100 or 18,000 tweets per window. We will eventually get into this in a later tutorial." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'/search/tweets': {u'limit': 180, u'remaining': 180, u'reset': 1479421815}}" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter.get_application_rate_limit_status()['resources']['search']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# OAuth2 Authentication (*app* authentication)\n", "The other authentication method is called *OAuth2*; this is *app-level* authention. It's a simpler method and will generally also allow you better rate limits on Twitter. First, though, you'll have to generate an *ACCESS_TOKEN.*" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": true }, "outputs": [], "source": [ "APP_KEY = 'YOUR APP KEY' #SAME AS ABOVE\n", "APP_SECRET = 'YOUR APP SECRET' #SAME AS ABOVE\n", "twitter = Twython(APP_KEY, APP_SECRET, oauth_version=2)\n", "ACCESS_TOKEN = twitter.obtain_access_token()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Print out your ACCESS_TOKEN \n", "I don't show it here, but you're *ACCESS_TOKEN* will print out. Copy and paste into the 'YOUR ACCESS TOKEN' code block below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [], "source": [ "print ACCESS_TOKEN" ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "APP_KEY = 'YOUR APP KEY' #SAME AS ABOVE\n", "ACCESS_TOKEN = 'YOUR ACCESS TOKEN' #COPY AND PASTE FROM OUTPUT FROM ABOVE COMMAND\n", "twitter = Twython(APP_KEY, access_token=ACCESS_TOKEN)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Note the rate limit of 450 API calls\n", "\n", "OAuth1 will give you *user* access to the API, whereas OAuth2 will give you *app* access. For academic use the rate limits are generally better for *OAuth2* (app) authentication, with a few exceptions. For a chart showing the API limits for user and app authentication for the various parts of the Twitter API, see this chart: https://dev.twitter.com/rest/public/rate-limits\n", "\n", "Running the code block below shows that we now have a rate limit of 450 API calls. This means we can make 450 different calls to the API within the current 15-minute window. With the search API we can access 100 tweets per call. This means that, if we were downloading tweets with a specific hashtag, such as *#arnova16*, we could download 450 $\\times$ 100 or 45,000 tweets per window. This is much better than the 18,000 tweets we could access using the OAuth1 or user authentication." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "{u'/search/tweets': {u'limit': 450, u'remaining': 450, u'reset': 1479692706}}" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "twitter.get_application_rate_limit_status()['resources']['search']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "For more Notebooks as well as additional Python and Big Data tutorials, please visit http://social-metrics.org or follow me on Twitter @gregorysaxton" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }