{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is this third in a series of notebooks designed to show you how to analyze social media data. For demonstration purposes we are looking at tweets sent by CSR-related Twitter accounts -- accounts related to ethics, equality, the environment, etc. -- of Fortune 200 firms in 2013. We assume you have already downloaded the data and have completed the steps taken in Chapter 1 and Chapter 2. In this third notebook I will show you how to conduct various temporal analyses of the Twitter data. Essentially, we will be taking the tweet-level data and aggregating to the account level."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Chapter 3: Analyze Twitter Data by Time Period"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we will import several necessary Python packages and set some options for viewing the data. As with Chapter 1 and Chapter 2, we will be using the Python Data Analysis Library, or PANDAS, extensively for our data manipulations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Import packages and set viewing options"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\n",
"import pandas as pd\n",
"from pandas import DataFrame\n",
"from pandas import Series"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#Set PANDAS to show all columns in DataFrame\n",
"pd.set_option('display.max_columns', None)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I'm using version 0.16.2 of PANDAS"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"'0.16.2'"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use the calendar package for one of our temporal manipulations."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import calendar"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Import graphing packages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll be producing some figures at the end of this tutorial so we need to import various graphing capabilities. The default Matplotlib library is solid. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.4.3\n"
]
}
],
"source": [
"import matplotlib\n",
"print matplotlib.__version__"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"#NECESSARY FOR XTICKS OPTION, ETC.\n",
"from pylab import*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One of the great innovations of ipython notebook is the ability to see output and graphics \"inline,\" that is, on the same page and immediately below each line of code. To enable this feature for graphics we run the following line."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We will be using Seaborn to help pretty up the default Matplotlib graphics. Seaborn does not come installed with Anaconda Python so you will have to open up a terminal and run pip install seaborn."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.6.0\n"
]
}
],
"source": [
"import seaborn as sns\n",
"print sns.__version__"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
The following line will set the default plots to be bigger."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"plt.rcParams['figure.figsize'] = (15, 5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Version 1.4 of matplotlib enables specific plotting styles. Let's check which ones are already imported so we can play around with them later."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import matplotlib as mpl"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"[u'dark_background', u'bmh', u'grayscale', u'ggplot', u'fivethirtyeight']"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"mpl.style.available"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read in data"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In Chapter 1 we deleted tweets from one unneeded Twitter account and also omitted several unnecessary columns (variables). We then saved, or \"pickled,\" the updated dataframe. Let's now open this saved file. As we can see in the operations below this dataframe contains 54 variables for 32,330 tweets."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"32330\n"
]
},
{
"data": {
"text/html": [
"
\n", " | rowid | \n", "query | \n", "tweet_id_str | \n", "inserted_date | \n", "language | \n", "coordinates | \n", "retweeted_status | \n", "created_at | \n", "month | \n", "year | \n", "content | \n", "from_user_screen_name | \n", "from_user_id | \n", "from_user_followers_count | \n", "from_user_friends_count | \n", "from_user_listed_count | \n", "from_user_favourites_count | \n", "from_user_statuses_count | \n", "from_user_description | \n", "from_user_location | \n", "from_user_created_at | \n", "retweet_count | \n", "favorite_count | \n", "entities_urls | \n", "entities_urls_count | \n", "entities_hashtags | \n", "entities_hashtags_count | \n", "entities_mentions | \n", "entities_mentions_count | \n", "in_reply_to_screen_name | \n", "in_reply_to_status_id | \n", "source | \n", "entities_expanded_urls | \n", "entities_media_count | \n", "media_expanded_url | \n", "media_url | \n", "media_type | \n", "video_link | \n", "photo_link | \n", "twitpic | \n", "num_characters | \n", "num_words | \n", "retweeted_user | \n", "retweeted_user_description | \n", "retweeted_user_screen_name | \n", "retweeted_user_followers_count | \n", "retweeted_user_listed_count | \n", "retweeted_user_statuses_count | \n", "retweeted_user_location | \n", "retweeted_tweet_created_at | \n", "Fortune_2012_rank | \n", "Company | \n", "CSR_sustainability | \n", "specific_project_initiative_area | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "67340 | \n", "humanavitality | \n", "306897327585652736 | \n", "2014-03-09 13:46:50.222857 | \n", "en | \n", "NaN | \n", "NaN | \n", "2013-02-27 22:43:19.000000 | \n", "2 | \n", "2013 | \n", "@louloushive (Tweet 2) We encourage other empl... | \n", "humanavitality | \n", "274041023 | \n", "2859 | \n", "440 | \n", "38 | \n", "25 | \n", "1766 | \n", "This is the official Twitter account for Human... | \n", "NaN | \n", "Tue Mar 29 16:23:02 +0000 2011 | \n", "0 | \n", "0 | \n", "NaN | \n", "0 | \n", "NaN | \n", "0 | \n", "louloushive | \n", "1 | \n", "louloushive | \n", "3.062183e+17 | \n", "web | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "121 | \n", "19 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "79 | \n", "Humana | \n", "0 | \n", "1 | \n", "
1 | \n", "39454 | \n", "FundacionPfizer | \n", "308616393706844160 | \n", "2014-03-09 13:38:20.679967 | \n", "es | \n", "NaN | \n", "NaN | \n", "2013-03-04 16:34:17.000000 | \n", "3 | \n", "2013 | \n", "¿Sabes por qué la #vacuna contra la #neumonía ... | \n", "FundacionPfizer | \n", "188384056 | \n", "2464 | \n", "597 | \n", "50 | \n", "11 | \n", "2400 | \n", "Noticias sobre Responsabilidad Social y Fundac... | \n", "México | \n", "Wed Sep 08 16:14:11 +0000 2010 | \n", "1 | \n", "0 | \n", "NaN | \n", "0 | \n", "vacuna, neumonía | \n", "2 | \n", "NaN | \n", "0 | \n", "NaN | \n", "NaN | \n", "web | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "138 | \n", "20 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "40 | \n", "Pfizer | \n", "0 | \n", "1 | \n", "
\n", " | rowid | \n", "query | \n", "tweet_id_str | \n", "inserted_date | \n", "language | \n", "coordinates | \n", "retweeted_status | \n", "month | \n", "year | \n", "content | \n", "from_user_screen_name | \n", "from_user_id | \n", "from_user_followers_count | \n", "from_user_friends_count | \n", "from_user_listed_count | \n", "from_user_favourites_count | \n", "from_user_statuses_count | \n", "from_user_description | \n", "from_user_location | \n", "from_user_created_at | \n", "retweet_count | \n", "favorite_count | \n", "entities_urls | \n", "entities_urls_count | \n", "entities_hashtags | \n", "entities_hashtags_count | \n", "entities_mentions | \n", "entities_mentions_count | \n", "in_reply_to_screen_name | \n", "in_reply_to_status_id | \n", "source | \n", "entities_expanded_urls | \n", "entities_media_count | \n", "media_expanded_url | \n", "media_url | \n", "media_type | \n", "video_link | \n", "photo_link | \n", "twitpic | \n", "num_characters | \n", "num_words | \n", "retweeted_user | \n", "retweeted_user_description | \n", "retweeted_user_screen_name | \n", "retweeted_user_followers_count | \n", "retweeted_user_listed_count | \n", "retweeted_user_statuses_count | \n", "retweeted_user_location | \n", "retweeted_tweet_created_at | \n", "Fortune_2012_rank | \n", "Company | \n", "CSR_sustainability | \n", "specific_project_initiative_area | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
created_at | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2013-02-27 22:43:19 | \n", "67340 | \n", "humanavitality | \n", "306897327585652736 | \n", "2014-03-09 13:46:50.222857 | \n", "en | \n", "NaN | \n", "NaN | \n", "2 | \n", "2013 | \n", "@louloushive (Tweet 2) We encourage other empl... | \n", "humanavitality | \n", "274041023 | \n", "2859 | \n", "440 | \n", "38 | \n", "25 | \n", "1766 | \n", "This is the official Twitter account for Human... | \n", "NaN | \n", "Tue Mar 29 16:23:02 +0000 2011 | \n", "0 | \n", "0 | \n", "NaN | \n", "0 | \n", "NaN | \n", "0 | \n", "louloushive | \n", "1 | \n", "louloushive | \n", "3.062183e+17 | \n", "web | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "121 | \n", "19 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "79 | \n", "Humana | \n", "0 | \n", "1 | \n", "
2013-03-04 16:34:17 | \n", "39454 | \n", "FundacionPfizer | \n", "308616393706844160 | \n", "2014-03-09 13:38:20.679967 | \n", "es | \n", "NaN | \n", "NaN | \n", "3 | \n", "2013 | \n", "¿Sabes por qué la #vacuna contra la #neumonía ... | \n", "FundacionPfizer | \n", "188384056 | \n", "2464 | \n", "597 | \n", "50 | \n", "11 | \n", "2400 | \n", "Noticias sobre Responsabilidad Social y Fundac... | \n", "México | \n", "Wed Sep 08 16:14:11 +0000 2010 | \n", "1 | \n", "0 | \n", "NaN | \n", "0 | \n", "vacuna, neumonía | \n", "2 | \n", "NaN | \n", "0 | \n", "NaN | \n", "NaN | \n", "web | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "138 | \n", "20 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "40 | \n", "Pfizer | \n", "0 | \n", "1 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
2013-01-01 | \n", "24 | \n", "
2013-01-02 | \n", "71 | \n", "
2013-01-03 | \n", "92 | \n", "
2013-01-04 | \n", "94 | \n", "
2013-01-05 | \n", "38 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
date | \n", "\n", " |
2013-01-01 | \n", "24 | \n", "
2013-01-02 | \n", "71 | \n", "
2013-01-03 | \n", "92 | \n", "
2013-01-04 | \n", "94 | \n", "
2013-01-05 | \n", "38 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
date | \n", "\n", " |
2013-12-27 | \n", "35 | \n", "
2013-12-28 | \n", "13 | \n", "
2013-12-29 | \n", "8 | \n", "
2013-12-30 | \n", "25 | \n", "
2013-12-31 | \n", "234 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
0 | \n", "5306 | \n", "
1 | \n", "6467 | \n", "
2 | \n", "6715 | \n", "
3 | \n", "6108 | \n", "
4 | \n", "5264 | \n", "
5 | \n", "1513 | \n", "
6 | \n", "957 | \n", "
\n", " | Number_of_tweets | \n", "day | \n", "
---|---|---|
0 | \n", "5306 | \n", "Monday | \n", "
1 | \n", "6467 | \n", "Tuesday | \n", "
2 | \n", "6715 | \n", "Wednesday | \n", "
3 | \n", "6108 | \n", "Thursday | \n", "
4 | \n", "5264 | \n", "Friday | \n", "
5 | \n", "1513 | \n", "Saturday | \n", "
6 | \n", "957 | \n", "Sunday | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
0 | \n", "1020 | \n", "
1 | \n", "743 | \n", "
2 | \n", "420 | \n", "
3 | \n", "250 | \n", "
4 | \n", "189 | \n", "
5 | \n", "238 | \n", "
6 | \n", "144 | \n", "
7 | \n", "93 | \n", "
8 | \n", "117 | \n", "
9 | \n", "99 | \n", "
10 | \n", "183 | \n", "
11 | \n", "267 | \n", "
12 | \n", "663 | \n", "
13 | \n", "1713 | \n", "
14 | \n", "2888 | \n", "
15 | \n", "3155 | \n", "
16 | \n", "3140 | \n", "
17 | \n", "2925 | \n", "
18 | \n", "2944 | \n", "
19 | \n", "3114 | \n", "
20 | \n", "2937 | \n", "
21 | \n", "2362 | \n", "
22 | \n", "1612 | \n", "
23 | \n", "1114 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
1 | \n", "3203 | \n", "
2 | \n", "3056 | \n", "
3 | \n", "2973 | \n", "
4 | \n", "3162 | \n", "
5 | \n", "2784 | \n", "
6 | \n", "2366 | \n", "
7 | \n", "2314 | \n", "
8 | \n", "2314 | \n", "
9 | \n", "2485 | \n", "
10 | \n", "3207 | \n", "
11 | \n", "2382 | \n", "
12 | \n", "2084 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
0 | \n", "3126 | \n", "
1 | \n", "634 | \n", "
2 | \n", "600 | \n", "
3 | \n", "557 | \n", "
4 | \n", "471 | \n", "
\n", " | Number_of_tweets | \n", "
---|---|
0 | \n", "1227 | \n", "
1 | \n", "1626 | \n", "
2 | \n", "1720 | \n", "
3 | \n", "1141 | \n", "
4 | \n", "920 | \n", "