{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"toc": "true"
},
"source": [
"
Table of Contents
\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gender dynamics"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tweet data prep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the tweets"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n",
"DEBUG:root:Loaded 50000\n",
"DEBUG:root:Loaded 100000\n",
"DEBUG:root:Loaded 150000\n",
"DEBUG:root:Loaded 200000\n",
"DEBUG:root:Loaded 250000\n",
"INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n",
"DEBUG:root:Loaded 300000\n",
"DEBUG:root:Loaded 350000\n",
"DEBUG:root:Loaded 400000\n",
"DEBUG:root:Loaded 450000\n",
"DEBUG:root:Loaded 500000\n",
"INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n",
"DEBUG:root:Loaded 550000\n",
"DEBUG:root:Loaded 600000\n",
"DEBUG:root:Loaded 650000\n",
"DEBUG:root:Loaded 700000\n",
"DEBUG:root:Loaded 750000\n",
"DEBUG:root:Loaded 800000\n"
]
},
{
"data": {
"text/plain": [
"tweet_id 817136\n",
"user_id 817136\n",
"screen_name 817136\n",
"tweet_created_at 817136\n",
"tweet_type 817136\n",
"dtype: int64"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%matplotlib inline\n",
"import pandas as pd\n",
"import numpy as np\n",
"import logging\n",
"from dateutil.parser import parse as date_parse\n",
"from utils import load_tweet_df, tweet_type\n",
"import matplotlib.pyplot as plt\n",
"\n",
"\n",
"logger = logging.getLogger()\n",
"logger.setLevel(logging.DEBUG)\n",
"\n",
"# Set float format so doesn't display scientific notation\n",
"pd.options.display.float_format = '{:20,.2f}'.format\n",
"\n",
"def tweet_transform(tweet):\n",
" return {\n",
" 'tweet_id': tweet['id_str'], \n",
" 'tweet_created_at': date_parse(tweet['created_at']),\n",
" 'user_id': tweet['user']['id_str'],\n",
" 'screen_name': tweet['user']['screen_name'],\n",
" 'tweet_type': tweet_type(tweet)\n",
" }\n",
"\n",
"tweet_df = load_tweet_df(tweet_transform, ['tweet_id', 'user_id', 'screen_name', 'tweet_created_at', 'tweet_type'], dedupe_columns=['tweet_id'])\n",
"tweet_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" tweet_id | \n",
" user_id | \n",
" screen_name | \n",
" tweet_created_at | \n",
" tweet_type | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 872631046088601600 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2017-06-08 01:47:08+00:00 | \n",
" retweet | \n",
"
\n",
" \n",
" 1 | \n",
" 872610483647516673 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2017-06-08 00:25:26+00:00 | \n",
" retweet | \n",
"
\n",
" \n",
" 2 | \n",
" 872609618626826240 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2017-06-08 00:22:00+00:00 | \n",
" retweet | \n",
"
\n",
" \n",
" 3 | \n",
" 872605974699311104 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2017-06-08 00:07:31+00:00 | \n",
" retweet | \n",
"
\n",
" \n",
" 4 | \n",
" 872603191518646276 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2017-06-07 23:56:27+00:00 | \n",
" retweet | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" tweet_id user_id screen_name tweet_created_at \\\n",
"0 872631046088601600 327862439 jonathanvswan 2017-06-08 01:47:08+00:00 \n",
"1 872610483647516673 327862439 jonathanvswan 2017-06-08 00:25:26+00:00 \n",
"2 872609618626826240 327862439 jonathanvswan 2017-06-08 00:22:00+00:00 \n",
"3 872605974699311104 327862439 jonathanvswan 2017-06-08 00:07:31+00:00 \n",
"4 872603191518646276 327862439 jonathanvswan 2017-06-07 23:56:27+00:00 \n",
"\n",
" tweet_type \n",
"0 retweet \n",
"1 retweet \n",
"2 retweet \n",
"3 retweet \n",
"4 retweet "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tweet_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tweeter data prep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Prepare the tweeter data\n",
"This comes from the following sources:\n",
"1. User lookup: These are lists of users exported from SFM. These are the final set of beltway journalists. Accounts that were suspended or deleted have been removed from this list. Also, this list will include users that did not tweet (i.e., have no tweets in dataset).\n",
"2. Tweets in the dataset: Used to generate tweet counts per tweeter. However, since some beltway journalists may not have tweeted, this may be a subset of the user lookup. Also, it may include the tweets of some users that were later excluded because their accounts were suspended or deleted or determined to not be beltway journalists.\n",
"3. User info lookup: Information on users that was manually coded in the beltway journalist spreadsheet or looked up from Twitter's API. This includes some accounts that were excluded from data collection for various reasons such as working for a foreign news organization or no longer working as a beltway journalist. Thus, these are a superset of the user lookup.\n",
"\n",
"Thus, the tweeter data should include tweet and user info data only from users in the user lookup."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load user lookup"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"screen_name 2487\n",
"dtype: int64"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_lookup_filepaths = ('lookups/senate_press_lookup.csv',\n",
" 'lookups/periodical_press_lookup.csv',\n",
" 'lookups/radio_and_television_lookup.csv')\n",
"user_lookup_df = pd.concat((pd.read_csv(user_lookup_filepath, usecols=['Uid', 'Token'], dtype={'Uid': str}) for user_lookup_filepath in user_lookup_filepaths))\n",
"user_lookup_df.set_index('Uid', inplace=True)\n",
"user_lookup_df.rename(columns={'Token': 'screen_name'}, inplace=True)\n",
"user_lookup_df.index.names = ['user_id']\n",
"# Some users may be in multiple lists, so need to drop duplicates\n",
"user_lookup_df = user_lookup_df[~user_lookup_df.index.duplicated()]\n",
"\n",
"user_lookup_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 23455653 | \n",
" abettel | \n",
"
\n",
" \n",
" 33919343 | \n",
" AshleyRParker | \n",
"
\n",
" \n",
" 18580432 | \n",
" b_fung | \n",
"
\n",
" \n",
" 399225358 | \n",
" b_muzz | \n",
"
\n",
" \n",
" 18834692 | \n",
" becca_milfeld | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name\n",
"user_id \n",
"23455653 abettel\n",
"33919343 AshleyRParker\n",
"18580432 b_fung\n",
"399225358 b_muzz\n",
"18834692 becca_milfeld"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_lookup_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load user info"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"name 2506\n",
"organization 2477\n",
"position 2503\n",
"gender 2505\n",
"followers_count 2506\n",
"following_count 2506\n",
"tweet_count 2506\n",
"user_created_at 2506\n",
"verified 2506\n",
"protected 2506\n",
"dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_info_df = pd.read_csv('source_data/user_info_lookup.csv', names=['user_id', 'name', 'organization', 'position',\n",
" 'gender', 'followers_count', 'following_count', 'tweet_count',\n",
" 'user_created_at', 'verified', 'protected'],\n",
" dtype={'user_id': str}).set_index(['user_id'])\n",
"user_info_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" organization | \n",
" position | \n",
" gender | \n",
" followers_count | \n",
" following_count | \n",
" tweet_count | \n",
" user_created_at | \n",
" verified | \n",
" protected | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 20711445 | \n",
" Glinski, Nina | \n",
" NaN | \n",
" Freelance Reporter | \n",
" F | \n",
" 963 | \n",
" 507 | \n",
" 909 | \n",
" Thu Feb 12 20:00:53 +0000 2009 | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 258917371 | \n",
" Enders, David | \n",
" NaN | \n",
" Journalist | \n",
" M | \n",
" 1444 | \n",
" 484 | \n",
" 6296 | \n",
" Mon Feb 28 19:52:03 +0000 2011 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 297046834 | \n",
" Barakat, Matthew | \n",
" Associated Press | \n",
" Northern Virginia Correspondent | \n",
" M | \n",
" 759 | \n",
" 352 | \n",
" 631 | \n",
" Wed May 11 20:55:24 +0000 2011 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 455585786 | \n",
" Atkins, Kimberly | \n",
" Boston Herald | \n",
" Chief Washington Reporter/Columnist | \n",
" F | \n",
" 2944 | \n",
" 2691 | \n",
" 6277 | \n",
" Thu Jan 05 08:26:46 +0000 2012 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 42584840 | \n",
" Vlahou, Toula | \n",
" CQ Roll Call | \n",
" Editor & Podcast Producer | \n",
" F | \n",
" 2703 | \n",
" 201 | \n",
" 6366 | \n",
" Tue May 26 07:41:38 +0000 2009 | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name organization \\\n",
"user_id \n",
"20711445 Glinski, Nina NaN \n",
"258917371 Enders, David NaN \n",
"297046834 Barakat, Matthew Associated Press \n",
"455585786 Atkins, Kimberly Boston Herald \n",
"42584840 Vlahou, Toula CQ Roll Call \n",
"\n",
" position gender followers_count \\\n",
"user_id \n",
"20711445 Freelance Reporter F 963 \n",
"258917371 Journalist M 1444 \n",
"297046834 Northern Virginia Correspondent M 759 \n",
"455585786 Chief Washington Reporter/Columnist F 2944 \n",
"42584840 Editor & Podcast Producer F 2703 \n",
"\n",
" following_count tweet_count user_created_at \\\n",
"user_id \n",
"20711445 507 909 Thu Feb 12 20:00:53 +0000 2009 \n",
"258917371 484 6296 Mon Feb 28 19:52:03 +0000 2011 \n",
"297046834 352 631 Wed May 11 20:55:24 +0000 2011 \n",
"455585786 2691 6277 Thu Jan 05 08:26:46 +0000 2012 \n",
"42584840 201 6366 Tue May 26 07:41:38 +0000 2009 \n",
"\n",
" verified protected \n",
"user_id \n",
"20711445 False False \n",
"258917371 True False \n",
"297046834 True False \n",
"455585786 True False \n",
"42584840 False False "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_info_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"user_tweet_count_df = tweet_df[['user_id', 'tweet_type']].groupby(['user_id', 'tweet_type']).size().unstack()\n",
"user_tweet_count_df.fillna(0, inplace=True)\n",
"user_tweet_count_df['tweets_in_dataset'] = user_tweet_count_df.original + user_tweet_count_df.quote + user_tweet_count_df.reply + user_tweet_count_df.retweet"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"screen_name 2487\n",
"name 2487\n",
"organization 2487\n",
"position 2484\n",
"gender 2486\n",
"followers_count 2487\n",
"following_count 2487\n",
"tweet_count 2487\n",
"user_created_at 2487\n",
"verified 2487\n",
"protected 2487\n",
"original 2487\n",
"quote 2487\n",
"reply 2487\n",
"retweet 2487\n",
"tweets_in_dataset 2487\n",
"dtype: int64"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_summary_df = user_lookup_df.join((user_info_df, user_tweet_count_df), how='left')\n",
"# Fill Nans\n",
"user_summary_df['organization'].fillna('', inplace=True)\n",
"user_summary_df['original'].fillna(0, inplace=True)\n",
"user_summary_df['quote'].fillna(0, inplace=True)\n",
"user_summary_df['reply'].fillna(0, inplace=True)\n",
"user_summary_df['retweet'].fillna(0, inplace=True)\n",
"user_summary_df['tweets_in_dataset'].fillna(0, inplace=True)\n",
"user_summary_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" position | \n",
" gender | \n",
" followers_count | \n",
" following_count | \n",
" tweet_count | \n",
" user_created_at | \n",
" verified | \n",
" protected | \n",
" original | \n",
" quote | \n",
" reply | \n",
" retweet | \n",
" tweets_in_dataset | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 23455653 | \n",
" abettel | \n",
" Bettelheim, Adriel | \n",
" Politico | \n",
" Health Care Editor | \n",
" F | \n",
" 2664 | \n",
" 1055 | \n",
" 15990 | \n",
" Mon Mar 09 16:32:20 +0000 2009 | \n",
" True | \n",
" False | \n",
" 289.00 | \n",
" 12.00 | \n",
" 6.00 | \n",
" 52.00 | \n",
" 359.00 | \n",
"
\n",
" \n",
" 33919343 | \n",
" AshleyRParker | \n",
" Parker, Ashley | \n",
" Washington Post | \n",
" White House Reporter | \n",
" F | \n",
" 122382 | \n",
" 2342 | \n",
" 12433 | \n",
" Tue Apr 21 14:28:57 +0000 2009 | \n",
" True | \n",
" False | \n",
" 172.00 | \n",
" 67.00 | \n",
" 11.00 | \n",
" 120.00 | \n",
" 370.00 | \n",
"
\n",
" \n",
" 18580432 | \n",
" b_fung | \n",
" Fung, Brian | \n",
" Washington Post | \n",
" Tech Reporter | \n",
" M | \n",
" 16558 | \n",
" 2062 | \n",
" 44799 | \n",
" Sat Jan 03 15:15:57 +0000 2009 | \n",
" True | \n",
" False | \n",
" 257.00 | \n",
" 85.00 | \n",
" 205.00 | \n",
" 82.00 | \n",
" 629.00 | \n",
"
\n",
" \n",
" 399225358 | \n",
" b_muzz | \n",
" Murray, Brendan | \n",
" Bloomberg News | \n",
" Managing Editor, U.S. Economy | \n",
" M | \n",
" 624 | \n",
" 382 | \n",
" 360 | \n",
" Thu Oct 27 05:34:05 +0000 2011 | \n",
" True | \n",
" False | \n",
" 3.00 | \n",
" 0.00 | \n",
" 0.00 | \n",
" 5.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 18834692 | \n",
" becca_milfeld | \n",
" Milfeld, Becca | \n",
" Agence France-Presse | \n",
" English Desk Editor and Journalist | \n",
" F | \n",
" 483 | \n",
" 993 | \n",
" 1484 | \n",
" Sat Jan 10 13:58:43 +0000 2009 | \n",
" False | \n",
" False | \n",
" 3.00 | \n",
" 14.00 | \n",
" 0.00 | \n",
" 7.00 | \n",
" 24.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"23455653 abettel Bettelheim, Adriel Politico \n",
"33919343 AshleyRParker Parker, Ashley Washington Post \n",
"18580432 b_fung Fung, Brian Washington Post \n",
"399225358 b_muzz Murray, Brendan Bloomberg News \n",
"18834692 becca_milfeld Milfeld, Becca Agence France-Presse \n",
"\n",
" position gender followers_count \\\n",
"user_id \n",
"23455653 Health Care Editor F 2664 \n",
"33919343 White House Reporter F 122382 \n",
"18580432 Tech Reporter M 16558 \n",
"399225358 Managing Editor, U.S. Economy M 624 \n",
"18834692 English Desk Editor and Journalist F 483 \n",
"\n",
" following_count tweet_count user_created_at \\\n",
"user_id \n",
"23455653 1055 15990 Mon Mar 09 16:32:20 +0000 2009 \n",
"33919343 2342 12433 Tue Apr 21 14:28:57 +0000 2009 \n",
"18580432 2062 44799 Sat Jan 03 15:15:57 +0000 2009 \n",
"399225358 382 360 Thu Oct 27 05:34:05 +0000 2011 \n",
"18834692 993 1484 Sat Jan 10 13:58:43 +0000 2009 \n",
"\n",
" verified protected original quote \\\n",
"user_id \n",
"23455653 True False 289.00 12.00 \n",
"33919343 True False 172.00 67.00 \n",
"18580432 True False 257.00 85.00 \n",
"399225358 True False 3.00 0.00 \n",
"18834692 False False 3.00 14.00 \n",
"\n",
" reply retweet tweets_in_dataset \n",
"user_id \n",
"23455653 6.00 52.00 359.00 \n",
"33919343 11.00 120.00 370.00 \n",
"18580432 205.00 82.00 629.00 \n",
"399225358 0.00 5.00 8.00 \n",
"18834692 0.00 7.00 24.00 "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_summary_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"### Remove users with no tweets in dataset"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"screen_name 195\n",
"name 195\n",
"organization 195\n",
"position 195\n",
"gender 194\n",
"followers_count 195\n",
"following_count 195\n",
"tweet_count 195\n",
"user_created_at 195\n",
"verified 195\n",
"protected 195\n",
"original 195\n",
"quote 195\n",
"reply 195\n",
"retweet 195\n",
"tweets_in_dataset 195\n",
"dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_summary_df[user_summary_df.tweets_in_dataset == 0].count()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"screen_name 2292\n",
"name 2292\n",
"organization 2292\n",
"position 2289\n",
"gender 2292\n",
"followers_count 2292\n",
"following_count 2292\n",
"tweet_count 2292\n",
"user_created_at 2292\n",
"verified 2292\n",
"protected 2292\n",
"original 2292\n",
"quote 2292\n",
"reply 2292\n",
"retweet 2292\n",
"tweets_in_dataset 2292\n",
"dtype: int64"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_summary_df = user_summary_df[user_summary_df.tweets_in_dataset != 0]\n",
"user_summary_df.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Gender"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 1299 | \n",
" 56.7% | \n",
"
\n",
" \n",
" F | \n",
" 993 | \n",
" 43.3% | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage\n",
"M 1299 56.7%\n",
"F 993 43.3%"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_gender_summary_df = pd.DataFrame({'count':user_summary_df.gender.value_counts(), 'percentage':user_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
"journalist_gender_summary_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reply data prep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load replies from tweets"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n",
"DEBUG:root:Loaded 50000\n",
"DEBUG:root:Loaded 100000\n",
"DEBUG:root:Loaded 150000\n",
"DEBUG:root:Loaded 200000\n",
"DEBUG:root:Loaded 250000\n",
"INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n",
"DEBUG:root:Loaded 300000\n",
"DEBUG:root:Loaded 350000\n",
"DEBUG:root:Loaded 400000\n",
"DEBUG:root:Loaded 450000\n",
"DEBUG:root:Loaded 500000\n",
"INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n",
"DEBUG:root:Loaded 550000\n",
"DEBUG:root:Loaded 600000\n",
"DEBUG:root:Loaded 650000\n",
"DEBUG:root:Loaded 700000\n",
"DEBUG:root:Loaded 750000\n",
"DEBUG:root:Loaded 800000\n"
]
},
{
"data": {
"text/plain": [
"tweet_id 126254\n",
"user_id 126254\n",
"screen_name 126254\n",
"reply_to_user_id 126254\n",
"reply_to_screen_name 126254\n",
"tweet_created_at 126254\n",
"dtype: int64"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Simply the tweet on load\n",
"def reply_transform(tweet):\n",
" if tweet_type(tweet) == 'reply': \n",
" return {\n",
" 'tweet_id': tweet['id_str'],\n",
" 'user_id': tweet['user']['id_str'],\n",
" 'screen_name': tweet['user']['screen_name'],\n",
" 'reply_to_user_id': tweet['in_reply_to_user_id_str'],\n",
" 'reply_to_screen_name': tweet['in_reply_to_screen_name'],\n",
" 'tweet_created_at': date_parse(tweet['created_at']) \n",
" }\n",
" return None\n",
"\n",
"base_reply_df = load_tweet_df(reply_transform, ['tweet_id', 'user_id', 'screen_name', 'reply_to_user_id',\n",
" 'reply_to_screen_name', 'tweet_created_at'],\n",
" dedupe_columns=['tweet_id'])\n",
"\n",
"base_reply_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" tweet_id | \n",
" user_id | \n",
" screen_name | \n",
" reply_to_user_id | \n",
" reply_to_screen_name | \n",
" tweet_created_at | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 872495244062978048 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 59331128 | \n",
" PhilipRucker | \n",
" 2017-06-07 16:47:31+00:00 | \n",
"
\n",
" \n",
" 1 | \n",
" 872473152160399361 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 2856617865 | \n",
" RPhuket | \n",
" 2017-06-07 15:19:43+00:00 | \n",
"
\n",
" \n",
" 2 | \n",
" 872266930341728256 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 1854392378 | \n",
" hrm_1973 | \n",
" 2017-06-07 01:40:16+00:00 | \n",
"
\n",
" \n",
" 3 | \n",
" 872250430109175809 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 390985197 | \n",
" MikeBastasch | \n",
" 2017-06-07 00:34:42+00:00 | \n",
"
\n",
" \n",
" 4 | \n",
" 872218322187767808 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-06 22:27:07+00:00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" tweet_id user_id screen_name reply_to_user_id \\\n",
"0 872495244062978048 327862439 jonathanvswan 59331128 \n",
"1 872473152160399361 327862439 jonathanvswan 2856617865 \n",
"2 872266930341728256 327862439 jonathanvswan 1854392378 \n",
"3 872250430109175809 327862439 jonathanvswan 390985197 \n",
"4 872218322187767808 327862439 jonathanvswan 407013776 \n",
"\n",
" reply_to_screen_name tweet_created_at \n",
"0 PhilipRucker 2017-06-07 16:47:31+00:00 \n",
"1 RPhuket 2017-06-07 15:19:43+00:00 \n",
"2 hrm_1973 2017-06-07 01:40:16+00:00 \n",
"3 MikeBastasch 2017-06-07 00:34:42+00:00 \n",
"4 burgessev 2017-06-06 22:27:07+00:00 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base_reply_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add gender of replier"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tweet_id 126254\n",
"user_id 126254\n",
"screen_name 126254\n",
"reply_to_user_id 126254\n",
"reply_to_screen_name 126254\n",
"tweet_created_at 126254\n",
"gender 126254\n",
"dtype: int64"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reply_df = base_reply_df.join(user_summary_df['gender'], on='user_id')\n",
"reply_df.count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### How may user have been replied to by journalists?"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"31034"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"reply_df['reply_to_user_id'].unique().size"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Limit to beltway journalists"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"tweet_id 43390\n",
"user_id 43390\n",
"screen_name 43390\n",
"reply_to_user_id 43390\n",
"reply_to_screen_name 43390\n",
"tweet_created_at 43390\n",
"gender 43390\n",
"reply_to_gender 43390\n",
"dtype: int64"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_reply_df = reply_df.join(user_summary_df['gender'], how='inner', on='reply_to_user_id', rsuffix='_reply')\n",
"journalists_reply_df.rename(columns = {'gender_reply': 'reply_to_gender'}, inplace=True)\n",
"journalists_reply_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" tweet_id | \n",
" user_id | \n",
" screen_name | \n",
" reply_to_user_id | \n",
" reply_to_screen_name | \n",
" tweet_created_at | \n",
" gender | \n",
" reply_to_gender | \n",
"
\n",
" \n",
" \n",
" \n",
" 4 | \n",
" 872218322187767808 | \n",
" 327862439 | \n",
" jonathanvswan | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-06 22:27:07+00:00 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 234 | \n",
" 871795694020984833 | \n",
" 195840597 | \n",
" JNicholsonInDC | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-05 18:27:45+00:00 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 572 | \n",
" 870371176866041856 | \n",
" 163589845 | \n",
" PoliticoKevin | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-01 20:07:13+00:00 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 728 | \n",
" 870659438901940224 | \n",
" 115564212 | \n",
" IsaacDovere | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-02 15:12:40+00:00 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 731 | \n",
" 872473152143667201 | \n",
" 167024520 | \n",
" rachaelmbade | \n",
" 407013776 | \n",
" burgessev | \n",
" 2017-06-07 15:19:43+00:00 | \n",
" F | \n",
" M | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" tweet_id user_id screen_name reply_to_user_id \\\n",
"4 872218322187767808 327862439 jonathanvswan 407013776 \n",
"234 871795694020984833 195840597 JNicholsonInDC 407013776 \n",
"572 870371176866041856 163589845 PoliticoKevin 407013776 \n",
"728 870659438901940224 115564212 IsaacDovere 407013776 \n",
"731 872473152143667201 167024520 rachaelmbade 407013776 \n",
"\n",
" reply_to_screen_name tweet_created_at gender reply_to_gender \n",
"4 burgessev 2017-06-06 22:27:07+00:00 M M \n",
"234 burgessev 2017-06-05 18:27:45+00:00 M M \n",
"572 burgessev 2017-06-01 20:07:13+00:00 M M \n",
"728 burgessev 2017-06-02 15:12:40+00:00 M M \n",
"731 burgessev 2017-06-07 15:19:43+00:00 F M "
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_reply_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Functions for summarizing replies by beltway journalists"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [],
"source": [
"# Gender of beltway journalists replied to by beltway journalists\n",
"def journalist_reply_gender_summary(reply_df):\n",
" gender_summary_df = pd.DataFrame({'count':reply_df.reply_to_gender.value_counts(), \n",
" 'percentage': reply_df.reply_to_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
" gender_summary_df.reset_index(inplace=True)\n",
" gender_summary_df['avg_replies'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1) \n",
" gender_summary_df.set_index('index', inplace=True, drop=True)\n",
" return gender_summary_df\n",
"\n",
"# Reply to beltway journalists by beltway journalists\n",
"def journalist_reply_summary(reply_df):\n",
" # Reply to count\n",
" reply_count_df = pd.DataFrame(reply_df.reply_to_user_id.value_counts().rename('reply_to_count'))\n",
" \n",
" # Replying to users. That is, the number of unique users replying to each user.\n",
" reply_to_user_id_per_user_df = reply_df[['reply_to_user_id', 'user_id']].drop_duplicates()\n",
" replying_to_user_count_df = pd.DataFrame(reply_to_user_id_per_user_df.groupby('reply_to_user_id').size(), columns=['replying_count'])\n",
" replying_to_user_count_df.index.name = 'user_id'\n",
" \n",
" # Join with user summary\n",
" journalist_reply_summary_df = user_summary_df.join([reply_count_df, replying_to_user_count_df])\n",
" journalist_reply_summary_df.fillna(0, inplace=True)\n",
" journalist_reply_summary_df = journalist_reply_summary_df.sort_values(['reply_to_count', 'replying_count', 'followers_count'], ascending=False)\n",
" return journalist_reply_summary_df\n",
"\n",
"# Gender of top journalists replied to by beltway journalists\n",
"def top_journalist_reply_gender_summary(reply_summary_df, replying_count_threshold=0, head=100):\n",
" top_reply_summary_df = reply_summary_df[reply_summary_df.replying_count > replying_count_threshold].head(head)\n",
" return pd.DataFrame({'count': top_reply_summary_df.gender.value_counts(), \n",
" 'percentage': top_reply_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
"\n",
"# Fields for displaying journalist mention summaries\n",
"journalist_reply_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'reply_to_count', 'replying_count']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reply analysis\n",
"*Note that for each of these, the complete list is being written to CSV in the output directory.*\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Of replies by journalists, how many are by males / females?"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reply | \n",
" percentage | \n",
" avg_replies | \n",
"
\n",
" \n",
" gender | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" F | \n",
" 31,831.00 | \n",
" 25.2% | \n",
" 32.06 | \n",
"
\n",
" \n",
" M | \n",
" 94,423.00 | \n",
" 74.8% | \n",
" 72.69 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reply percentage avg_replies\n",
"gender \n",
"F 31,831.00 25.2% 32.06\n",
"M 94,423.00 74.8% 72.69"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"replies_by_gender_df = user_summary_df[['gender', 'reply']].groupby('gender').sum()\n",
"replies_by_gender_df['percentage'] = replies_by_gender_df.reply.div(replies_by_gender_df.reply.sum()).mul(100).round(1).astype(str) + '%'\n",
"replies_by_gender_df.reset_index(inplace=True)\n",
"replies_by_gender_df['avg_replies'] = replies_by_gender_df.apply(lambda row: row['reply'] / journalist_gender_summary_df.loc[row['gender']]['count'], axis=1) \n",
"replies_by_gender_df.set_index('gender', inplace=True, drop=True)\n",
"# return gender_summary_df\n",
"replies_by_gender_df"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Which journalists reply the most?"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" tweet_count | \n",
" reply | \n",
" tweets_in_dataset | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3817401 | \n",
" ericgeller | \n",
" Geller, Eric | \n",
" Politico | \n",
" M | \n",
" 58173 | \n",
" 208763 | \n",
" 9,033.00 | \n",
" 11,432.00 | \n",
"
\n",
" \n",
" 22891564 | \n",
" chrisgeidner | \n",
" Geidner, Chris | \n",
" BuzzFeed | \n",
" M | \n",
" 83316 | \n",
" 205504 | \n",
" 3,917.00 | \n",
" 6,244.00 | \n",
"
\n",
" \n",
" 118130765 | \n",
" dylanlscott | \n",
" Scott, Dylan L. | \n",
" Stat News | \n",
" M | \n",
" 20122 | \n",
" 42497 | \n",
" 2,040.00 | \n",
" 3,960.00 | \n",
"
\n",
" \n",
" 19576571 | \n",
" JaredRizzi | \n",
" Rizzi, Jared | \n",
" Sirius XM Satellite Radio | \n",
" M | \n",
" 13545 | \n",
" 41620 | \n",
" 1,949.00 | \n",
" 5,567.00 | \n",
"
\n",
" \n",
" 275207082 | \n",
" AlexParkerDC | \n",
" Parker, Alexander M. | \n",
" Bloomberg BNA | \n",
" M | \n",
" 3828 | \n",
" 142150 | \n",
" 1,714.00 | \n",
" 3,983.00 | \n",
"
\n",
" \n",
" 63717541 | \n",
" phillyrich1 | \n",
" Weinstein, Richard | \n",
" C–SPAN | \n",
" M | \n",
" 3827 | \n",
" 27341 | \n",
" 1,532.00 | \n",
" 2,261.00 | \n",
"
\n",
" \n",
" 203226736 | \n",
" SharylAttkisson | \n",
" Attkisson, Sharyl | \n",
" Sinclair Broadcast Group | \n",
" F | \n",
" 132973 | \n",
" 24539 | \n",
" 1,458.00 | \n",
" 2,154.00 | \n",
"
\n",
" \n",
" 16812908 | \n",
" crousselle | \n",
" Rousselle, Christine | \n",
" Townhall | \n",
" F | \n",
" 5327 | \n",
" 118713 | \n",
" 1,089.00 | \n",
" 2,351.00 | \n",
"
\n",
" \n",
" 14529929 | \n",
" jaketapper | \n",
" Tapper, Jake | \n",
" CNN | \n",
" M | \n",
" 1305680 | \n",
" 148143 | \n",
" 1,040.00 | \n",
" 5,078.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 67526 | \n",
" 1,026.00 | \n",
" 3,066.00 | \n",
"
\n",
" \n",
" 27882000 | \n",
" jamiedupree | \n",
" Dupree, Jamie | \n",
" Cox Broadcasting | \n",
" M | \n",
" 140848 | \n",
" 46181 | \n",
" 993.00 | \n",
" 2,108.00 | \n",
"
\n",
" \n",
" 3372900155 | \n",
" samtayrey | \n",
" Reyes, Samantha | \n",
" CNN | \n",
" F | \n",
" 10344 | \n",
" 4783 | \n",
" 933.00 | \n",
" 1,349.00 | \n",
"
\n",
" \n",
" 132482136 | \n",
" Yaro_RT | \n",
" Yaroshevsky, Alexey | \n",
" RTTV America | \n",
" M | \n",
" 12968 | \n",
" 26795 | \n",
" 910.00 | \n",
" 1,199.00 | \n",
"
\n",
" \n",
" 46955476 | \n",
" GrahamDavidA | \n",
" Graham, David A. | \n",
" The Atlantic | \n",
" M | \n",
" 22112 | \n",
" 93391 | \n",
" 908.00 | \n",
" 1,566.00 | \n",
"
\n",
" \n",
" 16459325 | \n",
" ryanbeckwith | \n",
" Beckwith, Ryan Teague | \n",
" Time Magazine | \n",
" M | \n",
" 20947 | \n",
" 92203 | \n",
" 901.00 | \n",
" 5,187.00 | \n",
"
\n",
" \n",
" 25702314 | \n",
" EricMGarcia | \n",
" Garcia, Eric M. | \n",
" CQ Roll Call | \n",
" M | \n",
" 3094 | \n",
" 44783 | \n",
" 863.00 | \n",
" 3,584.00 | \n",
"
\n",
" \n",
" 12245632 | \n",
" jackshafer | \n",
" Shafer, Jack | \n",
" Politico | \n",
" M | \n",
" 73996 | \n",
" 44726 | \n",
" 861.00 | \n",
" 2,016.00 | \n",
"
\n",
" \n",
" 273540698 | \n",
" MKTWgoldstein | \n",
" Goldstein, Steven | \n",
" MarketWatch | \n",
" M | \n",
" 10185 | \n",
" 41497 | \n",
" 857.00 | \n",
" 1,897.00 | \n",
"
\n",
" \n",
" 19847765 | \n",
" sahilkapur | \n",
" Kapur, Sahil | \n",
" Bloomberg News | \n",
" M | \n",
" 69086 | \n",
" 51628 | \n",
" 853.00 | \n",
" 2,022.00 | \n",
"
\n",
" \n",
" 6904552 | \n",
" juliemason | \n",
" Mason, Julie | \n",
" Sirius XM Satellite Radio | \n",
" F | \n",
" 31276 | \n",
" 29214 | \n",
" 852.00 | \n",
" 1,213.00 | \n",
"
\n",
" \n",
" 225265639 | \n",
" ddale8 | \n",
" Dale, Daniel | \n",
" Toronto Star | \n",
" M | \n",
" 180671 | \n",
" 69807 | \n",
" 848.00 | \n",
" 2,496.00 | \n",
"
\n",
" \n",
" 15837659 | \n",
" jbendery | \n",
" Bendery, Jennifer | \n",
" Huffington Post | \n",
" M | \n",
" 41000 | \n",
" 65406 | \n",
" 844.00 | \n",
" 2,600.00 | \n",
"
\n",
" \n",
" 15146659 | \n",
" JSwiftTWS | \n",
" Swift, James A. | \n",
" Weekly Standard | \n",
" M | \n",
" 5691 | \n",
" 84245 | \n",
" 830.00 | \n",
" 2,612.00 | \n",
"
\n",
" \n",
" 227790723 | \n",
" RichardRubinDC | \n",
" Rubin, Richard | \n",
" Bloomberg News | \n",
" M | \n",
" 13015 | \n",
" 17796 | \n",
" 807.00 | \n",
" 1,312.00 | \n",
"
\n",
" \n",
" 14517538 | \n",
" derekwillis | \n",
" Willis, Derek | \n",
" ProPublica | \n",
" M | \n",
" 18049 | \n",
" 79502 | \n",
" 781.00 | \n",
" 1,811.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"3817401 ericgeller Geller, Eric Politico \n",
"22891564 chrisgeidner Geidner, Chris BuzzFeed \n",
"118130765 dylanlscott Scott, Dylan L. Stat News \n",
"19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n",
"275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n",
"63717541 phillyrich1 Weinstein, Richard C–SPAN \n",
"203226736 SharylAttkisson Attkisson, Sharyl Sinclair Broadcast Group \n",
"16812908 crousselle Rousselle, Christine Townhall \n",
"14529929 jaketapper Tapper, Jake CNN \n",
"46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n",
"27882000 jamiedupree Dupree, Jamie Cox Broadcasting \n",
"3372900155 samtayrey Reyes, Samantha CNN \n",
"132482136 Yaro_RT Yaroshevsky, Alexey RTTV America \n",
"46955476 GrahamDavidA Graham, David A. The Atlantic \n",
"16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n",
"25702314 EricMGarcia Garcia, Eric M. CQ Roll Call \n",
"12245632 jackshafer Shafer, Jack Politico \n",
"273540698 MKTWgoldstein Goldstein, Steven MarketWatch \n",
"19847765 sahilkapur Kapur, Sahil Bloomberg News \n",
"6904552 juliemason Mason, Julie Sirius XM Satellite Radio \n",
"225265639 ddale8 Dale, Daniel Toronto Star \n",
"15837659 jbendery Bendery, Jennifer Huffington Post \n",
"15146659 JSwiftTWS Swift, James A. Weekly Standard \n",
"227790723 RichardRubinDC Rubin, Richard Bloomberg News \n",
"14517538 derekwillis Willis, Derek ProPublica \n",
"\n",
" gender followers_count tweet_count reply \\\n",
"user_id \n",
"3817401 M 58173 208763 9,033.00 \n",
"22891564 M 83316 205504 3,917.00 \n",
"118130765 M 20122 42497 2,040.00 \n",
"19576571 M 13545 41620 1,949.00 \n",
"275207082 M 3828 142150 1,714.00 \n",
"63717541 M 3827 27341 1,532.00 \n",
"203226736 F 132973 24539 1,458.00 \n",
"16812908 F 5327 118713 1,089.00 \n",
"14529929 M 1305680 148143 1,040.00 \n",
"46557945 M 55762 67526 1,026.00 \n",
"27882000 M 140848 46181 993.00 \n",
"3372900155 F 10344 4783 933.00 \n",
"132482136 M 12968 26795 910.00 \n",
"46955476 M 22112 93391 908.00 \n",
"16459325 M 20947 92203 901.00 \n",
"25702314 M 3094 44783 863.00 \n",
"12245632 M 73996 44726 861.00 \n",
"273540698 M 10185 41497 857.00 \n",
"19847765 M 69086 51628 853.00 \n",
"6904552 F 31276 29214 852.00 \n",
"225265639 M 180671 69807 848.00 \n",
"15837659 M 41000 65406 844.00 \n",
"15146659 M 5691 84245 830.00 \n",
"227790723 M 13015 17796 807.00 \n",
"14517538 M 18049 79502 781.00 \n",
"\n",
" tweets_in_dataset \n",
"user_id \n",
"3817401 11,432.00 \n",
"22891564 6,244.00 \n",
"118130765 3,960.00 \n",
"19576571 5,567.00 \n",
"275207082 3,983.00 \n",
"63717541 2,261.00 \n",
"203226736 2,154.00 \n",
"16812908 2,351.00 \n",
"14529929 5,078.00 \n",
"46557945 3,066.00 \n",
"27882000 2,108.00 \n",
"3372900155 1,349.00 \n",
"132482136 1,199.00 \n",
"46955476 1,566.00 \n",
"16459325 5,187.00 \n",
"25702314 3,584.00 \n",
"12245632 2,016.00 \n",
"273540698 1,897.00 \n",
"19847765 2,022.00 \n",
"6904552 1,213.00 \n",
"225265639 2,496.00 \n",
"15837659 2,600.00 \n",
"15146659 2,612.00 \n",
"227790723 1,312.00 \n",
"14517538 1,811.00 "
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_summary_df[['screen_name', 'name', 'organization', 'gender', 'followers_count', 'tweet_count', 'reply', 'tweets_in_dataset']].sort_values(['reply'], ascending=False).head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Replies to all account (not just journalists)\n",
"This is based on screen name, which could have changed during collection period. However, for the users that would be at the top of this list, seems unlikely."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists replying to other accounts, who do they reply to the most?"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" \n",
" \n",
" ericgeller | \n",
" 1980 | \n",
" 75 | \n",
"
\n",
" \n",
" chrisgeidner | \n",
" 1901 | \n",
" 37 | \n",
"
\n",
" \n",
" dylanlscott | \n",
" 1091 | \n",
" 65 | \n",
"
\n",
" \n",
" JaredRizzi | \n",
" 750 | \n",
" 46 | \n",
"
\n",
" \n",
" StevenTDennis | \n",
" 745 | \n",
" 93 | \n",
"
\n",
" \n",
" AlexParkerDC | \n",
" 720 | \n",
" 23 | \n",
"
\n",
" \n",
" sahilkapur | \n",
" 662 | \n",
" 35 | \n",
"
\n",
" \n",
" jseldin | \n",
" 653 | \n",
" 2 | \n",
"
\n",
" \n",
" MEPFuller | \n",
" 522 | \n",
" 92 | \n",
"
\n",
" \n",
" amaxsmith | \n",
" 498 | \n",
" 6 | \n",
"
\n",
" \n",
" ddale8 | \n",
" 495 | \n",
" 20 | \n",
"
\n",
" \n",
" CraigCaplan | \n",
" 388 | \n",
" 8 | \n",
"
\n",
" \n",
" ChuckWendig | \n",
" 372 | \n",
" 1 | \n",
"
\n",
" \n",
" pbump | \n",
" 355 | \n",
" 43 | \n",
"
\n",
" \n",
" kelmej | \n",
" 340 | \n",
" 29 | \n",
"
\n",
" \n",
" benjamin_oc | \n",
" 322 | \n",
" 11 | \n",
"
\n",
" \n",
" KimberlyRobinsn | \n",
" 321 | \n",
" 7 | \n",
"
\n",
" \n",
" darth | \n",
" 315 | \n",
" 32 | \n",
"
\n",
" \n",
" ZoeTillman | \n",
" 311 | \n",
" 8 | \n",
"
\n",
" \n",
" RichardRubinDC | \n",
" 305 | \n",
" 41 | \n",
"
\n",
" \n",
" sdonnan | \n",
" 304 | \n",
" 7 | \n",
"
\n",
" \n",
" AaronMehta | \n",
" 304 | \n",
" 35 | \n",
"
\n",
" \n",
" MikeSacksEsq | \n",
" 299 | \n",
" 18 | \n",
"
\n",
" \n",
" heathdwilliams | \n",
" 298 | \n",
" 1 | \n",
"
\n",
" \n",
" ryanbeckwith | \n",
" 297 | \n",
" 49 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reply_to_count replying_count\n",
"ericgeller 1980 75\n",
"chrisgeidner 1901 37\n",
"dylanlscott 1091 65\n",
"JaredRizzi 750 46\n",
"StevenTDennis 745 93\n",
"AlexParkerDC 720 23\n",
"sahilkapur 662 35\n",
"jseldin 653 2\n",
"MEPFuller 522 92\n",
"amaxsmith 498 6\n",
"ddale8 495 20\n",
"CraigCaplan 388 8\n",
"ChuckWendig 372 1\n",
"pbump 355 43\n",
"kelmej 340 29\n",
"benjamin_oc 322 11\n",
"KimberlyRobinsn 321 7\n",
"darth 315 32\n",
"ZoeTillman 311 8\n",
"RichardRubinDC 305 41\n",
"sdonnan 304 7\n",
"AaronMehta 304 35\n",
"MikeSacksEsq 299 18\n",
"heathdwilliams 298 1\n",
"ryanbeckwith 297 49"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Reply to count\n",
"reply_to_count_screen_name_df = pd.DataFrame(reply_df.reply_to_screen_name.value_counts().rename('reply_to_count'))\n",
"\n",
"# Count of replying users\n",
"reply_to_user_id_per_user_screen_name_df = reply_df[['reply_to_screen_name', 'user_id']].drop_duplicates()\n",
"replying_count_screen_name_df = pd.DataFrame(reply_to_user_id_per_user_screen_name_df.groupby('reply_to_screen_name').size(), columns=['replying_count'])\n",
"replying_count_screen_name_df.index.name = 'screen_name'\n",
"\n",
"all_replied_to_df = reply_to_count_screen_name_df.join(replying_count_screen_name_df)\n",
"all_replied_to_df.to_csv('output/all_replied_to_by_journalists.csv')\n",
"all_replied_to_df.head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists replying to other journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists replying to other journalists, who do they reply to the most?"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3817401 | \n",
" ericgeller | \n",
" Geller, Eric | \n",
" Politico | \n",
" M | \n",
" 58173 | \n",
" 1,980.00 | \n",
" 75.00 | \n",
"
\n",
" \n",
" 22891564 | \n",
" chrisgeidner | \n",
" Geidner, Chris | \n",
" BuzzFeed | \n",
" M | \n",
" 83316 | \n",
" 1,901.00 | \n",
" 37.00 | \n",
"
\n",
" \n",
" 118130765 | \n",
" dylanlscott | \n",
" Scott, Dylan L. | \n",
" Stat News | \n",
" M | \n",
" 20122 | \n",
" 1,091.00 | \n",
" 65.00 | \n",
"
\n",
" \n",
" 19576571 | \n",
" JaredRizzi | \n",
" Rizzi, Jared | \n",
" Sirius XM Satellite Radio | \n",
" M | \n",
" 13545 | \n",
" 750.00 | \n",
" 46.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 745.00 | \n",
" 93.00 | \n",
"
\n",
" \n",
" 275207082 | \n",
" AlexParkerDC | \n",
" Parker, Alexander M. | \n",
" Bloomberg BNA | \n",
" M | \n",
" 3828 | \n",
" 720.00 | \n",
" 23.00 | \n",
"
\n",
" \n",
" 19847765 | \n",
" sahilkapur | \n",
" Kapur, Sahil | \n",
" Bloomberg News | \n",
" M | \n",
" 69086 | \n",
" 662.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 583821006 | \n",
" jseldin | \n",
" Seldin, Jeff | \n",
" Voice of America | \n",
" M | \n",
" 5365 | \n",
" 653.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 398088661 | \n",
" MEPFuller | \n",
" Fuller, Matt E. | \n",
" Huffington Post | \n",
" M | \n",
" 77919 | \n",
" 522.00 | \n",
" 92.00 | \n",
"
\n",
" \n",
" 44951698 | \n",
" amaxsmith | \n",
" Smith, Max | \n",
" WTOP Radio | \n",
" M | \n",
" 4726 | \n",
" 498.00 | \n",
" 6.00 | \n",
"
\n",
" \n",
" 225265639 | \n",
" ddale8 | \n",
" Dale, Daniel | \n",
" Toronto Star | \n",
" M | \n",
" 180671 | \n",
" 495.00 | \n",
" 20.00 | \n",
"
\n",
" \n",
" 317980134 | \n",
" CraigCaplan | \n",
" Caplan, Craig | \n",
" C–SPAN | \n",
" M | \n",
" 6143 | \n",
" 388.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 16061946 | \n",
" kelmej | \n",
" Mejdrich, Kellie | \n",
" CQ Roll Call | \n",
" F | \n",
" 4146 | \n",
" 340.00 | \n",
" 29.00 | \n",
"
\n",
" \n",
" 15365623 | \n",
" benjamin_oc | \n",
" O’Connell, Benjamin | \n",
" C–SPAN | \n",
" M | \n",
" 1455 | \n",
" 322.00 | \n",
" 11.00 | \n",
"
\n",
" \n",
" 906734342 | \n",
" KimberlyRobinsn | \n",
" Robinson, Kimberly S. | \n",
" Bloomberg BNA | \n",
" F | \n",
" 7170 | \n",
" 321.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 52392666 | \n",
" ZoeTillman | \n",
" Tillman, Zoe | \n",
" BuzzFeed | \n",
" F | \n",
" 15246 | \n",
" 311.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 227790723 | \n",
" RichardRubinDC | \n",
" Rubin, Richard | \n",
" Bloomberg News | \n",
" M | \n",
" 13015 | \n",
" 305.00 | \n",
" 41.00 | \n",
"
\n",
" \n",
" 103016675 | \n",
" AaronMehta | \n",
" Mehta, Aaron | \n",
" Sightline Media Group | \n",
" M | \n",
" 11124 | \n",
" 304.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 21810329 | \n",
" sdonnan | \n",
" Donnan, Shawn | \n",
" Financial Times | \n",
" M | \n",
" 12311 | \n",
" 304.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 90478926 | \n",
" MikeSacksEsq | \n",
" Sacks, Mike | \n",
" Scripps Howard News Service | \n",
" M | \n",
" 9289 | \n",
" 299.00 | \n",
" 18.00 | \n",
"
\n",
" \n",
" 16459325 | \n",
" ryanbeckwith | \n",
" Beckwith, Ryan Teague | \n",
" Time Magazine | \n",
" M | \n",
" 20947 | \n",
" 297.00 | \n",
" 49.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 283.00 | \n",
" 72.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 269.00 | \n",
" 45.00 | \n",
"
\n",
" \n",
" 21696279 | \n",
" brianbeutler | \n",
" Beutler, Brian Alfred | \n",
" New Republic | \n",
" M | \n",
" 74435 | \n",
" 269.00 | \n",
" 34.00 | \n",
"
\n",
" \n",
" 21212087 | \n",
" Olivianuzzi | \n",
" Nuzzi, Olivia | \n",
" New York | \n",
" F | \n",
" 136276 | \n",
" 243.00 | \n",
" 25.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name \\\n",
"user_id \n",
"3817401 ericgeller Geller, Eric \n",
"22891564 chrisgeidner Geidner, Chris \n",
"118130765 dylanlscott Scott, Dylan L. \n",
"19576571 JaredRizzi Rizzi, Jared \n",
"46557945 StevenTDennis Dennis, Steven T. \n",
"275207082 AlexParkerDC Parker, Alexander M. \n",
"19847765 sahilkapur Kapur, Sahil \n",
"583821006 jseldin Seldin, Jeff \n",
"398088661 MEPFuller Fuller, Matt E. \n",
"44951698 amaxsmith Smith, Max \n",
"225265639 ddale8 Dale, Daniel \n",
"317980134 CraigCaplan Caplan, Craig \n",
"16061946 kelmej Mejdrich, Kellie \n",
"15365623 benjamin_oc O’Connell, Benjamin \n",
"906734342 KimberlyRobinsn Robinson, Kimberly S. \n",
"52392666 ZoeTillman Tillman, Zoe \n",
"227790723 RichardRubinDC Rubin, Richard \n",
"103016675 AaronMehta Mehta, Aaron \n",
"21810329 sdonnan Donnan, Shawn \n",
"90478926 MikeSacksEsq Sacks, Mike \n",
"16459325 ryanbeckwith Beckwith, Ryan Teague \n",
"21252618 JakeSherman Sherman, Jacob S. \n",
"11771512 OKnox Knox, Olivier \n",
"21696279 brianbeutler Beutler, Brian Alfred \n",
"21212087 Olivianuzzi Nuzzi, Olivia \n",
"\n",
" organization gender followers_count \\\n",
"user_id \n",
"3817401 Politico M 58173 \n",
"22891564 BuzzFeed M 83316 \n",
"118130765 Stat News M 20122 \n",
"19576571 Sirius XM Satellite Radio M 13545 \n",
"46557945 Bloomberg News M 55762 \n",
"275207082 Bloomberg BNA M 3828 \n",
"19847765 Bloomberg News M 69086 \n",
"583821006 Voice of America M 5365 \n",
"398088661 Huffington Post M 77919 \n",
"44951698 WTOP Radio M 4726 \n",
"225265639 Toronto Star M 180671 \n",
"317980134 C–SPAN M 6143 \n",
"16061946 CQ Roll Call F 4146 \n",
"15365623 C–SPAN M 1455 \n",
"906734342 Bloomberg BNA F 7170 \n",
"52392666 BuzzFeed F 15246 \n",
"227790723 Bloomberg News M 13015 \n",
"103016675 Sightline Media Group M 11124 \n",
"21810329 Financial Times M 12311 \n",
"90478926 Scripps Howard News Service M 9289 \n",
"16459325 Time Magazine M 20947 \n",
"21252618 Politico M 81762 \n",
"11771512 Yahoo News M 44715 \n",
"21696279 New Republic M 74435 \n",
"21212087 New York F 136276 \n",
"\n",
" reply_to_count replying_count \n",
"user_id \n",
"3817401 1,980.00 75.00 \n",
"22891564 1,901.00 37.00 \n",
"118130765 1,091.00 65.00 \n",
"19576571 750.00 46.00 \n",
"46557945 745.00 93.00 \n",
"275207082 720.00 23.00 \n",
"19847765 662.00 35.00 \n",
"583821006 653.00 2.00 \n",
"398088661 522.00 92.00 \n",
"44951698 498.00 6.00 \n",
"225265639 495.00 20.00 \n",
"317980134 388.00 8.00 \n",
"16061946 340.00 29.00 \n",
"15365623 322.00 11.00 \n",
"906734342 321.00 7.00 \n",
"52392666 311.00 8.00 \n",
"227790723 305.00 41.00 \n",
"103016675 304.00 35.00 \n",
"21810329 304.00 7.00 \n",
"90478926 299.00 18.00 \n",
"16459325 297.00 49.00 \n",
"21252618 283.00 72.00 \n",
"11771512 269.00 45.00 \n",
"21696279 269.00 34.00 \n",
"21212087 243.00 25.00 "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_reply_summary_df = journalist_reply_summary(journalists_reply_df)\n",
"journalists_reply_summary_df.to_csv('output/journalists_replied_to_by_journalists.csv')\n",
"journalists_reply_summary_df[journalist_reply_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists replying to other journalists, how many that they reply to are male / female?"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_replies | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 33178 | \n",
" 76.5% | \n",
" 25.54 | \n",
"
\n",
" \n",
" F | \n",
" 10212 | \n",
" 23.5% | \n",
" 10.28 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_replies\n",
"index \n",
"M 33178 76.5% 25.54\n",
"F 10212 23.5% 10.28"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_reply_gender_summary(journalists_reply_df)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how many times do journalists reply to each journalists?"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reply_to_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 2,292.00 | \n",
"
\n",
" \n",
" mean | \n",
" 18.93 | \n",
"
\n",
" \n",
" std | \n",
" 81.76 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 0.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 1.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 8.00 | \n",
"
\n",
" \n",
" max | \n",
" 1,980.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reply_to_count\n",
"count 2,292.00\n",
"mean 18.93\n",
"std 81.76\n",
"min 0.00\n",
"25% 0.00\n",
"50% 1.00\n",
"75% 8.00\n",
"max 1,980.00"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_reply_summary_df[['reply_to_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists replying to female journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists replying to female journalists, which female journalists are replied to the most?"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 16061946 | \n",
" kelmej | \n",
" Mejdrich, Kellie | \n",
" CQ Roll Call | \n",
" F | \n",
" 4146 | \n",
" 340.00 | \n",
" 29.00 | \n",
"
\n",
" \n",
" 906734342 | \n",
" KimberlyRobinsn | \n",
" Robinson, Kimberly S. | \n",
" Bloomberg BNA | \n",
" F | \n",
" 7170 | \n",
" 321.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 52392666 | \n",
" ZoeTillman | \n",
" Tillman, Zoe | \n",
" BuzzFeed | \n",
" F | \n",
" 15246 | \n",
" 311.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 21212087 | \n",
" Olivianuzzi | \n",
" Nuzzi, Olivia | \n",
" New York | \n",
" F | \n",
" 136276 | \n",
" 243.00 | \n",
" 25.00 | \n",
"
\n",
" \n",
" 83462293 | \n",
" SarahMMimms | \n",
" Mimms, Sarah | \n",
" BuzzFeed | \n",
" F | \n",
" 6216 | \n",
" 236.00 | \n",
" 24.00 | \n",
"
\n",
" \n",
" 19186003 | \n",
" seungminkim | \n",
" Kim, Seung Min | \n",
" Politico | \n",
" F | \n",
" 33980 | \n",
" 233.00 | \n",
" 84.00 | \n",
"
\n",
" \n",
" 3372900155 | \n",
" samtayrey | \n",
" Reyes, Samantha | \n",
" CNN | \n",
" F | \n",
" 10344 | \n",
" 219.00 | \n",
" 18.00 | \n",
"
\n",
" \n",
" 18825339 | \n",
" CahnEmily | \n",
" Cahn, Emily | \n",
" Mic | \n",
" F | \n",
" 16980 | \n",
" 212.00 | \n",
" 48.00 | \n",
"
\n",
" \n",
" 1132012321 | \n",
" DaniellaMicaela | \n",
" Diaz, Daniella | \n",
" CNN | \n",
" F | \n",
" 14612 | \n",
" 181.00 | \n",
" 36.00 | \n",
"
\n",
" \n",
" 158072303 | \n",
" ValerieInsinna | \n",
" Insinna, Valerie | \n",
" Defense News | \n",
" F | \n",
" 4572 | \n",
" 175.00 | \n",
" 20.00 | \n",
"
\n",
" \n",
" 36607254 | \n",
" Oriana0214 | \n",
" Pawlyk, Oriana | \n",
" Military.com | \n",
" F | \n",
" 6397 | \n",
" 174.00 | \n",
" 21.00 | \n",
"
\n",
" \n",
" 96405362 | \n",
" laurenonthehill | \n",
" Camera, Lauren S. | \n",
" U.S. News & World Report | \n",
" F | \n",
" 3396 | \n",
" 162.00 | \n",
" 6.00 | \n",
"
\n",
" \n",
" 16812908 | \n",
" crousselle | \n",
" Rousselle, Christine | \n",
" Townhall | \n",
" F | \n",
" 5327 | \n",
" 149.00 | \n",
" 5.00 | \n",
"
\n",
" \n",
" 47758416 | \n",
" marissaaevans | \n",
" Evans, Marissa | \n",
" Texas Tribune | \n",
" F | \n",
" 6850 | \n",
" 137.00 | \n",
" 1.00 | \n",
"
\n",
" \n",
" 45399148 | \n",
" jeneps | \n",
" Epstein, Jennifer | \n",
" Bloomberg News | \n",
" F | \n",
" 61242 | \n",
" 134.00 | \n",
" 23.00 | \n",
"
\n",
" \n",
" 16434028 | \n",
" gabbilevy | \n",
" Levy, Gabrielle F. | \n",
" U.S. News & World Report | \n",
" F | \n",
" 2209 | \n",
" 132.00 | \n",
" 4.00 | \n",
"
\n",
" \n",
" 14870670 | \n",
" KateNocera | \n",
" Nocera, Kate | \n",
" BuzzFeed | \n",
" F | \n",
" 27714 | \n",
" 116.00 | \n",
" 36.00 | \n",
"
\n",
" \n",
" 18501487 | \n",
" leighmunsil | \n",
" Munsil, Leigh | \n",
" CNN | \n",
" F | \n",
" 11059 | \n",
" 107.00 | \n",
" 30.00 | \n",
"
\n",
" \n",
" 313545488 | \n",
" LauraLitvan | \n",
" Litvan, Laura | \n",
" Bloomberg News | \n",
" F | \n",
" 4468 | \n",
" 104.00 | \n",
" 12.00 | \n",
"
\n",
" \n",
" 116341480 | \n",
" RosieGray | \n",
" Gray, Rosie | \n",
" The Atlantic | \n",
" F | \n",
" 96935 | \n",
" 99.00 | \n",
" 31.00 | \n",
"
\n",
" \n",
" 82151660 | \n",
" kelsey_snell | \n",
" Snell, Kelse | \n",
" Washington Post | \n",
" F | \n",
" 8108 | \n",
" 96.00 | \n",
" 44.00 | \n",
"
\n",
" \n",
" 70511174 | \n",
" Hadas_Gold | \n",
" Gold, Hadas | \n",
" Politico | \n",
" F | \n",
" 45221 | \n",
" 95.00 | \n",
" 47.00 | \n",
"
\n",
" \n",
" 38855868 | \n",
" brennawilliams | \n",
" Williams, Brenna | \n",
" CNN | \n",
" F | \n",
" 7299 | \n",
" 93.00 | \n",
" 22.00 | \n",
"
\n",
" \n",
" 273700859 | \n",
" kpolantz | \n",
" Polantz, Katelyn J. | \n",
" National Law Journal | \n",
" F | \n",
" 2483 | \n",
" 91.00 | \n",
" 6.00 | \n",
"
\n",
" \n",
" 3273220608 | \n",
" KatherineBScott | \n",
" Scott, Katherine | \n",
" Bloomberg Government | \n",
" F | \n",
" 1841 | \n",
" 85.00 | \n",
" 14.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"16061946 kelmej Mejdrich, Kellie CQ Roll Call \n",
"906734342 KimberlyRobinsn Robinson, Kimberly S. Bloomberg BNA \n",
"52392666 ZoeTillman Tillman, Zoe BuzzFeed \n",
"21212087 Olivianuzzi Nuzzi, Olivia New York \n",
"83462293 SarahMMimms Mimms, Sarah BuzzFeed \n",
"19186003 seungminkim Kim, Seung Min Politico \n",
"3372900155 samtayrey Reyes, Samantha CNN \n",
"18825339 CahnEmily Cahn, Emily Mic \n",
"1132012321 DaniellaMicaela Diaz, Daniella CNN \n",
"158072303 ValerieInsinna Insinna, Valerie Defense News \n",
"36607254 Oriana0214 Pawlyk, Oriana Military.com \n",
"96405362 laurenonthehill Camera, Lauren S. U.S. News & World Report \n",
"16812908 crousselle Rousselle, Christine Townhall \n",
"47758416 marissaaevans Evans, Marissa Texas Tribune \n",
"45399148 jeneps Epstein, Jennifer Bloomberg News \n",
"16434028 gabbilevy Levy, Gabrielle F. U.S. News & World Report \n",
"14870670 KateNocera Nocera, Kate BuzzFeed \n",
"18501487 leighmunsil Munsil, Leigh CNN \n",
"313545488 LauraLitvan Litvan, Laura Bloomberg News \n",
"116341480 RosieGray Gray, Rosie The Atlantic \n",
"82151660 kelsey_snell Snell, Kelse Washington Post \n",
"70511174 Hadas_Gold Gold, Hadas Politico \n",
"38855868 brennawilliams Williams, Brenna CNN \n",
"273700859 kpolantz Polantz, Katelyn J. National Law Journal \n",
"3273220608 KatherineBScott Scott, Katherine Bloomberg Government \n",
"\n",
" gender followers_count reply_to_count replying_count \n",
"user_id \n",
"16061946 F 4146 340.00 29.00 \n",
"906734342 F 7170 321.00 7.00 \n",
"52392666 F 15246 311.00 8.00 \n",
"21212087 F 136276 243.00 25.00 \n",
"83462293 F 6216 236.00 24.00 \n",
"19186003 F 33980 233.00 84.00 \n",
"3372900155 F 10344 219.00 18.00 \n",
"18825339 F 16980 212.00 48.00 \n",
"1132012321 F 14612 181.00 36.00 \n",
"158072303 F 4572 175.00 20.00 \n",
"36607254 F 6397 174.00 21.00 \n",
"96405362 F 3396 162.00 6.00 \n",
"16812908 F 5327 149.00 5.00 \n",
"47758416 F 6850 137.00 1.00 \n",
"45399148 F 61242 134.00 23.00 \n",
"16434028 F 2209 132.00 4.00 \n",
"14870670 F 27714 116.00 36.00 \n",
"18501487 F 11059 107.00 30.00 \n",
"313545488 F 4468 104.00 12.00 \n",
"116341480 F 96935 99.00 31.00 \n",
"82151660 F 8108 96.00 44.00 \n",
"70511174 F 45221 95.00 47.00 \n",
"38855868 F 7299 93.00 22.00 \n",
"273700859 F 2483 91.00 6.00 \n",
"3273220608 F 1841 85.00 14.00 "
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"female_journalists_reply_summary_df = journalists_reply_summary_df[journalists_reply_summary_df.gender == 'F']\n",
"female_journalists_reply_summary_df.to_csv('output/female_journalists_replied_to_by_journalists.csv')\n",
"female_journalists_reply_summary_df[journalist_reply_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how many times do journalists reply to each female journalist?"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reply_to_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 993.00 | \n",
"
\n",
" \n",
" mean | \n",
" 10.28 | \n",
"
\n",
" \n",
" std | \n",
" 31.00 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 0.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 1.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 6.00 | \n",
"
\n",
" \n",
" max | \n",
" 340.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reply_to_count\n",
"count 993.00\n",
"mean 10.28\n",
"std 31.00\n",
"min 0.00\n",
"25% 0.00\n",
"50% 1.00\n",
"75% 6.00\n",
"max 340.00"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"female_journalists_reply_summary_df[['reply_to_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists replying to male journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists replying to male journalists, which male journalists are replied to the most?"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3817401 | \n",
" ericgeller | \n",
" Geller, Eric | \n",
" Politico | \n",
" M | \n",
" 58173 | \n",
" 1,980.00 | \n",
" 75.00 | \n",
"
\n",
" \n",
" 22891564 | \n",
" chrisgeidner | \n",
" Geidner, Chris | \n",
" BuzzFeed | \n",
" M | \n",
" 83316 | \n",
" 1,901.00 | \n",
" 37.00 | \n",
"
\n",
" \n",
" 118130765 | \n",
" dylanlscott | \n",
" Scott, Dylan L. | \n",
" Stat News | \n",
" M | \n",
" 20122 | \n",
" 1,091.00 | \n",
" 65.00 | \n",
"
\n",
" \n",
" 19576571 | \n",
" JaredRizzi | \n",
" Rizzi, Jared | \n",
" Sirius XM Satellite Radio | \n",
" M | \n",
" 13545 | \n",
" 750.00 | \n",
" 46.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 745.00 | \n",
" 93.00 | \n",
"
\n",
" \n",
" 275207082 | \n",
" AlexParkerDC | \n",
" Parker, Alexander M. | \n",
" Bloomberg BNA | \n",
" M | \n",
" 3828 | \n",
" 720.00 | \n",
" 23.00 | \n",
"
\n",
" \n",
" 19847765 | \n",
" sahilkapur | \n",
" Kapur, Sahil | \n",
" Bloomberg News | \n",
" M | \n",
" 69086 | \n",
" 662.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 583821006 | \n",
" jseldin | \n",
" Seldin, Jeff | \n",
" Voice of America | \n",
" M | \n",
" 5365 | \n",
" 653.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 398088661 | \n",
" MEPFuller | \n",
" Fuller, Matt E. | \n",
" Huffington Post | \n",
" M | \n",
" 77919 | \n",
" 522.00 | \n",
" 92.00 | \n",
"
\n",
" \n",
" 44951698 | \n",
" amaxsmith | \n",
" Smith, Max | \n",
" WTOP Radio | \n",
" M | \n",
" 4726 | \n",
" 498.00 | \n",
" 6.00 | \n",
"
\n",
" \n",
" 225265639 | \n",
" ddale8 | \n",
" Dale, Daniel | \n",
" Toronto Star | \n",
" M | \n",
" 180671 | \n",
" 495.00 | \n",
" 20.00 | \n",
"
\n",
" \n",
" 317980134 | \n",
" CraigCaplan | \n",
" Caplan, Craig | \n",
" C–SPAN | \n",
" M | \n",
" 6143 | \n",
" 388.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 15365623 | \n",
" benjamin_oc | \n",
" O’Connell, Benjamin | \n",
" C–SPAN | \n",
" M | \n",
" 1455 | \n",
" 322.00 | \n",
" 11.00 | \n",
"
\n",
" \n",
" 227790723 | \n",
" RichardRubinDC | \n",
" Rubin, Richard | \n",
" Bloomberg News | \n",
" M | \n",
" 13015 | \n",
" 305.00 | \n",
" 41.00 | \n",
"
\n",
" \n",
" 103016675 | \n",
" AaronMehta | \n",
" Mehta, Aaron | \n",
" Sightline Media Group | \n",
" M | \n",
" 11124 | \n",
" 304.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 21810329 | \n",
" sdonnan | \n",
" Donnan, Shawn | \n",
" Financial Times | \n",
" M | \n",
" 12311 | \n",
" 304.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 90478926 | \n",
" MikeSacksEsq | \n",
" Sacks, Mike | \n",
" Scripps Howard News Service | \n",
" M | \n",
" 9289 | \n",
" 299.00 | \n",
" 18.00 | \n",
"
\n",
" \n",
" 16459325 | \n",
" ryanbeckwith | \n",
" Beckwith, Ryan Teague | \n",
" Time Magazine | \n",
" M | \n",
" 20947 | \n",
" 297.00 | \n",
" 49.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 283.00 | \n",
" 72.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 269.00 | \n",
" 45.00 | \n",
"
\n",
" \n",
" 21696279 | \n",
" brianbeutler | \n",
" Beutler, Brian Alfred | \n",
" New Republic | \n",
" M | \n",
" 74435 | \n",
" 269.00 | \n",
" 34.00 | \n",
"
\n",
" \n",
" 190360266 | \n",
" connorobrienNH | \n",
" O’Brien, Connor | \n",
" Politico | \n",
" M | \n",
" 6158 | \n",
" 241.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 63717541 | \n",
" phillyrich1 | \n",
" Weinstein, Richard | \n",
" C–SPAN | \n",
" M | \n",
" 3827 | \n",
" 241.00 | \n",
" 4.00 | \n",
"
\n",
" \n",
" 407013776 | \n",
" burgessev | \n",
" Everett, John B. | \n",
" Politico | \n",
" M | \n",
" 31010 | \n",
" 238.00 | \n",
" 79.00 | \n",
"
\n",
" \n",
" 80111587 | \n",
" JeffYoung | \n",
" Young, Jeffrey | \n",
" Huffington Post | \n",
" M | \n",
" 26497 | \n",
" 238.00 | \n",
" 31.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"3817401 ericgeller Geller, Eric Politico \n",
"22891564 chrisgeidner Geidner, Chris BuzzFeed \n",
"118130765 dylanlscott Scott, Dylan L. Stat News \n",
"19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n",
"46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n",
"275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n",
"19847765 sahilkapur Kapur, Sahil Bloomberg News \n",
"583821006 jseldin Seldin, Jeff Voice of America \n",
"398088661 MEPFuller Fuller, Matt E. Huffington Post \n",
"44951698 amaxsmith Smith, Max WTOP Radio \n",
"225265639 ddale8 Dale, Daniel Toronto Star \n",
"317980134 CraigCaplan Caplan, Craig C–SPAN \n",
"15365623 benjamin_oc O’Connell, Benjamin C–SPAN \n",
"227790723 RichardRubinDC Rubin, Richard Bloomberg News \n",
"103016675 AaronMehta Mehta, Aaron Sightline Media Group \n",
"21810329 sdonnan Donnan, Shawn Financial Times \n",
"90478926 MikeSacksEsq Sacks, Mike Scripps Howard News Service \n",
"16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n",
"21252618 JakeSherman Sherman, Jacob S. Politico \n",
"11771512 OKnox Knox, Olivier Yahoo News \n",
"21696279 brianbeutler Beutler, Brian Alfred New Republic \n",
"190360266 connorobrienNH O’Brien, Connor Politico \n",
"63717541 phillyrich1 Weinstein, Richard C–SPAN \n",
"407013776 burgessev Everett, John B. Politico \n",
"80111587 JeffYoung Young, Jeffrey Huffington Post \n",
"\n",
" gender followers_count reply_to_count replying_count \n",
"user_id \n",
"3817401 M 58173 1,980.00 75.00 \n",
"22891564 M 83316 1,901.00 37.00 \n",
"118130765 M 20122 1,091.00 65.00 \n",
"19576571 M 13545 750.00 46.00 \n",
"46557945 M 55762 745.00 93.00 \n",
"275207082 M 3828 720.00 23.00 \n",
"19847765 M 69086 662.00 35.00 \n",
"583821006 M 5365 653.00 2.00 \n",
"398088661 M 77919 522.00 92.00 \n",
"44951698 M 4726 498.00 6.00 \n",
"225265639 M 180671 495.00 20.00 \n",
"317980134 M 6143 388.00 8.00 \n",
"15365623 M 1455 322.00 11.00 \n",
"227790723 M 13015 305.00 41.00 \n",
"103016675 M 11124 304.00 35.00 \n",
"21810329 M 12311 304.00 7.00 \n",
"90478926 M 9289 299.00 18.00 \n",
"16459325 M 20947 297.00 49.00 \n",
"21252618 M 81762 283.00 72.00 \n",
"11771512 M 44715 269.00 45.00 \n",
"21696279 M 74435 269.00 34.00 \n",
"190360266 M 6158 241.00 35.00 \n",
"63717541 M 3827 241.00 4.00 \n",
"407013776 M 31010 238.00 79.00 \n",
"80111587 M 26497 238.00 31.00 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"male_journalists_reply_summary_df = journalists_reply_summary_df[journalists_reply_summary_df.gender == 'M']\n",
"male_journalists_reply_summary_df.to_csv('output/male_journalists_replied_to_by_journalists.csv')\n",
"male_journalists_reply_summary_df[journalist_reply_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how often do journalists reply to each male journalist?"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reply_to_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 1,299.00 | \n",
"
\n",
" \n",
" mean | \n",
" 25.54 | \n",
"
\n",
" \n",
" std | \n",
" 104.71 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 0.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 1.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 11.00 | \n",
"
\n",
" \n",
" max | \n",
" 1,980.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reply_to_count\n",
"count 1,299.00\n",
"mean 25.54\n",
"std 104.71\n",
"min 0.00\n",
"25% 0.00\n",
"50% 1.00\n",
"75% 11.00\n",
"max 1,980.00"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"male_journalists_reply_summary_df[['reply_to_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Female journalists replying to journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of female journalists replying to journalists, who do they reply to the most?"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 906734342 | \n",
" KimberlyRobinsn | \n",
" Robinson, Kimberly S. | \n",
" Bloomberg BNA | \n",
" F | \n",
" 7170 | \n",
" 313.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 52392666 | \n",
" ZoeTillman | \n",
" Tillman, Zoe | \n",
" BuzzFeed | \n",
" F | \n",
" 15246 | \n",
" 305.00 | \n",
" 3.00 | \n",
"
\n",
" \n",
" 16061946 | \n",
" kelmej | \n",
" Mejdrich, Kellie | \n",
" CQ Roll Call | \n",
" F | \n",
" 4146 | \n",
" 295.00 | \n",
" 15.00 | \n",
"
\n",
" \n",
" 83462293 | \n",
" SarahMMimms | \n",
" Mimms, Sarah | \n",
" BuzzFeed | \n",
" F | \n",
" 6216 | \n",
" 195.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 21212087 | \n",
" Olivianuzzi | \n",
" Nuzzi, Olivia | \n",
" New York | \n",
" F | \n",
" 136276 | \n",
" 190.00 | \n",
" 9.00 | \n",
"
\n",
" \n",
" 3372900155 | \n",
" samtayrey | \n",
" Reyes, Samantha | \n",
" CNN | \n",
" F | \n",
" 10344 | \n",
" 179.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 96405362 | \n",
" laurenonthehill | \n",
" Camera, Lauren S. | \n",
" U.S. News & World Report | \n",
" F | \n",
" 3396 | \n",
" 159.00 | \n",
" 5.00 | \n",
"
\n",
" \n",
" 18825339 | \n",
" CahnEmily | \n",
" Cahn, Emily | \n",
" Mic | \n",
" F | \n",
" 16980 | \n",
" 148.00 | \n",
" 18.00 | \n",
"
\n",
" \n",
" 1132012321 | \n",
" DaniellaMicaela | \n",
" Diaz, Daniella | \n",
" CNN | \n",
" F | \n",
" 14612 | \n",
" 144.00 | \n",
" 22.00 | \n",
"
\n",
" \n",
" 16812908 | \n",
" crousselle | \n",
" Rousselle, Christine | \n",
" Townhall | \n",
" F | \n",
" 5327 | \n",
" 144.00 | \n",
" 3.00 | \n",
"
\n",
" \n",
" 47758416 | \n",
" marissaaevans | \n",
" Evans, Marissa | \n",
" Texas Tribune | \n",
" F | \n",
" 6850 | \n",
" 137.00 | \n",
" 1.00 | \n",
"
\n",
" \n",
" 36607254 | \n",
" Oriana0214 | \n",
" Pawlyk, Oriana | \n",
" Military.com | \n",
" F | \n",
" 6397 | \n",
" 133.00 | \n",
" 5.00 | \n",
"
\n",
" \n",
" 16434028 | \n",
" gabbilevy | \n",
" Levy, Gabrielle F. | \n",
" U.S. News & World Report | \n",
" F | \n",
" 2209 | \n",
" 130.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 19186003 | \n",
" seungminkim | \n",
" Kim, Seung Min | \n",
" Politico | \n",
" F | \n",
" 33980 | \n",
" 108.00 | \n",
" 36.00 | \n",
"
\n",
" \n",
" 45399148 | \n",
" jeneps | \n",
" Epstein, Jennifer | \n",
" Bloomberg News | \n",
" F | \n",
" 61242 | \n",
" 103.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
" 158072303 | \n",
" ValerieInsinna | \n",
" Insinna, Valerie | \n",
" Defense News | \n",
" F | \n",
" 4572 | \n",
" 97.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 313545488 | \n",
" LauraLitvan | \n",
" Litvan, Laura | \n",
" Bloomberg News | \n",
" F | \n",
" 4468 | \n",
" 97.00 | \n",
" 5.00 | \n",
"
\n",
" \n",
" 18501487 | \n",
" leighmunsil | \n",
" Munsil, Leigh | \n",
" CNN | \n",
" F | \n",
" 11059 | \n",
" 88.00 | \n",
" 13.00 | \n",
"
\n",
" \n",
" 273700859 | \n",
" kpolantz | \n",
" Polantz, Katelyn J. | \n",
" National Law Journal | \n",
" F | \n",
" 2483 | \n",
" 84.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 114670081 | \n",
" rebleber | \n",
" Leber, Rebecca J. | \n",
" Mother Jones | \n",
" F | \n",
" 16467 | \n",
" 79.00 | \n",
" 3.00 | \n",
"
\n",
" \n",
" 407013776 | \n",
" burgessev | \n",
" Everett, John B. | \n",
" Politico | \n",
" M | \n",
" 31010 | \n",
" 78.00 | \n",
" 30.00 | \n",
"
\n",
" \n",
" 118130765 | \n",
" dylanlscott | \n",
" Scott, Dylan L. | \n",
" Stat News | \n",
" M | \n",
" 20122 | \n",
" 78.00 | \n",
" 20.00 | \n",
"
\n",
" \n",
" 116341480 | \n",
" RosieGray | \n",
" Gray, Rosie | \n",
" The Atlantic | \n",
" F | \n",
" 96935 | \n",
" 73.00 | \n",
" 13.00 | \n",
"
\n",
" \n",
" 103016675 | \n",
" AaronMehta | \n",
" Mehta, Aaron | \n",
" Sightline Media Group | \n",
" M | \n",
" 11124 | \n",
" 72.00 | \n",
" 10.00 | \n",
"
\n",
" \n",
" 48038024 | \n",
" karentravers | \n",
" Travers, Karen | \n",
" ABC News | \n",
" F | \n",
" 17155 | \n",
" 71.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"906734342 KimberlyRobinsn Robinson, Kimberly S. Bloomberg BNA \n",
"52392666 ZoeTillman Tillman, Zoe BuzzFeed \n",
"16061946 kelmej Mejdrich, Kellie CQ Roll Call \n",
"83462293 SarahMMimms Mimms, Sarah BuzzFeed \n",
"21212087 Olivianuzzi Nuzzi, Olivia New York \n",
"3372900155 samtayrey Reyes, Samantha CNN \n",
"96405362 laurenonthehill Camera, Lauren S. U.S. News & World Report \n",
"18825339 CahnEmily Cahn, Emily Mic \n",
"1132012321 DaniellaMicaela Diaz, Daniella CNN \n",
"16812908 crousselle Rousselle, Christine Townhall \n",
"47758416 marissaaevans Evans, Marissa Texas Tribune \n",
"36607254 Oriana0214 Pawlyk, Oriana Military.com \n",
"16434028 gabbilevy Levy, Gabrielle F. U.S. News & World Report \n",
"19186003 seungminkim Kim, Seung Min Politico \n",
"45399148 jeneps Epstein, Jennifer Bloomberg News \n",
"158072303 ValerieInsinna Insinna, Valerie Defense News \n",
"313545488 LauraLitvan Litvan, Laura Bloomberg News \n",
"18501487 leighmunsil Munsil, Leigh CNN \n",
"273700859 kpolantz Polantz, Katelyn J. National Law Journal \n",
"114670081 rebleber Leber, Rebecca J. Mother Jones \n",
"407013776 burgessev Everett, John B. Politico \n",
"118130765 dylanlscott Scott, Dylan L. Stat News \n",
"116341480 RosieGray Gray, Rosie The Atlantic \n",
"103016675 AaronMehta Mehta, Aaron Sightline Media Group \n",
"48038024 karentravers Travers, Karen ABC News \n",
"\n",
" gender followers_count reply_to_count replying_count \n",
"user_id \n",
"906734342 F 7170 313.00 2.00 \n",
"52392666 F 15246 305.00 3.00 \n",
"16061946 F 4146 295.00 15.00 \n",
"83462293 F 6216 195.00 7.00 \n",
"21212087 F 136276 190.00 9.00 \n",
"3372900155 F 10344 179.00 7.00 \n",
"96405362 F 3396 159.00 5.00 \n",
"18825339 F 16980 148.00 18.00 \n",
"1132012321 F 14612 144.00 22.00 \n",
"16812908 F 5327 144.00 3.00 \n",
"47758416 F 6850 137.00 1.00 \n",
"36607254 F 6397 133.00 5.00 \n",
"16434028 F 2209 130.00 2.00 \n",
"19186003 F 33980 108.00 36.00 \n",
"45399148 F 61242 103.00 7.00 \n",
"158072303 F 4572 97.00 8.00 \n",
"313545488 F 4468 97.00 5.00 \n",
"18501487 F 11059 88.00 13.00 \n",
"273700859 F 2483 84.00 2.00 \n",
"114670081 F 16467 79.00 3.00 \n",
"407013776 M 31010 78.00 30.00 \n",
"118130765 M 20122 78.00 20.00 \n",
"116341480 F 96935 73.00 13.00 \n",
"103016675 M 11124 72.00 10.00 \n",
"48038024 F 17155 71.00 7.00 "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_replied_to_by_female_summary_df = journalist_reply_summary(journalists_reply_df[journalists_reply_df.gender == 'F'])\n",
"journalists_replied_to_by_female_summary_df.to_csv('output/journalists_replied_to_by_female_journalists.csv')\n",
"journalists_replied_to_by_female_summary_df[journalist_reply_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of female journalists replying to journalists, how many males / females do they reply to?"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_replies | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" F | \n",
" 7412 | \n",
" 72.1% | \n",
" 7.46 | \n",
"
\n",
" \n",
" M | \n",
" 2864 | \n",
" 27.9% | \n",
" 2.20 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_replies\n",
"index \n",
"F 7412 72.1% 7.46\n",
"M 2864 27.9% 2.20"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_reply_gender_summary(journalists_reply_df[journalists_reply_df.gender == 'F'])\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Male journalists replying to journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of male journalists replying to journalists, who do they reply to the most?"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" reply_to_count | \n",
" replying_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 3817401 | \n",
" ericgeller | \n",
" Geller, Eric | \n",
" Politico | \n",
" M | \n",
" 58173 | \n",
" 1,926.00 | \n",
" 58.00 | \n",
"
\n",
" \n",
" 22891564 | \n",
" chrisgeidner | \n",
" Geidner, Chris | \n",
" BuzzFeed | \n",
" M | \n",
" 83316 | \n",
" 1,864.00 | \n",
" 28.00 | \n",
"
\n",
" \n",
" 118130765 | \n",
" dylanlscott | \n",
" Scott, Dylan L. | \n",
" Stat News | \n",
" M | \n",
" 20122 | \n",
" 1,013.00 | \n",
" 45.00 | \n",
"
\n",
" \n",
" 19576571 | \n",
" JaredRizzi | \n",
" Rizzi, Jared | \n",
" Sirius XM Satellite Radio | \n",
" M | \n",
" 13545 | \n",
" 726.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 275207082 | \n",
" AlexParkerDC | \n",
" Parker, Alexander M. | \n",
" Bloomberg BNA | \n",
" M | \n",
" 3828 | \n",
" 709.00 | \n",
" 20.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 686.00 | \n",
" 61.00 | \n",
"
\n",
" \n",
" 583821006 | \n",
" jseldin | \n",
" Seldin, Jeff | \n",
" Voice of America | \n",
" M | \n",
" 5365 | \n",
" 653.00 | \n",
" 2.00 | \n",
"
\n",
" \n",
" 19847765 | \n",
" sahilkapur | \n",
" Kapur, Sahil | \n",
" Bloomberg News | \n",
" M | \n",
" 69086 | \n",
" 646.00 | \n",
" 24.00 | \n",
"
\n",
" \n",
" 44951698 | \n",
" amaxsmith | \n",
" Smith, Max | \n",
" WTOP Radio | \n",
" M | \n",
" 4726 | \n",
" 495.00 | \n",
" 4.00 | \n",
"
\n",
" \n",
" 225265639 | \n",
" ddale8 | \n",
" Dale, Daniel | \n",
" Toronto Star | \n",
" M | \n",
" 180671 | \n",
" 490.00 | \n",
" 16.00 | \n",
"
\n",
" \n",
" 398088661 | \n",
" MEPFuller | \n",
" Fuller, Matt E. | \n",
" Huffington Post | \n",
" M | \n",
" 77919 | \n",
" 456.00 | \n",
" 64.00 | \n",
"
\n",
" \n",
" 317980134 | \n",
" CraigCaplan | \n",
" Caplan, Craig | \n",
" C–SPAN | \n",
" M | \n",
" 6143 | \n",
" 388.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 15365623 | \n",
" benjamin_oc | \n",
" O’Connell, Benjamin | \n",
" C–SPAN | \n",
" M | \n",
" 1455 | \n",
" 318.00 | \n",
" 8.00 | \n",
"
\n",
" \n",
" 21810329 | \n",
" sdonnan | \n",
" Donnan, Shawn | \n",
" Financial Times | \n",
" M | \n",
" 12311 | \n",
" 303.00 | \n",
" 6.00 | \n",
"
\n",
" \n",
" 90478926 | \n",
" MikeSacksEsq | \n",
" Sacks, Mike | \n",
" Scripps Howard News Service | \n",
" M | \n",
" 9289 | \n",
" 294.00 | \n",
" 13.00 | \n",
"
\n",
" \n",
" 227790723 | \n",
" RichardRubinDC | \n",
" Rubin, Richard | \n",
" Bloomberg News | \n",
" M | \n",
" 13015 | \n",
" 284.00 | \n",
" 33.00 | \n",
"
\n",
" \n",
" 21696279 | \n",
" brianbeutler | \n",
" Beutler, Brian Alfred | \n",
" New Republic | \n",
" M | \n",
" 74435 | \n",
" 262.00 | \n",
" 29.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 249.00 | \n",
" 52.00 | \n",
"
\n",
" \n",
" 16459325 | \n",
" ryanbeckwith | \n",
" Beckwith, Ryan Teague | \n",
" Time Magazine | \n",
" M | \n",
" 20947 | \n",
" 241.00 | \n",
" 30.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 240.00 | \n",
" 35.00 | \n",
"
\n",
" \n",
" 63717541 | \n",
" phillyrich1 | \n",
" Weinstein, Richard | \n",
" C–SPAN | \n",
" M | \n",
" 3827 | \n",
" 240.00 | \n",
" 3.00 | \n",
"
\n",
" \n",
" 103016675 | \n",
" AaronMehta | \n",
" Mehta, Aaron | \n",
" Sightline Media Group | \n",
" M | \n",
" 11124 | \n",
" 232.00 | \n",
" 25.00 | \n",
"
\n",
" \n",
" 26559241 | \n",
" fordm | \n",
" Ford, Matt S. | \n",
" The Atlantic | \n",
" M | \n",
" 27571 | \n",
" 232.00 | \n",
" 15.00 | \n",
"
\n",
" \n",
" 437019753 | \n",
" TimothyNoah1 | \n",
" Noah, Timothy R. | \n",
" Politico | \n",
" M | \n",
" 15090 | \n",
" 231.00 | \n",
" 12.00 | \n",
"
\n",
" \n",
" 23332846 | \n",
" mattzap | \n",
" Zapotosky, Matt | \n",
" Washington Post | \n",
" M | \n",
" 56887 | \n",
" 230.00 | \n",
" 7.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization \\\n",
"user_id \n",
"3817401 ericgeller Geller, Eric Politico \n",
"22891564 chrisgeidner Geidner, Chris BuzzFeed \n",
"118130765 dylanlscott Scott, Dylan L. Stat News \n",
"19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n",
"275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n",
"46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n",
"583821006 jseldin Seldin, Jeff Voice of America \n",
"19847765 sahilkapur Kapur, Sahil Bloomberg News \n",
"44951698 amaxsmith Smith, Max WTOP Radio \n",
"225265639 ddale8 Dale, Daniel Toronto Star \n",
"398088661 MEPFuller Fuller, Matt E. Huffington Post \n",
"317980134 CraigCaplan Caplan, Craig C–SPAN \n",
"15365623 benjamin_oc O’Connell, Benjamin C–SPAN \n",
"21810329 sdonnan Donnan, Shawn Financial Times \n",
"90478926 MikeSacksEsq Sacks, Mike Scripps Howard News Service \n",
"227790723 RichardRubinDC Rubin, Richard Bloomberg News \n",
"21696279 brianbeutler Beutler, Brian Alfred New Republic \n",
"21252618 JakeSherman Sherman, Jacob S. Politico \n",
"16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n",
"11771512 OKnox Knox, Olivier Yahoo News \n",
"63717541 phillyrich1 Weinstein, Richard C–SPAN \n",
"103016675 AaronMehta Mehta, Aaron Sightline Media Group \n",
"26559241 fordm Ford, Matt S. The Atlantic \n",
"437019753 TimothyNoah1 Noah, Timothy R. Politico \n",
"23332846 mattzap Zapotosky, Matt Washington Post \n",
"\n",
" gender followers_count reply_to_count replying_count \n",
"user_id \n",
"3817401 M 58173 1,926.00 58.00 \n",
"22891564 M 83316 1,864.00 28.00 \n",
"118130765 M 20122 1,013.00 45.00 \n",
"19576571 M 13545 726.00 35.00 \n",
"275207082 M 3828 709.00 20.00 \n",
"46557945 M 55762 686.00 61.00 \n",
"583821006 M 5365 653.00 2.00 \n",
"19847765 M 69086 646.00 24.00 \n",
"44951698 M 4726 495.00 4.00 \n",
"225265639 M 180671 490.00 16.00 \n",
"398088661 M 77919 456.00 64.00 \n",
"317980134 M 6143 388.00 8.00 \n",
"15365623 M 1455 318.00 8.00 \n",
"21810329 M 12311 303.00 6.00 \n",
"90478926 M 9289 294.00 13.00 \n",
"227790723 M 13015 284.00 33.00 \n",
"21696279 M 74435 262.00 29.00 \n",
"21252618 M 81762 249.00 52.00 \n",
"16459325 M 20947 241.00 30.00 \n",
"11771512 M 44715 240.00 35.00 \n",
"63717541 M 3827 240.00 3.00 \n",
"103016675 M 11124 232.00 25.00 \n",
"26559241 M 27571 232.00 15.00 \n",
"437019753 M 15090 231.00 12.00 \n",
"23332846 M 56887 230.00 7.00 "
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalists_replied_to_by_male_summary_df = journalist_reply_summary(journalists_reply_df[journalists_reply_df.gender == 'M'])\n",
"journalists_replied_to_by_male_summary_df.to_csv('output/journalists_replied_to_by_male_journalists.csv')\n",
"journalists_replied_to_by_male_summary_df[journalist_reply_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of male journalists replying to journalists, how many are male / female?"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_replies | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 30314 | \n",
" 91.5% | \n",
" 23.34 | \n",
"
\n",
" \n",
" F | \n",
" 2800 | \n",
" 8.5% | \n",
" 2.82 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_replies\n",
"index \n",
"M 30314 91.5% 23.34\n",
"F 2800 8.5% 2.82"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_reply_gender_summary(journalists_reply_df[journalists_reply_df.gender == 'M'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Following data prep"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load following\n",
"Users that are followed by beltway journalists"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"follower_user_id 3417018\n",
"followed_user_id 3417018\n",
"dtype: int64"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base_follower_to_followed_df = pd.read_csv('source_data/follower_to_followed.csv', \n",
" names=['follower_user_id', 'followed_user_id'],\n",
" dtype={'follower_user_id': np.str, 'followed_user_id': np.str})\n",
"base_follower_to_followed_df.drop_duplicates(inplace=True)\n",
"base_follower_to_followed_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" follower_user_id | \n",
" followed_user_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 91156486 | \n",
" 3092427779 | \n",
"
\n",
" \n",
" 1 | \n",
" 91156486 | \n",
" 36953109 | \n",
"
\n",
" \n",
" 2 | \n",
" 91156486 | \n",
" 424274008 | \n",
"
\n",
" \n",
" 3 | \n",
" 91156486 | \n",
" 779044378929168384 | \n",
"
\n",
" \n",
" 4 | \n",
" 91156486 | \n",
" 339834914 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" follower_user_id followed_user_id\n",
"0 91156486 3092427779\n",
"1 91156486 36953109\n",
"2 91156486 424274008\n",
"3 91156486 779044378929168384\n",
"4 91156486 339834914"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"base_follower_to_followed_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" organization | \n",
" position | \n",
" gender | \n",
" followers_count | \n",
" following_count | \n",
" tweet_count | \n",
" user_created_at | \n",
" verified | \n",
" protected | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 20711445 | \n",
" Glinski, Nina | \n",
" NaN | \n",
" Freelance Reporter | \n",
" F | \n",
" 963 | \n",
" 507 | \n",
" 909 | \n",
" Thu Feb 12 20:00:53 +0000 2009 | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
" 258917371 | \n",
" Enders, David | \n",
" NaN | \n",
" Journalist | \n",
" M | \n",
" 1444 | \n",
" 484 | \n",
" 6296 | \n",
" Mon Feb 28 19:52:03 +0000 2011 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 297046834 | \n",
" Barakat, Matthew | \n",
" Associated Press | \n",
" Northern Virginia Correspondent | \n",
" M | \n",
" 759 | \n",
" 352 | \n",
" 631 | \n",
" Wed May 11 20:55:24 +0000 2011 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 455585786 | \n",
" Atkins, Kimberly | \n",
" Boston Herald | \n",
" Chief Washington Reporter/Columnist | \n",
" F | \n",
" 2944 | \n",
" 2691 | \n",
" 6277 | \n",
" Thu Jan 05 08:26:46 +0000 2012 | \n",
" True | \n",
" False | \n",
"
\n",
" \n",
" 42584840 | \n",
" Vlahou, Toula | \n",
" CQ Roll Call | \n",
" Editor & Podcast Producer | \n",
" F | \n",
" 2703 | \n",
" 201 | \n",
" 6366 | \n",
" Tue May 26 07:41:38 +0000 2009 | \n",
" False | \n",
" False | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name organization \\\n",
"user_id \n",
"20711445 Glinski, Nina NaN \n",
"258917371 Enders, David NaN \n",
"297046834 Barakat, Matthew Associated Press \n",
"455585786 Atkins, Kimberly Boston Herald \n",
"42584840 Vlahou, Toula CQ Roll Call \n",
"\n",
" position gender followers_count \\\n",
"user_id \n",
"20711445 Freelance Reporter F 963 \n",
"258917371 Journalist M 1444 \n",
"297046834 Northern Virginia Correspondent M 759 \n",
"455585786 Chief Washington Reporter/Columnist F 2944 \n",
"42584840 Editor & Podcast Producer F 2703 \n",
"\n",
" following_count tweet_count user_created_at \\\n",
"user_id \n",
"20711445 507 909 Thu Feb 12 20:00:53 +0000 2009 \n",
"258917371 484 6296 Mon Feb 28 19:52:03 +0000 2011 \n",
"297046834 352 631 Wed May 11 20:55:24 +0000 2011 \n",
"455585786 2691 6277 Thu Jan 05 08:26:46 +0000 2012 \n",
"42584840 201 6366 Tue May 26 07:41:38 +0000 2009 \n",
"\n",
" verified protected \n",
"user_id \n",
"20711445 False False \n",
"258917371 True False \n",
"297046834 True False \n",
"455585786 True False \n",
"42584840 False False "
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"user_info_df.head()"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"follower_user_id 3311406\n",
"followed_user_id 3311406\n",
"gender 3311406\n",
"dtype: int64"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This will drop followers of journalists that have no tweets\n",
"follower_to_followed_df = base_follower_to_followed_df.join(user_summary_df['gender'], on='follower_user_id', how='inner')\n",
"follower_to_followed_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" follower_user_id | \n",
" followed_user_id | \n",
" gender | \n",
"
\n",
" \n",
" \n",
" \n",
" 261 | \n",
" 15219888 | \n",
" 3291076716 | \n",
" F | \n",
"
\n",
" \n",
" 262 | \n",
" 15219888 | \n",
" 119175339 | \n",
" F | \n",
"
\n",
" \n",
" 263 | \n",
" 15219888 | \n",
" 418837047 | \n",
" F | \n",
"
\n",
" \n",
" 264 | \n",
" 15219888 | \n",
" 259817885 | \n",
" F | \n",
"
\n",
" \n",
" 265 | \n",
" 15219888 | \n",
" 287263845 | \n",
" F | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" follower_user_id followed_user_id gender\n",
"261 15219888 3291076716 F\n",
"262 15219888 119175339 F\n",
"263 15219888 418837047 F\n",
"264 15219888 259817885 F\n",
"265 15219888 287263845 F"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_followed_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load followed users"
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 17665874 | \n",
" onlinehigh | \n",
"
\n",
" \n",
" 2389275799 | \n",
" HLSPOLICY | \n",
"
\n",
" \n",
" 314728983 | \n",
" Veolia_NA | \n",
"
\n",
" \n",
" 239409802 | \n",
" fishingbuk | \n",
"
\n",
" \n",
" 522799320 | \n",
" GoldsmithBev | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name\n",
"user_id \n",
"17665874 onlinehigh\n",
"2389275799 HLSPOLICY\n",
"314728983 Veolia_NA\n",
"239409802 fishingbuk\n",
"522799320 GoldsmithBev"
]
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"followed_screen_name_lookup_df = pd.read_csv('source_data/followed.csv', \n",
" names=['screen_name', 'user_id'],\n",
" dtype={'user_id': np.str}).set_index(['user_id'])\n",
"followed_screen_name_lookup_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Limit to beltway journalists"
]
},
{
"cell_type": "code",
"execution_count": 40,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"follower_user_id 280340\n",
"followed_user_id 280340\n",
"gender 280340\n",
"followed_gender 280340\n",
"dtype: int64"
]
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_journalist_followed_df = follower_to_followed_df.join(user_summary_df['gender'], how='inner', on='followed_user_id', rsuffix='_followed')\n",
"follower_to_journalist_followed_df.rename(columns = {'gender_followed': 'followed_gender'}, inplace=True)\n",
"follower_to_journalist_followed_df.count()"
]
},
{
"cell_type": "code",
"execution_count": 41,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" follower_user_id | \n",
" followed_user_id | \n",
" gender | \n",
" followed_gender | \n",
"
\n",
" \n",
" \n",
" \n",
" 287 | \n",
" 15219888 | \n",
" 46582653 | \n",
" F | \n",
" M | \n",
"
\n",
" \n",
" 21810 | \n",
" 15780280 | \n",
" 46582653 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 24153 | \n",
" 14245722 | \n",
" 46582653 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
" 40694 | \n",
" 37865281 | \n",
" 46582653 | \n",
" F | \n",
" M | \n",
"
\n",
" \n",
" 66585 | \n",
" 165204211 | \n",
" 46582653 | \n",
" M | \n",
" M | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" follower_user_id followed_user_id gender followed_gender\n",
"287 15219888 46582653 F M\n",
"21810 15780280 46582653 M M\n",
"24153 14245722 46582653 M M\n",
"40694 37865281 46582653 F M\n",
"66585 165204211 46582653 M M"
]
},
"execution_count": 41,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_journalist_followed_df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Functions for summarizing following by beltway journalists"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"# Gender of beltway journalists followed by beltway journalists\n",
"def journalist_followed_gender_summary(follower_to_followed_df):\n",
" gender_summary_df = pd.DataFrame({'count':follower_to_followed_df.followed_gender.value_counts(), \n",
" 'percentage': follower_to_followed_df.followed_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
" gender_summary_df.reset_index(inplace=True)\n",
" gender_summary_df['avg_followed'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1)\n",
" gender_summary_df.set_index('index', inplace=True, drop=True)\n",
" return gender_summary_df\n",
"\n",
"def journalist_following_summary(follower_to_followed_df):\n",
" # Following count\n",
" following_count_df = pd.DataFrame(follower_to_followed_df.followed_user_id.value_counts().rename('journalist_follower_count'))\n",
"\n",
" # Join with user summary\n",
" journalist_following_summary_df = user_summary_df.join(following_count_df)\n",
" journalist_following_summary_df.fillna(0, inplace=True)\n",
" journalist_following_summary_df = journalist_following_summary_df.sort_values(['journalist_follower_count', 'followers_count'], ascending=False)\n",
" return journalist_following_summary_df\n",
"\n",
"# Gender of top journalists followed by beltway journalists\n",
"def top_journalist_followed_gender_summary(followed_summary_df, head=100):\n",
" top_followed_summary_df = followed_summary_df.head(head)\n",
" return pd.DataFrame({'count': top_followed_summary_df.gender.value_counts(), \n",
" 'percentage': top_followed_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
"\n",
"# Fields for displaying journalist mention summaries\n",
"journalist_following_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'journalist_follower_count']\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Following analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists following all accounts"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists following all accounts, who do they follow the most?"
]
},
{
"cell_type": "code",
"execution_count": 43,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" following_count | \n",
" screen_name | \n",
"
\n",
" \n",
" \n",
" \n",
" 813286 | \n",
" 1671 | \n",
" BarackObama | \n",
"
\n",
" \n",
" 51241574 | \n",
" 1629 | \n",
" AP | \n",
"
\n",
" \n",
" 25073877 | \n",
" 1613 | \n",
" realDonaldTrump | \n",
"
\n",
" \n",
" 807095 | \n",
" 1581 | \n",
" nytimes | \n",
"
\n",
" \n",
" 2467791 | \n",
" 1532 | \n",
" washingtonpost | \n",
"
\n",
" \n",
" 1339835893 | \n",
" 1531 | \n",
" HillaryClinton | \n",
"
\n",
" \n",
" 818927131883356161 | \n",
" 1522 | \n",
" PressSec | \n",
"
\n",
" \n",
" 822215673812119553 | \n",
" 1507 | \n",
" WhiteHouse | \n",
"
\n",
" \n",
" 822215679726100480 | \n",
" 1488 | \n",
" POTUS | \n",
"
\n",
" \n",
" 9300262 | \n",
" 1457 | \n",
" politico | \n",
"
\n",
" \n",
" 30313925 | \n",
" 1402 | \n",
" ObamaWhiteHouse | \n",
"
\n",
" \n",
" 14246001 | \n",
" 1384 | \n",
" mikeallen | \n",
"
\n",
" \n",
" 93069110 | \n",
" 1368 | \n",
" maggieNYT | \n",
"
\n",
" \n",
" 14529929 | \n",
" 1337 | \n",
" jaketapper | \n",
"
\n",
" \n",
" 428333 | \n",
" 1289 | \n",
" cnnbrk | \n",
"
\n",
" \n",
" 3108351 | \n",
" 1279 | \n",
" WSJ | \n",
"
\n",
" \n",
" 1536791610 | \n",
" 1279 | \n",
" POTUS44 | \n",
"
\n",
" \n",
" 50325797 | \n",
" 1258 | \n",
" chucktodd | \n",
"
\n",
" \n",
" 113420831 | \n",
" 1258 | \n",
" PressSec44 | \n",
"
\n",
" \n",
" 16017475 | \n",
" 1234 | \n",
" NateSilver538 | \n",
"
\n",
" \n",
" 18622869 | \n",
" 1231 | \n",
" ezraklein | \n",
"
\n",
" \n",
" 86129724 | \n",
" 1173 | \n",
" costareports | \n",
"
\n",
" \n",
" 1652541 | \n",
" 1144 | \n",
" Reuters | \n",
"
\n",
" \n",
" 1330457336 | \n",
" 1128 | \n",
" billclinton | \n",
"
\n",
" \n",
" 5392522 | \n",
" 1124 | \n",
" NPR | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" following_count screen_name\n",
"813286 1671 BarackObama\n",
"51241574 1629 AP\n",
"25073877 1613 realDonaldTrump\n",
"807095 1581 nytimes\n",
"2467791 1532 washingtonpost\n",
"1339835893 1531 HillaryClinton\n",
"818927131883356161 1522 PressSec\n",
"822215673812119553 1507 WhiteHouse\n",
"822215679726100480 1488 POTUS\n",
"9300262 1457 politico\n",
"30313925 1402 ObamaWhiteHouse\n",
"14246001 1384 mikeallen\n",
"93069110 1368 maggieNYT\n",
"14529929 1337 jaketapper\n",
"428333 1289 cnnbrk\n",
"3108351 1279 WSJ\n",
"1536791610 1279 POTUS44\n",
"50325797 1258 chucktodd\n",
"113420831 1258 PressSec44\n",
"16017475 1234 NateSilver538\n",
"18622869 1231 ezraklein\n",
"86129724 1173 costareports\n",
"1652541 1144 Reuters\n",
"1330457336 1128 billclinton\n",
"5392522 1124 NPR"
]
},
"execution_count": 43,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Following count\n",
"all_followed_df = pd.DataFrame(follower_to_followed_df.followed_user_id.value_counts().rename('following_count')).join(followed_screen_name_lookup_df)\n",
"all_followed_df.to_csv('output/all_followed_by_journalists.csv')\n",
"all_followed_df.head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists following journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of all journalists following by journalists, who is followed the most?"
]
},
{
"cell_type": "code",
"execution_count": 44,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 14529929 | \n",
" jaketapper | \n",
" Tapper, Jake | \n",
" CNN | \n",
" M | \n",
" 1305680 | \n",
" 1,337.00 | \n",
"
\n",
" \n",
" 50325797 | \n",
" chucktodd | \n",
" Todd, Chuck | \n",
" NBC News | \n",
" M | \n",
" 1781247 | \n",
" 1,258.00 | \n",
"
\n",
" \n",
" 19107878 | \n",
" GlennThrush | \n",
" Thrush, Glenn H. | \n",
" New York Times | \n",
" M | \n",
" 308181 | \n",
" 1,116.00 | \n",
"
\n",
" \n",
" 31127446 | \n",
" markknoller | \n",
" Knoller, Mark | \n",
" CBS News | \n",
" M | \n",
" 301474 | \n",
" 1,107.00 | \n",
"
\n",
" \n",
" 13524182 | \n",
" daveweigel | \n",
" Weigel, David | \n",
" Washington Post | \n",
" M | \n",
" 332344 | \n",
" 1,106.00 | \n",
"
\n",
" \n",
" 61734492 | \n",
" Fahrenthold | \n",
" Fahrenthold, David | \n",
" Washington Post | \n",
" M | \n",
" 451778 | \n",
" 1,082.00 | \n",
"
\n",
" \n",
" 18678924 | \n",
" jmartNYT | \n",
" Martin, Jonathan | \n",
" New York Times | \n",
" M | \n",
" 197322 | \n",
" 1,032.00 | \n",
"
\n",
" \n",
" 39155029 | \n",
" mkraju | \n",
" Raju, Manu K. | \n",
" CNN | \n",
" M | \n",
" 88366 | \n",
" 977.00 | \n",
"
\n",
" \n",
" 16930125 | \n",
" edatpost | \n",
" O’Keefe, Edward | \n",
" Washington Post | \n",
" M | \n",
" 58670 | \n",
" 973.00 | \n",
"
\n",
" \n",
" 85131054 | \n",
" jeffzeleny | \n",
" Zeleny, Jeff | \n",
" CNN | \n",
" M | \n",
" 244114 | \n",
" 970.00 | \n",
"
\n",
" \n",
" 21316253 | \n",
" ZekeJMiller | \n",
" Miller, Zeke J. | \n",
" Time Magazine | \n",
" M | \n",
" 198517 | \n",
" 915.00 | \n",
"
\n",
" \n",
" 89820928 | \n",
" mitchellreports | \n",
" Mitchell, Andrea | \n",
" NBC News | \n",
" F | \n",
" 1388543 | \n",
" 909.00 | \n",
"
\n",
" \n",
" 59676104 | \n",
" danbalz | \n",
" Balz, Daniel | \n",
" Washington Post | \n",
" M | \n",
" 90819 | \n",
" 892.00 | \n",
"
\n",
" \n",
" 108617810 | \n",
" DanaBashCNN | \n",
" Bash, Dana | \n",
" CNN | \n",
" F | \n",
" 281861 | \n",
" 884.00 | \n",
"
\n",
" \n",
" 15463671 | \n",
" samstein | \n",
" Stein, Sam | \n",
" Huffington Post | \n",
" M | \n",
" 313211 | \n",
" 880.00 | \n",
"
\n",
" \n",
" 130945778 | \n",
" mollyesque | \n",
" Ball, Molly | \n",
" The Atlantic | \n",
" F | \n",
" 116857 | \n",
" 877.00 | \n",
"
\n",
" \n",
" 46176168 | \n",
" MajorCBS | \n",
" Garrett, Major | \n",
" CBS News | \n",
" M | \n",
" 178640 | \n",
" 872.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 868.00 | \n",
"
\n",
" \n",
" 16187637 | \n",
" ChadPergram | \n",
" Pergram, Chad | \n",
" Fox News | \n",
" M | \n",
" 59305 | \n",
" 866.00 | \n",
"
\n",
" \n",
" 22771961 | \n",
" Acosta | \n",
" Acosta, Jim | \n",
" CNN | \n",
" M | \n",
" 350650 | \n",
" 860.00 | \n",
"
\n",
" \n",
" 12354832 | \n",
" kasie | \n",
" Hunt, Kasie | \n",
" NBC News | \n",
" F | \n",
" 187357 | \n",
" 860.00 | \n",
"
\n",
" \n",
" 123327472 | \n",
" peterbakernyt | \n",
" Baker, Peter | \n",
" New York Times | \n",
" M | \n",
" 96956 | \n",
" 856.00 | \n",
"
\n",
" \n",
" 15931637 | \n",
" jonkarl | \n",
" Karl, Jonathan | \n",
" ABC News | \n",
" M | \n",
" 183467 | \n",
" 830.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 788.00 | \n",
"
\n",
" \n",
" 259395895 | \n",
" JohnJHarwood | \n",
" Harwood, John | \n",
" CNBC | \n",
" M | \n",
" 149040 | \n",
" 783.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization gender \\\n",
"user_id \n",
"14529929 jaketapper Tapper, Jake CNN M \n",
"50325797 chucktodd Todd, Chuck NBC News M \n",
"19107878 GlennThrush Thrush, Glenn H. New York Times M \n",
"31127446 markknoller Knoller, Mark CBS News M \n",
"13524182 daveweigel Weigel, David Washington Post M \n",
"61734492 Fahrenthold Fahrenthold, David Washington Post M \n",
"18678924 jmartNYT Martin, Jonathan New York Times M \n",
"39155029 mkraju Raju, Manu K. CNN M \n",
"16930125 edatpost O’Keefe, Edward Washington Post M \n",
"85131054 jeffzeleny Zeleny, Jeff CNN M \n",
"21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n",
"89820928 mitchellreports Mitchell, Andrea NBC News F \n",
"59676104 danbalz Balz, Daniel Washington Post M \n",
"108617810 DanaBashCNN Bash, Dana CNN F \n",
"15463671 samstein Stein, Sam Huffington Post M \n",
"130945778 mollyesque Ball, Molly The Atlantic F \n",
"46176168 MajorCBS Garrett, Major CBS News M \n",
"21252618 JakeSherman Sherman, Jacob S. Politico M \n",
"16187637 ChadPergram Pergram, Chad Fox News M \n",
"22771961 Acosta Acosta, Jim CNN M \n",
"12354832 kasie Hunt, Kasie NBC News F \n",
"123327472 peterbakernyt Baker, Peter New York Times M \n",
"15931637 jonkarl Karl, Jonathan ABC News M \n",
"11771512 OKnox Knox, Olivier Yahoo News M \n",
"259395895 JohnJHarwood Harwood, John CNBC M \n",
"\n",
" followers_count journalist_follower_count \n",
"user_id \n",
"14529929 1305680 1,337.00 \n",
"50325797 1781247 1,258.00 \n",
"19107878 308181 1,116.00 \n",
"31127446 301474 1,107.00 \n",
"13524182 332344 1,106.00 \n",
"61734492 451778 1,082.00 \n",
"18678924 197322 1,032.00 \n",
"39155029 88366 977.00 \n",
"16930125 58670 973.00 \n",
"85131054 244114 970.00 \n",
"21316253 198517 915.00 \n",
"89820928 1388543 909.00 \n",
"59676104 90819 892.00 \n",
"108617810 281861 884.00 \n",
"15463671 313211 880.00 \n",
"130945778 116857 877.00 \n",
"46176168 178640 872.00 \n",
"21252618 81762 868.00 \n",
"16187637 59305 866.00 \n",
"22771961 350650 860.00 \n",
"12354832 187357 860.00 \n",
"123327472 96956 856.00 \n",
"15931637 183467 830.00 \n",
"11771512 44715 788.00 \n",
"259395895 149040 783.00 "
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_journalist_followed_summary_df = journalist_following_summary(follower_to_journalist_followed_df)\n",
"follower_to_journalist_followed_summary_df.to_csv('output/journalists_followed_by_journalists.csv')\n",
"follower_to_journalist_followed_summary_df[journalist_following_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists following journalists, what how many of the followed journalists are male / female?"
]
},
{
"cell_type": "code",
"execution_count": 45,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_followed | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 174283 | \n",
" 62.2% | \n",
" 134.17 | \n",
"
\n",
" \n",
" F | \n",
" 106057 | \n",
" 37.8% | \n",
" 106.80 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_followed\n",
"index \n",
"M 174283 62.2% 134.17\n",
"F 106057 37.8% 106.80"
]
},
"execution_count": 45,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_followed_gender_summary(follower_to_journalist_followed_df)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how many journalists follow each journalist?"
]
},
{
"cell_type": "code",
"execution_count": 46,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 2,292.00 | \n",
"
\n",
" \n",
" mean | \n",
" 122.31 | \n",
"
\n",
" \n",
" std | \n",
" 161.53 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 26.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 64.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 145.00 | \n",
"
\n",
" \n",
" max | \n",
" 1,337.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" journalist_follower_count\n",
"count 2,292.00\n",
"mean 122.31\n",
"std 161.53\n",
"min 0.00\n",
"25% 26.00\n",
"50% 64.00\n",
"75% 145.00\n",
"max 1,337.00"
]
},
"execution_count": 46,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_journalist_followed_summary_df[['journalist_follower_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists following female journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of journalists following female journalists, which female journalists do they follow the most?"
]
},
{
"cell_type": "code",
"execution_count": 47,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 89820928 | \n",
" mitchellreports | \n",
" Mitchell, Andrea | \n",
" NBC News | \n",
" F | \n",
" 1388543 | \n",
" 909.00 | \n",
"
\n",
" \n",
" 108617810 | \n",
" DanaBashCNN | \n",
" Bash, Dana | \n",
" CNN | \n",
" F | \n",
" 281861 | \n",
" 884.00 | \n",
"
\n",
" \n",
" 130945778 | \n",
" mollyesque | \n",
" Ball, Molly | \n",
" The Atlantic | \n",
" F | \n",
" 116857 | \n",
" 877.00 | \n",
"
\n",
" \n",
" 12354832 | \n",
" kasie | \n",
" Hunt, Kasie | \n",
" NBC News | \n",
" F | \n",
" 187357 | \n",
" 860.00 | \n",
"
\n",
" \n",
" 33919343 | \n",
" AshleyRParker | \n",
" Parker, Ashley | \n",
" Washington Post | \n",
" F | \n",
" 122382 | \n",
" 777.00 | \n",
"
\n",
" \n",
" 28181835 | \n",
" jpaceDC | \n",
" Pace, Julie | \n",
" Associated Press | \n",
" F | \n",
" 46017 | \n",
" 738.00 | \n",
"
\n",
" \n",
" 70511174 | \n",
" Hadas_Gold | \n",
" Gold, Hadas | \n",
" Politico | \n",
" F | \n",
" 45221 | \n",
" 679.00 | \n",
"
\n",
" \n",
" 21307076 | \n",
" SusanPage | \n",
" Page, Susan | \n",
" USA Today | \n",
" F | \n",
" 48675 | \n",
" 670.00 | \n",
"
\n",
" \n",
" 19186003 | \n",
" seungminkim | \n",
" Kim, Seung Min | \n",
" Politico | \n",
" F | \n",
" 33980 | \n",
" 664.00 | \n",
"
\n",
" \n",
" 45399148 | \n",
" jeneps | \n",
" Epstein, Jennifer | \n",
" Bloomberg News | \n",
" F | \n",
" 61242 | \n",
" 631.00 | \n",
"
\n",
" \n",
" 224320485 | \n",
" KellyO | \n",
" O’Donnell, Kelly | \n",
" NBC News | \n",
" F | \n",
" 148476 | \n",
" 630.00 | \n",
"
\n",
" \n",
" 20776497 | \n",
" BFischerMartin | \n",
" Fischer Martin, Betsy | \n",
" Bloomberg News | \n",
" F | \n",
" 50890 | \n",
" 609.00 | \n",
"
\n",
" \n",
" 77032777 | \n",
" apalmerdc | \n",
" Palmer, Anna A. | \n",
" Politico | \n",
" F | \n",
" 30523 | \n",
" 591.00 | \n",
"
\n",
" \n",
" 116341480 | \n",
" RosieGray | \n",
" Gray, Rosie | \n",
" The Atlantic | \n",
" F | \n",
" 96935 | \n",
" 589.00 | \n",
"
\n",
" \n",
" 237477771 | \n",
" juliehdavis | \n",
" Davis, Julie | \n",
" New York Times | \n",
" F | \n",
" 49821 | \n",
" 570.00 | \n",
"
\n",
" \n",
" 58869089 | \n",
" margarettalev | \n",
" Talev, Margaret | \n",
" Bloomberg News | \n",
" F | \n",
" 19588 | \n",
" 569.00 | \n",
"
\n",
" \n",
" 14870670 | \n",
" KateNocera | \n",
" Nocera, Kate | \n",
" BuzzFeed | \n",
" F | \n",
" 27714 | \n",
" 567.00 | \n",
"
\n",
" \n",
" 46817943 | \n",
" brikeilarcnn | \n",
" Keilar, Brianna | \n",
" CNN | \n",
" F | \n",
" 105276 | \n",
" 557.00 | \n",
"
\n",
" \n",
" 22772264 | \n",
" carolelee | \n",
" Lee, Carol | \n",
" Wall Street Journal / Dow Jones | \n",
" F | \n",
" 31840 | \n",
" 552.00 | \n",
"
\n",
" \n",
" 15159913 | \n",
" JFKucinich | \n",
" Kucinich, Jacqueline | \n",
" Daily Beast | \n",
" F | \n",
" 31210 | \n",
" 549.00 | \n",
"
\n",
" \n",
" 297532865 | \n",
" kwelkernbc | \n",
" Welker, Kristen | \n",
" NBC News | \n",
" F | \n",
" 99234 | \n",
" 537.00 | \n",
"
\n",
" \n",
" 15727317 | \n",
" aterkel | \n",
" Terkel, Amanda | \n",
" Huffington Post | \n",
" F | \n",
" 78736 | \n",
" 527.00 | \n",
"
\n",
" \n",
" 17881467 | \n",
" rebeccagberg | \n",
" Berg, Rebecca | \n",
" RealClearPolitics | \n",
" F | \n",
" 48798 | \n",
" 516.00 | \n",
"
\n",
" \n",
" 151444950 | \n",
" DaviSusan | \n",
" Davis, Susan | \n",
" National Public Radio | \n",
" F | \n",
" 27297 | \n",
" 506.00 | \n",
"
\n",
" \n",
" 27055034 | \n",
" SabrinaSiddiqui | \n",
" Siddiqui, Sabrina | \n",
" Guardian US | \n",
" F | \n",
" 53835 | \n",
" 474.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name \\\n",
"user_id \n",
"89820928 mitchellreports Mitchell, Andrea \n",
"108617810 DanaBashCNN Bash, Dana \n",
"130945778 mollyesque Ball, Molly \n",
"12354832 kasie Hunt, Kasie \n",
"33919343 AshleyRParker Parker, Ashley \n",
"28181835 jpaceDC Pace, Julie \n",
"70511174 Hadas_Gold Gold, Hadas \n",
"21307076 SusanPage Page, Susan \n",
"19186003 seungminkim Kim, Seung Min \n",
"45399148 jeneps Epstein, Jennifer \n",
"224320485 KellyO O’Donnell, Kelly \n",
"20776497 BFischerMartin Fischer Martin, Betsy \n",
"77032777 apalmerdc Palmer, Anna A. \n",
"116341480 RosieGray Gray, Rosie \n",
"237477771 juliehdavis Davis, Julie \n",
"58869089 margarettalev Talev, Margaret \n",
"14870670 KateNocera Nocera, Kate \n",
"46817943 brikeilarcnn Keilar, Brianna \n",
"22772264 carolelee Lee, Carol \n",
"15159913 JFKucinich Kucinich, Jacqueline \n",
"297532865 kwelkernbc Welker, Kristen \n",
"15727317 aterkel Terkel, Amanda \n",
"17881467 rebeccagberg Berg, Rebecca \n",
"151444950 DaviSusan Davis, Susan \n",
"27055034 SabrinaSiddiqui Siddiqui, Sabrina \n",
"\n",
" organization gender followers_count \\\n",
"user_id \n",
"89820928 NBC News F 1388543 \n",
"108617810 CNN F 281861 \n",
"130945778 The Atlantic F 116857 \n",
"12354832 NBC News F 187357 \n",
"33919343 Washington Post F 122382 \n",
"28181835 Associated Press F 46017 \n",
"70511174 Politico F 45221 \n",
"21307076 USA Today F 48675 \n",
"19186003 Politico F 33980 \n",
"45399148 Bloomberg News F 61242 \n",
"224320485 NBC News F 148476 \n",
"20776497 Bloomberg News F 50890 \n",
"77032777 Politico F 30523 \n",
"116341480 The Atlantic F 96935 \n",
"237477771 New York Times F 49821 \n",
"58869089 Bloomberg News F 19588 \n",
"14870670 BuzzFeed F 27714 \n",
"46817943 CNN F 105276 \n",
"22772264 Wall Street Journal / Dow Jones F 31840 \n",
"15159913 Daily Beast F 31210 \n",
"297532865 NBC News F 99234 \n",
"15727317 Huffington Post F 78736 \n",
"17881467 RealClearPolitics F 48798 \n",
"151444950 National Public Radio F 27297 \n",
"27055034 Guardian US F 53835 \n",
"\n",
" journalist_follower_count \n",
"user_id \n",
"89820928 909.00 \n",
"108617810 884.00 \n",
"130945778 877.00 \n",
"12354832 860.00 \n",
"33919343 777.00 \n",
"28181835 738.00 \n",
"70511174 679.00 \n",
"21307076 670.00 \n",
"19186003 664.00 \n",
"45399148 631.00 \n",
"224320485 630.00 \n",
"20776497 609.00 \n",
"77032777 591.00 \n",
"116341480 589.00 \n",
"237477771 570.00 \n",
"58869089 569.00 \n",
"14870670 567.00 \n",
"46817943 557.00 \n",
"22772264 552.00 \n",
"15159913 549.00 \n",
"297532865 537.00 \n",
"15727317 527.00 \n",
"17881467 516.00 \n",
"151444950 506.00 \n",
"27055034 474.00 "
]
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_female_journalist_followed_df = follower_to_journalist_followed_summary_df[follower_to_journalist_followed_summary_df.gender == 'F']\n",
"follower_to_female_journalist_followed_df.to_csv('output/female_journalists_followed_by_journalists.csv')\n",
"follower_to_female_journalist_followed_df[journalist_following_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how many journalists follow each female journalist?"
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 993.00 | \n",
"
\n",
" \n",
" mean | \n",
" 106.80 | \n",
"
\n",
" \n",
" std | \n",
" 131.81 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 24.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 59.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 131.00 | \n",
"
\n",
" \n",
" max | \n",
" 909.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" journalist_follower_count\n",
"count 993.00\n",
"mean 106.80\n",
"std 131.81\n",
"min 0.00\n",
"25% 24.00\n",
"50% 59.00\n",
"75% 131.00\n",
"max 909.00"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_female_journalist_followed_df[['journalist_follower_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Journalists following male journalists"
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 14529929 | \n",
" jaketapper | \n",
" Tapper, Jake | \n",
" CNN | \n",
" M | \n",
" 1305680 | \n",
" 1,337.00 | \n",
"
\n",
" \n",
" 50325797 | \n",
" chucktodd | \n",
" Todd, Chuck | \n",
" NBC News | \n",
" M | \n",
" 1781247 | \n",
" 1,258.00 | \n",
"
\n",
" \n",
" 19107878 | \n",
" GlennThrush | \n",
" Thrush, Glenn H. | \n",
" New York Times | \n",
" M | \n",
" 308181 | \n",
" 1,116.00 | \n",
"
\n",
" \n",
" 31127446 | \n",
" markknoller | \n",
" Knoller, Mark | \n",
" CBS News | \n",
" M | \n",
" 301474 | \n",
" 1,107.00 | \n",
"
\n",
" \n",
" 13524182 | \n",
" daveweigel | \n",
" Weigel, David | \n",
" Washington Post | \n",
" M | \n",
" 332344 | \n",
" 1,106.00 | \n",
"
\n",
" \n",
" 61734492 | \n",
" Fahrenthold | \n",
" Fahrenthold, David | \n",
" Washington Post | \n",
" M | \n",
" 451778 | \n",
" 1,082.00 | \n",
"
\n",
" \n",
" 18678924 | \n",
" jmartNYT | \n",
" Martin, Jonathan | \n",
" New York Times | \n",
" M | \n",
" 197322 | \n",
" 1,032.00 | \n",
"
\n",
" \n",
" 39155029 | \n",
" mkraju | \n",
" Raju, Manu K. | \n",
" CNN | \n",
" M | \n",
" 88366 | \n",
" 977.00 | \n",
"
\n",
" \n",
" 16930125 | \n",
" edatpost | \n",
" O’Keefe, Edward | \n",
" Washington Post | \n",
" M | \n",
" 58670 | \n",
" 973.00 | \n",
"
\n",
" \n",
" 85131054 | \n",
" jeffzeleny | \n",
" Zeleny, Jeff | \n",
" CNN | \n",
" M | \n",
" 244114 | \n",
" 970.00 | \n",
"
\n",
" \n",
" 21316253 | \n",
" ZekeJMiller | \n",
" Miller, Zeke J. | \n",
" Time Magazine | \n",
" M | \n",
" 198517 | \n",
" 915.00 | \n",
"
\n",
" \n",
" 59676104 | \n",
" danbalz | \n",
" Balz, Daniel | \n",
" Washington Post | \n",
" M | \n",
" 90819 | \n",
" 892.00 | \n",
"
\n",
" \n",
" 15463671 | \n",
" samstein | \n",
" Stein, Sam | \n",
" Huffington Post | \n",
" M | \n",
" 313211 | \n",
" 880.00 | \n",
"
\n",
" \n",
" 46176168 | \n",
" MajorCBS | \n",
" Garrett, Major | \n",
" CBS News | \n",
" M | \n",
" 178640 | \n",
" 872.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 868.00 | \n",
"
\n",
" \n",
" 16187637 | \n",
" ChadPergram | \n",
" Pergram, Chad | \n",
" Fox News | \n",
" M | \n",
" 59305 | \n",
" 866.00 | \n",
"
\n",
" \n",
" 22771961 | \n",
" Acosta | \n",
" Acosta, Jim | \n",
" CNN | \n",
" M | \n",
" 350650 | \n",
" 860.00 | \n",
"
\n",
" \n",
" 123327472 | \n",
" peterbakernyt | \n",
" Baker, Peter | \n",
" New York Times | \n",
" M | \n",
" 96956 | \n",
" 856.00 | \n",
"
\n",
" \n",
" 15931637 | \n",
" jonkarl | \n",
" Karl, Jonathan | \n",
" ABC News | \n",
" M | \n",
" 183467 | \n",
" 830.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 788.00 | \n",
"
\n",
" \n",
" 259395895 | \n",
" JohnJHarwood | \n",
" Harwood, John | \n",
" CNBC | \n",
" M | \n",
" 149040 | \n",
" 783.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 781.00 | \n",
"
\n",
" \n",
" 18172905 | \n",
" rickklein | \n",
" Klein, Richard | \n",
" ABC News | \n",
" M | \n",
" 109170 | \n",
" 737.00 | \n",
"
\n",
" \n",
" 21768766 | \n",
" jonathanweisman | \n",
" Weisman, Jonathan | \n",
" New York Times | \n",
" M | \n",
" 57549 | \n",
" 728.00 | \n",
"
\n",
" \n",
" 997684836 | \n",
" pkcapitol | \n",
" Kane, Paul | \n",
" Washington Post | \n",
" M | \n",
" 31300 | \n",
" 728.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization gender \\\n",
"user_id \n",
"14529929 jaketapper Tapper, Jake CNN M \n",
"50325797 chucktodd Todd, Chuck NBC News M \n",
"19107878 GlennThrush Thrush, Glenn H. New York Times M \n",
"31127446 markknoller Knoller, Mark CBS News M \n",
"13524182 daveweigel Weigel, David Washington Post M \n",
"61734492 Fahrenthold Fahrenthold, David Washington Post M \n",
"18678924 jmartNYT Martin, Jonathan New York Times M \n",
"39155029 mkraju Raju, Manu K. CNN M \n",
"16930125 edatpost O’Keefe, Edward Washington Post M \n",
"85131054 jeffzeleny Zeleny, Jeff CNN M \n",
"21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n",
"59676104 danbalz Balz, Daniel Washington Post M \n",
"15463671 samstein Stein, Sam Huffington Post M \n",
"46176168 MajorCBS Garrett, Major CBS News M \n",
"21252618 JakeSherman Sherman, Jacob S. Politico M \n",
"16187637 ChadPergram Pergram, Chad Fox News M \n",
"22771961 Acosta Acosta, Jim CNN M \n",
"123327472 peterbakernyt Baker, Peter New York Times M \n",
"15931637 jonkarl Karl, Jonathan ABC News M \n",
"11771512 OKnox Knox, Olivier Yahoo News M \n",
"259395895 JohnJHarwood Harwood, John CNBC M \n",
"46557945 StevenTDennis Dennis, Steven T. Bloomberg News M \n",
"18172905 rickklein Klein, Richard ABC News M \n",
"21768766 jonathanweisman Weisman, Jonathan New York Times M \n",
"997684836 pkcapitol Kane, Paul Washington Post M \n",
"\n",
" followers_count journalist_follower_count \n",
"user_id \n",
"14529929 1305680 1,337.00 \n",
"50325797 1781247 1,258.00 \n",
"19107878 308181 1,116.00 \n",
"31127446 301474 1,107.00 \n",
"13524182 332344 1,106.00 \n",
"61734492 451778 1,082.00 \n",
"18678924 197322 1,032.00 \n",
"39155029 88366 977.00 \n",
"16930125 58670 973.00 \n",
"85131054 244114 970.00 \n",
"21316253 198517 915.00 \n",
"59676104 90819 892.00 \n",
"15463671 313211 880.00 \n",
"46176168 178640 872.00 \n",
"21252618 81762 868.00 \n",
"16187637 59305 866.00 \n",
"22771961 350650 860.00 \n",
"123327472 96956 856.00 \n",
"15931637 183467 830.00 \n",
"11771512 44715 788.00 \n",
"259395895 149040 783.00 \n",
"46557945 55762 781.00 \n",
"18172905 109170 737.00 \n",
"21768766 57549 728.00 \n",
"997684836 31300 728.00 "
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_male_journalist_followed_df = follower_to_journalist_followed_summary_df[follower_to_journalist_followed_summary_df.gender == 'M']\n",
"follower_to_male_journalist_followed_df.to_csv('output/male_journalists_followed_by_journalists.csv')\n",
"follower_to_male_journalist_followed_df[journalist_following_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### On average, how many journalists follow each male journalists?"
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 1,299.00 | \n",
"
\n",
" \n",
" mean | \n",
" 134.17 | \n",
"
\n",
" \n",
" std | \n",
" 180.14 | \n",
"
\n",
" \n",
" min | \n",
" 0.00 | \n",
"
\n",
" \n",
" 25% | \n",
" 28.00 | \n",
"
\n",
" \n",
" 50% | \n",
" 67.00 | \n",
"
\n",
" \n",
" 75% | \n",
" 156.00 | \n",
"
\n",
" \n",
" max | \n",
" 1,337.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" journalist_follower_count\n",
"count 1,299.00\n",
"mean 134.17\n",
"std 180.14\n",
"min 0.00\n",
"25% 28.00\n",
"50% 67.00\n",
"75% 156.00\n",
"max 1,337.00"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"follower_to_male_journalist_followed_df[['journalist_follower_count']].describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Female journalists following journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of female journalists following journalists, who do they follow the most?"
]
},
{
"cell_type": "code",
"execution_count": 51,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 14529929 | \n",
" jaketapper | \n",
" Tapper, Jake | \n",
" CNN | \n",
" M | \n",
" 1305680 | \n",
" 619.00 | \n",
"
\n",
" \n",
" 50325797 | \n",
" chucktodd | \n",
" Todd, Chuck | \n",
" NBC News | \n",
" M | \n",
" 1781247 | \n",
" 569.00 | \n",
"
\n",
" \n",
" 31127446 | \n",
" markknoller | \n",
" Knoller, Mark | \n",
" CBS News | \n",
" M | \n",
" 301474 | \n",
" 505.00 | \n",
"
\n",
" \n",
" 19107878 | \n",
" GlennThrush | \n",
" Thrush, Glenn H. | \n",
" New York Times | \n",
" M | \n",
" 308181 | \n",
" 490.00 | \n",
"
\n",
" \n",
" 13524182 | \n",
" daveweigel | \n",
" Weigel, David | \n",
" Washington Post | \n",
" M | \n",
" 332344 | \n",
" 484.00 | \n",
"
\n",
" \n",
" 61734492 | \n",
" Fahrenthold | \n",
" Fahrenthold, David | \n",
" Washington Post | \n",
" M | \n",
" 451778 | \n",
" 474.00 | \n",
"
\n",
" \n",
" 18678924 | \n",
" jmartNYT | \n",
" Martin, Jonathan | \n",
" New York Times | \n",
" M | \n",
" 197322 | \n",
" 445.00 | \n",
"
\n",
" \n",
" 16930125 | \n",
" edatpost | \n",
" O’Keefe, Edward | \n",
" Washington Post | \n",
" M | \n",
" 58670 | \n",
" 444.00 | \n",
"
\n",
" \n",
" 89820928 | \n",
" mitchellreports | \n",
" Mitchell, Andrea | \n",
" NBC News | \n",
" F | \n",
" 1388543 | \n",
" 441.00 | \n",
"
\n",
" \n",
" 85131054 | \n",
" jeffzeleny | \n",
" Zeleny, Jeff | \n",
" CNN | \n",
" M | \n",
" 244114 | \n",
" 435.00 | \n",
"
\n",
" \n",
" 39155029 | \n",
" mkraju | \n",
" Raju, Manu K. | \n",
" CNN | \n",
" M | \n",
" 88366 | \n",
" 434.00 | \n",
"
\n",
" \n",
" 108617810 | \n",
" DanaBashCNN | \n",
" Bash, Dana | \n",
" CNN | \n",
" F | \n",
" 281861 | \n",
" 430.00 | \n",
"
\n",
" \n",
" 21316253 | \n",
" ZekeJMiller | \n",
" Miller, Zeke J. | \n",
" Time Magazine | \n",
" M | \n",
" 198517 | \n",
" 420.00 | \n",
"
\n",
" \n",
" 22771961 | \n",
" Acosta | \n",
" Acosta, Jim | \n",
" CNN | \n",
" M | \n",
" 350650 | \n",
" 402.00 | \n",
"
\n",
" \n",
" 15463671 | \n",
" samstein | \n",
" Stein, Sam | \n",
" Huffington Post | \n",
" M | \n",
" 313211 | \n",
" 398.00 | \n",
"
\n",
" \n",
" 16187637 | \n",
" ChadPergram | \n",
" Pergram, Chad | \n",
" Fox News | \n",
" M | \n",
" 59305 | \n",
" 397.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 394.00 | \n",
"
\n",
" \n",
" 46176168 | \n",
" MajorCBS | \n",
" Garrett, Major | \n",
" CBS News | \n",
" M | \n",
" 178640 | \n",
" 390.00 | \n",
"
\n",
" \n",
" 15931637 | \n",
" jonkarl | \n",
" Karl, Jonathan | \n",
" ABC News | \n",
" M | \n",
" 183467 | \n",
" 389.00 | \n",
"
\n",
" \n",
" 130945778 | \n",
" mollyesque | \n",
" Ball, Molly | \n",
" The Atlantic | \n",
" F | \n",
" 116857 | \n",
" 386.00 | \n",
"
\n",
" \n",
" 59676104 | \n",
" danbalz | \n",
" Balz, Daniel | \n",
" Washington Post | \n",
" M | \n",
" 90819 | \n",
" 382.00 | \n",
"
\n",
" \n",
" 123327472 | \n",
" peterbakernyt | \n",
" Baker, Peter | \n",
" New York Times | \n",
" M | \n",
" 96956 | \n",
" 379.00 | \n",
"
\n",
" \n",
" 12354832 | \n",
" kasie | \n",
" Hunt, Kasie | \n",
" NBC News | \n",
" F | \n",
" 187357 | \n",
" 366.00 | \n",
"
\n",
" \n",
" 11771512 | \n",
" OKnox | \n",
" Knox, Olivier | \n",
" Yahoo News | \n",
" M | \n",
" 44715 | \n",
" 354.00 | \n",
"
\n",
" \n",
" 33919343 | \n",
" AshleyRParker | \n",
" Parker, Ashley | \n",
" Washington Post | \n",
" F | \n",
" 122382 | \n",
" 339.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization gender \\\n",
"user_id \n",
"14529929 jaketapper Tapper, Jake CNN M \n",
"50325797 chucktodd Todd, Chuck NBC News M \n",
"31127446 markknoller Knoller, Mark CBS News M \n",
"19107878 GlennThrush Thrush, Glenn H. New York Times M \n",
"13524182 daveweigel Weigel, David Washington Post M \n",
"61734492 Fahrenthold Fahrenthold, David Washington Post M \n",
"18678924 jmartNYT Martin, Jonathan New York Times M \n",
"16930125 edatpost O’Keefe, Edward Washington Post M \n",
"89820928 mitchellreports Mitchell, Andrea NBC News F \n",
"85131054 jeffzeleny Zeleny, Jeff CNN M \n",
"39155029 mkraju Raju, Manu K. CNN M \n",
"108617810 DanaBashCNN Bash, Dana CNN F \n",
"21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n",
"22771961 Acosta Acosta, Jim CNN M \n",
"15463671 samstein Stein, Sam Huffington Post M \n",
"16187637 ChadPergram Pergram, Chad Fox News M \n",
"21252618 JakeSherman Sherman, Jacob S. Politico M \n",
"46176168 MajorCBS Garrett, Major CBS News M \n",
"15931637 jonkarl Karl, Jonathan ABC News M \n",
"130945778 mollyesque Ball, Molly The Atlantic F \n",
"59676104 danbalz Balz, Daniel Washington Post M \n",
"123327472 peterbakernyt Baker, Peter New York Times M \n",
"12354832 kasie Hunt, Kasie NBC News F \n",
"11771512 OKnox Knox, Olivier Yahoo News M \n",
"33919343 AshleyRParker Parker, Ashley Washington Post F \n",
"\n",
" followers_count journalist_follower_count \n",
"user_id \n",
"14529929 1305680 619.00 \n",
"50325797 1781247 569.00 \n",
"31127446 301474 505.00 \n",
"19107878 308181 490.00 \n",
"13524182 332344 484.00 \n",
"61734492 451778 474.00 \n",
"18678924 197322 445.00 \n",
"16930125 58670 444.00 \n",
"89820928 1388543 441.00 \n",
"85131054 244114 435.00 \n",
"39155029 88366 434.00 \n",
"108617810 281861 430.00 \n",
"21316253 198517 420.00 \n",
"22771961 350650 402.00 \n",
"15463671 313211 398.00 \n",
"16187637 59305 397.00 \n",
"21252618 81762 394.00 \n",
"46176168 178640 390.00 \n",
"15931637 183467 389.00 \n",
"130945778 116857 386.00 \n",
"59676104 90819 382.00 \n",
"123327472 96956 379.00 \n",
"12354832 187357 366.00 \n",
"11771512 44715 354.00 \n",
"33919343 122382 339.00 "
]
},
"execution_count": 51,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"female_follower_to_journalist_followed_df = journalist_following_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'F'])\n",
"female_follower_to_journalist_followed_df.to_csv('output/journalists_followed_by_female_journalists.csv')\n",
"female_follower_to_journalist_followed_df[journalist_following_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of female journalists following journalists, how many of the followed journalists are male / female?"
]
},
{
"cell_type": "code",
"execution_count": 52,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_followed | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 73950 | \n",
" 62.0% | \n",
" 56.93 | \n",
"
\n",
" \n",
" F | \n",
" 45300 | \n",
" 38.0% | \n",
" 45.62 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_followed\n",
"index \n",
"M 73950 62.0% 56.93\n",
"F 45300 38.0% 45.62"
]
},
"execution_count": 52,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_followed_gender_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'F'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Male journalists following journalists"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Of male journalists following journalists, who do they follow the most?"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" screen_name | \n",
" name | \n",
" organization | \n",
" gender | \n",
" followers_count | \n",
" journalist_follower_count | \n",
"
\n",
" \n",
" user_id | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 14529929 | \n",
" jaketapper | \n",
" Tapper, Jake | \n",
" CNN | \n",
" M | \n",
" 1305680 | \n",
" 718.00 | \n",
"
\n",
" \n",
" 50325797 | \n",
" chucktodd | \n",
" Todd, Chuck | \n",
" NBC News | \n",
" M | \n",
" 1781247 | \n",
" 689.00 | \n",
"
\n",
" \n",
" 19107878 | \n",
" GlennThrush | \n",
" Thrush, Glenn H. | \n",
" New York Times | \n",
" M | \n",
" 308181 | \n",
" 626.00 | \n",
"
\n",
" \n",
" 13524182 | \n",
" daveweigel | \n",
" Weigel, David | \n",
" Washington Post | \n",
" M | \n",
" 332344 | \n",
" 622.00 | \n",
"
\n",
" \n",
" 61734492 | \n",
" Fahrenthold | \n",
" Fahrenthold, David | \n",
" Washington Post | \n",
" M | \n",
" 451778 | \n",
" 608.00 | \n",
"
\n",
" \n",
" 31127446 | \n",
" markknoller | \n",
" Knoller, Mark | \n",
" CBS News | \n",
" M | \n",
" 301474 | \n",
" 602.00 | \n",
"
\n",
" \n",
" 18678924 | \n",
" jmartNYT | \n",
" Martin, Jonathan | \n",
" New York Times | \n",
" M | \n",
" 197322 | \n",
" 587.00 | \n",
"
\n",
" \n",
" 39155029 | \n",
" mkraju | \n",
" Raju, Manu K. | \n",
" CNN | \n",
" M | \n",
" 88366 | \n",
" 543.00 | \n",
"
\n",
" \n",
" 85131054 | \n",
" jeffzeleny | \n",
" Zeleny, Jeff | \n",
" CNN | \n",
" M | \n",
" 244114 | \n",
" 535.00 | \n",
"
\n",
" \n",
" 16930125 | \n",
" edatpost | \n",
" O’Keefe, Edward | \n",
" Washington Post | \n",
" M | \n",
" 58670 | \n",
" 529.00 | \n",
"
\n",
" \n",
" 59676104 | \n",
" danbalz | \n",
" Balz, Daniel | \n",
" Washington Post | \n",
" M | \n",
" 90819 | \n",
" 510.00 | \n",
"
\n",
" \n",
" 21316253 | \n",
" ZekeJMiller | \n",
" Miller, Zeke J. | \n",
" Time Magazine | \n",
" M | \n",
" 198517 | \n",
" 495.00 | \n",
"
\n",
" \n",
" 12354832 | \n",
" kasie | \n",
" Hunt, Kasie | \n",
" NBC News | \n",
" F | \n",
" 187357 | \n",
" 494.00 | \n",
"
\n",
" \n",
" 130945778 | \n",
" mollyesque | \n",
" Ball, Molly | \n",
" The Atlantic | \n",
" F | \n",
" 116857 | \n",
" 491.00 | \n",
"
\n",
" \n",
" 15463671 | \n",
" samstein | \n",
" Stein, Sam | \n",
" Huffington Post | \n",
" M | \n",
" 313211 | \n",
" 482.00 | \n",
"
\n",
" \n",
" 46176168 | \n",
" MajorCBS | \n",
" Garrett, Major | \n",
" CBS News | \n",
" M | \n",
" 178640 | \n",
" 482.00 | \n",
"
\n",
" \n",
" 123327472 | \n",
" peterbakernyt | \n",
" Baker, Peter | \n",
" New York Times | \n",
" M | \n",
" 96956 | \n",
" 477.00 | \n",
"
\n",
" \n",
" 21252618 | \n",
" JakeSherman | \n",
" Sherman, Jacob S. | \n",
" Politico | \n",
" M | \n",
" 81762 | \n",
" 474.00 | \n",
"
\n",
" \n",
" 16187637 | \n",
" ChadPergram | \n",
" Pergram, Chad | \n",
" Fox News | \n",
" M | \n",
" 59305 | \n",
" 469.00 | \n",
"
\n",
" \n",
" 89820928 | \n",
" mitchellreports | \n",
" Mitchell, Andrea | \n",
" NBC News | \n",
" F | \n",
" 1388543 | \n",
" 468.00 | \n",
"
\n",
" \n",
" 259395895 | \n",
" JohnJHarwood | \n",
" Harwood, John | \n",
" CNBC | \n",
" M | \n",
" 149040 | \n",
" 464.00 | \n",
"
\n",
" \n",
" 22771961 | \n",
" Acosta | \n",
" Acosta, Jim | \n",
" CNN | \n",
" M | \n",
" 350650 | \n",
" 458.00 | \n",
"
\n",
" \n",
" 108617810 | \n",
" DanaBashCNN | \n",
" Bash, Dana | \n",
" CNN | \n",
" F | \n",
" 281861 | \n",
" 454.00 | \n",
"
\n",
" \n",
" 46557945 | \n",
" StevenTDennis | \n",
" Dennis, Steven T. | \n",
" Bloomberg News | \n",
" M | \n",
" 55762 | \n",
" 446.00 | \n",
"
\n",
" \n",
" 15931637 | \n",
" jonkarl | \n",
" Karl, Jonathan | \n",
" ABC News | \n",
" M | \n",
" 183467 | \n",
" 441.00 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" screen_name name organization gender \\\n",
"user_id \n",
"14529929 jaketapper Tapper, Jake CNN M \n",
"50325797 chucktodd Todd, Chuck NBC News M \n",
"19107878 GlennThrush Thrush, Glenn H. New York Times M \n",
"13524182 daveweigel Weigel, David Washington Post M \n",
"61734492 Fahrenthold Fahrenthold, David Washington Post M \n",
"31127446 markknoller Knoller, Mark CBS News M \n",
"18678924 jmartNYT Martin, Jonathan New York Times M \n",
"39155029 mkraju Raju, Manu K. CNN M \n",
"85131054 jeffzeleny Zeleny, Jeff CNN M \n",
"16930125 edatpost O’Keefe, Edward Washington Post M \n",
"59676104 danbalz Balz, Daniel Washington Post M \n",
"21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n",
"12354832 kasie Hunt, Kasie NBC News F \n",
"130945778 mollyesque Ball, Molly The Atlantic F \n",
"15463671 samstein Stein, Sam Huffington Post M \n",
"46176168 MajorCBS Garrett, Major CBS News M \n",
"123327472 peterbakernyt Baker, Peter New York Times M \n",
"21252618 JakeSherman Sherman, Jacob S. Politico M \n",
"16187637 ChadPergram Pergram, Chad Fox News M \n",
"89820928 mitchellreports Mitchell, Andrea NBC News F \n",
"259395895 JohnJHarwood Harwood, John CNBC M \n",
"22771961 Acosta Acosta, Jim CNN M \n",
"108617810 DanaBashCNN Bash, Dana CNN F \n",
"46557945 StevenTDennis Dennis, Steven T. Bloomberg News M \n",
"15931637 jonkarl Karl, Jonathan ABC News M \n",
"\n",
" followers_count journalist_follower_count \n",
"user_id \n",
"14529929 1305680 718.00 \n",
"50325797 1781247 689.00 \n",
"19107878 308181 626.00 \n",
"13524182 332344 622.00 \n",
"61734492 451778 608.00 \n",
"31127446 301474 602.00 \n",
"18678924 197322 587.00 \n",
"39155029 88366 543.00 \n",
"85131054 244114 535.00 \n",
"16930125 58670 529.00 \n",
"59676104 90819 510.00 \n",
"21316253 198517 495.00 \n",
"12354832 187357 494.00 \n",
"130945778 116857 491.00 \n",
"15463671 313211 482.00 \n",
"46176168 178640 482.00 \n",
"123327472 96956 477.00 \n",
"21252618 81762 474.00 \n",
"16187637 59305 469.00 \n",
"89820928 1388543 468.00 \n",
"259395895 149040 464.00 \n",
"22771961 350650 458.00 \n",
"108617810 281861 454.00 \n",
"46557945 55762 446.00 \n",
"15931637 183467 441.00 "
]
},
"execution_count": 53,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"male_follower_to_journalist_followed_df = journalist_following_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'M'])\n",
"male_follower_to_journalist_followed_df.to_csv('output/journalists_followed_by_male_journalists.csv')\n",
"male_follower_to_journalist_followed_df[journalist_following_summary_fields].head(25)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Of male journalists following journalists, how many of the following journalists are male / female?"
]
},
{
"cell_type": "code",
"execution_count": 54,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" count | \n",
" percentage | \n",
" avg_followed | \n",
"
\n",
" \n",
" index | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" M | \n",
" 100333 | \n",
" 62.3% | \n",
" 77.24 | \n",
"
\n",
" \n",
" F | \n",
" 60757 | \n",
" 37.7% | \n",
" 61.19 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" count percentage avg_followed\n",
"index \n",
"M 100333 62.3% 77.24\n",
"F 60757 37.7% 61.19"
]
},
"execution_count": 54,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"journalist_followed_gender_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'M'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
},
"toc": {
"nav_menu": {
"height": "512px",
"width": "252px"
},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"toc_cell": true,
"toc_position": {
"height": "586px",
"left": "0px",
"right": "1088px",
"top": "112px",
"width": "343px"
},
"toc_section_display": "block",
"toc_window_display": true
}
},
"nbformat": 4,
"nbformat_minor": 2
}