{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gender dynamics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tweet data prep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the tweets" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n", "DEBUG:root:Loaded 50000\n", "DEBUG:root:Loaded 100000\n", "DEBUG:root:Loaded 150000\n", "DEBUG:root:Loaded 200000\n", "DEBUG:root:Loaded 250000\n", "INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n", "DEBUG:root:Loaded 300000\n", "DEBUG:root:Loaded 350000\n", "DEBUG:root:Loaded 400000\n", "DEBUG:root:Loaded 450000\n", "DEBUG:root:Loaded 500000\n", "INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n", "DEBUG:root:Loaded 550000\n", "DEBUG:root:Loaded 600000\n", "DEBUG:root:Loaded 650000\n", "DEBUG:root:Loaded 700000\n", "DEBUG:root:Loaded 750000\n", "DEBUG:root:Loaded 800000\n" ] }, { "data": { "text/plain": [ "tweet_id 817136\n", "user_id 817136\n", "screen_name 817136\n", "tweet_created_at 817136\n", "tweet_type 817136\n", "dtype: int64" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%matplotlib inline\n", "import pandas as pd\n", "import numpy as np\n", "import logging\n", "from dateutil.parser import parse as date_parse\n", "from utils import load_tweet_df, tweet_type\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "logger = logging.getLogger()\n", "logger.setLevel(logging.DEBUG)\n", "\n", "# Set float format so doesn't display scientific notation\n", "pd.options.display.float_format = '{:20,.2f}'.format\n", "\n", "def tweet_transform(tweet):\n", " return {\n", " 'tweet_id': tweet['id_str'], \n", " 'tweet_created_at': date_parse(tweet['created_at']),\n", " 'user_id': tweet['user']['id_str'],\n", " 'screen_name': tweet['user']['screen_name'],\n", " 'tweet_type': tweet_type(tweet)\n", " }\n", "\n", "tweet_df = load_tweet_df(tweet_transform, ['tweet_id', 'user_id', 'screen_name', 'tweet_created_at', 'tweet_type'], dedupe_columns=['tweet_id'])\n", "tweet_df.count()" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_iduser_idscreen_nametweet_created_attweet_type
0872631046088601600327862439jonathanvswan2017-06-08 01:47:08+00:00retweet
1872610483647516673327862439jonathanvswan2017-06-08 00:25:26+00:00retweet
2872609618626826240327862439jonathanvswan2017-06-08 00:22:00+00:00retweet
3872605974699311104327862439jonathanvswan2017-06-08 00:07:31+00:00retweet
4872603191518646276327862439jonathanvswan2017-06-07 23:56:27+00:00retweet
\n", "
" ], "text/plain": [ " tweet_id user_id screen_name tweet_created_at \\\n", "0 872631046088601600 327862439 jonathanvswan 2017-06-08 01:47:08+00:00 \n", "1 872610483647516673 327862439 jonathanvswan 2017-06-08 00:25:26+00:00 \n", "2 872609618626826240 327862439 jonathanvswan 2017-06-08 00:22:00+00:00 \n", "3 872605974699311104 327862439 jonathanvswan 2017-06-08 00:07:31+00:00 \n", "4 872603191518646276 327862439 jonathanvswan 2017-06-07 23:56:27+00:00 \n", "\n", " tweet_type \n", "0 retweet \n", "1 retweet \n", "2 retweet \n", "3 retweet \n", "4 retweet " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tweet_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tweeter data prep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prepare the tweeter data\n", "This comes from the following sources:\n", "1. User lookup: These are lists of users exported from SFM. These are the final set of beltway journalists. Accounts that were suspended or deleted have been removed from this list. Also, this list will include users that did not tweet (i.e., have no tweets in dataset).\n", "2. Tweets in the dataset: Used to generate tweet counts per tweeter. However, since some beltway journalists may not have tweeted, this may be a subset of the user lookup. Also, it may include the tweets of some users that were later excluded because their accounts were suspended or deleted or determined to not be beltway journalists.\n", "3. User info lookup: Information on users that was manually coded in the beltway journalist spreadsheet or looked up from Twitter's API. This includes some accounts that were excluded from data collection for various reasons such as working for a foreign news organization or no longer working as a beltway journalist. Thus, these are a superset of the user lookup.\n", "\n", "Thus, the tweeter data should include tweet and user info data only from users in the user lookup." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load user lookup" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "screen_name 2487\n", "dtype: int64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_lookup_filepaths = ('lookups/senate_press_lookup.csv',\n", " 'lookups/periodical_press_lookup.csv',\n", " 'lookups/radio_and_television_lookup.csv')\n", "user_lookup_df = pd.concat((pd.read_csv(user_lookup_filepath, usecols=['Uid', 'Token'], dtype={'Uid': str}) for user_lookup_filepath in user_lookup_filepaths))\n", "user_lookup_df.set_index('Uid', inplace=True)\n", "user_lookup_df.rename(columns={'Token': 'screen_name'}, inplace=True)\n", "user_lookup_df.index.names = ['user_id']\n", "# Some users may be in multiple lists, so need to drop duplicates\n", "user_lookup_df = user_lookup_df[~user_lookup_df.index.duplicated()]\n", "\n", "user_lookup_df.count()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_name
user_id
23455653abettel
33919343AshleyRParker
18580432b_fung
399225358b_muzz
18834692becca_milfeld
\n", "
" ], "text/plain": [ " screen_name\n", "user_id \n", "23455653 abettel\n", "33919343 AshleyRParker\n", "18580432 b_fung\n", "399225358 b_muzz\n", "18834692 becca_milfeld" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_lookup_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load user info" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "name 2506\n", "organization 2477\n", "position 2503\n", "gender 2505\n", "followers_count 2506\n", "following_count 2506\n", "tweet_count 2506\n", "user_created_at 2506\n", "verified 2506\n", "protected 2506\n", "dtype: int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_info_df = pd.read_csv('source_data/user_info_lookup.csv', names=['user_id', 'name', 'organization', 'position',\n", " 'gender', 'followers_count', 'following_count', 'tweet_count',\n", " 'user_created_at', 'verified', 'protected'],\n", " dtype={'user_id': str}).set_index(['user_id'])\n", "user_info_df.count()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameorganizationpositiongenderfollowers_countfollowing_counttweet_countuser_created_atverifiedprotected
user_id
20711445Glinski, NinaNaNFreelance ReporterF963507909Thu Feb 12 20:00:53 +0000 2009FalseFalse
258917371Enders, DavidNaNJournalistM14444846296Mon Feb 28 19:52:03 +0000 2011TrueFalse
297046834Barakat, MatthewAssociated PressNorthern Virginia CorrespondentM759352631Wed May 11 20:55:24 +0000 2011TrueFalse
455585786Atkins, KimberlyBoston HeraldChief Washington Reporter/ColumnistF294426916277Thu Jan 05 08:26:46 +0000 2012TrueFalse
42584840Vlahou, ToulaCQ Roll CallEditor & Podcast ProducerF27032016366Tue May 26 07:41:38 +0000 2009FalseFalse
\n", "
" ], "text/plain": [ " name organization \\\n", "user_id \n", "20711445 Glinski, Nina NaN \n", "258917371 Enders, David NaN \n", "297046834 Barakat, Matthew Associated Press \n", "455585786 Atkins, Kimberly Boston Herald \n", "42584840 Vlahou, Toula CQ Roll Call \n", "\n", " position gender followers_count \\\n", "user_id \n", "20711445 Freelance Reporter F 963 \n", "258917371 Journalist M 1444 \n", "297046834 Northern Virginia Correspondent M 759 \n", "455585786 Chief Washington Reporter/Columnist F 2944 \n", "42584840 Editor & Podcast Producer F 2703 \n", "\n", " following_count tweet_count user_created_at \\\n", "user_id \n", "20711445 507 909 Thu Feb 12 20:00:53 +0000 2009 \n", "258917371 484 6296 Mon Feb 28 19:52:03 +0000 2011 \n", "297046834 352 631 Wed May 11 20:55:24 +0000 2011 \n", "455585786 2691 6277 Thu Jan 05 08:26:46 +0000 2012 \n", "42584840 201 6366 Tue May 26 07:41:38 +0000 2009 \n", "\n", " verified protected \n", "user_id \n", "20711445 False False \n", "258917371 True False \n", "297046834 True False \n", "455585786 True False \n", "42584840 False False " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_info_df.head()" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "user_tweet_count_df = tweet_df[['user_id', 'tweet_type']].groupby(['user_id', 'tweet_type']).size().unstack()\n", "user_tweet_count_df.fillna(0, inplace=True)\n", "user_tweet_count_df['tweets_in_dataset'] = user_tweet_count_df.original + user_tweet_count_df.quote + user_tweet_count_df.reply + user_tweet_count_df.retweet" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "screen_name 2487\n", "name 2487\n", "organization 2487\n", "position 2484\n", "gender 2486\n", "followers_count 2487\n", "following_count 2487\n", "tweet_count 2487\n", "user_created_at 2487\n", "verified 2487\n", "protected 2487\n", "original 2487\n", "quote 2487\n", "reply 2487\n", "retweet 2487\n", "tweets_in_dataset 2487\n", "dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_summary_df = user_lookup_df.join((user_info_df, user_tweet_count_df), how='left')\n", "# Fill Nans\n", "user_summary_df['organization'].fillna('', inplace=True)\n", "user_summary_df['original'].fillna(0, inplace=True)\n", "user_summary_df['quote'].fillna(0, inplace=True)\n", "user_summary_df['reply'].fillna(0, inplace=True)\n", "user_summary_df['retweet'].fillna(0, inplace=True)\n", "user_summary_df['tweets_in_dataset'].fillna(0, inplace=True)\n", "user_summary_df.count()" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationpositiongenderfollowers_countfollowing_counttweet_countuser_created_atverifiedprotectedoriginalquotereplyretweettweets_in_dataset
user_id
23455653abettelBettelheim, AdrielPoliticoHealth Care EditorF2664105515990Mon Mar 09 16:32:20 +0000 2009TrueFalse289.0012.006.0052.00359.00
33919343AshleyRParkerParker, AshleyWashington PostWhite House ReporterF122382234212433Tue Apr 21 14:28:57 +0000 2009TrueFalse172.0067.0011.00120.00370.00
18580432b_fungFung, BrianWashington PostTech ReporterM16558206244799Sat Jan 03 15:15:57 +0000 2009TrueFalse257.0085.00205.0082.00629.00
399225358b_muzzMurray, BrendanBloomberg NewsManaging Editor, U.S. EconomyM624382360Thu Oct 27 05:34:05 +0000 2011TrueFalse3.000.000.005.008.00
18834692becca_milfeldMilfeld, BeccaAgence France-PresseEnglish Desk Editor and JournalistF4839931484Sat Jan 10 13:58:43 +0000 2009FalseFalse3.0014.000.007.0024.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "23455653 abettel Bettelheim, Adriel Politico \n", "33919343 AshleyRParker Parker, Ashley Washington Post \n", "18580432 b_fung Fung, Brian Washington Post \n", "399225358 b_muzz Murray, Brendan Bloomberg News \n", "18834692 becca_milfeld Milfeld, Becca Agence France-Presse \n", "\n", " position gender followers_count \\\n", "user_id \n", "23455653 Health Care Editor F 2664 \n", "33919343 White House Reporter F 122382 \n", "18580432 Tech Reporter M 16558 \n", "399225358 Managing Editor, U.S. Economy M 624 \n", "18834692 English Desk Editor and Journalist F 483 \n", "\n", " following_count tweet_count user_created_at \\\n", "user_id \n", "23455653 1055 15990 Mon Mar 09 16:32:20 +0000 2009 \n", "33919343 2342 12433 Tue Apr 21 14:28:57 +0000 2009 \n", "18580432 2062 44799 Sat Jan 03 15:15:57 +0000 2009 \n", "399225358 382 360 Thu Oct 27 05:34:05 +0000 2011 \n", "18834692 993 1484 Sat Jan 10 13:58:43 +0000 2009 \n", "\n", " verified protected original quote \\\n", "user_id \n", "23455653 True False 289.00 12.00 \n", "33919343 True False 172.00 67.00 \n", "18580432 True False 257.00 85.00 \n", "399225358 True False 3.00 0.00 \n", "18834692 False False 3.00 14.00 \n", "\n", " reply retweet tweets_in_dataset \n", "user_id \n", "23455653 6.00 52.00 359.00 \n", "33919343 11.00 120.00 370.00 \n", "18580432 205.00 82.00 629.00 \n", "399225358 0.00 5.00 8.00 \n", "18834692 0.00 7.00 24.00 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_summary_df.head()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "### Remove users with no tweets in dataset" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "screen_name 195\n", "name 195\n", "organization 195\n", "position 195\n", "gender 194\n", "followers_count 195\n", "following_count 195\n", "tweet_count 195\n", "user_created_at 195\n", "verified 195\n", "protected 195\n", "original 195\n", "quote 195\n", "reply 195\n", "retweet 195\n", "tweets_in_dataset 195\n", "dtype: int64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_summary_df[user_summary_df.tweets_in_dataset == 0].count()" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "screen_name 2292\n", "name 2292\n", "organization 2292\n", "position 2289\n", "gender 2292\n", "followers_count 2292\n", "following_count 2292\n", "tweet_count 2292\n", "user_created_at 2292\n", "verified 2292\n", "protected 2292\n", "original 2292\n", "quote 2292\n", "reply 2292\n", "retweet 2292\n", "tweets_in_dataset 2292\n", "dtype: int64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_summary_df = user_summary_df[user_summary_df.tweets_in_dataset != 0]\n", "user_summary_df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Gender" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentage
M129956.7%
F99343.3%
\n", "
" ], "text/plain": [ " count percentage\n", "M 1299 56.7%\n", "F 993 43.3%" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_gender_summary_df = pd.DataFrame({'count':user_summary_df.gender.value_counts(), 'percentage':user_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n", "journalist_gender_summary_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reply data prep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load replies from tweets" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n", "DEBUG:root:Loaded 50000\n", "DEBUG:root:Loaded 100000\n", "DEBUG:root:Loaded 150000\n", "DEBUG:root:Loaded 200000\n", "DEBUG:root:Loaded 250000\n", "INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n", "DEBUG:root:Loaded 300000\n", "DEBUG:root:Loaded 350000\n", "DEBUG:root:Loaded 400000\n", "DEBUG:root:Loaded 450000\n", "DEBUG:root:Loaded 500000\n", "INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n", "DEBUG:root:Loaded 550000\n", "DEBUG:root:Loaded 600000\n", "DEBUG:root:Loaded 650000\n", "DEBUG:root:Loaded 700000\n", "DEBUG:root:Loaded 750000\n", "DEBUG:root:Loaded 800000\n" ] }, { "data": { "text/plain": [ "tweet_id 126254\n", "user_id 126254\n", "screen_name 126254\n", "reply_to_user_id 126254\n", "reply_to_screen_name 126254\n", "tweet_created_at 126254\n", "dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Simply the tweet on load\n", "def reply_transform(tweet):\n", " if tweet_type(tweet) == 'reply': \n", " return {\n", " 'tweet_id': tweet['id_str'],\n", " 'user_id': tweet['user']['id_str'],\n", " 'screen_name': tweet['user']['screen_name'],\n", " 'reply_to_user_id': tweet['in_reply_to_user_id_str'],\n", " 'reply_to_screen_name': tweet['in_reply_to_screen_name'],\n", " 'tweet_created_at': date_parse(tweet['created_at']) \n", " }\n", " return None\n", "\n", "base_reply_df = load_tweet_df(reply_transform, ['tweet_id', 'user_id', 'screen_name', 'reply_to_user_id',\n", " 'reply_to_screen_name', 'tweet_created_at'],\n", " dedupe_columns=['tweet_id'])\n", "\n", "base_reply_df.count()" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_iduser_idscreen_namereply_to_user_idreply_to_screen_nametweet_created_at
0872495244062978048327862439jonathanvswan59331128PhilipRucker2017-06-07 16:47:31+00:00
1872473152160399361327862439jonathanvswan2856617865RPhuket2017-06-07 15:19:43+00:00
2872266930341728256327862439jonathanvswan1854392378hrm_19732017-06-07 01:40:16+00:00
3872250430109175809327862439jonathanvswan390985197MikeBastasch2017-06-07 00:34:42+00:00
4872218322187767808327862439jonathanvswan407013776burgessev2017-06-06 22:27:07+00:00
\n", "
" ], "text/plain": [ " tweet_id user_id screen_name reply_to_user_id \\\n", "0 872495244062978048 327862439 jonathanvswan 59331128 \n", "1 872473152160399361 327862439 jonathanvswan 2856617865 \n", "2 872266930341728256 327862439 jonathanvswan 1854392378 \n", "3 872250430109175809 327862439 jonathanvswan 390985197 \n", "4 872218322187767808 327862439 jonathanvswan 407013776 \n", "\n", " reply_to_screen_name tweet_created_at \n", "0 PhilipRucker 2017-06-07 16:47:31+00:00 \n", "1 RPhuket 2017-06-07 15:19:43+00:00 \n", "2 hrm_1973 2017-06-07 01:40:16+00:00 \n", "3 MikeBastasch 2017-06-07 00:34:42+00:00 \n", "4 burgessev 2017-06-06 22:27:07+00:00 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base_reply_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Add gender of replier" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id 126254\n", "user_id 126254\n", "screen_name 126254\n", "reply_to_user_id 126254\n", "reply_to_screen_name 126254\n", "tweet_created_at 126254\n", "gender 126254\n", "dtype: int64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reply_df = base_reply_df.join(user_summary_df['gender'], on='user_id')\n", "reply_df.count()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How may user have been replied to by journalists?" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "31034" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "reply_df['reply_to_user_id'].unique().size" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Limit to beltway journalists" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "tweet_id 43390\n", "user_id 43390\n", "screen_name 43390\n", "reply_to_user_id 43390\n", "reply_to_screen_name 43390\n", "tweet_created_at 43390\n", "gender 43390\n", "reply_to_gender 43390\n", "dtype: int64" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_reply_df = reply_df.join(user_summary_df['gender'], how='inner', on='reply_to_user_id', rsuffix='_reply')\n", "journalists_reply_df.rename(columns = {'gender_reply': 'reply_to_gender'}, inplace=True)\n", "journalists_reply_df.count()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
tweet_iduser_idscreen_namereply_to_user_idreply_to_screen_nametweet_created_atgenderreply_to_gender
4872218322187767808327862439jonathanvswan407013776burgessev2017-06-06 22:27:07+00:00MM
234871795694020984833195840597JNicholsonInDC407013776burgessev2017-06-05 18:27:45+00:00MM
572870371176866041856163589845PoliticoKevin407013776burgessev2017-06-01 20:07:13+00:00MM
728870659438901940224115564212IsaacDovere407013776burgessev2017-06-02 15:12:40+00:00MM
731872473152143667201167024520rachaelmbade407013776burgessev2017-06-07 15:19:43+00:00FM
\n", "
" ], "text/plain": [ " tweet_id user_id screen_name reply_to_user_id \\\n", "4 872218322187767808 327862439 jonathanvswan 407013776 \n", "234 871795694020984833 195840597 JNicholsonInDC 407013776 \n", "572 870371176866041856 163589845 PoliticoKevin 407013776 \n", "728 870659438901940224 115564212 IsaacDovere 407013776 \n", "731 872473152143667201 167024520 rachaelmbade 407013776 \n", "\n", " reply_to_screen_name tweet_created_at gender reply_to_gender \n", "4 burgessev 2017-06-06 22:27:07+00:00 M M \n", "234 burgessev 2017-06-05 18:27:45+00:00 M M \n", "572 burgessev 2017-06-01 20:07:13+00:00 M M \n", "728 burgessev 2017-06-02 15:12:40+00:00 M M \n", "731 burgessev 2017-06-07 15:19:43+00:00 F M " ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_reply_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions for summarizing replies by beltway journalists" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "# Gender of beltway journalists replied to by beltway journalists\n", "def journalist_reply_gender_summary(reply_df):\n", " gender_summary_df = pd.DataFrame({'count':reply_df.reply_to_gender.value_counts(), \n", " 'percentage': reply_df.reply_to_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n", " gender_summary_df.reset_index(inplace=True)\n", " gender_summary_df['avg_replies'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1) \n", " gender_summary_df.set_index('index', inplace=True, drop=True)\n", " return gender_summary_df\n", "\n", "# Reply to beltway journalists by beltway journalists\n", "def journalist_reply_summary(reply_df):\n", " # Reply to count\n", " reply_count_df = pd.DataFrame(reply_df.reply_to_user_id.value_counts().rename('reply_to_count'))\n", " \n", " # Replying to users. That is, the number of unique users replying to each user.\n", " reply_to_user_id_per_user_df = reply_df[['reply_to_user_id', 'user_id']].drop_duplicates()\n", " replying_to_user_count_df = pd.DataFrame(reply_to_user_id_per_user_df.groupby('reply_to_user_id').size(), columns=['replying_count'])\n", " replying_to_user_count_df.index.name = 'user_id'\n", " \n", " # Join with user summary\n", " journalist_reply_summary_df = user_summary_df.join([reply_count_df, replying_to_user_count_df])\n", " journalist_reply_summary_df.fillna(0, inplace=True)\n", " journalist_reply_summary_df = journalist_reply_summary_df.sort_values(['reply_to_count', 'replying_count', 'followers_count'], ascending=False)\n", " return journalist_reply_summary_df\n", "\n", "# Gender of top journalists replied to by beltway journalists\n", "def top_journalist_reply_gender_summary(reply_summary_df, replying_count_threshold=0, head=100):\n", " top_reply_summary_df = reply_summary_df[reply_summary_df.replying_count > replying_count_threshold].head(head)\n", " return pd.DataFrame({'count': top_reply_summary_df.gender.value_counts(), \n", " 'percentage': top_reply_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n", "\n", "# Fields for displaying journalist mention summaries\n", "journalist_reply_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'reply_to_count', 'replying_count']\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reply analysis\n", "*Note that for each of these, the complete list is being written to CSV in the output directory.*\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Of replies by journalists, how many are by males / females?" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
replypercentageavg_replies
gender
F31,831.0025.2%32.06
M94,423.0074.8%72.69
\n", "
" ], "text/plain": [ " reply percentage avg_replies\n", "gender \n", "F 31,831.00 25.2% 32.06\n", "M 94,423.00 74.8% 72.69" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "replies_by_gender_df = user_summary_df[['gender', 'reply']].groupby('gender').sum()\n", "replies_by_gender_df['percentage'] = replies_by_gender_df.reply.div(replies_by_gender_df.reply.sum()).mul(100).round(1).astype(str) + '%'\n", "replies_by_gender_df.reset_index(inplace=True)\n", "replies_by_gender_df['avg_replies'] = replies_by_gender_df.apply(lambda row: row['reply'] / journalist_gender_summary_df.loc[row['gender']]['count'], axis=1) \n", "replies_by_gender_df.set_index('gender', inplace=True, drop=True)\n", "# return gender_summary_df\n", "replies_by_gender_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Which journalists reply the most?" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_counttweet_countreplytweets_in_dataset
user_id
3817401ericgellerGeller, EricPoliticoM581732087639,033.0011,432.00
22891564chrisgeidnerGeidner, ChrisBuzzFeedM833162055043,917.006,244.00
118130765dylanlscottScott, Dylan L.Stat NewsM20122424972,040.003,960.00
19576571JaredRizziRizzi, JaredSirius XM Satellite RadioM13545416201,949.005,567.00
275207082AlexParkerDCParker, Alexander M.Bloomberg BNAM38281421501,714.003,983.00
63717541phillyrich1Weinstein, RichardC–SPANM3827273411,532.002,261.00
203226736SharylAttkissonAttkisson, SharylSinclair Broadcast GroupF132973245391,458.002,154.00
16812908crousselleRousselle, ChristineTownhallF53271187131,089.002,351.00
14529929jaketapperTapper, JakeCNNM13056801481431,040.005,078.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762675261,026.003,066.00
27882000jamiedupreeDupree, JamieCox BroadcastingM14084846181993.002,108.00
3372900155samtayreyReyes, SamanthaCNNF103444783933.001,349.00
132482136Yaro_RTYaroshevsky, AlexeyRTTV AmericaM1296826795910.001,199.00
46955476GrahamDavidAGraham, David A.The AtlanticM2211293391908.001,566.00
16459325ryanbeckwithBeckwith, Ryan TeagueTime MagazineM2094792203901.005,187.00
25702314EricMGarciaGarcia, Eric M.CQ Roll CallM309444783863.003,584.00
12245632jackshaferShafer, JackPoliticoM7399644726861.002,016.00
273540698MKTWgoldsteinGoldstein, StevenMarketWatchM1018541497857.001,897.00
19847765sahilkapurKapur, SahilBloomberg NewsM6908651628853.002,022.00
6904552juliemasonMason, JulieSirius XM Satellite RadioF3127629214852.001,213.00
225265639ddale8Dale, DanielToronto StarM18067169807848.002,496.00
15837659jbenderyBendery, JenniferHuffington PostM4100065406844.002,600.00
15146659JSwiftTWSSwift, James A.Weekly StandardM569184245830.002,612.00
227790723RichardRubinDCRubin, RichardBloomberg NewsM1301517796807.001,312.00
14517538derekwillisWillis, DerekProPublicaM1804979502781.001,811.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "3817401 ericgeller Geller, Eric Politico \n", "22891564 chrisgeidner Geidner, Chris BuzzFeed \n", "118130765 dylanlscott Scott, Dylan L. Stat News \n", "19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n", "275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n", "63717541 phillyrich1 Weinstein, Richard C–SPAN \n", "203226736 SharylAttkisson Attkisson, Sharyl Sinclair Broadcast Group \n", "16812908 crousselle Rousselle, Christine Townhall \n", "14529929 jaketapper Tapper, Jake CNN \n", "46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n", "27882000 jamiedupree Dupree, Jamie Cox Broadcasting \n", "3372900155 samtayrey Reyes, Samantha CNN \n", "132482136 Yaro_RT Yaroshevsky, Alexey RTTV America \n", "46955476 GrahamDavidA Graham, David A. The Atlantic \n", "16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n", "25702314 EricMGarcia Garcia, Eric M. CQ Roll Call \n", "12245632 jackshafer Shafer, Jack Politico \n", "273540698 MKTWgoldstein Goldstein, Steven MarketWatch \n", "19847765 sahilkapur Kapur, Sahil Bloomberg News \n", "6904552 juliemason Mason, Julie Sirius XM Satellite Radio \n", "225265639 ddale8 Dale, Daniel Toronto Star \n", "15837659 jbendery Bendery, Jennifer Huffington Post \n", "15146659 JSwiftTWS Swift, James A. Weekly Standard \n", "227790723 RichardRubinDC Rubin, Richard Bloomberg News \n", "14517538 derekwillis Willis, Derek ProPublica \n", "\n", " gender followers_count tweet_count reply \\\n", "user_id \n", "3817401 M 58173 208763 9,033.00 \n", "22891564 M 83316 205504 3,917.00 \n", "118130765 M 20122 42497 2,040.00 \n", "19576571 M 13545 41620 1,949.00 \n", "275207082 M 3828 142150 1,714.00 \n", "63717541 M 3827 27341 1,532.00 \n", "203226736 F 132973 24539 1,458.00 \n", "16812908 F 5327 118713 1,089.00 \n", "14529929 M 1305680 148143 1,040.00 \n", "46557945 M 55762 67526 1,026.00 \n", "27882000 M 140848 46181 993.00 \n", "3372900155 F 10344 4783 933.00 \n", "132482136 M 12968 26795 910.00 \n", "46955476 M 22112 93391 908.00 \n", "16459325 M 20947 92203 901.00 \n", "25702314 M 3094 44783 863.00 \n", "12245632 M 73996 44726 861.00 \n", "273540698 M 10185 41497 857.00 \n", "19847765 M 69086 51628 853.00 \n", "6904552 F 31276 29214 852.00 \n", "225265639 M 180671 69807 848.00 \n", "15837659 M 41000 65406 844.00 \n", "15146659 M 5691 84245 830.00 \n", "227790723 M 13015 17796 807.00 \n", "14517538 M 18049 79502 781.00 \n", "\n", " tweets_in_dataset \n", "user_id \n", "3817401 11,432.00 \n", "22891564 6,244.00 \n", "118130765 3,960.00 \n", "19576571 5,567.00 \n", "275207082 3,983.00 \n", "63717541 2,261.00 \n", "203226736 2,154.00 \n", "16812908 2,351.00 \n", "14529929 5,078.00 \n", "46557945 3,066.00 \n", "27882000 2,108.00 \n", "3372900155 1,349.00 \n", "132482136 1,199.00 \n", "46955476 1,566.00 \n", "16459325 5,187.00 \n", "25702314 3,584.00 \n", "12245632 2,016.00 \n", "273540698 1,897.00 \n", "19847765 2,022.00 \n", "6904552 1,213.00 \n", "225265639 2,496.00 \n", "15837659 2,600.00 \n", "15146659 2,612.00 \n", "227790723 1,312.00 \n", "14517538 1,811.00 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_summary_df[['screen_name', 'name', 'organization', 'gender', 'followers_count', 'tweet_count', 'reply', 'tweets_in_dataset']].sort_values(['reply'], ascending=False).head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Replies to all account (not just journalists)\n", "This is based on screen name, which could have changed during collection period. However, for the users that would be at the top of this list, seems unlikely." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists replying to other accounts, who do they reply to the most?" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
reply_to_countreplying_count
ericgeller198075
chrisgeidner190137
dylanlscott109165
JaredRizzi75046
StevenTDennis74593
AlexParkerDC72023
sahilkapur66235
jseldin6532
MEPFuller52292
amaxsmith4986
ddale849520
CraigCaplan3888
ChuckWendig3721
pbump35543
kelmej34029
benjamin_oc32211
KimberlyRobinsn3217
darth31532
ZoeTillman3118
RichardRubinDC30541
sdonnan3047
AaronMehta30435
MikeSacksEsq29918
heathdwilliams2981
ryanbeckwith29749
\n", "
" ], "text/plain": [ " reply_to_count replying_count\n", "ericgeller 1980 75\n", "chrisgeidner 1901 37\n", "dylanlscott 1091 65\n", "JaredRizzi 750 46\n", "StevenTDennis 745 93\n", "AlexParkerDC 720 23\n", "sahilkapur 662 35\n", "jseldin 653 2\n", "MEPFuller 522 92\n", "amaxsmith 498 6\n", "ddale8 495 20\n", "CraigCaplan 388 8\n", "ChuckWendig 372 1\n", "pbump 355 43\n", "kelmej 340 29\n", "benjamin_oc 322 11\n", "KimberlyRobinsn 321 7\n", "darth 315 32\n", "ZoeTillman 311 8\n", "RichardRubinDC 305 41\n", "sdonnan 304 7\n", "AaronMehta 304 35\n", "MikeSacksEsq 299 18\n", "heathdwilliams 298 1\n", "ryanbeckwith 297 49" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reply to count\n", "reply_to_count_screen_name_df = pd.DataFrame(reply_df.reply_to_screen_name.value_counts().rename('reply_to_count'))\n", "\n", "# Count of replying users\n", "reply_to_user_id_per_user_screen_name_df = reply_df[['reply_to_screen_name', 'user_id']].drop_duplicates()\n", "replying_count_screen_name_df = pd.DataFrame(reply_to_user_id_per_user_screen_name_df.groupby('reply_to_screen_name').size(), columns=['replying_count'])\n", "replying_count_screen_name_df.index.name = 'screen_name'\n", "\n", "all_replied_to_df = reply_to_count_screen_name_df.join(replying_count_screen_name_df)\n", "all_replied_to_df.to_csv('output/all_replied_to_by_journalists.csv')\n", "all_replied_to_df.head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists replying to other journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists replying to other journalists, who do they reply to the most?" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countreply_to_countreplying_count
user_id
3817401ericgellerGeller, EricPoliticoM581731,980.0075.00
22891564chrisgeidnerGeidner, ChrisBuzzFeedM833161,901.0037.00
118130765dylanlscottScott, Dylan L.Stat NewsM201221,091.0065.00
19576571JaredRizziRizzi, JaredSirius XM Satellite RadioM13545750.0046.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762745.0093.00
275207082AlexParkerDCParker, Alexander M.Bloomberg BNAM3828720.0023.00
19847765sahilkapurKapur, SahilBloomberg NewsM69086662.0035.00
583821006jseldinSeldin, JeffVoice of AmericaM5365653.002.00
398088661MEPFullerFuller, Matt E.Huffington PostM77919522.0092.00
44951698amaxsmithSmith, MaxWTOP RadioM4726498.006.00
225265639ddale8Dale, DanielToronto StarM180671495.0020.00
317980134CraigCaplanCaplan, CraigC–SPANM6143388.008.00
16061946kelmejMejdrich, KellieCQ Roll CallF4146340.0029.00
15365623benjamin_ocO’Connell, BenjaminC–SPANM1455322.0011.00
906734342KimberlyRobinsnRobinson, Kimberly S.Bloomberg BNAF7170321.007.00
52392666ZoeTillmanTillman, ZoeBuzzFeedF15246311.008.00
227790723RichardRubinDCRubin, RichardBloomberg NewsM13015305.0041.00
103016675AaronMehtaMehta, AaronSightline Media GroupM11124304.0035.00
21810329sdonnanDonnan, ShawnFinancial TimesM12311304.007.00
90478926MikeSacksEsqSacks, MikeScripps Howard News ServiceM9289299.0018.00
16459325ryanbeckwithBeckwith, Ryan TeagueTime MagazineM20947297.0049.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762283.0072.00
11771512OKnoxKnox, OlivierYahoo NewsM44715269.0045.00
21696279brianbeutlerBeutler, Brian AlfredNew RepublicM74435269.0034.00
21212087OlivianuzziNuzzi, OliviaNew YorkF136276243.0025.00
\n", "
" ], "text/plain": [ " screen_name name \\\n", "user_id \n", "3817401 ericgeller Geller, Eric \n", "22891564 chrisgeidner Geidner, Chris \n", "118130765 dylanlscott Scott, Dylan L. \n", "19576571 JaredRizzi Rizzi, Jared \n", "46557945 StevenTDennis Dennis, Steven T. \n", "275207082 AlexParkerDC Parker, Alexander M. \n", "19847765 sahilkapur Kapur, Sahil \n", "583821006 jseldin Seldin, Jeff \n", "398088661 MEPFuller Fuller, Matt E. \n", "44951698 amaxsmith Smith, Max \n", "225265639 ddale8 Dale, Daniel \n", "317980134 CraigCaplan Caplan, Craig \n", "16061946 kelmej Mejdrich, Kellie \n", "15365623 benjamin_oc O’Connell, Benjamin \n", "906734342 KimberlyRobinsn Robinson, Kimberly S. \n", "52392666 ZoeTillman Tillman, Zoe \n", "227790723 RichardRubinDC Rubin, Richard \n", "103016675 AaronMehta Mehta, Aaron \n", "21810329 sdonnan Donnan, Shawn \n", "90478926 MikeSacksEsq Sacks, Mike \n", "16459325 ryanbeckwith Beckwith, Ryan Teague \n", "21252618 JakeSherman Sherman, Jacob S. \n", "11771512 OKnox Knox, Olivier \n", "21696279 brianbeutler Beutler, Brian Alfred \n", "21212087 Olivianuzzi Nuzzi, Olivia \n", "\n", " organization gender followers_count \\\n", "user_id \n", "3817401 Politico M 58173 \n", "22891564 BuzzFeed M 83316 \n", "118130765 Stat News M 20122 \n", "19576571 Sirius XM Satellite Radio M 13545 \n", "46557945 Bloomberg News M 55762 \n", "275207082 Bloomberg BNA M 3828 \n", "19847765 Bloomberg News M 69086 \n", "583821006 Voice of America M 5365 \n", "398088661 Huffington Post M 77919 \n", "44951698 WTOP Radio M 4726 \n", "225265639 Toronto Star M 180671 \n", "317980134 C–SPAN M 6143 \n", "16061946 CQ Roll Call F 4146 \n", "15365623 C–SPAN M 1455 \n", "906734342 Bloomberg BNA F 7170 \n", "52392666 BuzzFeed F 15246 \n", "227790723 Bloomberg News M 13015 \n", "103016675 Sightline Media Group M 11124 \n", "21810329 Financial Times M 12311 \n", "90478926 Scripps Howard News Service M 9289 \n", "16459325 Time Magazine M 20947 \n", "21252618 Politico M 81762 \n", "11771512 Yahoo News M 44715 \n", "21696279 New Republic M 74435 \n", "21212087 New York F 136276 \n", "\n", " reply_to_count replying_count \n", "user_id \n", "3817401 1,980.00 75.00 \n", "22891564 1,901.00 37.00 \n", "118130765 1,091.00 65.00 \n", "19576571 750.00 46.00 \n", "46557945 745.00 93.00 \n", "275207082 720.00 23.00 \n", "19847765 662.00 35.00 \n", "583821006 653.00 2.00 \n", "398088661 522.00 92.00 \n", "44951698 498.00 6.00 \n", "225265639 495.00 20.00 \n", "317980134 388.00 8.00 \n", "16061946 340.00 29.00 \n", "15365623 322.00 11.00 \n", "906734342 321.00 7.00 \n", "52392666 311.00 8.00 \n", "227790723 305.00 41.00 \n", "103016675 304.00 35.00 \n", "21810329 304.00 7.00 \n", "90478926 299.00 18.00 \n", "16459325 297.00 49.00 \n", "21252618 283.00 72.00 \n", "11771512 269.00 45.00 \n", "21696279 269.00 34.00 \n", "21212087 243.00 25.00 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_reply_summary_df = journalist_reply_summary(journalists_reply_df)\n", "journalists_reply_summary_df.to_csv('output/journalists_replied_to_by_journalists.csv')\n", "journalists_reply_summary_df[journalist_reply_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists replying to other journalists, how many that they reply to are male / female?" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_replies
index
M3317876.5%25.54
F1021223.5%10.28
\n", "
" ], "text/plain": [ " count percentage avg_replies\n", "index \n", "M 33178 76.5% 25.54\n", "F 10212 23.5% 10.28" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_reply_gender_summary(journalists_reply_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how many times do journalists reply to each journalists?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
reply_to_count
count2,292.00
mean18.93
std81.76
min0.00
25%0.00
50%1.00
75%8.00
max1,980.00
\n", "
" ], "text/plain": [ " reply_to_count\n", "count 2,292.00\n", "mean 18.93\n", "std 81.76\n", "min 0.00\n", "25% 0.00\n", "50% 1.00\n", "75% 8.00\n", "max 1,980.00" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_reply_summary_df[['reply_to_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists replying to female journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists replying to female journalists, which female journalists are replied to the most?" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countreply_to_countreplying_count
user_id
16061946kelmejMejdrich, KellieCQ Roll CallF4146340.0029.00
906734342KimberlyRobinsnRobinson, Kimberly S.Bloomberg BNAF7170321.007.00
52392666ZoeTillmanTillman, ZoeBuzzFeedF15246311.008.00
21212087OlivianuzziNuzzi, OliviaNew YorkF136276243.0025.00
83462293SarahMMimmsMimms, SarahBuzzFeedF6216236.0024.00
19186003seungminkimKim, Seung MinPoliticoF33980233.0084.00
3372900155samtayreyReyes, SamanthaCNNF10344219.0018.00
18825339CahnEmilyCahn, EmilyMicF16980212.0048.00
1132012321DaniellaMicaelaDiaz, DaniellaCNNF14612181.0036.00
158072303ValerieInsinnaInsinna, ValerieDefense NewsF4572175.0020.00
36607254Oriana0214Pawlyk, OrianaMilitary.comF6397174.0021.00
96405362laurenonthehillCamera, Lauren S.U.S. News & World ReportF3396162.006.00
16812908crousselleRousselle, ChristineTownhallF5327149.005.00
47758416marissaaevansEvans, MarissaTexas TribuneF6850137.001.00
45399148jenepsEpstein, JenniferBloomberg NewsF61242134.0023.00
16434028gabbilevyLevy, Gabrielle F.U.S. News & World ReportF2209132.004.00
14870670KateNoceraNocera, KateBuzzFeedF27714116.0036.00
18501487leighmunsilMunsil, LeighCNNF11059107.0030.00
313545488LauraLitvanLitvan, LauraBloomberg NewsF4468104.0012.00
116341480RosieGrayGray, RosieThe AtlanticF9693599.0031.00
82151660kelsey_snellSnell, KelseWashington PostF810896.0044.00
70511174Hadas_GoldGold, HadasPoliticoF4522195.0047.00
38855868brennawilliamsWilliams, BrennaCNNF729993.0022.00
273700859kpolantzPolantz, Katelyn J.National Law JournalF248391.006.00
3273220608KatherineBScottScott, KatherineBloomberg GovernmentF184185.0014.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "16061946 kelmej Mejdrich, Kellie CQ Roll Call \n", "906734342 KimberlyRobinsn Robinson, Kimberly S. Bloomberg BNA \n", "52392666 ZoeTillman Tillman, Zoe BuzzFeed \n", "21212087 Olivianuzzi Nuzzi, Olivia New York \n", "83462293 SarahMMimms Mimms, Sarah BuzzFeed \n", "19186003 seungminkim Kim, Seung Min Politico \n", "3372900155 samtayrey Reyes, Samantha CNN \n", "18825339 CahnEmily Cahn, Emily Mic \n", "1132012321 DaniellaMicaela Diaz, Daniella CNN \n", "158072303 ValerieInsinna Insinna, Valerie Defense News \n", "36607254 Oriana0214 Pawlyk, Oriana Military.com \n", "96405362 laurenonthehill Camera, Lauren S. U.S. News & World Report \n", "16812908 crousselle Rousselle, Christine Townhall \n", "47758416 marissaaevans Evans, Marissa Texas Tribune \n", "45399148 jeneps Epstein, Jennifer Bloomberg News \n", "16434028 gabbilevy Levy, Gabrielle F. U.S. News & World Report \n", "14870670 KateNocera Nocera, Kate BuzzFeed \n", "18501487 leighmunsil Munsil, Leigh CNN \n", "313545488 LauraLitvan Litvan, Laura Bloomberg News \n", "116341480 RosieGray Gray, Rosie The Atlantic \n", "82151660 kelsey_snell Snell, Kelse Washington Post \n", "70511174 Hadas_Gold Gold, Hadas Politico \n", "38855868 brennawilliams Williams, Brenna CNN \n", "273700859 kpolantz Polantz, Katelyn J. National Law Journal \n", "3273220608 KatherineBScott Scott, Katherine Bloomberg Government \n", "\n", " gender followers_count reply_to_count replying_count \n", "user_id \n", "16061946 F 4146 340.00 29.00 \n", "906734342 F 7170 321.00 7.00 \n", "52392666 F 15246 311.00 8.00 \n", "21212087 F 136276 243.00 25.00 \n", "83462293 F 6216 236.00 24.00 \n", "19186003 F 33980 233.00 84.00 \n", "3372900155 F 10344 219.00 18.00 \n", "18825339 F 16980 212.00 48.00 \n", "1132012321 F 14612 181.00 36.00 \n", "158072303 F 4572 175.00 20.00 \n", "36607254 F 6397 174.00 21.00 \n", "96405362 F 3396 162.00 6.00 \n", "16812908 F 5327 149.00 5.00 \n", "47758416 F 6850 137.00 1.00 \n", "45399148 F 61242 134.00 23.00 \n", "16434028 F 2209 132.00 4.00 \n", "14870670 F 27714 116.00 36.00 \n", "18501487 F 11059 107.00 30.00 \n", "313545488 F 4468 104.00 12.00 \n", "116341480 F 96935 99.00 31.00 \n", "82151660 F 8108 96.00 44.00 \n", "70511174 F 45221 95.00 47.00 \n", "38855868 F 7299 93.00 22.00 \n", "273700859 F 2483 91.00 6.00 \n", "3273220608 F 1841 85.00 14.00 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "female_journalists_reply_summary_df = journalists_reply_summary_df[journalists_reply_summary_df.gender == 'F']\n", "female_journalists_reply_summary_df.to_csv('output/female_journalists_replied_to_by_journalists.csv')\n", "female_journalists_reply_summary_df[journalist_reply_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how many times do journalists reply to each female journalist?" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
reply_to_count
count993.00
mean10.28
std31.00
min0.00
25%0.00
50%1.00
75%6.00
max340.00
\n", "
" ], "text/plain": [ " reply_to_count\n", "count 993.00\n", "mean 10.28\n", "std 31.00\n", "min 0.00\n", "25% 0.00\n", "50% 1.00\n", "75% 6.00\n", "max 340.00" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "female_journalists_reply_summary_df[['reply_to_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists replying to male journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists replying to male journalists, which male journalists are replied to the most?" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countreply_to_countreplying_count
user_id
3817401ericgellerGeller, EricPoliticoM581731,980.0075.00
22891564chrisgeidnerGeidner, ChrisBuzzFeedM833161,901.0037.00
118130765dylanlscottScott, Dylan L.Stat NewsM201221,091.0065.00
19576571JaredRizziRizzi, JaredSirius XM Satellite RadioM13545750.0046.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762745.0093.00
275207082AlexParkerDCParker, Alexander M.Bloomberg BNAM3828720.0023.00
19847765sahilkapurKapur, SahilBloomberg NewsM69086662.0035.00
583821006jseldinSeldin, JeffVoice of AmericaM5365653.002.00
398088661MEPFullerFuller, Matt E.Huffington PostM77919522.0092.00
44951698amaxsmithSmith, MaxWTOP RadioM4726498.006.00
225265639ddale8Dale, DanielToronto StarM180671495.0020.00
317980134CraigCaplanCaplan, CraigC–SPANM6143388.008.00
15365623benjamin_ocO’Connell, BenjaminC–SPANM1455322.0011.00
227790723RichardRubinDCRubin, RichardBloomberg NewsM13015305.0041.00
103016675AaronMehtaMehta, AaronSightline Media GroupM11124304.0035.00
21810329sdonnanDonnan, ShawnFinancial TimesM12311304.007.00
90478926MikeSacksEsqSacks, MikeScripps Howard News ServiceM9289299.0018.00
16459325ryanbeckwithBeckwith, Ryan TeagueTime MagazineM20947297.0049.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762283.0072.00
11771512OKnoxKnox, OlivierYahoo NewsM44715269.0045.00
21696279brianbeutlerBeutler, Brian AlfredNew RepublicM74435269.0034.00
190360266connorobrienNHO’Brien, ConnorPoliticoM6158241.0035.00
63717541phillyrich1Weinstein, RichardC–SPANM3827241.004.00
407013776burgessevEverett, John B.PoliticoM31010238.0079.00
80111587JeffYoungYoung, JeffreyHuffington PostM26497238.0031.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "3817401 ericgeller Geller, Eric Politico \n", "22891564 chrisgeidner Geidner, Chris BuzzFeed \n", "118130765 dylanlscott Scott, Dylan L. Stat News \n", "19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n", "46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n", "275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n", "19847765 sahilkapur Kapur, Sahil Bloomberg News \n", "583821006 jseldin Seldin, Jeff Voice of America \n", "398088661 MEPFuller Fuller, Matt E. Huffington Post \n", "44951698 amaxsmith Smith, Max WTOP Radio \n", "225265639 ddale8 Dale, Daniel Toronto Star \n", "317980134 CraigCaplan Caplan, Craig C–SPAN \n", "15365623 benjamin_oc O’Connell, Benjamin C–SPAN \n", "227790723 RichardRubinDC Rubin, Richard Bloomberg News \n", "103016675 AaronMehta Mehta, Aaron Sightline Media Group \n", "21810329 sdonnan Donnan, Shawn Financial Times \n", "90478926 MikeSacksEsq Sacks, Mike Scripps Howard News Service \n", "16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n", "21252618 JakeSherman Sherman, Jacob S. Politico \n", "11771512 OKnox Knox, Olivier Yahoo News \n", "21696279 brianbeutler Beutler, Brian Alfred New Republic \n", "190360266 connorobrienNH O’Brien, Connor Politico \n", "63717541 phillyrich1 Weinstein, Richard C–SPAN \n", "407013776 burgessev Everett, John B. Politico \n", "80111587 JeffYoung Young, Jeffrey Huffington Post \n", "\n", " gender followers_count reply_to_count replying_count \n", "user_id \n", "3817401 M 58173 1,980.00 75.00 \n", "22891564 M 83316 1,901.00 37.00 \n", "118130765 M 20122 1,091.00 65.00 \n", "19576571 M 13545 750.00 46.00 \n", "46557945 M 55762 745.00 93.00 \n", "275207082 M 3828 720.00 23.00 \n", "19847765 M 69086 662.00 35.00 \n", "583821006 M 5365 653.00 2.00 \n", "398088661 M 77919 522.00 92.00 \n", "44951698 M 4726 498.00 6.00 \n", "225265639 M 180671 495.00 20.00 \n", "317980134 M 6143 388.00 8.00 \n", "15365623 M 1455 322.00 11.00 \n", "227790723 M 13015 305.00 41.00 \n", "103016675 M 11124 304.00 35.00 \n", "21810329 M 12311 304.00 7.00 \n", "90478926 M 9289 299.00 18.00 \n", "16459325 M 20947 297.00 49.00 \n", "21252618 M 81762 283.00 72.00 \n", "11771512 M 44715 269.00 45.00 \n", "21696279 M 74435 269.00 34.00 \n", "190360266 M 6158 241.00 35.00 \n", "63717541 M 3827 241.00 4.00 \n", "407013776 M 31010 238.00 79.00 \n", "80111587 M 26497 238.00 31.00 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "male_journalists_reply_summary_df = journalists_reply_summary_df[journalists_reply_summary_df.gender == 'M']\n", "male_journalists_reply_summary_df.to_csv('output/male_journalists_replied_to_by_journalists.csv')\n", "male_journalists_reply_summary_df[journalist_reply_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how often do journalists reply to each male journalist?" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
reply_to_count
count1,299.00
mean25.54
std104.71
min0.00
25%0.00
50%1.00
75%11.00
max1,980.00
\n", "
" ], "text/plain": [ " reply_to_count\n", "count 1,299.00\n", "mean 25.54\n", "std 104.71\n", "min 0.00\n", "25% 0.00\n", "50% 1.00\n", "75% 11.00\n", "max 1,980.00" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "male_journalists_reply_summary_df[['reply_to_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Female journalists replying to journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of female journalists replying to journalists, who do they reply to the most?" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countreply_to_countreplying_count
user_id
906734342KimberlyRobinsnRobinson, Kimberly S.Bloomberg BNAF7170313.002.00
52392666ZoeTillmanTillman, ZoeBuzzFeedF15246305.003.00
16061946kelmejMejdrich, KellieCQ Roll CallF4146295.0015.00
83462293SarahMMimmsMimms, SarahBuzzFeedF6216195.007.00
21212087OlivianuzziNuzzi, OliviaNew YorkF136276190.009.00
3372900155samtayreyReyes, SamanthaCNNF10344179.007.00
96405362laurenonthehillCamera, Lauren S.U.S. News & World ReportF3396159.005.00
18825339CahnEmilyCahn, EmilyMicF16980148.0018.00
1132012321DaniellaMicaelaDiaz, DaniellaCNNF14612144.0022.00
16812908crousselleRousselle, ChristineTownhallF5327144.003.00
47758416marissaaevansEvans, MarissaTexas TribuneF6850137.001.00
36607254Oriana0214Pawlyk, OrianaMilitary.comF6397133.005.00
16434028gabbilevyLevy, Gabrielle F.U.S. News & World ReportF2209130.002.00
19186003seungminkimKim, Seung MinPoliticoF33980108.0036.00
45399148jenepsEpstein, JenniferBloomberg NewsF61242103.007.00
158072303ValerieInsinnaInsinna, ValerieDefense NewsF457297.008.00
313545488LauraLitvanLitvan, LauraBloomberg NewsF446897.005.00
18501487leighmunsilMunsil, LeighCNNF1105988.0013.00
273700859kpolantzPolantz, Katelyn J.National Law JournalF248384.002.00
114670081rebleberLeber, Rebecca J.Mother JonesF1646779.003.00
407013776burgessevEverett, John B.PoliticoM3101078.0030.00
118130765dylanlscottScott, Dylan L.Stat NewsM2012278.0020.00
116341480RosieGrayGray, RosieThe AtlanticF9693573.0013.00
103016675AaronMehtaMehta, AaronSightline Media GroupM1112472.0010.00
48038024karentraversTravers, KarenABC NewsF1715571.007.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "906734342 KimberlyRobinsn Robinson, Kimberly S. Bloomberg BNA \n", "52392666 ZoeTillman Tillman, Zoe BuzzFeed \n", "16061946 kelmej Mejdrich, Kellie CQ Roll Call \n", "83462293 SarahMMimms Mimms, Sarah BuzzFeed \n", "21212087 Olivianuzzi Nuzzi, Olivia New York \n", "3372900155 samtayrey Reyes, Samantha CNN \n", "96405362 laurenonthehill Camera, Lauren S. U.S. News & World Report \n", "18825339 CahnEmily Cahn, Emily Mic \n", "1132012321 DaniellaMicaela Diaz, Daniella CNN \n", "16812908 crousselle Rousselle, Christine Townhall \n", "47758416 marissaaevans Evans, Marissa Texas Tribune \n", "36607254 Oriana0214 Pawlyk, Oriana Military.com \n", "16434028 gabbilevy Levy, Gabrielle F. U.S. News & World Report \n", "19186003 seungminkim Kim, Seung Min Politico \n", "45399148 jeneps Epstein, Jennifer Bloomberg News \n", "158072303 ValerieInsinna Insinna, Valerie Defense News \n", "313545488 LauraLitvan Litvan, Laura Bloomberg News \n", "18501487 leighmunsil Munsil, Leigh CNN \n", "273700859 kpolantz Polantz, Katelyn J. National Law Journal \n", "114670081 rebleber Leber, Rebecca J. Mother Jones \n", "407013776 burgessev Everett, John B. Politico \n", "118130765 dylanlscott Scott, Dylan L. Stat News \n", "116341480 RosieGray Gray, Rosie The Atlantic \n", "103016675 AaronMehta Mehta, Aaron Sightline Media Group \n", "48038024 karentravers Travers, Karen ABC News \n", "\n", " gender followers_count reply_to_count replying_count \n", "user_id \n", "906734342 F 7170 313.00 2.00 \n", "52392666 F 15246 305.00 3.00 \n", "16061946 F 4146 295.00 15.00 \n", "83462293 F 6216 195.00 7.00 \n", "21212087 F 136276 190.00 9.00 \n", "3372900155 F 10344 179.00 7.00 \n", "96405362 F 3396 159.00 5.00 \n", "18825339 F 16980 148.00 18.00 \n", "1132012321 F 14612 144.00 22.00 \n", "16812908 F 5327 144.00 3.00 \n", "47758416 F 6850 137.00 1.00 \n", "36607254 F 6397 133.00 5.00 \n", "16434028 F 2209 130.00 2.00 \n", "19186003 F 33980 108.00 36.00 \n", "45399148 F 61242 103.00 7.00 \n", "158072303 F 4572 97.00 8.00 \n", "313545488 F 4468 97.00 5.00 \n", "18501487 F 11059 88.00 13.00 \n", "273700859 F 2483 84.00 2.00 \n", "114670081 F 16467 79.00 3.00 \n", "407013776 M 31010 78.00 30.00 \n", "118130765 M 20122 78.00 20.00 \n", "116341480 F 96935 73.00 13.00 \n", "103016675 M 11124 72.00 10.00 \n", "48038024 F 17155 71.00 7.00 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_replied_to_by_female_summary_df = journalist_reply_summary(journalists_reply_df[journalists_reply_df.gender == 'F'])\n", "journalists_replied_to_by_female_summary_df.to_csv('output/journalists_replied_to_by_female_journalists.csv')\n", "journalists_replied_to_by_female_summary_df[journalist_reply_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of female journalists replying to journalists, how many males / females do they reply to?" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_replies
index
F741272.1%7.46
M286427.9%2.20
\n", "
" ], "text/plain": [ " count percentage avg_replies\n", "index \n", "F 7412 72.1% 7.46\n", "M 2864 27.9% 2.20" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_reply_gender_summary(journalists_reply_df[journalists_reply_df.gender == 'F'])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Male journalists replying to journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of male journalists replying to journalists, who do they reply to the most?" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countreply_to_countreplying_count
user_id
3817401ericgellerGeller, EricPoliticoM581731,926.0058.00
22891564chrisgeidnerGeidner, ChrisBuzzFeedM833161,864.0028.00
118130765dylanlscottScott, Dylan L.Stat NewsM201221,013.0045.00
19576571JaredRizziRizzi, JaredSirius XM Satellite RadioM13545726.0035.00
275207082AlexParkerDCParker, Alexander M.Bloomberg BNAM3828709.0020.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762686.0061.00
583821006jseldinSeldin, JeffVoice of AmericaM5365653.002.00
19847765sahilkapurKapur, SahilBloomberg NewsM69086646.0024.00
44951698amaxsmithSmith, MaxWTOP RadioM4726495.004.00
225265639ddale8Dale, DanielToronto StarM180671490.0016.00
398088661MEPFullerFuller, Matt E.Huffington PostM77919456.0064.00
317980134CraigCaplanCaplan, CraigC–SPANM6143388.008.00
15365623benjamin_ocO’Connell, BenjaminC–SPANM1455318.008.00
21810329sdonnanDonnan, ShawnFinancial TimesM12311303.006.00
90478926MikeSacksEsqSacks, MikeScripps Howard News ServiceM9289294.0013.00
227790723RichardRubinDCRubin, RichardBloomberg NewsM13015284.0033.00
21696279brianbeutlerBeutler, Brian AlfredNew RepublicM74435262.0029.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762249.0052.00
16459325ryanbeckwithBeckwith, Ryan TeagueTime MagazineM20947241.0030.00
11771512OKnoxKnox, OlivierYahoo NewsM44715240.0035.00
63717541phillyrich1Weinstein, RichardC–SPANM3827240.003.00
103016675AaronMehtaMehta, AaronSightline Media GroupM11124232.0025.00
26559241fordmFord, Matt S.The AtlanticM27571232.0015.00
437019753TimothyNoah1Noah, Timothy R.PoliticoM15090231.0012.00
23332846mattzapZapotosky, MattWashington PostM56887230.007.00
\n", "
" ], "text/plain": [ " screen_name name organization \\\n", "user_id \n", "3817401 ericgeller Geller, Eric Politico \n", "22891564 chrisgeidner Geidner, Chris BuzzFeed \n", "118130765 dylanlscott Scott, Dylan L. Stat News \n", "19576571 JaredRizzi Rizzi, Jared Sirius XM Satellite Radio \n", "275207082 AlexParkerDC Parker, Alexander M. Bloomberg BNA \n", "46557945 StevenTDennis Dennis, Steven T. Bloomberg News \n", "583821006 jseldin Seldin, Jeff Voice of America \n", "19847765 sahilkapur Kapur, Sahil Bloomberg News \n", "44951698 amaxsmith Smith, Max WTOP Radio \n", "225265639 ddale8 Dale, Daniel Toronto Star \n", "398088661 MEPFuller Fuller, Matt E. Huffington Post \n", "317980134 CraigCaplan Caplan, Craig C–SPAN \n", "15365623 benjamin_oc O’Connell, Benjamin C–SPAN \n", "21810329 sdonnan Donnan, Shawn Financial Times \n", "90478926 MikeSacksEsq Sacks, Mike Scripps Howard News Service \n", "227790723 RichardRubinDC Rubin, Richard Bloomberg News \n", "21696279 brianbeutler Beutler, Brian Alfred New Republic \n", "21252618 JakeSherman Sherman, Jacob S. Politico \n", "16459325 ryanbeckwith Beckwith, Ryan Teague Time Magazine \n", "11771512 OKnox Knox, Olivier Yahoo News \n", "63717541 phillyrich1 Weinstein, Richard C–SPAN \n", "103016675 AaronMehta Mehta, Aaron Sightline Media Group \n", "26559241 fordm Ford, Matt S. The Atlantic \n", "437019753 TimothyNoah1 Noah, Timothy R. Politico \n", "23332846 mattzap Zapotosky, Matt Washington Post \n", "\n", " gender followers_count reply_to_count replying_count \n", "user_id \n", "3817401 M 58173 1,926.00 58.00 \n", "22891564 M 83316 1,864.00 28.00 \n", "118130765 M 20122 1,013.00 45.00 \n", "19576571 M 13545 726.00 35.00 \n", "275207082 M 3828 709.00 20.00 \n", "46557945 M 55762 686.00 61.00 \n", "583821006 M 5365 653.00 2.00 \n", "19847765 M 69086 646.00 24.00 \n", "44951698 M 4726 495.00 4.00 \n", "225265639 M 180671 490.00 16.00 \n", "398088661 M 77919 456.00 64.00 \n", "317980134 M 6143 388.00 8.00 \n", "15365623 M 1455 318.00 8.00 \n", "21810329 M 12311 303.00 6.00 \n", "90478926 M 9289 294.00 13.00 \n", "227790723 M 13015 284.00 33.00 \n", "21696279 M 74435 262.00 29.00 \n", "21252618 M 81762 249.00 52.00 \n", "16459325 M 20947 241.00 30.00 \n", "11771512 M 44715 240.00 35.00 \n", "63717541 M 3827 240.00 3.00 \n", "103016675 M 11124 232.00 25.00 \n", "26559241 M 27571 232.00 15.00 \n", "437019753 M 15090 231.00 12.00 \n", "23332846 M 56887 230.00 7.00 " ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalists_replied_to_by_male_summary_df = journalist_reply_summary(journalists_reply_df[journalists_reply_df.gender == 'M'])\n", "journalists_replied_to_by_male_summary_df.to_csv('output/journalists_replied_to_by_male_journalists.csv')\n", "journalists_replied_to_by_male_summary_df[journalist_reply_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of male journalists replying to journalists, how many are male / female?" ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_replies
index
M3031491.5%23.34
F28008.5%2.82
\n", "
" ], "text/plain": [ " count percentage avg_replies\n", "index \n", "M 30314 91.5% 23.34\n", "F 2800 8.5% 2.82" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_reply_gender_summary(journalists_reply_df[journalists_reply_df.gender == 'M'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Following data prep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load following\n", "Users that are followed by beltway journalists" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "follower_user_id 3417018\n", "followed_user_id 3417018\n", "dtype: int64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base_follower_to_followed_df = pd.read_csv('source_data/follower_to_followed.csv', \n", " names=['follower_user_id', 'followed_user_id'],\n", " dtype={'follower_user_id': np.str, 'followed_user_id': np.str})\n", "base_follower_to_followed_df.drop_duplicates(inplace=True)\n", "base_follower_to_followed_df.count()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
follower_user_idfollowed_user_id
0911564863092427779
19115648636953109
291156486424274008
391156486779044378929168384
491156486339834914
\n", "
" ], "text/plain": [ " follower_user_id followed_user_id\n", "0 91156486 3092427779\n", "1 91156486 36953109\n", "2 91156486 424274008\n", "3 91156486 779044378929168384\n", "4 91156486 339834914" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "base_follower_to_followed_df.head()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameorganizationpositiongenderfollowers_countfollowing_counttweet_countuser_created_atverifiedprotected
user_id
20711445Glinski, NinaNaNFreelance ReporterF963507909Thu Feb 12 20:00:53 +0000 2009FalseFalse
258917371Enders, DavidNaNJournalistM14444846296Mon Feb 28 19:52:03 +0000 2011TrueFalse
297046834Barakat, MatthewAssociated PressNorthern Virginia CorrespondentM759352631Wed May 11 20:55:24 +0000 2011TrueFalse
455585786Atkins, KimberlyBoston HeraldChief Washington Reporter/ColumnistF294426916277Thu Jan 05 08:26:46 +0000 2012TrueFalse
42584840Vlahou, ToulaCQ Roll CallEditor & Podcast ProducerF27032016366Tue May 26 07:41:38 +0000 2009FalseFalse
\n", "
" ], "text/plain": [ " name organization \\\n", "user_id \n", "20711445 Glinski, Nina NaN \n", "258917371 Enders, David NaN \n", "297046834 Barakat, Matthew Associated Press \n", "455585786 Atkins, Kimberly Boston Herald \n", "42584840 Vlahou, Toula CQ Roll Call \n", "\n", " position gender followers_count \\\n", "user_id \n", "20711445 Freelance Reporter F 963 \n", "258917371 Journalist M 1444 \n", "297046834 Northern Virginia Correspondent M 759 \n", "455585786 Chief Washington Reporter/Columnist F 2944 \n", "42584840 Editor & Podcast Producer F 2703 \n", "\n", " following_count tweet_count user_created_at \\\n", "user_id \n", "20711445 507 909 Thu Feb 12 20:00:53 +0000 2009 \n", "258917371 484 6296 Mon Feb 28 19:52:03 +0000 2011 \n", "297046834 352 631 Wed May 11 20:55:24 +0000 2011 \n", "455585786 2691 6277 Thu Jan 05 08:26:46 +0000 2012 \n", "42584840 201 6366 Tue May 26 07:41:38 +0000 2009 \n", "\n", " verified protected \n", "user_id \n", "20711445 False False \n", "258917371 True False \n", "297046834 True False \n", "455585786 True False \n", "42584840 False False " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "user_info_df.head()" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "follower_user_id 3311406\n", "followed_user_id 3311406\n", "gender 3311406\n", "dtype: int64" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This will drop followers of journalists that have no tweets\n", "follower_to_followed_df = base_follower_to_followed_df.join(user_summary_df['gender'], on='follower_user_id', how='inner')\n", "follower_to_followed_df.count()" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
follower_user_idfollowed_user_idgender
261152198883291076716F
26215219888119175339F
26315219888418837047F
26415219888259817885F
26515219888287263845F
\n", "
" ], "text/plain": [ " follower_user_id followed_user_id gender\n", "261 15219888 3291076716 F\n", "262 15219888 119175339 F\n", "263 15219888 418837047 F\n", "264 15219888 259817885 F\n", "265 15219888 287263845 F" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_followed_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load followed users" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_name
user_id
17665874onlinehigh
2389275799HLSPOLICY
314728983Veolia_NA
239409802fishingbuk
522799320GoldsmithBev
\n", "
" ], "text/plain": [ " screen_name\n", "user_id \n", "17665874 onlinehigh\n", "2389275799 HLSPOLICY\n", "314728983 Veolia_NA\n", "239409802 fishingbuk\n", "522799320 GoldsmithBev" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "followed_screen_name_lookup_df = pd.read_csv('source_data/followed.csv', \n", " names=['screen_name', 'user_id'],\n", " dtype={'user_id': np.str}).set_index(['user_id'])\n", "followed_screen_name_lookup_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Limit to beltway journalists" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "follower_user_id 280340\n", "followed_user_id 280340\n", "gender 280340\n", "followed_gender 280340\n", "dtype: int64" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_journalist_followed_df = follower_to_followed_df.join(user_summary_df['gender'], how='inner', on='followed_user_id', rsuffix='_followed')\n", "follower_to_journalist_followed_df.rename(columns = {'gender_followed': 'followed_gender'}, inplace=True)\n", "follower_to_journalist_followed_df.count()" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
follower_user_idfollowed_user_idgenderfollowed_gender
2871521988846582653FM
218101578028046582653MM
241531424572246582653MM
406943786528146582653FM
6658516520421146582653MM
\n", "
" ], "text/plain": [ " follower_user_id followed_user_id gender followed_gender\n", "287 15219888 46582653 F M\n", "21810 15780280 46582653 M M\n", "24153 14245722 46582653 M M\n", "40694 37865281 46582653 F M\n", "66585 165204211 46582653 M M" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_journalist_followed_df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Functions for summarizing following by beltway journalists" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "# Gender of beltway journalists followed by beltway journalists\n", "def journalist_followed_gender_summary(follower_to_followed_df):\n", " gender_summary_df = pd.DataFrame({'count':follower_to_followed_df.followed_gender.value_counts(), \n", " 'percentage': follower_to_followed_df.followed_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n", " gender_summary_df.reset_index(inplace=True)\n", " gender_summary_df['avg_followed'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1)\n", " gender_summary_df.set_index('index', inplace=True, drop=True)\n", " return gender_summary_df\n", "\n", "def journalist_following_summary(follower_to_followed_df):\n", " # Following count\n", " following_count_df = pd.DataFrame(follower_to_followed_df.followed_user_id.value_counts().rename('journalist_follower_count'))\n", "\n", " # Join with user summary\n", " journalist_following_summary_df = user_summary_df.join(following_count_df)\n", " journalist_following_summary_df.fillna(0, inplace=True)\n", " journalist_following_summary_df = journalist_following_summary_df.sort_values(['journalist_follower_count', 'followers_count'], ascending=False)\n", " return journalist_following_summary_df\n", "\n", "# Gender of top journalists followed by beltway journalists\n", "def top_journalist_followed_gender_summary(followed_summary_df, head=100):\n", " top_followed_summary_df = followed_summary_df.head(head)\n", " return pd.DataFrame({'count': top_followed_summary_df.gender.value_counts(), \n", " 'percentage': top_followed_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n", "\n", "# Fields for displaying journalist mention summaries\n", "journalist_following_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'journalist_follower_count']\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Following analysis" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists following all accounts" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists following all accounts, who do they follow the most?" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
following_countscreen_name
8132861671BarackObama
512415741629AP
250738771613realDonaldTrump
8070951581nytimes
24677911532washingtonpost
13398358931531HillaryClinton
8189271318833561611522PressSec
8222156738121195531507WhiteHouse
8222156797261004801488POTUS
93002621457politico
303139251402ObamaWhiteHouse
142460011384mikeallen
930691101368maggieNYT
145299291337jaketapper
4283331289cnnbrk
31083511279WSJ
15367916101279POTUS44
503257971258chucktodd
1134208311258PressSec44
160174751234NateSilver538
186228691231ezraklein
861297241173costareports
16525411144Reuters
13304573361128billclinton
53925221124NPR
\n", "
" ], "text/plain": [ " following_count screen_name\n", "813286 1671 BarackObama\n", "51241574 1629 AP\n", "25073877 1613 realDonaldTrump\n", "807095 1581 nytimes\n", "2467791 1532 washingtonpost\n", "1339835893 1531 HillaryClinton\n", "818927131883356161 1522 PressSec\n", "822215673812119553 1507 WhiteHouse\n", "822215679726100480 1488 POTUS\n", "9300262 1457 politico\n", "30313925 1402 ObamaWhiteHouse\n", "14246001 1384 mikeallen\n", "93069110 1368 maggieNYT\n", "14529929 1337 jaketapper\n", "428333 1289 cnnbrk\n", "3108351 1279 WSJ\n", "1536791610 1279 POTUS44\n", "50325797 1258 chucktodd\n", "113420831 1258 PressSec44\n", "16017475 1234 NateSilver538\n", "18622869 1231 ezraklein\n", "86129724 1173 costareports\n", "1652541 1144 Reuters\n", "1330457336 1128 billclinton\n", "5392522 1124 NPR" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Following count\n", "all_followed_df = pd.DataFrame(follower_to_followed_df.followed_user_id.value_counts().rename('following_count')).join(followed_screen_name_lookup_df)\n", "all_followed_df.to_csv('output/all_followed_by_journalists.csv')\n", "all_followed_df.head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists following journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of all journalists following by journalists, who is followed the most?" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countjournalist_follower_count
user_id
14529929jaketapperTapper, JakeCNNM13056801,337.00
50325797chucktoddTodd, ChuckNBC NewsM17812471,258.00
19107878GlennThrushThrush, Glenn H.New York TimesM3081811,116.00
31127446markknollerKnoller, MarkCBS NewsM3014741,107.00
13524182daveweigelWeigel, DavidWashington PostM3323441,106.00
61734492FahrentholdFahrenthold, DavidWashington PostM4517781,082.00
18678924jmartNYTMartin, JonathanNew York TimesM1973221,032.00
39155029mkrajuRaju, Manu K.CNNM88366977.00
16930125edatpostO’Keefe, EdwardWashington PostM58670973.00
85131054jeffzelenyZeleny, JeffCNNM244114970.00
21316253ZekeJMillerMiller, Zeke J.Time MagazineM198517915.00
89820928mitchellreportsMitchell, AndreaNBC NewsF1388543909.00
59676104danbalzBalz, DanielWashington PostM90819892.00
108617810DanaBashCNNBash, DanaCNNF281861884.00
15463671samsteinStein, SamHuffington PostM313211880.00
130945778mollyesqueBall, MollyThe AtlanticF116857877.00
46176168MajorCBSGarrett, MajorCBS NewsM178640872.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762868.00
16187637ChadPergramPergram, ChadFox NewsM59305866.00
22771961AcostaAcosta, JimCNNM350650860.00
12354832kasieHunt, KasieNBC NewsF187357860.00
123327472peterbakernytBaker, PeterNew York TimesM96956856.00
15931637jonkarlKarl, JonathanABC NewsM183467830.00
11771512OKnoxKnox, OlivierYahoo NewsM44715788.00
259395895JohnJHarwoodHarwood, JohnCNBCM149040783.00
\n", "
" ], "text/plain": [ " screen_name name organization gender \\\n", "user_id \n", "14529929 jaketapper Tapper, Jake CNN M \n", "50325797 chucktodd Todd, Chuck NBC News M \n", "19107878 GlennThrush Thrush, Glenn H. New York Times M \n", "31127446 markknoller Knoller, Mark CBS News M \n", "13524182 daveweigel Weigel, David Washington Post M \n", "61734492 Fahrenthold Fahrenthold, David Washington Post M \n", "18678924 jmartNYT Martin, Jonathan New York Times M \n", "39155029 mkraju Raju, Manu K. CNN M \n", "16930125 edatpost O’Keefe, Edward Washington Post M \n", "85131054 jeffzeleny Zeleny, Jeff CNN M \n", "21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n", "89820928 mitchellreports Mitchell, Andrea NBC News F \n", "59676104 danbalz Balz, Daniel Washington Post M \n", "108617810 DanaBashCNN Bash, Dana CNN F \n", "15463671 samstein Stein, Sam Huffington Post M \n", "130945778 mollyesque Ball, Molly The Atlantic F \n", "46176168 MajorCBS Garrett, Major CBS News M \n", "21252618 JakeSherman Sherman, Jacob S. Politico M \n", "16187637 ChadPergram Pergram, Chad Fox News M \n", "22771961 Acosta Acosta, Jim CNN M \n", "12354832 kasie Hunt, Kasie NBC News F \n", "123327472 peterbakernyt Baker, Peter New York Times M \n", "15931637 jonkarl Karl, Jonathan ABC News M \n", "11771512 OKnox Knox, Olivier Yahoo News M \n", "259395895 JohnJHarwood Harwood, John CNBC M \n", "\n", " followers_count journalist_follower_count \n", "user_id \n", "14529929 1305680 1,337.00 \n", "50325797 1781247 1,258.00 \n", "19107878 308181 1,116.00 \n", "31127446 301474 1,107.00 \n", "13524182 332344 1,106.00 \n", "61734492 451778 1,082.00 \n", "18678924 197322 1,032.00 \n", "39155029 88366 977.00 \n", "16930125 58670 973.00 \n", "85131054 244114 970.00 \n", "21316253 198517 915.00 \n", "89820928 1388543 909.00 \n", "59676104 90819 892.00 \n", "108617810 281861 884.00 \n", "15463671 313211 880.00 \n", "130945778 116857 877.00 \n", "46176168 178640 872.00 \n", "21252618 81762 868.00 \n", "16187637 59305 866.00 \n", "22771961 350650 860.00 \n", "12354832 187357 860.00 \n", "123327472 96956 856.00 \n", "15931637 183467 830.00 \n", "11771512 44715 788.00 \n", "259395895 149040 783.00 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_journalist_followed_summary_df = journalist_following_summary(follower_to_journalist_followed_df)\n", "follower_to_journalist_followed_summary_df.to_csv('output/journalists_followed_by_journalists.csv')\n", "follower_to_journalist_followed_summary_df[journalist_following_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists following journalists, what how many of the followed journalists are male / female?" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_followed
index
M17428362.2%134.17
F10605737.8%106.80
\n", "
" ], "text/plain": [ " count percentage avg_followed\n", "index \n", "M 174283 62.2% 134.17\n", "F 106057 37.8% 106.80" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_followed_gender_summary(follower_to_journalist_followed_df)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how many journalists follow each journalist?" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
journalist_follower_count
count2,292.00
mean122.31
std161.53
min0.00
25%26.00
50%64.00
75%145.00
max1,337.00
\n", "
" ], "text/plain": [ " journalist_follower_count\n", "count 2,292.00\n", "mean 122.31\n", "std 161.53\n", "min 0.00\n", "25% 26.00\n", "50% 64.00\n", "75% 145.00\n", "max 1,337.00" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_journalist_followed_summary_df[['journalist_follower_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists following female journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of journalists following female journalists, which female journalists do they follow the most?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countjournalist_follower_count
user_id
89820928mitchellreportsMitchell, AndreaNBC NewsF1388543909.00
108617810DanaBashCNNBash, DanaCNNF281861884.00
130945778mollyesqueBall, MollyThe AtlanticF116857877.00
12354832kasieHunt, KasieNBC NewsF187357860.00
33919343AshleyRParkerParker, AshleyWashington PostF122382777.00
28181835jpaceDCPace, JulieAssociated PressF46017738.00
70511174Hadas_GoldGold, HadasPoliticoF45221679.00
21307076SusanPagePage, SusanUSA TodayF48675670.00
19186003seungminkimKim, Seung MinPoliticoF33980664.00
45399148jenepsEpstein, JenniferBloomberg NewsF61242631.00
224320485KellyOO’Donnell, KellyNBC NewsF148476630.00
20776497BFischerMartinFischer Martin, BetsyBloomberg NewsF50890609.00
77032777apalmerdcPalmer, Anna A.PoliticoF30523591.00
116341480RosieGrayGray, RosieThe AtlanticF96935589.00
237477771juliehdavisDavis, JulieNew York TimesF49821570.00
58869089margarettalevTalev, MargaretBloomberg NewsF19588569.00
14870670KateNoceraNocera, KateBuzzFeedF27714567.00
46817943brikeilarcnnKeilar, BriannaCNNF105276557.00
22772264caroleleeLee, CarolWall Street Journal / Dow JonesF31840552.00
15159913JFKucinichKucinich, JacquelineDaily BeastF31210549.00
297532865kwelkernbcWelker, KristenNBC NewsF99234537.00
15727317aterkelTerkel, AmandaHuffington PostF78736527.00
17881467rebeccagbergBerg, RebeccaRealClearPoliticsF48798516.00
151444950DaviSusanDavis, SusanNational Public RadioF27297506.00
27055034SabrinaSiddiquiSiddiqui, SabrinaGuardian USF53835474.00
\n", "
" ], "text/plain": [ " screen_name name \\\n", "user_id \n", "89820928 mitchellreports Mitchell, Andrea \n", "108617810 DanaBashCNN Bash, Dana \n", "130945778 mollyesque Ball, Molly \n", "12354832 kasie Hunt, Kasie \n", "33919343 AshleyRParker Parker, Ashley \n", "28181835 jpaceDC Pace, Julie \n", "70511174 Hadas_Gold Gold, Hadas \n", "21307076 SusanPage Page, Susan \n", "19186003 seungminkim Kim, Seung Min \n", "45399148 jeneps Epstein, Jennifer \n", "224320485 KellyO O’Donnell, Kelly \n", "20776497 BFischerMartin Fischer Martin, Betsy \n", "77032777 apalmerdc Palmer, Anna A. \n", "116341480 RosieGray Gray, Rosie \n", "237477771 juliehdavis Davis, Julie \n", "58869089 margarettalev Talev, Margaret \n", "14870670 KateNocera Nocera, Kate \n", "46817943 brikeilarcnn Keilar, Brianna \n", "22772264 carolelee Lee, Carol \n", "15159913 JFKucinich Kucinich, Jacqueline \n", "297532865 kwelkernbc Welker, Kristen \n", "15727317 aterkel Terkel, Amanda \n", "17881467 rebeccagberg Berg, Rebecca \n", "151444950 DaviSusan Davis, Susan \n", "27055034 SabrinaSiddiqui Siddiqui, Sabrina \n", "\n", " organization gender followers_count \\\n", "user_id \n", "89820928 NBC News F 1388543 \n", "108617810 CNN F 281861 \n", "130945778 The Atlantic F 116857 \n", "12354832 NBC News F 187357 \n", "33919343 Washington Post F 122382 \n", "28181835 Associated Press F 46017 \n", "70511174 Politico F 45221 \n", "21307076 USA Today F 48675 \n", "19186003 Politico F 33980 \n", "45399148 Bloomberg News F 61242 \n", "224320485 NBC News F 148476 \n", "20776497 Bloomberg News F 50890 \n", "77032777 Politico F 30523 \n", "116341480 The Atlantic F 96935 \n", "237477771 New York Times F 49821 \n", "58869089 Bloomberg News F 19588 \n", "14870670 BuzzFeed F 27714 \n", "46817943 CNN F 105276 \n", "22772264 Wall Street Journal / Dow Jones F 31840 \n", "15159913 Daily Beast F 31210 \n", "297532865 NBC News F 99234 \n", "15727317 Huffington Post F 78736 \n", "17881467 RealClearPolitics F 48798 \n", "151444950 National Public Radio F 27297 \n", "27055034 Guardian US F 53835 \n", "\n", " journalist_follower_count \n", "user_id \n", "89820928 909.00 \n", "108617810 884.00 \n", "130945778 877.00 \n", "12354832 860.00 \n", "33919343 777.00 \n", "28181835 738.00 \n", "70511174 679.00 \n", "21307076 670.00 \n", "19186003 664.00 \n", "45399148 631.00 \n", "224320485 630.00 \n", "20776497 609.00 \n", "77032777 591.00 \n", "116341480 589.00 \n", "237477771 570.00 \n", "58869089 569.00 \n", "14870670 567.00 \n", "46817943 557.00 \n", "22772264 552.00 \n", "15159913 549.00 \n", "297532865 537.00 \n", "15727317 527.00 \n", "17881467 516.00 \n", "151444950 506.00 \n", "27055034 474.00 " ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_female_journalist_followed_df = follower_to_journalist_followed_summary_df[follower_to_journalist_followed_summary_df.gender == 'F']\n", "follower_to_female_journalist_followed_df.to_csv('output/female_journalists_followed_by_journalists.csv')\n", "follower_to_female_journalist_followed_df[journalist_following_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how many journalists follow each female journalist?" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
journalist_follower_count
count993.00
mean106.80
std131.81
min0.00
25%24.00
50%59.00
75%131.00
max909.00
\n", "
" ], "text/plain": [ " journalist_follower_count\n", "count 993.00\n", "mean 106.80\n", "std 131.81\n", "min 0.00\n", "25% 24.00\n", "50% 59.00\n", "75% 131.00\n", "max 909.00" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_female_journalist_followed_df[['journalist_follower_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Journalists following male journalists" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countjournalist_follower_count
user_id
14529929jaketapperTapper, JakeCNNM13056801,337.00
50325797chucktoddTodd, ChuckNBC NewsM17812471,258.00
19107878GlennThrushThrush, Glenn H.New York TimesM3081811,116.00
31127446markknollerKnoller, MarkCBS NewsM3014741,107.00
13524182daveweigelWeigel, DavidWashington PostM3323441,106.00
61734492FahrentholdFahrenthold, DavidWashington PostM4517781,082.00
18678924jmartNYTMartin, JonathanNew York TimesM1973221,032.00
39155029mkrajuRaju, Manu K.CNNM88366977.00
16930125edatpostO’Keefe, EdwardWashington PostM58670973.00
85131054jeffzelenyZeleny, JeffCNNM244114970.00
21316253ZekeJMillerMiller, Zeke J.Time MagazineM198517915.00
59676104danbalzBalz, DanielWashington PostM90819892.00
15463671samsteinStein, SamHuffington PostM313211880.00
46176168MajorCBSGarrett, MajorCBS NewsM178640872.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762868.00
16187637ChadPergramPergram, ChadFox NewsM59305866.00
22771961AcostaAcosta, JimCNNM350650860.00
123327472peterbakernytBaker, PeterNew York TimesM96956856.00
15931637jonkarlKarl, JonathanABC NewsM183467830.00
11771512OKnoxKnox, OlivierYahoo NewsM44715788.00
259395895JohnJHarwoodHarwood, JohnCNBCM149040783.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762781.00
18172905rickkleinKlein, RichardABC NewsM109170737.00
21768766jonathanweismanWeisman, JonathanNew York TimesM57549728.00
997684836pkcapitolKane, PaulWashington PostM31300728.00
\n", "
" ], "text/plain": [ " screen_name name organization gender \\\n", "user_id \n", "14529929 jaketapper Tapper, Jake CNN M \n", "50325797 chucktodd Todd, Chuck NBC News M \n", "19107878 GlennThrush Thrush, Glenn H. New York Times M \n", "31127446 markknoller Knoller, Mark CBS News M \n", "13524182 daveweigel Weigel, David Washington Post M \n", "61734492 Fahrenthold Fahrenthold, David Washington Post M \n", "18678924 jmartNYT Martin, Jonathan New York Times M \n", "39155029 mkraju Raju, Manu K. CNN M \n", "16930125 edatpost O’Keefe, Edward Washington Post M \n", "85131054 jeffzeleny Zeleny, Jeff CNN M \n", "21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n", "59676104 danbalz Balz, Daniel Washington Post M \n", "15463671 samstein Stein, Sam Huffington Post M \n", "46176168 MajorCBS Garrett, Major CBS News M \n", "21252618 JakeSherman Sherman, Jacob S. Politico M \n", "16187637 ChadPergram Pergram, Chad Fox News M \n", "22771961 Acosta Acosta, Jim CNN M \n", "123327472 peterbakernyt Baker, Peter New York Times M \n", "15931637 jonkarl Karl, Jonathan ABC News M \n", "11771512 OKnox Knox, Olivier Yahoo News M \n", "259395895 JohnJHarwood Harwood, John CNBC M \n", "46557945 StevenTDennis Dennis, Steven T. Bloomberg News M \n", "18172905 rickklein Klein, Richard ABC News M \n", "21768766 jonathanweisman Weisman, Jonathan New York Times M \n", "997684836 pkcapitol Kane, Paul Washington Post M \n", "\n", " followers_count journalist_follower_count \n", "user_id \n", "14529929 1305680 1,337.00 \n", "50325797 1781247 1,258.00 \n", "19107878 308181 1,116.00 \n", "31127446 301474 1,107.00 \n", "13524182 332344 1,106.00 \n", "61734492 451778 1,082.00 \n", "18678924 197322 1,032.00 \n", "39155029 88366 977.00 \n", "16930125 58670 973.00 \n", "85131054 244114 970.00 \n", "21316253 198517 915.00 \n", "59676104 90819 892.00 \n", "15463671 313211 880.00 \n", "46176168 178640 872.00 \n", "21252618 81762 868.00 \n", "16187637 59305 866.00 \n", "22771961 350650 860.00 \n", "123327472 96956 856.00 \n", "15931637 183467 830.00 \n", "11771512 44715 788.00 \n", "259395895 149040 783.00 \n", "46557945 55762 781.00 \n", "18172905 109170 737.00 \n", "21768766 57549 728.00 \n", "997684836 31300 728.00 " ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_male_journalist_followed_df = follower_to_journalist_followed_summary_df[follower_to_journalist_followed_summary_df.gender == 'M']\n", "follower_to_male_journalist_followed_df.to_csv('output/male_journalists_followed_by_journalists.csv')\n", "follower_to_male_journalist_followed_df[journalist_following_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### On average, how many journalists follow each male journalists?" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
journalist_follower_count
count1,299.00
mean134.17
std180.14
min0.00
25%28.00
50%67.00
75%156.00
max1,337.00
\n", "
" ], "text/plain": [ " journalist_follower_count\n", "count 1,299.00\n", "mean 134.17\n", "std 180.14\n", "min 0.00\n", "25% 28.00\n", "50% 67.00\n", "75% 156.00\n", "max 1,337.00" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "follower_to_male_journalist_followed_df[['journalist_follower_count']].describe()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Female journalists following journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of female journalists following journalists, who do they follow the most?" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countjournalist_follower_count
user_id
14529929jaketapperTapper, JakeCNNM1305680619.00
50325797chucktoddTodd, ChuckNBC NewsM1781247569.00
31127446markknollerKnoller, MarkCBS NewsM301474505.00
19107878GlennThrushThrush, Glenn H.New York TimesM308181490.00
13524182daveweigelWeigel, DavidWashington PostM332344484.00
61734492FahrentholdFahrenthold, DavidWashington PostM451778474.00
18678924jmartNYTMartin, JonathanNew York TimesM197322445.00
16930125edatpostO’Keefe, EdwardWashington PostM58670444.00
89820928mitchellreportsMitchell, AndreaNBC NewsF1388543441.00
85131054jeffzelenyZeleny, JeffCNNM244114435.00
39155029mkrajuRaju, Manu K.CNNM88366434.00
108617810DanaBashCNNBash, DanaCNNF281861430.00
21316253ZekeJMillerMiller, Zeke J.Time MagazineM198517420.00
22771961AcostaAcosta, JimCNNM350650402.00
15463671samsteinStein, SamHuffington PostM313211398.00
16187637ChadPergramPergram, ChadFox NewsM59305397.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762394.00
46176168MajorCBSGarrett, MajorCBS NewsM178640390.00
15931637jonkarlKarl, JonathanABC NewsM183467389.00
130945778mollyesqueBall, MollyThe AtlanticF116857386.00
59676104danbalzBalz, DanielWashington PostM90819382.00
123327472peterbakernytBaker, PeterNew York TimesM96956379.00
12354832kasieHunt, KasieNBC NewsF187357366.00
11771512OKnoxKnox, OlivierYahoo NewsM44715354.00
33919343AshleyRParkerParker, AshleyWashington PostF122382339.00
\n", "
" ], "text/plain": [ " screen_name name organization gender \\\n", "user_id \n", "14529929 jaketapper Tapper, Jake CNN M \n", "50325797 chucktodd Todd, Chuck NBC News M \n", "31127446 markknoller Knoller, Mark CBS News M \n", "19107878 GlennThrush Thrush, Glenn H. New York Times M \n", "13524182 daveweigel Weigel, David Washington Post M \n", "61734492 Fahrenthold Fahrenthold, David Washington Post M \n", "18678924 jmartNYT Martin, Jonathan New York Times M \n", "16930125 edatpost O’Keefe, Edward Washington Post M \n", "89820928 mitchellreports Mitchell, Andrea NBC News F \n", "85131054 jeffzeleny Zeleny, Jeff CNN M \n", "39155029 mkraju Raju, Manu K. CNN M \n", "108617810 DanaBashCNN Bash, Dana CNN F \n", "21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n", "22771961 Acosta Acosta, Jim CNN M \n", "15463671 samstein Stein, Sam Huffington Post M \n", "16187637 ChadPergram Pergram, Chad Fox News M \n", "21252618 JakeSherman Sherman, Jacob S. Politico M \n", "46176168 MajorCBS Garrett, Major CBS News M \n", "15931637 jonkarl Karl, Jonathan ABC News M \n", "130945778 mollyesque Ball, Molly The Atlantic F \n", "59676104 danbalz Balz, Daniel Washington Post M \n", "123327472 peterbakernyt Baker, Peter New York Times M \n", "12354832 kasie Hunt, Kasie NBC News F \n", "11771512 OKnox Knox, Olivier Yahoo News M \n", "33919343 AshleyRParker Parker, Ashley Washington Post F \n", "\n", " followers_count journalist_follower_count \n", "user_id \n", "14529929 1305680 619.00 \n", "50325797 1781247 569.00 \n", "31127446 301474 505.00 \n", "19107878 308181 490.00 \n", "13524182 332344 484.00 \n", "61734492 451778 474.00 \n", "18678924 197322 445.00 \n", "16930125 58670 444.00 \n", "89820928 1388543 441.00 \n", "85131054 244114 435.00 \n", "39155029 88366 434.00 \n", "108617810 281861 430.00 \n", "21316253 198517 420.00 \n", "22771961 350650 402.00 \n", "15463671 313211 398.00 \n", "16187637 59305 397.00 \n", "21252618 81762 394.00 \n", "46176168 178640 390.00 \n", "15931637 183467 389.00 \n", "130945778 116857 386.00 \n", "59676104 90819 382.00 \n", "123327472 96956 379.00 \n", "12354832 187357 366.00 \n", "11771512 44715 354.00 \n", "33919343 122382 339.00 " ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "female_follower_to_journalist_followed_df = journalist_following_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'F'])\n", "female_follower_to_journalist_followed_df.to_csv('output/journalists_followed_by_female_journalists.csv')\n", "female_follower_to_journalist_followed_df[journalist_following_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of female journalists following journalists, how many of the followed journalists are male / female?" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_followed
index
M7395062.0%56.93
F4530038.0%45.62
\n", "
" ], "text/plain": [ " count percentage avg_followed\n", "index \n", "M 73950 62.0% 56.93\n", "F 45300 38.0% 45.62" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_followed_gender_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'F'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Male journalists following journalists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Of male journalists following journalists, who do they follow the most?" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
screen_namenameorganizationgenderfollowers_countjournalist_follower_count
user_id
14529929jaketapperTapper, JakeCNNM1305680718.00
50325797chucktoddTodd, ChuckNBC NewsM1781247689.00
19107878GlennThrushThrush, Glenn H.New York TimesM308181626.00
13524182daveweigelWeigel, DavidWashington PostM332344622.00
61734492FahrentholdFahrenthold, DavidWashington PostM451778608.00
31127446markknollerKnoller, MarkCBS NewsM301474602.00
18678924jmartNYTMartin, JonathanNew York TimesM197322587.00
39155029mkrajuRaju, Manu K.CNNM88366543.00
85131054jeffzelenyZeleny, JeffCNNM244114535.00
16930125edatpostO’Keefe, EdwardWashington PostM58670529.00
59676104danbalzBalz, DanielWashington PostM90819510.00
21316253ZekeJMillerMiller, Zeke J.Time MagazineM198517495.00
12354832kasieHunt, KasieNBC NewsF187357494.00
130945778mollyesqueBall, MollyThe AtlanticF116857491.00
15463671samsteinStein, SamHuffington PostM313211482.00
46176168MajorCBSGarrett, MajorCBS NewsM178640482.00
123327472peterbakernytBaker, PeterNew York TimesM96956477.00
21252618JakeShermanSherman, Jacob S.PoliticoM81762474.00
16187637ChadPergramPergram, ChadFox NewsM59305469.00
89820928mitchellreportsMitchell, AndreaNBC NewsF1388543468.00
259395895JohnJHarwoodHarwood, JohnCNBCM149040464.00
22771961AcostaAcosta, JimCNNM350650458.00
108617810DanaBashCNNBash, DanaCNNF281861454.00
46557945StevenTDennisDennis, Steven T.Bloomberg NewsM55762446.00
15931637jonkarlKarl, JonathanABC NewsM183467441.00
\n", "
" ], "text/plain": [ " screen_name name organization gender \\\n", "user_id \n", "14529929 jaketapper Tapper, Jake CNN M \n", "50325797 chucktodd Todd, Chuck NBC News M \n", "19107878 GlennThrush Thrush, Glenn H. New York Times M \n", "13524182 daveweigel Weigel, David Washington Post M \n", "61734492 Fahrenthold Fahrenthold, David Washington Post M \n", "31127446 markknoller Knoller, Mark CBS News M \n", "18678924 jmartNYT Martin, Jonathan New York Times M \n", "39155029 mkraju Raju, Manu K. CNN M \n", "85131054 jeffzeleny Zeleny, Jeff CNN M \n", "16930125 edatpost O’Keefe, Edward Washington Post M \n", "59676104 danbalz Balz, Daniel Washington Post M \n", "21316253 ZekeJMiller Miller, Zeke J. Time Magazine M \n", "12354832 kasie Hunt, Kasie NBC News F \n", "130945778 mollyesque Ball, Molly The Atlantic F \n", "15463671 samstein Stein, Sam Huffington Post M \n", "46176168 MajorCBS Garrett, Major CBS News M \n", "123327472 peterbakernyt Baker, Peter New York Times M \n", "21252618 JakeSherman Sherman, Jacob S. Politico M \n", "16187637 ChadPergram Pergram, Chad Fox News M \n", "89820928 mitchellreports Mitchell, Andrea NBC News F \n", "259395895 JohnJHarwood Harwood, John CNBC M \n", "22771961 Acosta Acosta, Jim CNN M \n", "108617810 DanaBashCNN Bash, Dana CNN F \n", "46557945 StevenTDennis Dennis, Steven T. Bloomberg News M \n", "15931637 jonkarl Karl, Jonathan ABC News M \n", "\n", " followers_count journalist_follower_count \n", "user_id \n", "14529929 1305680 718.00 \n", "50325797 1781247 689.00 \n", "19107878 308181 626.00 \n", "13524182 332344 622.00 \n", "61734492 451778 608.00 \n", "31127446 301474 602.00 \n", "18678924 197322 587.00 \n", "39155029 88366 543.00 \n", "85131054 244114 535.00 \n", "16930125 58670 529.00 \n", "59676104 90819 510.00 \n", "21316253 198517 495.00 \n", "12354832 187357 494.00 \n", "130945778 116857 491.00 \n", "15463671 313211 482.00 \n", "46176168 178640 482.00 \n", "123327472 96956 477.00 \n", "21252618 81762 474.00 \n", "16187637 59305 469.00 \n", "89820928 1388543 468.00 \n", "259395895 149040 464.00 \n", "22771961 350650 458.00 \n", "108617810 281861 454.00 \n", "46557945 55762 446.00 \n", "15931637 183467 441.00 " ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "male_follower_to_journalist_followed_df = journalist_following_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'M'])\n", "male_follower_to_journalist_followed_df.to_csv('output/journalists_followed_by_male_journalists.csv')\n", "male_follower_to_journalist_followed_df[journalist_following_summary_fields].head(25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Of male journalists following journalists, how many of the following journalists are male / female?" ] }, { "cell_type": "code", "execution_count": 54, "metadata": { "scrolled": false }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countpercentageavg_followed
index
M10033362.3%77.24
F6075737.7%61.19
\n", "
" ], "text/plain": [ " count percentage avg_followed\n", "index \n", "M 100333 62.3% 77.24\n", "F 60757 37.7% 61.19" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "journalist_followed_gender_summary(follower_to_journalist_followed_df[follower_to_journalist_followed_df.gender == 'M'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" }, "toc": { "nav_menu": { "height": "512px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "toc_cell": true, "toc_position": { "height": "586px", "left": "0px", "right": "1088px", "top": "112px", "width": "343px" }, "toc_section_display": "block", "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }