{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\" style=\"margin-top: 1em;\"><ul class=\"toc-item\"><li><span><a href=\"#Gender-dynamics\" data-toc-modified-id=\"Gender-dynamics-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Gender dynamics</a></span><ul class=\"toc-item\"><li><span><a href=\"#Tweet-data-prep\" data-toc-modified-id=\"Tweet-data-prep-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Tweet data prep</a></span><ul class=\"toc-item\"><li><span><a href=\"#Load-the-tweets\" data-toc-modified-id=\"Load-the-tweets-1.1.1\"><span class=\"toc-item-num\">1.1.1&nbsp;&nbsp;</span>Load the tweets</a></span></li></ul></li><li><span><a href=\"#Tweet-analysis\" data-toc-modified-id=\"Tweet-analysis-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>Tweet analysis</a></span><ul class=\"toc-item\"><li><span><a href=\"#What-are-the-first-and-last-tweets-in-the-dataset?\" data-toc-modified-id=\"What-are-the-first-and-last-tweets-in-the-dataset?-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>What are the first and last tweets in the dataset?</a></span></li><li><span><a href=\"#How-many-retweets,-original-tweets,-replies,-and-quotes-are-in-dataset?\" data-toc-modified-id=\"How-many-retweets,-original-tweets,-replies,-and-quotes-are-in-dataset?-1.2.2\"><span class=\"toc-item-num\">1.2.2&nbsp;&nbsp;</span>How many retweets, original tweets, replies, and quotes are in dataset?</a></span></li></ul></li><li><span><a href=\"#Tweeter-data-prep\" data-toc-modified-id=\"Tweeter-data-prep-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Tweeter data prep</a></span><ul class=\"toc-item\"><li><span><a href=\"#Load-user-lookup\" data-toc-modified-id=\"Load-user-lookup-1.3.1\"><span class=\"toc-item-num\">1.3.1&nbsp;&nbsp;</span>Load user lookup</a></span></li><li><span><a href=\"#Tweets-in-dataset-per-tweeter\" data-toc-modified-id=\"Tweets-in-dataset-per-tweeter-1.3.2\"><span class=\"toc-item-num\">1.3.2&nbsp;&nbsp;</span>Tweets in dataset per tweeter</a></span></li><li><span><a href=\"#Load-user-info\" data-toc-modified-id=\"Load-user-info-1.3.3\"><span class=\"toc-item-num\">1.3.3&nbsp;&nbsp;</span>Load user info</a></span></li><li><span><a href=\"#Remove-users-with-no-tweets-in-dataset\" data-toc-modified-id=\"Remove-users-with-no-tweets-in-dataset-1.3.4\"><span class=\"toc-item-num\">1.3.4&nbsp;&nbsp;</span>Remove users with no tweets in dataset</a></span></li></ul></li><li><span><a href=\"#Tweeter-analysis\" data-toc-modified-id=\"Tweeter-analysis-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Tweeter analysis</a></span><ul class=\"toc-item\"><li><span><a href=\"#How-many-of-the-journalists-are-male-/-female?\" data-toc-modified-id=\"How-many-of-the-journalists-are-male-/-female?-1.4.1\"><span class=\"toc-item-num\">1.4.1&nbsp;&nbsp;</span>How many of the journalists are male / female?</a></span></li><li><span><a href=\"#Summary\" data-toc-modified-id=\"Summary-1.4.2\"><span class=\"toc-item-num\">1.4.2&nbsp;&nbsp;</span>Summary</a></span><ul class=\"toc-item\"><li><span><a href=\"#All\" data-toc-modified-id=\"All-1.4.2.1\"><span class=\"toc-item-num\">1.4.2.1&nbsp;&nbsp;</span>All</a></span></li><li><span><a href=\"#Female\" data-toc-modified-id=\"Female-1.4.2.2\"><span class=\"toc-item-num\">1.4.2.2&nbsp;&nbsp;</span>Female</a></span></li><li><span><a href=\"#Male\" data-toc-modified-id=\"Male-1.4.2.3\"><span class=\"toc-item-num\">1.4.2.3&nbsp;&nbsp;</span>Male</a></span></li></ul></li><li><span><a href=\"#Verified\" data-toc-modified-id=\"Verified-1.4.3\"><span class=\"toc-item-num\">1.4.3&nbsp;&nbsp;</span>Verified</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-all-journalists,-how-many-are-verified?\" data-toc-modified-id=\"Of-all-journalists,-how-many-are-verified?-1.4.3.1\"><span class=\"toc-item-num\">1.4.3.1&nbsp;&nbsp;</span>Of all journalists, how many are verified?</a></span></li><li><span><a href=\"#Of-female-journalists,-how-many-are-verified?\" data-toc-modified-id=\"Of-female-journalists,-how-many-are-verified?-1.4.3.2\"><span class=\"toc-item-num\">1.4.3.2&nbsp;&nbsp;</span>Of female journalists, how many are verified?</a></span></li><li><span><a href=\"#Of-male-journalists,-how-many-are-verified?\" data-toc-modified-id=\"Of-male-journalists,-how-many-are-verified?-1.4.3.3\"><span class=\"toc-item-num\">1.4.3.3&nbsp;&nbsp;</span>Of male journalists, how many are verified?</a></span></li></ul></li></ul></li><li><span><a href=\"#Mention-data-prep\" data-toc-modified-id=\"Mention-data-prep-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>Mention data prep</a></span><ul class=\"toc-item\"><li><span><a href=\"#Load-mentions-from-tweets\" data-toc-modified-id=\"Load-mentions-from-tweets-1.5.1\"><span class=\"toc-item-num\">1.5.1&nbsp;&nbsp;</span>Load mentions from tweets</a></span></li><li><span><a href=\"#Add-gender-of-mentioner\" data-toc-modified-id=\"Add-gender-of-mentioner-1.5.2\"><span class=\"toc-item-num\">1.5.2&nbsp;&nbsp;</span>Add gender of mentioner</a></span></li><li><span><a href=\"#How-many-tweets-have-mentions?\" data-toc-modified-id=\"How-many-tweets-have-mentions?-1.5.3\"><span class=\"toc-item-num\">1.5.3&nbsp;&nbsp;</span>How many tweets have mentions?</a></span></li><li><span><a href=\"#How-many-users-are-mentioned?-(All-users,-not-just-journalists)\" data-toc-modified-id=\"How-many-users-are-mentioned?-(All-users,-not-just-journalists)-1.5.4\"><span class=\"toc-item-num\">1.5.4&nbsp;&nbsp;</span>How many users are mentioned? (All users, not just journalists)</a></span></li><li><span><a href=\"#Limit-to-mentions-of-journalists\" data-toc-modified-id=\"Limit-to-mentions-of-journalists-1.5.5\"><span class=\"toc-item-num\">1.5.5&nbsp;&nbsp;</span>Limit to mentions of journalists</a></span></li><li><span><a href=\"#Functions-for-summarizing-mentions-by-beltway-journalists\" data-toc-modified-id=\"Functions-for-summarizing-mentions-by-beltway-journalists-1.5.6\"><span class=\"toc-item-num\">1.5.6&nbsp;&nbsp;</span>Functions for summarizing mentions by beltway journalists</a></span></li></ul></li><li><span><a href=\"#Mentioned-analysis\" data-toc-modified-id=\"Mentioned-analysis-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>Mentioned analysis</a></span><ul class=\"toc-item\"><li><span><a href=\"#Original-tweets-(since-mentions-are-extracted-from-original-tweets)\" data-toc-modified-id=\"Original-tweets-(since-mentions-are-extracted-from-original-tweets)-1.6.1\"><span class=\"toc-item-num\">1.6.1&nbsp;&nbsp;</span>Original tweets (since mentions are extracted from original tweets)</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-the-original-tweets,-how-many-were-posted-by-male-journalists-/-female-journalists?\" data-toc-modified-id=\"Of-the-original-tweets,-how-many-were-posted-by-male-journalists-/-female-journalists?-1.6.1.1\"><span class=\"toc-item-num\">1.6.1.1&nbsp;&nbsp;</span>Of the original tweets, how many were posted by male journalists / female journalists?</a></span></li><li><span><a href=\"#Who-posted-the-most-original-tweets?\" data-toc-modified-id=\"Who-posted-the-most-original-tweets?-1.6.1.2\"><span class=\"toc-item-num\">1.6.1.2&nbsp;&nbsp;</span>Who posted the most original tweets?</a></span></li><li><span><a href=\"#Mentions-of-all-accounts-(not-just-journalists)\" data-toc-modified-id=\"Mentions-of-all-accounts-(not-just-journalists)-1.6.1.3\"><span class=\"toc-item-num\">1.6.1.3&nbsp;&nbsp;</span>Mentions of all accounts (not just journalists)</a></span></li><li><span><a href=\"#Of-journalists-mentioning-accounts,-which-are-mentioned-the-most?\" data-toc-modified-id=\"Of-journalists-mentioning-accounts,-which-are-mentioned-the-most?-1.6.1.4\"><span class=\"toc-item-num\">1.6.1.4&nbsp;&nbsp;</span>Of journalists mentioning accounts, which are mentioned the most?</a></span></li><li><span><a href=\"#Same,-but-ordered-by-the-number-of-journalists-mentioning-the-account\" data-toc-modified-id=\"Same,-but-ordered-by-the-number-of-journalists-mentioning-the-account-1.6.1.5\"><span class=\"toc-item-num\">1.6.1.5&nbsp;&nbsp;</span>Same, but ordered by the number of journalists mentioning the account</a></span></li></ul></li><li><span><a href=\"#Journalists-mentioning-journalists\" data-toc-modified-id=\"Journalists-mentioning-journalists-1.6.2\"><span class=\"toc-item-num\">1.6.2&nbsp;&nbsp;</span>Journalists mentioning journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-mentioning-journalists,-who-is-mentioned-the-most?\" data-toc-modified-id=\"Of-journalists-mentioning-journalists,-who-is-mentioned-the-most?-1.6.2.1\"><span class=\"toc-item-num\">1.6.2.1&nbsp;&nbsp;</span>Of journalists mentioning journalists, who is mentioned the most?</a></span></li><li><span><a href=\"#Same,-but-ordered-by-number-of-journalists-mentioning\" data-toc-modified-id=\"Same,-but-ordered-by-number-of-journalists-mentioning-1.6.2.2\"><span class=\"toc-item-num\">1.6.2.2&nbsp;&nbsp;</span>Same, but ordered by number of journalists mentioning</a></span></li><li><span><a href=\"#Of-journalists-mentioning-other-journalists,-how-many-are-male-/-female?\" data-toc-modified-id=\"Of-journalists-mentioning-other-journalists,-how-many-are-male-/-female?-1.6.2.3\"><span class=\"toc-item-num\">1.6.2.3&nbsp;&nbsp;</span>Of journalists mentioning other journalists, how many are male / female?</a></span></li><li><span><a href=\"#On-average-how-many-times-are-journalists-mentioned-by-other-journalists?\" data-toc-modified-id=\"On-average-how-many-times-are-journalists-mentioned-by-other-journalists?-1.6.2.4\"><span class=\"toc-item-num\">1.6.2.4&nbsp;&nbsp;</span>On average how many times are journalists mentioned by other journalists?</a></span></li></ul></li><li><span><a href=\"#Journalists-mentioning-female-journalists\" data-toc-modified-id=\"Journalists-mentioning-female-journalists-1.6.3\"><span class=\"toc-item-num\">1.6.3&nbsp;&nbsp;</span>Journalists mentioning female journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-mentioning-female-journalists-who-is-mentioned-the-most?\" data-toc-modified-id=\"Of-journalists-mentioning-female-journalists-who-is-mentioned-the-most?-1.6.3.1\"><span class=\"toc-item-num\">1.6.3.1&nbsp;&nbsp;</span>Of journalists mentioning female journalists who is mentioned the most?</a></span></li><li><span><a href=\"#On-average,-how-many-times-are-female-journalists-mentioned-by-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-are-female-journalists-mentioned-by-journalists?-1.6.3.2\"><span class=\"toc-item-num\">1.6.3.2&nbsp;&nbsp;</span>On average, how many times are female journalists mentioned by journalists?</a></span></li></ul></li><li><span><a href=\"#Journalists-mentioning-male-journalists\" data-toc-modified-id=\"Journalists-mentioning-male-journalists-1.6.4\"><span class=\"toc-item-num\">1.6.4&nbsp;&nbsp;</span>Journalists mentioning male journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-mentioning-male-journalists,-who-do-they-mention-the-most?\" data-toc-modified-id=\"Of-journalists-mentioning-male-journalists,-who-do-they-mention-the-most?-1.6.4.1\"><span class=\"toc-item-num\">1.6.4.1&nbsp;&nbsp;</span>Of journalists mentioning male journalists, who do they mention the most?</a></span></li><li><span><a href=\"#On-average,-how-many-times-are-male-journalists-mentioned-by-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-are-male-journalists-mentioned-by-journalists?-1.6.4.2\"><span class=\"toc-item-num\">1.6.4.2&nbsp;&nbsp;</span>On average, how many times are male journalists mentioned by journalists?</a></span></li></ul></li><li><span><a href=\"#Female-journalists-mentioning-other-journalists\" data-toc-modified-id=\"Female-journalists-mentioning-other-journalists-1.6.5\"><span class=\"toc-item-num\">1.6.5&nbsp;&nbsp;</span>Female journalists mentioning other journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-female-journalists-mentioning-other-journalists,-who-do-they-mention-the-most?\" data-toc-modified-id=\"Of-female-journalists-mentioning-other-journalists,-who-do-they-mention-the-most?-1.6.5.1\"><span class=\"toc-item-num\">1.6.5.1&nbsp;&nbsp;</span>Of female journalists mentioning other journalists, who do they mention the most?</a></span></li><li><span><a href=\"#Of-female-journalists-mentioning-journalists,-how-many-are-male-/-female?\" data-toc-modified-id=\"Of-female-journalists-mentioning-journalists,-how-many-are-male-/-female?-1.6.5.2\"><span class=\"toc-item-num\">1.6.5.2&nbsp;&nbsp;</span>Of female journalists mentioning journalists, how many are male / female?</a></span></li></ul></li><li><span><a href=\"#Male-journalists-mentioning-other-journalists\" data-toc-modified-id=\"Male-journalists-mentioning-other-journalists-1.6.6\"><span class=\"toc-item-num\">1.6.6&nbsp;&nbsp;</span>Male journalists mentioning other journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-male-journalists-mentioning-other-journalists,-who-do-they-mention-the-most?\" data-toc-modified-id=\"Of-male-journalists-mentioning-other-journalists,-who-do-they-mention-the-most?-1.6.6.1\"><span class=\"toc-item-num\">1.6.6.1&nbsp;&nbsp;</span>Of male journalists mentioning other journalists, who do they mention the most?</a></span></li><li><span><a href=\"#Of-male-journalists-mentioning-other-journalists,-how-many-are-male-/-female?\" data-toc-modified-id=\"Of-male-journalists-mentioning-other-journalists,-how-many-are-male-/-female?-1.6.6.2\"><span class=\"toc-item-num\">1.6.6.2&nbsp;&nbsp;</span>Of male journalists mentioning other journalists, how many are male / female?</a></span></li></ul></li></ul></li><li><span><a href=\"#Retweet-data-prep\" data-toc-modified-id=\"Retweet-data-prep-1.7\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>Retweet data prep</a></span><ul class=\"toc-item\"><li><span><a href=\"#Load-retweets-from-tweets\" data-toc-modified-id=\"Load-retweets-from-tweets-1.7.1\"><span class=\"toc-item-num\">1.7.1&nbsp;&nbsp;</span>Load retweets from tweets</a></span></li><li><span><a href=\"#Add-gender-of-retweeter\" data-toc-modified-id=\"Add-gender-of-retweeter-1.7.2\"><span class=\"toc-item-num\">1.7.2&nbsp;&nbsp;</span>Add gender of retweeter</a></span></li><li><span><a href=\"#How-many-users-have-been-retweeted-by-journalists?\" data-toc-modified-id=\"How-many-users-have-been-retweeted-by-journalists?-1.7.3\"><span class=\"toc-item-num\">1.7.3&nbsp;&nbsp;</span>How many users have been retweeted by journalists?</a></span></li><li><span><a href=\"#Limit-to-retweeted-journalists\" data-toc-modified-id=\"Limit-to-retweeted-journalists-1.7.4\"><span class=\"toc-item-num\">1.7.4&nbsp;&nbsp;</span>Limit to retweeted journalists</a></span></li><li><span><a href=\"#Functions-for-summarizing-retweets-by-beltway-journalists\" data-toc-modified-id=\"Functions-for-summarizing-retweets-by-beltway-journalists-1.7.5\"><span class=\"toc-item-num\">1.7.5&nbsp;&nbsp;</span>Functions for summarizing retweets by beltway journalists</a></span></li></ul></li><li><span><a href=\"#Retweet-analysis\" data-toc-modified-id=\"Retweet-analysis-1.8\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>Retweet analysis</a></span><ul class=\"toc-item\"><li><span><a href=\"#Retweets-of-all-accounts-(not-just-journalists)\" data-toc-modified-id=\"Retweets-of-all-accounts-(not-just-journalists)-1.8.1\"><span class=\"toc-item-num\">1.8.1&nbsp;&nbsp;</span>Retweets of all accounts (not just journalists)</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-retweeting-other-accounts,-how-many-of-the-retweets-are-from-males-/-females?\" data-toc-modified-id=\"Of-journalists-retweeting-other-accounts,-how-many-of-the-retweets-are-from-males-/-females?-1.8.1.1\"><span class=\"toc-item-num\">1.8.1.1&nbsp;&nbsp;</span>Of journalists retweeting other accounts, how many of the retweets are from males / females?</a></span></li><li><span><a href=\"#Of-journalists-retweeting-other-accounts,-who-retweets-the-most?\" data-toc-modified-id=\"Of-journalists-retweeting-other-accounts,-who-retweets-the-most?-1.8.1.2\"><span class=\"toc-item-num\">1.8.1.2&nbsp;&nbsp;</span>Of journalists retweeting other accounts, who retweets the most?</a></span></li><li><span><a href=\"#Of-journalists-retweeting-other-accounts,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-journalists-retweeting-other-accounts,-who-is-retweeted-the-most?-1.8.1.3\"><span class=\"toc-item-num\">1.8.1.3&nbsp;&nbsp;</span>Of journalists retweeting other accounts, who is retweeted the most?</a></span></li></ul></li><li><span><a href=\"#Journalists-retweeting-other-journalists\" data-toc-modified-id=\"Journalists-retweeting-other-journalists-1.8.2\"><span class=\"toc-item-num\">1.8.2&nbsp;&nbsp;</span>Journalists retweeting other journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?-1.8.2.1\"><span class=\"toc-item-num\">1.8.2.1&nbsp;&nbsp;</span>Of journalists retweeting other journalists, who is retweeted the most?</a></span></li><li><span><a href=\"#Of-journalists-retweeting-other-journalists,-how-many-of-the-retweets-are-of-males-/-females?\" data-toc-modified-id=\"Of-journalists-retweeting-other-journalists,-how-many-of-the-retweets-are-of-males-/-females?-1.8.2.2\"><span class=\"toc-item-num\">1.8.2.2&nbsp;&nbsp;</span>Of journalists retweeting other journalists, how many of the retweets are of males / females?</a></span></li><li><span><a href=\"#On-average,-how-many-times-are-journalists-retweeted-by-other-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-are-journalists-retweeted-by-other-journalists?-1.8.2.3\"><span class=\"toc-item-num\">1.8.2.3&nbsp;&nbsp;</span>On average, how many times are journalists retweeted by other journalists?</a></span></li></ul></li><li><span><a href=\"#Journalists-retweeting-female-journalists\" data-toc-modified-id=\"Journalists-retweeting-female-journalists-1.8.3\"><span class=\"toc-item-num\">1.8.3&nbsp;&nbsp;</span>Journalists retweeting female journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-retweeting-female-journalists,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-journalists-retweeting-female-journalists,-who-is-retweeted-the-most?-1.8.3.1\"><span class=\"toc-item-num\">1.8.3.1&nbsp;&nbsp;</span>Of journalists retweeting female journalists, who is retweeted the most?</a></span></li><li><span><a href=\"#On-average,-how-many-times-are-female-journalists-retweeted-by-other-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-are-female-journalists-retweeted-by-other-journalists?-1.8.3.2\"><span class=\"toc-item-num\">1.8.3.2&nbsp;&nbsp;</span>On average, how many times are female journalists retweeted by other journalists?</a></span></li></ul></li><li><span><a href=\"#Journalists-retweeting-male-journalists\" data-toc-modified-id=\"Journalists-retweeting-male-journalists-1.8.4\"><span class=\"toc-item-num\">1.8.4&nbsp;&nbsp;</span>Journalists retweeting male journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-journalists-retweeting-male-journalists,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-journalists-retweeting-male-journalists,-who-is-retweeted-the-most?-1.8.4.1\"><span class=\"toc-item-num\">1.8.4.1&nbsp;&nbsp;</span>Of journalists retweeting male journalists, who is retweeted the most?</a></span></li><li><span><a href=\"#On-average,-how-many-times-are-male-journalists-retweeted-by-other-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-are-male-journalists-retweeted-by-other-journalists?-1.8.4.2\"><span class=\"toc-item-num\">1.8.4.2&nbsp;&nbsp;</span>On average, how many times are male journalists retweeted by other journalists?</a></span></li></ul></li><li><span><a href=\"#Female-journalists-retweeting-other-journalists\" data-toc-modified-id=\"Female-journalists-retweeting-other-journalists-1.8.5\"><span class=\"toc-item-num\">1.8.5&nbsp;&nbsp;</span>Female journalists retweeting other journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-female-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-female-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?-1.8.5.1\"><span class=\"toc-item-num\">1.8.5.1&nbsp;&nbsp;</span>Of female journalists retweeting other journalists, who is retweeted the most?</a></span></li><li><span><a href=\"#Of-female-journalists-retweeting-other-journalists,-how-many-are-male-/-female?\" data-toc-modified-id=\"Of-female-journalists-retweeting-other-journalists,-how-many-are-male-/-female?-1.8.5.2\"><span class=\"toc-item-num\">1.8.5.2&nbsp;&nbsp;</span>Of female journalists retweeting other journalists, how many are male / female?</a></span></li><li><span><a href=\"#On-average,-how-many-times-do-female-journalists-retweet-male-/-female-/-all-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-do-female-journalists-retweet-male-/-female-/-all-journalists?-1.8.5.3\"><span class=\"toc-item-num\">1.8.5.3&nbsp;&nbsp;</span>On average, how many times do female journalists retweet male / female / all journalists?</a></span></li></ul></li><li><span><a href=\"#Male-journalists-retweeting-other-journalists\" data-toc-modified-id=\"Male-journalists-retweeting-other-journalists-1.8.6\"><span class=\"toc-item-num\">1.8.6&nbsp;&nbsp;</span>Male journalists retweeting other journalists</a></span><ul class=\"toc-item\"><li><span><a href=\"#Of-male-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?\" data-toc-modified-id=\"Of-male-journalists-retweeting-other-journalists,-who-is-retweeted-the-most?-1.8.6.1\"><span class=\"toc-item-num\">1.8.6.1&nbsp;&nbsp;</span>Of male journalists retweeting other journalists, who is retweeted the most?</a></span></li><li><span><a href=\"#Of-male--journalists-retweeting-other-journalists,-how-many-are-male-/-female?\" data-toc-modified-id=\"Of-male--journalists-retweeting-other-journalists,-how-many-are-male-/-female?-1.8.6.2\"><span class=\"toc-item-num\">1.8.6.2&nbsp;&nbsp;</span>Of male  journalists retweeting other journalists, how many are male / female?</a></span></li><li><span><a href=\"#On-average,-how-many-times-do-male-journalists-retweet-male-/-female-/-all-journalists?\" data-toc-modified-id=\"On-average,-how-many-times-do-male-journalists-retweet-male-/-female-/-all-journalists?-1.8.6.3\"><span class=\"toc-item-num\">1.8.6.3&nbsp;&nbsp;</span>On average, how many times do male journalists retweet male / female / all journalists?</a></span></li></ul></li></ul></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\" style=\"margin-top: 1em;\"><ul class=\"toc-item\"></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gender dynamics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tweet data prep"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the tweets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n",
      "DEBUG:root:Loaded 50000\n",
      "DEBUG:root:Loaded 100000\n",
      "DEBUG:root:Loaded 150000\n",
      "DEBUG:root:Loaded 200000\n",
      "DEBUG:root:Loaded 250000\n",
      "INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n",
      "DEBUG:root:Loaded 300000\n",
      "DEBUG:root:Loaded 350000\n",
      "DEBUG:root:Loaded 400000\n",
      "DEBUG:root:Loaded 450000\n",
      "DEBUG:root:Loaded 500000\n",
      "INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n",
      "DEBUG:root:Loaded 550000\n",
      "DEBUG:root:Loaded 600000\n",
      "DEBUG:root:Loaded 650000\n",
      "DEBUG:root:Loaded 700000\n",
      "DEBUG:root:Loaded 750000\n",
      "DEBUG:root:Loaded 800000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tweet_id            817136\n",
       "user_id             817136\n",
       "screen_name         817136\n",
       "tweet_created_at    817136\n",
       "tweet_type          817136\n",
       "dtype: int64"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import logging\n",
    "from dateutil.parser import parse as date_parse\n",
    "from utils import load_tweet_df, tweet_type\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.DEBUG)\n",
    "\n",
    "# Set float format so doesn't display scientific notation\n",
    "pd.options.display.float_format = '{:20,.2f}'.format\n",
    "\n",
    "def tweet_transform(tweet):\n",
    "    return {\n",
    "        'tweet_id': tweet['id_str'], \n",
    "        'tweet_created_at': date_parse(tweet['created_at']),\n",
    "        'user_id': tweet['user']['id_str'],\n",
    "        'screen_name': tweet['user']['screen_name'],\n",
    "        'tweet_type': tweet_type(tweet)\n",
    "    }\n",
    "\n",
    "tweet_df = load_tweet_df(tweet_transform, ['tweet_id', 'user_id', 'screen_name', 'tweet_created_at', 'tweet_type'], dedupe_columns=['tweet_id'])\n",
    "tweet_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tweet_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>screen_name</th>\n",
       "      <th>tweet_created_at</th>\n",
       "      <th>tweet_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>872631046088601600</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2017-06-08 01:47:08+00:00</td>\n",
       "      <td>retweet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>872610483647516673</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2017-06-08 00:25:26+00:00</td>\n",
       "      <td>retweet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>872609618626826240</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2017-06-08 00:22:00+00:00</td>\n",
       "      <td>retweet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>872605974699311104</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2017-06-08 00:07:31+00:00</td>\n",
       "      <td>retweet</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>872603191518646276</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2017-06-07 23:56:27+00:00</td>\n",
       "      <td>retweet</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             tweet_id    user_id    screen_name          tweet_created_at  \\\n",
       "0  872631046088601600  327862439  jonathanvswan 2017-06-08 01:47:08+00:00   \n",
       "1  872610483647516673  327862439  jonathanvswan 2017-06-08 00:25:26+00:00   \n",
       "2  872609618626826240  327862439  jonathanvswan 2017-06-08 00:22:00+00:00   \n",
       "3  872605974699311104  327862439  jonathanvswan 2017-06-08 00:07:31+00:00   \n",
       "4  872603191518646276  327862439  jonathanvswan 2017-06-07 23:56:27+00:00   \n",
       "\n",
       "  tweet_type  \n",
       "0    retweet  \n",
       "1    retweet  \n",
       "2    retweet  \n",
       "3    retweet  \n",
       "4    retweet  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tweet_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tweet analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What are the first and last tweets in the dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2017-06-01 04:00:01+0000', tz='UTC')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tweet_df.tweet_created_at.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Timestamp('2017-08-01 03:59:58+0000', tz='UTC')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tweet_df.tweet_created_at.max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many retweets, original tweets, replies, and quotes are in dataset?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>retweet</th>\n",
       "      <td>345266</td>\n",
       "      <td>42.3%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>original</th>\n",
       "      <td>233926</td>\n",
       "      <td>28.6%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>reply</th>\n",
       "      <td>126254</td>\n",
       "      <td>15.5%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>quote</th>\n",
       "      <td>111690</td>\n",
       "      <td>13.7%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           count percentage\n",
       "retweet   345266      42.3%\n",
       "original  233926      28.6%\n",
       "reply     126254      15.5%\n",
       "quote     111690      13.7%"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'count':tweet_df.tweet_type.value_counts(), \n",
    "              'percentage':tweet_df.tweet_type.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tweeter data prep"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This comes from the following sources:\n",
    "1. User lookup: These are lists of users exported from SFM. These are the final set of beltway journalists. Accounts that were suspended or deleted have been removed from this list. Also, this list will include users that did not tweet (i.e., have no tweets in dataset).\n",
    "2. Tweets in the dataset: Used to generate tweet counts per tweeter. However, since some beltway journalists may not have tweeted, this may be a subset of the user lookup. Also, it may include the tweets of some users that were later excluded because their accounts were suspended or deleted or determined to not be beltway journalists.\n",
    "3. User info lookup: Information on users that was manually coded in the beltway journalist spreadsheet or looked up from Twitter's API. This includes some accounts that were excluded from data collection for various reasons such as working for a foreign news organization or no longer working as a beltway journalist. Thus, these are a superset of the user lookup.\n",
    "\n",
    "Thus, the tweeter data should include tweet and user info data only from users in the user lookup."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load user lookup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "screen_name    2487\n",
       "dtype: int64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_lookup_filepaths = ('lookups/senate_press_lookup.csv',\n",
    "                         'lookups/periodical_press_lookup.csv',\n",
    "                         'lookups/radio_and_television_lookup.csv')\n",
    "user_lookup_df = pd.concat((pd.read_csv(user_lookup_filepath, usecols=['Uid', 'Token'], dtype={'Uid': str}) for user_lookup_filepath in user_lookup_filepaths))\n",
    "user_lookup_df.set_index('Uid', inplace=True)\n",
    "user_lookup_df.rename(columns={'Token': 'screen_name'}, inplace=True)\n",
    "user_lookup_df.index.names = ['user_id']\n",
    "# Some users may be in multiple lists, so need to drop duplicates\n",
    "user_lookup_df = user_lookup_df[~user_lookup_df.index.duplicated()]\n",
    "\n",
    "user_lookup_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>23455653</th>\n",
       "      <td>abettel</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18580432</th>\n",
       "      <td>b_fung</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>399225358</th>\n",
       "      <td>b_muzz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18834692</th>\n",
       "      <td>becca_milfeld</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             screen_name\n",
       "user_id                 \n",
       "23455653         abettel\n",
       "33919343   AshleyRParker\n",
       "18580432          b_fung\n",
       "399225358         b_muzz\n",
       "18834692   becca_milfeld"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_lookup_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tweets in dataset per tweeter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tweet_type\n",
       "original             2292\n",
       "quote                2292\n",
       "reply                2292\n",
       "retweet              2292\n",
       "tweets_in_dataset    2292\n",
       "dtype: int64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_tweet_count_df = tweet_df[['user_id', 'tweet_type']].groupby(['user_id', 'tweet_type']).size().unstack()\n",
    "user_tweet_count_df.fillna(0, inplace=True)\n",
    "user_tweet_count_df['tweets_in_dataset'] = user_tweet_count_df.original + user_tweet_count_df.quote + user_tweet_count_df.reply + user_tweet_count_df.retweet\n",
    "user_tweet_count_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>tweet_type</th>\n",
       "      <th>original</th>\n",
       "      <th>quote</th>\n",
       "      <th>reply</th>\n",
       "      <th>retweet</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1001991865</th>\n",
       "      <td>13.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>31.00</td>\n",
       "      <td>48.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1002229862</th>\n",
       "      <td>48.00</td>\n",
       "      <td>20.00</td>\n",
       "      <td>3.00</td>\n",
       "      <td>118.00</td>\n",
       "      <td>189.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100270054</th>\n",
       "      <td>1.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100802089</th>\n",
       "      <td>4.00</td>\n",
       "      <td>7.00</td>\n",
       "      <td>12.00</td>\n",
       "      <td>17.00</td>\n",
       "      <td>40.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100860790</th>\n",
       "      <td>102.00</td>\n",
       "      <td>26.00</td>\n",
       "      <td>4.00</td>\n",
       "      <td>166.00</td>\n",
       "      <td>298.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "tweet_type             original                quote                reply  \\\n",
       "user_id                                                                     \n",
       "1001991865                13.00                 3.00                 1.00   \n",
       "1002229862                48.00                20.00                 3.00   \n",
       "100270054                  1.00                 0.00                 0.00   \n",
       "100802089                  4.00                 7.00                12.00   \n",
       "100860790                102.00                26.00                 4.00   \n",
       "\n",
       "tweet_type              retweet    tweets_in_dataset  \n",
       "user_id                                               \n",
       "1001991865                31.00                48.00  \n",
       "1002229862               118.00               189.00  \n",
       "100270054                  0.00                 1.00  \n",
       "100802089                 17.00                40.00  \n",
       "100860790                166.00               298.00  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_tweet_count_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load user info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "name               2506\n",
       "organization       2477\n",
       "position           2503\n",
       "gender             2505\n",
       "followers_count    2506\n",
       "following_count    2506\n",
       "tweet_count        2506\n",
       "user_created_at    2506\n",
       "verified           2506\n",
       "protected          2506\n",
       "dtype: int64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_df = pd.read_csv('source_data/user_info_lookup.csv', names=['user_id', 'name', 'organization', 'position',\n",
    "                                            'gender', 'followers_count', 'following_count', 'tweet_count',\n",
    "                                            'user_created_at', 'verified', 'protected'],\n",
    "                          dtype={'user_id': str}).set_index(['user_id'])\n",
    "user_info_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>position</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>following_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>user_created_at</th>\n",
       "      <th>verified</th>\n",
       "      <th>protected</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20711445</th>\n",
       "      <td>Glinski, Nina</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Freelance Reporter</td>\n",
       "      <td>F</td>\n",
       "      <td>963</td>\n",
       "      <td>507</td>\n",
       "      <td>909</td>\n",
       "      <td>Thu Feb 12 20:00:53 +0000 2009</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>258917371</th>\n",
       "      <td>Enders, David</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Journalist</td>\n",
       "      <td>M</td>\n",
       "      <td>1444</td>\n",
       "      <td>484</td>\n",
       "      <td>6296</td>\n",
       "      <td>Mon Feb 28 19:52:03 +0000 2011</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>297046834</th>\n",
       "      <td>Barakat, Matthew</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>Northern Virginia Correspondent</td>\n",
       "      <td>M</td>\n",
       "      <td>759</td>\n",
       "      <td>352</td>\n",
       "      <td>631</td>\n",
       "      <td>Wed May 11 20:55:24 +0000 2011</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>455585786</th>\n",
       "      <td>Atkins, Kimberly</td>\n",
       "      <td>Boston Herald</td>\n",
       "      <td>Chief Washington Reporter/Columnist</td>\n",
       "      <td>F</td>\n",
       "      <td>2944</td>\n",
       "      <td>2691</td>\n",
       "      <td>6277</td>\n",
       "      <td>Thu Jan 05 08:26:46 +0000 2012</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42584840</th>\n",
       "      <td>Vlahou, Toula</td>\n",
       "      <td>CQ Roll Call</td>\n",
       "      <td>Editor &amp; Podcast Producer</td>\n",
       "      <td>F</td>\n",
       "      <td>2703</td>\n",
       "      <td>201</td>\n",
       "      <td>6366</td>\n",
       "      <td>Tue May 26 07:41:38 +0000 2009</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       name      organization  \\\n",
       "user_id                                         \n",
       "20711445      Glinski, Nina               NaN   \n",
       "258917371     Enders, David               NaN   \n",
       "297046834  Barakat, Matthew  Associated Press   \n",
       "455585786  Atkins, Kimberly     Boston Herald   \n",
       "42584840      Vlahou, Toula      CQ Roll Call   \n",
       "\n",
       "                                      position gender  followers_count  \\\n",
       "user_id                                                                  \n",
       "20711445                    Freelance Reporter      F              963   \n",
       "258917371                           Journalist      M             1444   \n",
       "297046834      Northern Virginia Correspondent      M              759   \n",
       "455585786  Chief Washington Reporter/Columnist      F             2944   \n",
       "42584840             Editor & Podcast Producer      F             2703   \n",
       "\n",
       "           following_count  tweet_count                 user_created_at  \\\n",
       "user_id                                                                   \n",
       "20711445               507          909  Thu Feb 12 20:00:53 +0000 2009   \n",
       "258917371              484         6296  Mon Feb 28 19:52:03 +0000 2011   \n",
       "297046834              352          631  Wed May 11 20:55:24 +0000 2011   \n",
       "455585786             2691         6277  Thu Jan 05 08:26:46 +0000 2012   \n",
       "42584840               201         6366  Tue May 26 07:41:38 +0000 2009   \n",
       "\n",
       "           verified  protected  \n",
       "user_id                         \n",
       "20711445      False      False  \n",
       "258917371      True      False  \n",
       "297046834      True      False  \n",
       "455585786      True      False  \n",
       "42584840      False      False  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_info_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "screen_name          2487\n",
       "name                 2487\n",
       "organization         2487\n",
       "position             2484\n",
       "gender               2486\n",
       "followers_count      2487\n",
       "following_count      2487\n",
       "tweet_count          2487\n",
       "user_created_at      2487\n",
       "verified             2487\n",
       "protected            2487\n",
       "original             2487\n",
       "quote                2487\n",
       "reply                2487\n",
       "retweet              2487\n",
       "tweets_in_dataset    2487\n",
       "dtype: int64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df = user_lookup_df.join((user_info_df, user_tweet_count_df), how='left')\n",
    "# Fill Nans\n",
    "user_summary_df['organization'].fillna('', inplace=True)\n",
    "user_summary_df['original'].fillna(0, inplace=True)\n",
    "user_summary_df['quote'].fillna(0, inplace=True)\n",
    "user_summary_df['reply'].fillna(0, inplace=True)\n",
    "user_summary_df['retweet'].fillna(0, inplace=True)\n",
    "user_summary_df['tweets_in_dataset'].fillna(0, inplace=True)\n",
    "user_summary_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>position</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>following_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>user_created_at</th>\n",
       "      <th>verified</th>\n",
       "      <th>protected</th>\n",
       "      <th>original</th>\n",
       "      <th>quote</th>\n",
       "      <th>reply</th>\n",
       "      <th>retweet</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>23455653</th>\n",
       "      <td>abettel</td>\n",
       "      <td>Bettelheim, Adriel</td>\n",
       "      <td>Politico</td>\n",
       "      <td>Health Care Editor</td>\n",
       "      <td>F</td>\n",
       "      <td>2664</td>\n",
       "      <td>1055</td>\n",
       "      <td>15990</td>\n",
       "      <td>Mon Mar 09 16:32:20 +0000 2009</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>289.00</td>\n",
       "      <td>12.00</td>\n",
       "      <td>6.00</td>\n",
       "      <td>52.00</td>\n",
       "      <td>359.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>White House Reporter</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>2342</td>\n",
       "      <td>12433</td>\n",
       "      <td>Tue Apr 21 14:28:57 +0000 2009</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>172.00</td>\n",
       "      <td>67.00</td>\n",
       "      <td>11.00</td>\n",
       "      <td>120.00</td>\n",
       "      <td>370.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18580432</th>\n",
       "      <td>b_fung</td>\n",
       "      <td>Fung, Brian</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>Tech Reporter</td>\n",
       "      <td>M</td>\n",
       "      <td>16558</td>\n",
       "      <td>2062</td>\n",
       "      <td>44799</td>\n",
       "      <td>Sat Jan 03 15:15:57 +0000 2009</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>257.00</td>\n",
       "      <td>85.00</td>\n",
       "      <td>205.00</td>\n",
       "      <td>82.00</td>\n",
       "      <td>629.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>399225358</th>\n",
       "      <td>b_muzz</td>\n",
       "      <td>Murray, Brendan</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>Managing Editor, U.S. Economy</td>\n",
       "      <td>M</td>\n",
       "      <td>624</td>\n",
       "      <td>382</td>\n",
       "      <td>360</td>\n",
       "      <td>Thu Oct 27 05:34:05 +0000 2011</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>3.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>5.00</td>\n",
       "      <td>8.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18834692</th>\n",
       "      <td>becca_milfeld</td>\n",
       "      <td>Milfeld, Becca</td>\n",
       "      <td>Agence France-Presse</td>\n",
       "      <td>English Desk Editor and Journalist</td>\n",
       "      <td>F</td>\n",
       "      <td>483</td>\n",
       "      <td>993</td>\n",
       "      <td>1484</td>\n",
       "      <td>Sat Jan 10 13:58:43 +0000 2009</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>3.00</td>\n",
       "      <td>14.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>7.00</td>\n",
       "      <td>24.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             screen_name                name          organization  \\\n",
       "user_id                                                              \n",
       "23455653         abettel  Bettelheim, Adriel              Politico   \n",
       "33919343   AshleyRParker      Parker, Ashley       Washington Post   \n",
       "18580432          b_fung         Fung, Brian       Washington Post   \n",
       "399225358         b_muzz     Murray, Brendan        Bloomberg News   \n",
       "18834692   becca_milfeld      Milfeld, Becca  Agence France-Presse   \n",
       "\n",
       "                                     position gender  followers_count  \\\n",
       "user_id                                                                 \n",
       "23455653                   Health Care Editor      F             2664   \n",
       "33919343                 White House Reporter      F           122382   \n",
       "18580432                        Tech Reporter      M            16558   \n",
       "399225358       Managing Editor, U.S. Economy      M              624   \n",
       "18834692   English Desk Editor and Journalist      F              483   \n",
       "\n",
       "           following_count  tweet_count                 user_created_at  \\\n",
       "user_id                                                                   \n",
       "23455653              1055        15990  Mon Mar 09 16:32:20 +0000 2009   \n",
       "33919343              2342        12433  Tue Apr 21 14:28:57 +0000 2009   \n",
       "18580432              2062        44799  Sat Jan 03 15:15:57 +0000 2009   \n",
       "399225358              382          360  Thu Oct 27 05:34:05 +0000 2011   \n",
       "18834692               993         1484  Sat Jan 10 13:58:43 +0000 2009   \n",
       "\n",
       "           verified  protected             original                quote  \\\n",
       "user_id                                                                    \n",
       "23455653       True      False               289.00                12.00   \n",
       "33919343       True      False               172.00                67.00   \n",
       "18580432       True      False               257.00                85.00   \n",
       "399225358      True      False                 3.00                 0.00   \n",
       "18834692      False      False                 3.00                14.00   \n",
       "\n",
       "                         reply              retweet    tweets_in_dataset  \n",
       "user_id                                                                   \n",
       "23455653                  6.00                52.00               359.00  \n",
       "33919343                 11.00               120.00               370.00  \n",
       "18580432                205.00                82.00               629.00  \n",
       "399225358                 0.00                 5.00                 8.00  \n",
       "18834692                  0.00                 7.00                24.00  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### Remove users with no tweets in dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "screen_name          195\n",
       "name                 195\n",
       "organization         195\n",
       "position             195\n",
       "gender               194\n",
       "followers_count      195\n",
       "following_count      195\n",
       "tweet_count          195\n",
       "user_created_at      195\n",
       "verified             195\n",
       "protected            195\n",
       "original             195\n",
       "quote                195\n",
       "reply                195\n",
       "retweet              195\n",
       "tweets_in_dataset    195\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df[user_summary_df.tweets_in_dataset == 0].count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "screen_name          2292\n",
       "name                 2292\n",
       "organization         2292\n",
       "position             2289\n",
       "gender               2292\n",
       "followers_count      2292\n",
       "following_count      2292\n",
       "tweet_count          2292\n",
       "user_created_at      2292\n",
       "verified             2292\n",
       "protected            2292\n",
       "original             2292\n",
       "quote                2292\n",
       "reply                2292\n",
       "retweet              2292\n",
       "tweets_in_dataset    2292\n",
       "dtype: int64"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df = user_summary_df[user_summary_df.tweets_in_dataset != 0]\n",
    "user_summary_df.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tweeter analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many of the journalists are male / female?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>1299</td>\n",
       "      <td>56.7%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>993</td>\n",
       "      <td>43.3%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   count percentage\n",
       "M   1299      56.7%\n",
       "F    993      43.3%"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_gender_summary_df = pd.DataFrame({'count':user_summary_df.gender.value_counts(), 'percentage':user_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
    "journalist_gender_summary_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Summary\n",
    "\n",
    "* 25%, 50%, 75% are the percentiles. (Min is equivalent to 0%. Max is equivalent to 100%. 50% is the median.)\n",
    "* std is standard deviation, normalized by N-1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### All"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>followers_count</th>\n",
       "      <th>following_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>original</th>\n",
       "      <th>quote</th>\n",
       "      <th>reply</th>\n",
       "      <th>retweet</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "      <td>2,292.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>16,467.62</td>\n",
       "      <td>1,444.83</td>\n",
       "      <td>9,619.69</td>\n",
       "      <td>102.06</td>\n",
       "      <td>48.73</td>\n",
       "      <td>55.08</td>\n",
       "      <td>150.64</td>\n",
       "      <td>356.52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>91,886.90</td>\n",
       "      <td>3,003.00</td>\n",
       "      <td>16,618.09</td>\n",
       "      <td>169.43</td>\n",
       "      <td>135.90</td>\n",
       "      <td>249.18</td>\n",
       "      <td>585.08</td>\n",
       "      <td>833.76</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>6.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>831.75</td>\n",
       "      <td>505.75</td>\n",
       "      <td>1,449.50</td>\n",
       "      <td>10.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>8.00</td>\n",
       "      <td>32.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2,419.50</td>\n",
       "      <td>998.50</td>\n",
       "      <td>4,211.50</td>\n",
       "      <td>41.00</td>\n",
       "      <td>9.00</td>\n",
       "      <td>5.00</td>\n",
       "      <td>39.00</td>\n",
       "      <td>122.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>7,348.75</td>\n",
       "      <td>1,713.50</td>\n",
       "      <td>10,817.25</td>\n",
       "      <td>124.25</td>\n",
       "      <td>43.00</td>\n",
       "      <td>30.00</td>\n",
       "      <td>129.00</td>\n",
       "      <td>375.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2,176,578.00</td>\n",
       "      <td>96,194.00</td>\n",
       "      <td>208,763.00</td>\n",
       "      <td>2,693.00</td>\n",
       "      <td>3,069.00</td>\n",
       "      <td>9,033.00</td>\n",
       "      <td>21,524.00</td>\n",
       "      <td>21,547.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           followers_count      following_count          tweet_count  \\\n",
       "count             2,292.00             2,292.00             2,292.00   \n",
       "mean             16,467.62             1,444.83             9,619.69   \n",
       "std              91,886.90             3,003.00            16,618.09   \n",
       "min                   6.00                 0.00                 1.00   \n",
       "25%                 831.75               505.75             1,449.50   \n",
       "50%               2,419.50               998.50             4,211.50   \n",
       "75%               7,348.75             1,713.50            10,817.25   \n",
       "max           2,176,578.00            96,194.00           208,763.00   \n",
       "\n",
       "                  original                quote                reply  \\\n",
       "count             2,292.00             2,292.00             2,292.00   \n",
       "mean                102.06                48.73                55.08   \n",
       "std                 169.43               135.90               249.18   \n",
       "min                   0.00                 0.00                 0.00   \n",
       "25%                  10.00                 1.00                 1.00   \n",
       "50%                  41.00                 9.00                 5.00   \n",
       "75%                 124.25                43.00                30.00   \n",
       "max               2,693.00             3,069.00             9,033.00   \n",
       "\n",
       "                   retweet    tweets_in_dataset  \n",
       "count             2,292.00             2,292.00  \n",
       "mean                150.64               356.52  \n",
       "std                 585.08               833.76  \n",
       "min                   0.00                 1.00  \n",
       "25%                   8.00                32.00  \n",
       "50%                  39.00               122.00  \n",
       "75%                 129.00               375.00  \n",
       "max              21,524.00            21,547.00  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df[['followers_count', 'following_count', 'tweet_count', 'original', 'quote', 'reply', 'retweet', 'tweets_in_dataset']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Female"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>followers_count</th>\n",
       "      <th>following_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>original</th>\n",
       "      <th>quote</th>\n",
       "      <th>reply</th>\n",
       "      <th>retweet</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>11,609.53</td>\n",
       "      <td>1,314.07</td>\n",
       "      <td>7,498.74</td>\n",
       "      <td>83.84</td>\n",
       "      <td>39.27</td>\n",
       "      <td>32.06</td>\n",
       "      <td>135.55</td>\n",
       "      <td>290.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>65,563.72</td>\n",
       "      <td>1,250.56</td>\n",
       "      <td>11,312.72</td>\n",
       "      <td>124.86</td>\n",
       "      <td>135.05</td>\n",
       "      <td>94.73</td>\n",
       "      <td>724.92</td>\n",
       "      <td>833.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>6.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>825.00</td>\n",
       "      <td>567.00</td>\n",
       "      <td>1,393.00</td>\n",
       "      <td>8.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>9.00</td>\n",
       "      <td>32.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2,327.00</td>\n",
       "      <td>1,034.00</td>\n",
       "      <td>4,055.00</td>\n",
       "      <td>39.00</td>\n",
       "      <td>9.00</td>\n",
       "      <td>4.00</td>\n",
       "      <td>37.00</td>\n",
       "      <td>111.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>6,340.00</td>\n",
       "      <td>1,659.00</td>\n",
       "      <td>8,983.00</td>\n",
       "      <td>111.00</td>\n",
       "      <td>33.00</td>\n",
       "      <td>21.00</td>\n",
       "      <td>115.00</td>\n",
       "      <td>314.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1,388,543.00</td>\n",
       "      <td>18,197.00</td>\n",
       "      <td>118,713.00</td>\n",
       "      <td>1,440.00</td>\n",
       "      <td>3,069.00</td>\n",
       "      <td>1,458.00</td>\n",
       "      <td>21,524.00</td>\n",
       "      <td>21,547.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           followers_count      following_count          tweet_count  \\\n",
       "count               993.00               993.00               993.00   \n",
       "mean             11,609.53             1,314.07             7,498.74   \n",
       "std              65,563.72             1,250.56            11,312.72   \n",
       "min                   6.00                 1.00                 1.00   \n",
       "25%                 825.00               567.00             1,393.00   \n",
       "50%               2,327.00             1,034.00             4,055.00   \n",
       "75%               6,340.00             1,659.00             8,983.00   \n",
       "max           1,388,543.00            18,197.00           118,713.00   \n",
       "\n",
       "                  original                quote                reply  \\\n",
       "count               993.00               993.00               993.00   \n",
       "mean                 83.84                39.27                32.06   \n",
       "std                 124.86               135.05                94.73   \n",
       "min                   0.00                 0.00                 0.00   \n",
       "25%                   8.00                 1.00                 1.00   \n",
       "50%                  39.00                 9.00                 4.00   \n",
       "75%                 111.00                33.00                21.00   \n",
       "max               1,440.00             3,069.00             1,458.00   \n",
       "\n",
       "                   retweet    tweets_in_dataset  \n",
       "count               993.00               993.00  \n",
       "mean                135.55               290.72  \n",
       "std                 724.92               833.07  \n",
       "min                   0.00                 1.00  \n",
       "25%                   9.00                32.00  \n",
       "50%                  37.00               111.00  \n",
       "75%                 115.00               314.00  \n",
       "max              21,524.00            21,547.00  "
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df[user_summary_df.gender == 'F'][['followers_count', 'following_count', 'tweet_count', 'original', 'quote', 'reply', 'retweet', 'tweets_in_dataset']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Male"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>followers_count</th>\n",
       "      <th>following_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>original</th>\n",
       "      <th>quote</th>\n",
       "      <th>reply</th>\n",
       "      <th>retweet</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>20,181.31</td>\n",
       "      <td>1,544.78</td>\n",
       "      <td>11,241.02</td>\n",
       "      <td>115.99</td>\n",
       "      <td>55.96</td>\n",
       "      <td>72.69</td>\n",
       "      <td>162.17</td>\n",
       "      <td>406.81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>107,635.37</td>\n",
       "      <td>3,833.89</td>\n",
       "      <td>19,584.46</td>\n",
       "      <td>195.72</td>\n",
       "      <td>136.16</td>\n",
       "      <td>319.41</td>\n",
       "      <td>449.75</td>\n",
       "      <td>831.10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>10.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>5.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>857.50</td>\n",
       "      <td>472.00</td>\n",
       "      <td>1,477.00</td>\n",
       "      <td>12.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>6.00</td>\n",
       "      <td>33.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2,498.00</td>\n",
       "      <td>953.00</td>\n",
       "      <td>4,401.00</td>\n",
       "      <td>44.00</td>\n",
       "      <td>9.00</td>\n",
       "      <td>6.00</td>\n",
       "      <td>40.00</td>\n",
       "      <td>131.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>8,341.50</td>\n",
       "      <td>1,763.00</td>\n",
       "      <td>12,584.50</td>\n",
       "      <td>140.00</td>\n",
       "      <td>50.50</td>\n",
       "      <td>38.50</td>\n",
       "      <td>142.00</td>\n",
       "      <td>428.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>2,176,578.00</td>\n",
       "      <td>96,194.00</td>\n",
       "      <td>208,763.00</td>\n",
       "      <td>2,693.00</td>\n",
       "      <td>1,955.00</td>\n",
       "      <td>9,033.00</td>\n",
       "      <td>7,528.00</td>\n",
       "      <td>11,432.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           followers_count      following_count          tweet_count  \\\n",
       "count             1,299.00             1,299.00             1,299.00   \n",
       "mean             20,181.31             1,544.78            11,241.02   \n",
       "std             107,635.37             3,833.89            19,584.46   \n",
       "min                  10.00                 0.00                 5.00   \n",
       "25%                 857.50               472.00             1,477.00   \n",
       "50%               2,498.00               953.00             4,401.00   \n",
       "75%               8,341.50             1,763.00            12,584.50   \n",
       "max           2,176,578.00            96,194.00           208,763.00   \n",
       "\n",
       "                  original                quote                reply  \\\n",
       "count             1,299.00             1,299.00             1,299.00   \n",
       "mean                115.99                55.96                72.69   \n",
       "std                 195.72               136.16               319.41   \n",
       "min                   0.00                 0.00                 0.00   \n",
       "25%                  12.00                 0.00                 1.00   \n",
       "50%                  44.00                 9.00                 6.00   \n",
       "75%                 140.00                50.50                38.50   \n",
       "max               2,693.00             1,955.00             9,033.00   \n",
       "\n",
       "                   retweet    tweets_in_dataset  \n",
       "count             1,299.00             1,299.00  \n",
       "mean                162.17               406.81  \n",
       "std                 449.75               831.10  \n",
       "min                   0.00                 1.00  \n",
       "25%                   6.00                33.00  \n",
       "50%                  40.00               131.00  \n",
       "75%                 142.00               428.00  \n",
       "max               7,528.00            11,432.00  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df[user_summary_df.gender == 'M'][['followers_count', 'following_count', 'tweet_count', 'original', 'quote', 'reply', 'retweet', 'tweets_in_dataset']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Verified"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of all journalists, how many are verified?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>True</th>\n",
       "      <td>1240</td>\n",
       "      <td>54.1%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>False</th>\n",
       "      <td>1052</td>\n",
       "      <td>45.9%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage\n",
       "True    1240      54.1%\n",
       "False   1052      45.9%"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'count':user_summary_df.verified.value_counts(), 'percentage':user_summary_df.verified.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of female journalists, how many are verified?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>True</th>\n",
       "      <td>512</td>\n",
       "      <td>51.6%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>False</th>\n",
       "      <td>481</td>\n",
       "      <td>48.4%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage\n",
       "True     512      51.6%\n",
       "False    481      48.4%"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'count':user_summary_df[user_summary_df.gender == 'F'].verified.value_counts(), 'percentage':user_summary_df[user_summary_df.gender == 'F'].verified.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of male journalists, how many are verified?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>True</th>\n",
       "      <td>728</td>\n",
       "      <td>56.0%</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>False</th>\n",
       "      <td>571</td>\n",
       "      <td>44.0%</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage\n",
       "True     728      56.0%\n",
       "False    571      44.0%"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'count':user_summary_df[user_summary_df.gender == 'M'].verified.value_counts(), 'percentage':user_summary_df[user_summary_df.gender == 'M'].verified.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mention data prep"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load mentions from tweets\n",
    "Including original tweets only"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n",
      "DEBUG:root:Loaded 50000\n",
      "DEBUG:root:Loaded 100000\n",
      "DEBUG:root:Loaded 150000\n",
      "DEBUG:root:Loaded 200000\n",
      "DEBUG:root:Loaded 250000\n",
      "INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n",
      "DEBUG:root:Loaded 300000\n",
      "DEBUG:root:Loaded 350000\n",
      "DEBUG:root:Loaded 400000\n",
      "DEBUG:root:Loaded 450000\n",
      "DEBUG:root:Loaded 500000\n",
      "INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n",
      "DEBUG:root:Loaded 550000\n",
      "DEBUG:root:Loaded 600000\n",
      "DEBUG:root:Loaded 650000\n",
      "DEBUG:root:Loaded 700000\n",
      "DEBUG:root:Loaded 750000\n",
      "DEBUG:root:Loaded 800000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tweet_id               118210\n",
       "user_id                118210\n",
       "screen_name            118210\n",
       "mention_user_id        118210\n",
       "mention_screen_name    118210\n",
       "tweet_created_at       118210\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import logging\n",
    "from dateutil.parser import parse as date_parse\n",
    "from utils import load_tweet_df, tweet_type\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "logger = logging.getLogger()\n",
    "logger.setLevel(logging.DEBUG)\n",
    "\n",
    "# Set float format so doesn't display scientific notation\n",
    "pd.options.display.float_format = '{:20,.2f}'.format\n",
    "\n",
    "# Simply the tweet on load\n",
    "def mention_transform(tweet):\n",
    "    mentions = []\n",
    "    if tweet_type(tweet) == 'original':\n",
    "        for mention in tweet.get('entities', {}).get('user_mentions', []):\n",
    "            mentions.append({\n",
    "                'tweet_id': tweet['id_str'],\n",
    "                'user_id': tweet['user']['id_str'],\n",
    "                'screen_name': tweet['user']['screen_name'],\n",
    "                'mention_user_id': mention['id_str'],\n",
    "                'mention_screen_name': mention['screen_name'],\n",
    "                'tweet_created_at': date_parse(tweet['created_at'])\n",
    "            })\n",
    "    return mentions\n",
    "\n",
    "base_mention_df = load_tweet_df(mention_transform, ['tweet_id', 'user_id', 'screen_name', 'mention_user_id',\n",
    "                                           'mention_screen_name', 'tweet_created_at'], \n",
    "                           dedupe_columns=['tweet_id', 'mention_user_id'])\n",
    "base_mention_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tweet_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>screen_name</th>\n",
       "      <th>mention_user_id</th>\n",
       "      <th>mention_screen_name</th>\n",
       "      <th>tweet_created_at</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>872522339962978307</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>800707492346925056</td>\n",
       "      <td>axios</td>\n",
       "      <td>2017-06-07 18:35:11+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>872484939530461184</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>17494010</td>\n",
       "      <td>SenSchumer</td>\n",
       "      <td>2017-06-07 16:06:34+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>872475140575170562</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>2836421</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>2017-06-07 15:27:37+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>872475140575170562</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>800707492346925056</td>\n",
       "      <td>axios</td>\n",
       "      <td>2017-06-07 15:27:37+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>872459457946673154</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>800707492346925056</td>\n",
       "      <td>axios</td>\n",
       "      <td>2017-06-07 14:25:18+00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             tweet_id    user_id    screen_name     mention_user_id  \\\n",
       "0  872522339962978307  327862439  jonathanvswan  800707492346925056   \n",
       "1  872484939530461184  327862439  jonathanvswan            17494010   \n",
       "2  872475140575170562  327862439  jonathanvswan             2836421   \n",
       "3  872475140575170562  327862439  jonathanvswan  800707492346925056   \n",
       "4  872459457946673154  327862439  jonathanvswan  800707492346925056   \n",
       "\n",
       "  mention_screen_name          tweet_created_at  \n",
       "0               axios 2017-06-07 18:35:11+00:00  \n",
       "1          SenSchumer 2017-06-07 16:06:34+00:00  \n",
       "2               MSNBC 2017-06-07 15:27:37+00:00  \n",
       "3               axios 2017-06-07 15:27:37+00:00  \n",
       "4               axios 2017-06-07 14:25:18+00:00  "
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base_mention_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add gender of mentioner"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tweet_id               118210\n",
       "user_id                118210\n",
       "screen_name            118210\n",
       "mention_user_id        118210\n",
       "mention_screen_name    118210\n",
       "tweet_created_at       118210\n",
       "gender                 118210\n",
       "dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mention_df = base_mention_df.join(user_summary_df['gender'], on='user_id')\n",
    "mention_df.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many tweets have mentions?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "84942"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mention_df['tweet_id'].unique().size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many users are mentioned? (All users, not just journalists)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "17730"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mention_df['mention_user_id'].unique().size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Limit to mentions of journalists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tweet_id               14298\n",
       "user_id                14298\n",
       "screen_name            14298\n",
       "mention_user_id        14298\n",
       "mention_screen_name    14298\n",
       "tweet_created_at       14298\n",
       "gender                 14298\n",
       "mention_gender         14298\n",
       "dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mention_df = mention_df.join(user_summary_df['gender'], how='inner', on='mention_user_id', rsuffix='_mention')\n",
    "journalists_mention_df.rename(columns = {'gender_mention': 'mention_gender'}, inplace=True)\n",
    "journalists_mention_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tweet_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>screen_name</th>\n",
       "      <th>mention_user_id</th>\n",
       "      <th>mention_screen_name</th>\n",
       "      <th>tweet_created_at</th>\n",
       "      <th>gender</th>\n",
       "      <th>mention_gender</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>870408075878027268</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>16031927</td>\n",
       "      <td>greta</td>\n",
       "      <td>2017-06-01 22:33:51+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>283</th>\n",
       "      <td>872581449861541893</td>\n",
       "      <td>19847765</td>\n",
       "      <td>sahilkapur</td>\n",
       "      <td>16031927</td>\n",
       "      <td>greta</td>\n",
       "      <td>2017-06-07 22:30:04+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2202</th>\n",
       "      <td>872578055910371328</td>\n",
       "      <td>21252618</td>\n",
       "      <td>JakeSherman</td>\n",
       "      <td>16031927</td>\n",
       "      <td>greta</td>\n",
       "      <td>2017-06-07 22:16:34+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15977</th>\n",
       "      <td>880841069243629568</td>\n",
       "      <td>70511174</td>\n",
       "      <td>Hadas_Gold</td>\n",
       "      <td>16031927</td>\n",
       "      <td>greta</td>\n",
       "      <td>2017-06-30 17:30:50+00:00</td>\n",
       "      <td>F</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17258</th>\n",
       "      <td>880183952018886661</td>\n",
       "      <td>90077282</td>\n",
       "      <td>politicoalex</td>\n",
       "      <td>16031927</td>\n",
       "      <td>greta</td>\n",
       "      <td>2017-06-28 21:59:41+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>F</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 tweet_id    user_id    screen_name mention_user_id  \\\n",
       "16     870408075878027268  327862439  jonathanvswan        16031927   \n",
       "283    872581449861541893   19847765     sahilkapur        16031927   \n",
       "2202   872578055910371328   21252618    JakeSherman        16031927   \n",
       "15977  880841069243629568   70511174     Hadas_Gold        16031927   \n",
       "17258  880183952018886661   90077282   politicoalex        16031927   \n",
       "\n",
       "      mention_screen_name          tweet_created_at gender mention_gender  \n",
       "16                  greta 2017-06-01 22:33:51+00:00      M              F  \n",
       "283                 greta 2017-06-07 22:30:04+00:00      M              F  \n",
       "2202                greta 2017-06-07 22:16:34+00:00      M              F  \n",
       "15977               greta 2017-06-30 17:30:50+00:00      F              F  \n",
       "17258               greta 2017-06-28 21:59:41+00:00      M              F  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mention_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Functions for summarizing mentions by beltway journalists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Gender of beltway journalists mentioned by beltway journalists\n",
    "def journalist_mention_gender_summary(mention_df):\n",
    "    gender_summary_df = pd.DataFrame({'count': mention_df.mention_gender.value_counts(), \n",
    "                  'percentage': mention_df.mention_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
    "    gender_summary_df.reset_index(inplace=True)\n",
    "    gender_summary_df['avg_mentions'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1)    \n",
    "    gender_summary_df.set_index('index', inplace=True, drop=True)\n",
    "    return gender_summary_df\n",
    "\n",
    "def journalist_mention_summary(mention_df):\n",
    "    # Mention count\n",
    "    mention_count_df = pd.DataFrame(mention_df.mention_user_id.value_counts().rename('mention_count'))\n",
    "\n",
    "    # Mentioning users. That is, the number of unique users mentioning each user.\n",
    "    mention_user_id_per_user_df = mention_df[['mention_user_id', 'user_id']].drop_duplicates()\n",
    "    mentioning_user_count_df = pd.DataFrame(mention_user_id_per_user_df.groupby('mention_user_id').size(), columns=['mentioning_count'])\n",
    "    mentioning_user_count_df.index.name = 'user_id'\n",
    "\n",
    "    # Join with user summary\n",
    "    journalist_mention_summary_df = user_summary_df.join([mention_count_df, mentioning_user_count_df])\n",
    "    journalist_mention_summary_df.fillna(0, inplace=True)\n",
    "    journalist_mention_summary_df = journalist_mention_summary_df.sort_values(['mention_count', 'mentioning_count', 'followers_count'], ascending=False)\n",
    "    return journalist_mention_summary_df\n",
    "\n",
    "# Gender of top journalists mentioned by beltway journalists\n",
    "def top_journalist_mention_gender_summary(mention_summary_df, mentioning_count_threshold=0, head=100):\n",
    "    top_mention_summary_df = mention_summary_df[mention_summary_df.mentioning_count > mentioning_count_threshold].head(head)\n",
    "    return pd.DataFrame({'count': top_mention_summary_df.gender.value_counts(), \n",
    "                  'percentage': top_mention_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
    "\n",
    "\n",
    "# Fields for displaying journalist mention summaries\n",
    "journalist_mention_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'mention_count', 'mentioning_count']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Mentioned analysis\n",
    "*Note that for each of these, the complete list is being written to CSV in the output directory.*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Original tweets (since mentions are extracted from original tweets)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of the original tweets, how many were posted by male journalists / female journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>original</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_original</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>83,251.00</td>\n",
       "      <td>35.6%</td>\n",
       "      <td>83.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>150,675.00</td>\n",
       "      <td>64.4%</td>\n",
       "      <td>115.99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   original percentage         avg_original\n",
       "gender                                                     \n",
       "F                 83,251.00      35.6%                83.84\n",
       "M                150,675.00      64.4%               115.99"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "original_tweets_by_gender_df = user_summary_df[['gender', 'original']].groupby('gender').sum()\n",
    "original_tweets_by_gender_df['percentage'] = original_tweets_by_gender_df.original.div(user_summary_df.original.sum()).mul(100).round(1).astype(str) + '%'\n",
    "original_tweets_by_gender_df.reset_index(inplace=True)\n",
    "original_tweets_by_gender_df['avg_original'] = original_tweets_by_gender_df.apply(lambda row: row['original'] / journalist_gender_summary_df.loc[row['gender']]['count'], axis=1)\n",
    "original_tweets_by_gender_df.set_index('gender', inplace=True, drop=True)\n",
    "original_tweets_by_gender_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Who posted the most original tweets?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>original</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>16187637</th>\n",
       "      <td>ChadPergram</td>\n",
       "      <td>Pergram, Chad</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>59305</td>\n",
       "      <td>61461</td>\n",
       "      <td>2,693.00</td>\n",
       "      <td>2,693.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31127446</th>\n",
       "      <td>markknoller</td>\n",
       "      <td>Knoller, Mark</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>301474</td>\n",
       "      <td>115132</td>\n",
       "      <td>1,858.00</td>\n",
       "      <td>2,089.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16459325</th>\n",
       "      <td>ryanbeckwith</td>\n",
       "      <td>Beckwith, Ryan Teague</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>20947</td>\n",
       "      <td>92203</td>\n",
       "      <td>1,534.00</td>\n",
       "      <td>5,187.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19580890</th>\n",
       "      <td>LeeCamp</td>\n",
       "      <td>Camp, Lee</td>\n",
       "      <td>RTTV America</td>\n",
       "      <td>M</td>\n",
       "      <td>67601</td>\n",
       "      <td>52051</td>\n",
       "      <td>1,517.00</td>\n",
       "      <td>3,708.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18825339</th>\n",
       "      <td>CahnEmily</td>\n",
       "      <td>Cahn, Emily</td>\n",
       "      <td>Mic</td>\n",
       "      <td>F</td>\n",
       "      <td>16980</td>\n",
       "      <td>100803</td>\n",
       "      <td>1,440.00</td>\n",
       "      <td>8,196.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>593813785</th>\n",
       "      <td>DonnaYoungDC</td>\n",
       "      <td>Young, Donna</td>\n",
       "      <td>S&amp;P Global Market Intelligence</td>\n",
       "      <td>F</td>\n",
       "      <td>5894</td>\n",
       "      <td>49967</td>\n",
       "      <td>1,332.00</td>\n",
       "      <td>4,414.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>148143</td>\n",
       "      <td>1,316.00</td>\n",
       "      <td>5,078.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21316253</th>\n",
       "      <td>ZekeJMiller</td>\n",
       "      <td>Miller, Zeke J.</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>198517</td>\n",
       "      <td>161148</td>\n",
       "      <td>1,271.00</td>\n",
       "      <td>2,106.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36246939</th>\n",
       "      <td>malbertnews</td>\n",
       "      <td>Albert, Mark</td>\n",
       "      <td>The Voyage Report</td>\n",
       "      <td>M</td>\n",
       "      <td>3575</td>\n",
       "      <td>28230</td>\n",
       "      <td>1,078.00</td>\n",
       "      <td>1,151.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>117467779</th>\n",
       "      <td>palbergo</td>\n",
       "      <td>Albergo, Paul F.</td>\n",
       "      <td>Bloomberg BNA</td>\n",
       "      <td>M</td>\n",
       "      <td>1191</td>\n",
       "      <td>18083</td>\n",
       "      <td>1,043.00</td>\n",
       "      <td>1,236.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102171691</th>\n",
       "      <td>rlocker12</td>\n",
       "      <td>Locker, Ray</td>\n",
       "      <td>USA Today</td>\n",
       "      <td>M</td>\n",
       "      <td>3665</td>\n",
       "      <td>41194</td>\n",
       "      <td>1,038.00</td>\n",
       "      <td>2,496.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15486163</th>\n",
       "      <td>SimonMarksFSN</td>\n",
       "      <td>Marks, Simon</td>\n",
       "      <td>Feature Story News</td>\n",
       "      <td>M</td>\n",
       "      <td>7767</td>\n",
       "      <td>41541</td>\n",
       "      <td>984.00</td>\n",
       "      <td>3,432.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>275207082</th>\n",
       "      <td>AlexParkerDC</td>\n",
       "      <td>Parker, Alexander M.</td>\n",
       "      <td>Bloomberg BNA</td>\n",
       "      <td>M</td>\n",
       "      <td>3828</td>\n",
       "      <td>142150</td>\n",
       "      <td>972.00</td>\n",
       "      <td>3,983.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190360266</th>\n",
       "      <td>connorobrienNH</td>\n",
       "      <td>O’Brien, Connor</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>6158</td>\n",
       "      <td>17242</td>\n",
       "      <td>954.00</td>\n",
       "      <td>1,944.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16031927</th>\n",
       "      <td>greta</td>\n",
       "      <td>Van Susteren, Greta</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>F</td>\n",
       "      <td>1186850</td>\n",
       "      <td>116645</td>\n",
       "      <td>907.00</td>\n",
       "      <td>4,792.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>300497193</th>\n",
       "      <td>tackettdc</td>\n",
       "      <td>Tackett, R. Michael</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>16857</td>\n",
       "      <td>38620</td>\n",
       "      <td>896.00</td>\n",
       "      <td>1,041.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191964162</th>\n",
       "      <td>SamLitzinger</td>\n",
       "      <td>Litzinger, Sam</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>2329</td>\n",
       "      <td>95236</td>\n",
       "      <td>891.00</td>\n",
       "      <td>7,537.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118130765</th>\n",
       "      <td>dylanlscott</td>\n",
       "      <td>Scott, Dylan L.</td>\n",
       "      <td>Stat News</td>\n",
       "      <td>M</td>\n",
       "      <td>20122</td>\n",
       "      <td>42497</td>\n",
       "      <td>885.00</td>\n",
       "      <td>3,960.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3817401</th>\n",
       "      <td>ericgeller</td>\n",
       "      <td>Geller, Eric</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>58173</td>\n",
       "      <td>208763</td>\n",
       "      <td>871.00</td>\n",
       "      <td>11,432.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259395895</th>\n",
       "      <td>JohnJHarwood</td>\n",
       "      <td>Harwood, John</td>\n",
       "      <td>CNBC</td>\n",
       "      <td>M</td>\n",
       "      <td>149040</td>\n",
       "      <td>78015</td>\n",
       "      <td>846.00</td>\n",
       "      <td>6,377.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27882000</th>\n",
       "      <td>jamiedupree</td>\n",
       "      <td>Dupree, Jamie</td>\n",
       "      <td>Cox Broadcasting</td>\n",
       "      <td>M</td>\n",
       "      <td>140848</td>\n",
       "      <td>46181</td>\n",
       "      <td>841.00</td>\n",
       "      <td>2,108.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>27294</td>\n",
       "      <td>836.00</td>\n",
       "      <td>1,673.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104299137</th>\n",
       "      <td>DavidMDrucker</td>\n",
       "      <td>Drucker, David</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>M</td>\n",
       "      <td>35033</td>\n",
       "      <td>104613</td>\n",
       "      <td>824.00</td>\n",
       "      <td>4,907.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63149389</th>\n",
       "      <td>hbwx</td>\n",
       "      <td>Bernstein, Howard</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>8337</td>\n",
       "      <td>48025</td>\n",
       "      <td>822.00</td>\n",
       "      <td>1,604.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13262862</th>\n",
       "      <td>HowardMortman</td>\n",
       "      <td>Mortman, Howard</td>\n",
       "      <td>C–SPAN</td>\n",
       "      <td>M</td>\n",
       "      <td>6211</td>\n",
       "      <td>38406</td>\n",
       "      <td>819.00</td>\n",
       "      <td>1,289.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name                   name  \\\n",
       "user_id                                            \n",
       "16187637      ChadPergram          Pergram, Chad   \n",
       "31127446      markknoller          Knoller, Mark   \n",
       "16459325     ryanbeckwith  Beckwith, Ryan Teague   \n",
       "19580890          LeeCamp              Camp, Lee   \n",
       "18825339        CahnEmily            Cahn, Emily   \n",
       "593813785    DonnaYoungDC           Young, Donna   \n",
       "14529929       jaketapper           Tapper, Jake   \n",
       "21316253      ZekeJMiller        Miller, Zeke J.   \n",
       "36246939      malbertnews           Albert, Mark   \n",
       "117467779        palbergo       Albergo, Paul F.   \n",
       "102171691       rlocker12            Locker, Ray   \n",
       "15486163    SimonMarksFSN           Marks, Simon   \n",
       "275207082    AlexParkerDC   Parker, Alexander M.   \n",
       "190360266  connorobrienNH        O’Brien, Connor   \n",
       "16031927            greta    Van Susteren, Greta   \n",
       "300497193       tackettdc    Tackett, R. Michael   \n",
       "191964162    SamLitzinger         Litzinger, Sam   \n",
       "118130765     dylanlscott        Scott, Dylan L.   \n",
       "3817401        ericgeller           Geller, Eric   \n",
       "259395895    JohnJHarwood          Harwood, John   \n",
       "27882000      jamiedupree          Dupree, Jamie   \n",
       "407013776       burgessev       Everett, John B.   \n",
       "104299137   DavidMDrucker         Drucker, David   \n",
       "63149389             hbwx      Bernstein, Howard   \n",
       "13262862    HowardMortman        Mortman, Howard   \n",
       "\n",
       "                             organization gender  followers_count  \\\n",
       "user_id                                                             \n",
       "16187637                         Fox News      M            59305   \n",
       "31127446                         CBS News      M           301474   \n",
       "16459325                    Time Magazine      M            20947   \n",
       "19580890                     RTTV America      M            67601   \n",
       "18825339                              Mic      F            16980   \n",
       "593813785  S&P Global Market Intelligence      F             5894   \n",
       "14529929                              CNN      M          1305680   \n",
       "21316253                    Time Magazine      M           198517   \n",
       "36246939                The Voyage Report      M             3575   \n",
       "117467779                   Bloomberg BNA      M             1191   \n",
       "102171691                       USA Today      M             3665   \n",
       "15486163               Feature Story News      M             7767   \n",
       "275207082                   Bloomberg BNA      M             3828   \n",
       "190360266                        Politico      M             6158   \n",
       "16031927                            MSNBC      F          1186850   \n",
       "300497193                  New York Times      M            16857   \n",
       "191964162                        CBS News      M             2329   \n",
       "118130765                       Stat News      M            20122   \n",
       "3817401                          Politico      M            58173   \n",
       "259395895                            CNBC      M           149040   \n",
       "27882000                 Cox Broadcasting      M           140848   \n",
       "407013776                        Politico      M            31010   \n",
       "104299137             Washington Examiner      M            35033   \n",
       "63149389                          WUSA–TV      M             8337   \n",
       "13262862                           C–SPAN      M             6211   \n",
       "\n",
       "           tweet_count             original    tweets_in_dataset  \n",
       "user_id                                                           \n",
       "16187637         61461             2,693.00             2,693.00  \n",
       "31127446        115132             1,858.00             2,089.00  \n",
       "16459325         92203             1,534.00             5,187.00  \n",
       "19580890         52051             1,517.00             3,708.00  \n",
       "18825339        100803             1,440.00             8,196.00  \n",
       "593813785        49967             1,332.00             4,414.00  \n",
       "14529929        148143             1,316.00             5,078.00  \n",
       "21316253        161148             1,271.00             2,106.00  \n",
       "36246939         28230             1,078.00             1,151.00  \n",
       "117467779        18083             1,043.00             1,236.00  \n",
       "102171691        41194             1,038.00             2,496.00  \n",
       "15486163         41541               984.00             3,432.00  \n",
       "275207082       142150               972.00             3,983.00  \n",
       "190360266        17242               954.00             1,944.00  \n",
       "16031927        116645               907.00             4,792.00  \n",
       "300497193        38620               896.00             1,041.00  \n",
       "191964162        95236               891.00             7,537.00  \n",
       "118130765        42497               885.00             3,960.00  \n",
       "3817401         208763               871.00            11,432.00  \n",
       "259395895        78015               846.00             6,377.00  \n",
       "27882000         46181               841.00             2,108.00  \n",
       "407013776        27294               836.00             1,673.00  \n",
       "104299137       104613               824.00             4,907.00  \n",
       "63149389         48025               822.00             1,604.00  \n",
       "13262862         38406               819.00             1,289.00  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "user_summary_df[['screen_name', 'name', 'organization', 'gender', 'followers_count', 'tweet_count', 'original', 'tweets_in_dataset']].sort_values(['original'], ascending=False).head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Mentions of all accounts (not just journalists)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists mentioning accounts, which are mentioned the most?\n",
    "This is based on screen name, which could have changed during collection period. However, for the users that would be at the top of this list, seems unlikely."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>realDonaldTrump</th>\n",
       "      <td>2876</td>\n",
       "      <td>452</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>POTUS</th>\n",
       "      <td>2265</td>\n",
       "      <td>253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>wusa9</th>\n",
       "      <td>2111</td>\n",
       "      <td>41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AP</th>\n",
       "      <td>1948</td>\n",
       "      <td>143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>USATODAY</th>\n",
       "      <td>1235</td>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nbcwashington</th>\n",
       "      <td>1230</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WSJ</th>\n",
       "      <td>1227</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>dcexaminer</th>\n",
       "      <td>1034</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SHSanders45</th>\n",
       "      <td>927</td>\n",
       "      <td>148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nytimes</th>\n",
       "      <td>829</td>\n",
       "      <td>289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BloombergBNA</th>\n",
       "      <td>759</td>\n",
       "      <td>45</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>politico</th>\n",
       "      <td>747</td>\n",
       "      <td>181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SpeakerRyan</th>\n",
       "      <td>700</td>\n",
       "      <td>181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Scaramucci</th>\n",
       "      <td>657</td>\n",
       "      <td>198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PressSec</th>\n",
       "      <td>654</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CNN</th>\n",
       "      <td>628</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ABC7News</th>\n",
       "      <td>604</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SenJohnMcCain</th>\n",
       "      <td>599</td>\n",
       "      <td>231</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WTOP</th>\n",
       "      <td>529</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BloombergLaw</th>\n",
       "      <td>517</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VP</th>\n",
       "      <td>506</td>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SteveScalise</th>\n",
       "      <td>505</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSNBC</th>\n",
       "      <td>486</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Reuters</th>\n",
       "      <td>483</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>bpolitics</th>\n",
       "      <td>432</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 mention_count  mentioning_count\n",
       "realDonaldTrump           2876               452\n",
       "POTUS                     2265               253\n",
       "wusa9                     2111                41\n",
       "AP                        1948               143\n",
       "USATODAY                  1235               105\n",
       "nbcwashington             1230                70\n",
       "WSJ                       1227               152\n",
       "dcexaminer                1034                53\n",
       "SHSanders45                927               148\n",
       "nytimes                    829               289\n",
       "BloombergBNA               759                45\n",
       "politico                   747               181\n",
       "SpeakerRyan                700               181\n",
       "Scaramucci                 657               198\n",
       "PressSec                   654               178\n",
       "CNN                        628               186\n",
       "ABC7News                   604                24\n",
       "SenJohnMcCain              599               231\n",
       "WTOP                       529                43\n",
       "BloombergLaw               517                15\n",
       "VP                         506               140\n",
       "SteveScalise               505               150\n",
       "MSNBC                      486                92\n",
       "Reuters                    483                84\n",
       "bpolitics                  432                69"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Mention count\n",
    "mention_count_screen_name_df = pd.DataFrame(mention_df.mention_screen_name.value_counts().rename('mention_count'))\n",
    "\n",
    "# Count of mentioning users\n",
    "mention_user_id_per_user_screen_name_df = mention_df[['mention_screen_name', 'user_id']].drop_duplicates()\n",
    "mentioning_count_screen_name_df = pd.DataFrame(mention_user_id_per_user_screen_name_df.groupby('mention_screen_name').size(), columns=['mentioning_count'])\n",
    "mentioning_count_screen_name_df.index.name = 'screen_name'\n",
    "\n",
    "all_mentioned_df = mention_count_screen_name_df.join(mentioning_count_screen_name_df)\n",
    "all_mentioned_df.to_csv('output/all_mentioned_by_journalists.csv')\n",
    "all_mentioned_df.head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Same, but ordered by the number of journalists mentioning the account"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>realDonaldTrump</th>\n",
       "      <td>2876</td>\n",
       "      <td>452</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nytimes</th>\n",
       "      <td>829</td>\n",
       "      <td>289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>POTUS</th>\n",
       "      <td>2265</td>\n",
       "      <td>253</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SenJohnMcCain</th>\n",
       "      <td>599</td>\n",
       "      <td>231</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Scaramucci</th>\n",
       "      <td>657</td>\n",
       "      <td>198</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CNN</th>\n",
       "      <td>628</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>politico</th>\n",
       "      <td>747</td>\n",
       "      <td>181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SpeakerRyan</th>\n",
       "      <td>700</td>\n",
       "      <td>181</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PressSec</th>\n",
       "      <td>654</td>\n",
       "      <td>178</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>washingtonpost</th>\n",
       "      <td>413</td>\n",
       "      <td>154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WSJ</th>\n",
       "      <td>1227</td>\n",
       "      <td>152</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SteveScalise</th>\n",
       "      <td>505</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SHSanders45</th>\n",
       "      <td>927</td>\n",
       "      <td>148</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AP</th>\n",
       "      <td>1948</td>\n",
       "      <td>143</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>VP</th>\n",
       "      <td>506</td>\n",
       "      <td>140</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SenateMajLdr</th>\n",
       "      <td>412</td>\n",
       "      <td>120</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>DonaldJTrumpJr</th>\n",
       "      <td>199</td>\n",
       "      <td>110</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>RandPaul</th>\n",
       "      <td>206</td>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>USATODAY</th>\n",
       "      <td>1235</td>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>LindseyGrahamSC</th>\n",
       "      <td>253</td>\n",
       "      <td>105</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SenSchumer</th>\n",
       "      <td>265</td>\n",
       "      <td>97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>NancyPelosi</th>\n",
       "      <td>266</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSNBC</th>\n",
       "      <td>486</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CNNPolitics</th>\n",
       "      <td>329</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MarkWarner</th>\n",
       "      <td>204</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 mention_count  mentioning_count\n",
       "realDonaldTrump           2876               452\n",
       "nytimes                    829               289\n",
       "POTUS                     2265               253\n",
       "SenJohnMcCain              599               231\n",
       "Scaramucci                 657               198\n",
       "CNN                        628               186\n",
       "politico                   747               181\n",
       "SpeakerRyan                700               181\n",
       "PressSec                   654               178\n",
       "washingtonpost             413               154\n",
       "WSJ                       1227               152\n",
       "SteveScalise               505               150\n",
       "SHSanders45                927               148\n",
       "AP                        1948               143\n",
       "VP                         506               140\n",
       "SenateMajLdr               412               120\n",
       "DonaldJTrumpJr             199               110\n",
       "RandPaul                   206               107\n",
       "USATODAY                  1235               105\n",
       "LindseyGrahamSC            253               105\n",
       "SenSchumer                 265                97\n",
       "NancyPelosi                266                95\n",
       "MSNBC                      486                92\n",
       "CNNPolitics                329                91\n",
       "MarkWarner                 204                89"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_mentioned_df.sort_values(['mentioning_count', 'mention_count'], ascending=False).head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists mentioning journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists mentioning journalists, who is mentioned the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>325050734</th>\n",
       "      <td>AllysonRaeWx</td>\n",
       "      <td>Banks, Allyson</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>F</td>\n",
       "      <td>6918</td>\n",
       "      <td>330.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28496589</th>\n",
       "      <td>TenaciousTopper</td>\n",
       "      <td>Shutt, Charles</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>15868</td>\n",
       "      <td>239.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63149389</th>\n",
       "      <td>hbwx</td>\n",
       "      <td>Bernstein, Howard</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>8337</td>\n",
       "      <td>235.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>212.00</td>\n",
       "      <td>46.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>200.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>143.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>127.00</td>\n",
       "      <td>51.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169586280</th>\n",
       "      <td>WaPoSean</td>\n",
       "      <td>Sullivan, Sean</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>22860</td>\n",
       "      <td>117.00</td>\n",
       "      <td>20.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997684836</th>\n",
       "      <td>pkcapitol</td>\n",
       "      <td>Kane, Paul</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>31300</td>\n",
       "      <td>116.00</td>\n",
       "      <td>47.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108617810</th>\n",
       "      <td>DanaBashCNN</td>\n",
       "      <td>Bash, Dana</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>281861</td>\n",
       "      <td>115.00</td>\n",
       "      <td>55.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82151660</th>\n",
       "      <td>kelsey_snell</td>\n",
       "      <td>Snell, Kelse</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>8108</td>\n",
       "      <td>109.00</td>\n",
       "      <td>22.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123327472</th>\n",
       "      <td>peterbakernyt</td>\n",
       "      <td>Baker, Peter</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>96956</td>\n",
       "      <td>107.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>106.00</td>\n",
       "      <td>42.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>105.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15931637</th>\n",
       "      <td>jonkarl</td>\n",
       "      <td>Karl, Jonathan</td>\n",
       "      <td>ABC News</td>\n",
       "      <td>M</td>\n",
       "      <td>183467</td>\n",
       "      <td>104.00</td>\n",
       "      <td>40.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>100.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9126752</th>\n",
       "      <td>reporterjoe</td>\n",
       "      <td>Gould, Joseph M.</td>\n",
       "      <td>Sightline Media Group</td>\n",
       "      <td>M</td>\n",
       "      <td>4702</td>\n",
       "      <td>98.00</td>\n",
       "      <td>16.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>95.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52392666</th>\n",
       "      <td>ZoeTillman</td>\n",
       "      <td>Tillman, Zoe</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>F</td>\n",
       "      <td>15246</td>\n",
       "      <td>87.00</td>\n",
       "      <td>14.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16930125</th>\n",
       "      <td>edatpost</td>\n",
       "      <td>O’Keefe, Edward</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>58670</td>\n",
       "      <td>84.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26632935</th>\n",
       "      <td>HopeSeck</td>\n",
       "      <td>Hodge Seck, Hope</td>\n",
       "      <td>Military.com</td>\n",
       "      <td>F</td>\n",
       "      <td>4584</td>\n",
       "      <td>83.00</td>\n",
       "      <td>3.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48802204</th>\n",
       "      <td>HardballChris</td>\n",
       "      <td>Matthews, Chris</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>718330</td>\n",
       "      <td>80.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>78.00</td>\n",
       "      <td>37.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>78.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24439201</th>\n",
       "      <td>jameshohmann</td>\n",
       "      <td>Hohmann, James P.</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>38708</td>\n",
       "      <td>78.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               screen_name                    name           organization  \\\n",
       "user_id                                                                     \n",
       "325050734     AllysonRaeWx          Banks, Allyson                WUSA–TV   \n",
       "28496589   TenaciousTopper          Shutt, Charles                WUSA–TV   \n",
       "63149389              hbwx       Bernstein, Howard                WUSA–TV   \n",
       "407013776        burgessev        Everett, John B.               Politico   \n",
       "16018516            jenhab  Haberkorn, Jennifer A.               Politico   \n",
       "19186003       seungminkim          Kim, Seung Min               Politico   \n",
       "14529929        jaketapper            Tapper, Jake                    CNN   \n",
       "169586280         WaPoSean          Sullivan, Sean        Washington Post   \n",
       "997684836        pkcapitol              Kane, Paul        Washington Post   \n",
       "108617810      DanaBashCNN              Bash, Dana                    CNN   \n",
       "82151660      kelsey_snell            Snell, Kelse        Washington Post   \n",
       "123327472    peterbakernyt            Baker, Peter         New York Times   \n",
       "13524182        daveweigel           Weigel, David        Washington Post   \n",
       "46557945     StevenTDennis       Dennis, Steven T.         Bloomberg News   \n",
       "15931637           jonkarl          Karl, Jonathan               ABC News   \n",
       "33919343     AshleyRParker          Parker, Ashley        Washington Post   \n",
       "9126752        reporterjoe        Gould, Joseph M.  Sightline Media Group   \n",
       "39155029            mkraju           Raju, Manu K.                    CNN   \n",
       "52392666        ZoeTillman            Tillman, Zoe               BuzzFeed   \n",
       "16930125          edatpost         O’Keefe, Edward        Washington Post   \n",
       "26632935          HopeSeck        Hodge Seck, Hope           Military.com   \n",
       "48802204     HardballChris         Matthews, Chris               NBC News   \n",
       "19107878       GlennThrush        Thrush, Glenn H.         New York Times   \n",
       "217550862     BresPolitico         Bresnahan, John               Politico   \n",
       "24439201      jameshohmann       Hohmann, James P.        Washington Post   \n",
       "\n",
       "          gender  followers_count        mention_count     mentioning_count  \n",
       "user_id                                                                      \n",
       "325050734      F             6918               330.00                 7.00  \n",
       "28496589       M            15868               239.00                13.00  \n",
       "63149389       M             8337               235.00                10.00  \n",
       "407013776      M            31010               212.00                46.00  \n",
       "16018516       F            20028               200.00                31.00  \n",
       "19186003       F            33980               143.00                41.00  \n",
       "14529929       M          1305680               127.00                51.00  \n",
       "169586280      M            22860               117.00                20.00  \n",
       "997684836      M            31300               116.00                47.00  \n",
       "108617810      F           281861               115.00                55.00  \n",
       "82151660       F             8108               109.00                22.00  \n",
       "123327472      M            96956               107.00                43.00  \n",
       "13524182       M           332344               106.00                42.00  \n",
       "46557945       M            55762               105.00                27.00  \n",
       "15931637       M           183467               104.00                40.00  \n",
       "33919343       F           122382               100.00                31.00  \n",
       "9126752        M             4702                98.00                16.00  \n",
       "39155029       M            88366                95.00                43.00  \n",
       "52392666       F            15246                87.00                14.00  \n",
       "16930125       M            58670                84.00                41.00  \n",
       "26632935       F             4584                83.00                 3.00  \n",
       "48802204       M           718330                80.00                 9.00  \n",
       "19107878       M           308181                78.00                37.00  \n",
       "217550862      M            40562                78.00                27.00  \n",
       "24439201       M            38708                78.00                27.00  "
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mention_summary_df = journalist_mention_summary(journalists_mention_df)\n",
    "journalists_mention_summary_df.to_csv('output/journalists_mentioned_by_journalists.csv')\n",
    "journalists_mention_summary_df[journalist_mention_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Same, but ordered by number of journalists mentioning"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>108617810</th>\n",
       "      <td>DanaBashCNN</td>\n",
       "      <td>Bash, Dana</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>281861</td>\n",
       "      <td>115.00</td>\n",
       "      <td>55.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>127.00</td>\n",
       "      <td>51.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997684836</th>\n",
       "      <td>pkcapitol</td>\n",
       "      <td>Kane, Paul</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>31300</td>\n",
       "      <td>116.00</td>\n",
       "      <td>47.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>212.00</td>\n",
       "      <td>46.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112526560</th>\n",
       "      <td>kenvogel</td>\n",
       "      <td>Vogel, Kenneth P.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>53894</td>\n",
       "      <td>67.00</td>\n",
       "      <td>45.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18227519</th>\n",
       "      <td>morningmika</td>\n",
       "      <td>Brzezinski, Mika</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>F</td>\n",
       "      <td>653031</td>\n",
       "      <td>70.00</td>\n",
       "      <td>44.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123327472</th>\n",
       "      <td>peterbakernyt</td>\n",
       "      <td>Baker, Peter</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>96956</td>\n",
       "      <td>107.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>95.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>106.00</td>\n",
       "      <td>42.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>143.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16930125</th>\n",
       "      <td>edatpost</td>\n",
       "      <td>O’Keefe, Edward</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>58670</td>\n",
       "      <td>84.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15931637</th>\n",
       "      <td>jonkarl</td>\n",
       "      <td>Karl, Jonathan</td>\n",
       "      <td>ABC News</td>\n",
       "      <td>M</td>\n",
       "      <td>183467</td>\n",
       "      <td>104.00</td>\n",
       "      <td>40.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>61.00</td>\n",
       "      <td>38.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>78.00</td>\n",
       "      <td>37.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>75.00</td>\n",
       "      <td>37.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61734492</th>\n",
       "      <td>Fahrenthold</td>\n",
       "      <td>Fahrenthold, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>451778</td>\n",
       "      <td>43.00</td>\n",
       "      <td>32.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>200.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>100.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50325797</th>\n",
       "      <td>chucktodd</td>\n",
       "      <td>Todd, Chuck</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>1781247</td>\n",
       "      <td>40.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71294756</th>\n",
       "      <td>wolfblitzer</td>\n",
       "      <td>Blitzer, Wolf</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1281914</td>\n",
       "      <td>56.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28181835</th>\n",
       "      <td>jpaceDC</td>\n",
       "      <td>Pace, Julie</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>46017</td>\n",
       "      <td>52.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>67.00</td>\n",
       "      <td>29.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16031927</th>\n",
       "      <td>greta</td>\n",
       "      <td>Van Susteren, Greta</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>F</td>\n",
       "      <td>1186850</td>\n",
       "      <td>37.00</td>\n",
       "      <td>28.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>105.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>78.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             screen_name                    name      organization gender  \\\n",
       "user_id                                                                     \n",
       "108617810    DanaBashCNN              Bash, Dana               CNN      F   \n",
       "14529929      jaketapper            Tapper, Jake               CNN      M   \n",
       "997684836      pkcapitol              Kane, Paul   Washington Post      M   \n",
       "407013776      burgessev        Everett, John B.          Politico      M   \n",
       "112526560       kenvogel       Vogel, Kenneth P.          Politico      M   \n",
       "18227519     morningmika        Brzezinski, Mika             MSNBC      F   \n",
       "123327472  peterbakernyt            Baker, Peter    New York Times      M   \n",
       "39155029          mkraju           Raju, Manu K.               CNN      M   \n",
       "13524182      daveweigel           Weigel, David   Washington Post      M   \n",
       "19186003     seungminkim          Kim, Seung Min          Politico      F   \n",
       "16930125        edatpost         O’Keefe, Edward   Washington Post      M   \n",
       "15931637         jonkarl          Karl, Jonathan          ABC News      M   \n",
       "22771961          Acosta             Acosta, Jim               CNN      M   \n",
       "19107878     GlennThrush        Thrush, Glenn H.    New York Times      M   \n",
       "18678924        jmartNYT        Martin, Jonathan    New York Times      M   \n",
       "61734492     Fahrenthold      Fahrenthold, David   Washington Post      M   \n",
       "16018516          jenhab  Haberkorn, Jennifer A.          Politico      F   \n",
       "33919343   AshleyRParker          Parker, Ashley   Washington Post      F   \n",
       "50325797       chucktodd             Todd, Chuck          NBC News      M   \n",
       "71294756     wolfblitzer           Blitzer, Wolf               CNN      M   \n",
       "28181835         jpaceDC             Pace, Julie  Associated Press      F   \n",
       "12354832           kasie             Hunt, Kasie          NBC News      F   \n",
       "16031927           greta     Van Susteren, Greta             MSNBC      F   \n",
       "46557945   StevenTDennis       Dennis, Steven T.    Bloomberg News      M   \n",
       "217550862   BresPolitico         Bresnahan, John          Politico      M   \n",
       "\n",
       "           followers_count        mention_count     mentioning_count  \n",
       "user_id                                                               \n",
       "108617810           281861               115.00                55.00  \n",
       "14529929           1305680               127.00                51.00  \n",
       "997684836            31300               116.00                47.00  \n",
       "407013776            31010               212.00                46.00  \n",
       "112526560            53894                67.00                45.00  \n",
       "18227519            653031                70.00                44.00  \n",
       "123327472            96956               107.00                43.00  \n",
       "39155029             88366                95.00                43.00  \n",
       "13524182            332344               106.00                42.00  \n",
       "19186003             33980               143.00                41.00  \n",
       "16930125             58670                84.00                41.00  \n",
       "15931637            183467               104.00                40.00  \n",
       "22771961            350650                61.00                38.00  \n",
       "19107878            308181                78.00                37.00  \n",
       "18678924            197322                75.00                37.00  \n",
       "61734492            451778                43.00                32.00  \n",
       "16018516             20028               200.00                31.00  \n",
       "33919343            122382               100.00                31.00  \n",
       "50325797           1781247                40.00                31.00  \n",
       "71294756           1281914                56.00                30.00  \n",
       "28181835             46017                52.00                30.00  \n",
       "12354832            187357                67.00                29.00  \n",
       "16031927           1186850                37.00                28.00  \n",
       "46557945             55762               105.00                27.00  \n",
       "217550862            40562                78.00                27.00  "
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mention_summary_df[journalist_mention_summary_fields].sort_values(['mentioning_count', 'mention_count'], ascending=False).head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists mentioning other journalists, how many are male / female?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_mentions</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>8298</td>\n",
       "      <td>58.0%</td>\n",
       "      <td>6.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>6000</td>\n",
       "      <td>42.0%</td>\n",
       "      <td>6.04</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_mentions\n",
       "index                                       \n",
       "M       8298      58.0%                 6.39\n",
       "F       6000      42.0%                 6.04"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_mention_gender_summary(journalists_mention_df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average how many times are journalists mentioned by other journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mention_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2,292.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>6.24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>17.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>5.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>330.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mention_count\n",
       "count             2,292.00\n",
       "mean                  6.24\n",
       "std                  17.59\n",
       "min                   0.00\n",
       "25%                   0.00\n",
       "50%                   1.00\n",
       "75%                   5.00\n",
       "max                 330.00"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mention_summary_df[['mention_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists mentioning female journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists mentioning female journalists who is mentioned the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>325050734</th>\n",
       "      <td>AllysonRaeWx</td>\n",
       "      <td>Banks, Allyson</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>F</td>\n",
       "      <td>6918</td>\n",
       "      <td>330.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>200.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>143.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108617810</th>\n",
       "      <td>DanaBashCNN</td>\n",
       "      <td>Bash, Dana</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>281861</td>\n",
       "      <td>115.00</td>\n",
       "      <td>55.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82151660</th>\n",
       "      <td>kelsey_snell</td>\n",
       "      <td>Snell, Kelse</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>8108</td>\n",
       "      <td>109.00</td>\n",
       "      <td>22.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>100.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52392666</th>\n",
       "      <td>ZoeTillman</td>\n",
       "      <td>Tillman, Zoe</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>F</td>\n",
       "      <td>15246</td>\n",
       "      <td>87.00</td>\n",
       "      <td>14.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26632935</th>\n",
       "      <td>HopeSeck</td>\n",
       "      <td>Hodge Seck, Hope</td>\n",
       "      <td>Military.com</td>\n",
       "      <td>F</td>\n",
       "      <td>4584</td>\n",
       "      <td>83.00</td>\n",
       "      <td>3.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16441088</th>\n",
       "      <td>jestei</td>\n",
       "      <td>Steinhauer, Jennifer</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>F</td>\n",
       "      <td>13452</td>\n",
       "      <td>76.00</td>\n",
       "      <td>26.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18227519</th>\n",
       "      <td>morningmika</td>\n",
       "      <td>Brzezinski, Mika</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>F</td>\n",
       "      <td>653031</td>\n",
       "      <td>70.00</td>\n",
       "      <td>44.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>67.00</td>\n",
       "      <td>29.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139738464</th>\n",
       "      <td>mj_lee</td>\n",
       "      <td>Lee, MJ</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>31940</td>\n",
       "      <td>67.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204599219</th>\n",
       "      <td>pw_cunningham</td>\n",
       "      <td>Cunningham, Paige</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>F</td>\n",
       "      <td>9255</td>\n",
       "      <td>67.00</td>\n",
       "      <td>18.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118747545</th>\n",
       "      <td>eilperin</td>\n",
       "      <td>Eilperin, Juliet</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>20483</td>\n",
       "      <td>67.00</td>\n",
       "      <td>16.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>360080772</th>\n",
       "      <td>FoxReports</td>\n",
       "      <td>Fox, Lauren</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>7282</td>\n",
       "      <td>65.00</td>\n",
       "      <td>15.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>58869089</th>\n",
       "      <td>margarettalev</td>\n",
       "      <td>Talev, Margaret</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>F</td>\n",
       "      <td>19588</td>\n",
       "      <td>58.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>313545488</th>\n",
       "      <td>LauraLitvan</td>\n",
       "      <td>Litvan, Laura</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>F</td>\n",
       "      <td>4468</td>\n",
       "      <td>58.00</td>\n",
       "      <td>5.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19734832</th>\n",
       "      <td>sarahkliff</td>\n",
       "      <td>Kliff, Sarah L.</td>\n",
       "      <td>Vox Media</td>\n",
       "      <td>F</td>\n",
       "      <td>100090</td>\n",
       "      <td>57.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>381664207</th>\n",
       "      <td>caitlinnowens</td>\n",
       "      <td>Owens, Caitlin N.</td>\n",
       "      <td>Axios</td>\n",
       "      <td>F</td>\n",
       "      <td>5749</td>\n",
       "      <td>57.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167024520</th>\n",
       "      <td>rachaelmbade</td>\n",
       "      <td>Bade, Rachel M.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>30164</td>\n",
       "      <td>56.00</td>\n",
       "      <td>26.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247852986</th>\n",
       "      <td>rachanadixit</td>\n",
       "      <td>Pradhan, Rachana D.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>6178</td>\n",
       "      <td>55.00</td>\n",
       "      <td>14.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237477771</th>\n",
       "      <td>juliehdavis</td>\n",
       "      <td>Davis, Julie</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>F</td>\n",
       "      <td>49821</td>\n",
       "      <td>55.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36607254</th>\n",
       "      <td>Oriana0214</td>\n",
       "      <td>Pawlyk, Oriana</td>\n",
       "      <td>Military.com</td>\n",
       "      <td>F</td>\n",
       "      <td>6397</td>\n",
       "      <td>55.00</td>\n",
       "      <td>4.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28181835</th>\n",
       "      <td>jpaceDC</td>\n",
       "      <td>Pace, Julie</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>46017</td>\n",
       "      <td>52.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48144950</th>\n",
       "      <td>JudyWoodruff</td>\n",
       "      <td>Woodruff, Judy</td>\n",
       "      <td>PBS NewsHour</td>\n",
       "      <td>F</td>\n",
       "      <td>64294</td>\n",
       "      <td>49.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             screen_name                    name         organization gender  \\\n",
       "user_id                                                                        \n",
       "325050734   AllysonRaeWx          Banks, Allyson              WUSA–TV      F   \n",
       "16018516          jenhab  Haberkorn, Jennifer A.             Politico      F   \n",
       "19186003     seungminkim          Kim, Seung Min             Politico      F   \n",
       "108617810    DanaBashCNN              Bash, Dana                  CNN      F   \n",
       "82151660    kelsey_snell            Snell, Kelse      Washington Post      F   \n",
       "33919343   AshleyRParker          Parker, Ashley      Washington Post      F   \n",
       "52392666      ZoeTillman            Tillman, Zoe             BuzzFeed      F   \n",
       "26632935        HopeSeck        Hodge Seck, Hope         Military.com      F   \n",
       "16441088          jestei    Steinhauer, Jennifer       New York Times      F   \n",
       "18227519     morningmika        Brzezinski, Mika                MSNBC      F   \n",
       "12354832           kasie             Hunt, Kasie             NBC News      F   \n",
       "139738464         mj_lee                 Lee, MJ                  CNN      F   \n",
       "204599219  pw_cunningham       Cunningham, Paige  Washington Examiner      F   \n",
       "118747545       eilperin        Eilperin, Juliet      Washington Post      F   \n",
       "360080772     FoxReports             Fox, Lauren                  CNN      F   \n",
       "58869089   margarettalev         Talev, Margaret       Bloomberg News      F   \n",
       "313545488    LauraLitvan           Litvan, Laura       Bloomberg News      F   \n",
       "19734832      sarahkliff         Kliff, Sarah L.            Vox Media      F   \n",
       "381664207  caitlinnowens       Owens, Caitlin N.                Axios      F   \n",
       "167024520   rachaelmbade         Bade, Rachel M.             Politico      F   \n",
       "247852986   rachanadixit     Pradhan, Rachana D.             Politico      F   \n",
       "237477771    juliehdavis            Davis, Julie       New York Times      F   \n",
       "36607254      Oriana0214          Pawlyk, Oriana         Military.com      F   \n",
       "28181835         jpaceDC             Pace, Julie     Associated Press      F   \n",
       "48144950    JudyWoodruff          Woodruff, Judy         PBS NewsHour      F   \n",
       "\n",
       "           followers_count        mention_count     mentioning_count  \n",
       "user_id                                                               \n",
       "325050734             6918               330.00                 7.00  \n",
       "16018516             20028               200.00                31.00  \n",
       "19186003             33980               143.00                41.00  \n",
       "108617810           281861               115.00                55.00  \n",
       "82151660              8108               109.00                22.00  \n",
       "33919343            122382               100.00                31.00  \n",
       "52392666             15246                87.00                14.00  \n",
       "26632935              4584                83.00                 3.00  \n",
       "16441088             13452                76.00                26.00  \n",
       "18227519            653031                70.00                44.00  \n",
       "12354832            187357                67.00                29.00  \n",
       "139738464            31940                67.00                27.00  \n",
       "204599219             9255                67.00                18.00  \n",
       "118747545            20483                67.00                16.00  \n",
       "360080772             7282                65.00                15.00  \n",
       "58869089             19588                58.00                27.00  \n",
       "313545488             4468                58.00                 5.00  \n",
       "19734832            100090                57.00                27.00  \n",
       "381664207             5749                57.00                 9.00  \n",
       "167024520            30164                56.00                26.00  \n",
       "247852986             6178                55.00                14.00  \n",
       "237477771            49821                55.00                10.00  \n",
       "36607254              6397                55.00                 4.00  \n",
       "28181835             46017                52.00                30.00  \n",
       "48144950             64294                49.00                 7.00  "
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "female_journalists_mention_summary_df = journalists_mention_summary_df[journalists_mention_summary_df.gender == 'F']\n",
    "female_journalists_mention_summary_df.to_csv('output/female_journalists_mentioned_by_journalists.csv')\n",
    "female_journalists_mention_summary_df[journalist_mention_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times are female journalists mentioned by journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mention_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>993.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>6.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>17.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>4.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>330.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mention_count\n",
       "count               993.00\n",
       "mean                  6.04\n",
       "std                  17.95\n",
       "min                   0.00\n",
       "25%                   0.00\n",
       "50%                   1.00\n",
       "75%                   4.00\n",
       "max                 330.00"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "female_journalists_mention_summary_df[['mention_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists mentioning male journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists mentioning male journalists, who do they mention the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>28496589</th>\n",
       "      <td>TenaciousTopper</td>\n",
       "      <td>Shutt, Charles</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>15868</td>\n",
       "      <td>239.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63149389</th>\n",
       "      <td>hbwx</td>\n",
       "      <td>Bernstein, Howard</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>8337</td>\n",
       "      <td>235.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>212.00</td>\n",
       "      <td>46.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>127.00</td>\n",
       "      <td>51.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169586280</th>\n",
       "      <td>WaPoSean</td>\n",
       "      <td>Sullivan, Sean</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>22860</td>\n",
       "      <td>117.00</td>\n",
       "      <td>20.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997684836</th>\n",
       "      <td>pkcapitol</td>\n",
       "      <td>Kane, Paul</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>31300</td>\n",
       "      <td>116.00</td>\n",
       "      <td>47.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123327472</th>\n",
       "      <td>peterbakernyt</td>\n",
       "      <td>Baker, Peter</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>96956</td>\n",
       "      <td>107.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>106.00</td>\n",
       "      <td>42.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>105.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15931637</th>\n",
       "      <td>jonkarl</td>\n",
       "      <td>Karl, Jonathan</td>\n",
       "      <td>ABC News</td>\n",
       "      <td>M</td>\n",
       "      <td>183467</td>\n",
       "      <td>104.00</td>\n",
       "      <td>40.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9126752</th>\n",
       "      <td>reporterjoe</td>\n",
       "      <td>Gould, Joseph M.</td>\n",
       "      <td>Sightline Media Group</td>\n",
       "      <td>M</td>\n",
       "      <td>4702</td>\n",
       "      <td>98.00</td>\n",
       "      <td>16.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>95.00</td>\n",
       "      <td>43.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16930125</th>\n",
       "      <td>edatpost</td>\n",
       "      <td>O’Keefe, Edward</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>58670</td>\n",
       "      <td>84.00</td>\n",
       "      <td>41.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48802204</th>\n",
       "      <td>HardballChris</td>\n",
       "      <td>Matthews, Chris</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>718330</td>\n",
       "      <td>80.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>78.00</td>\n",
       "      <td>37.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>78.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24439201</th>\n",
       "      <td>jameshohmann</td>\n",
       "      <td>Hohmann, James P.</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>38708</td>\n",
       "      <td>78.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>75.00</td>\n",
       "      <td>37.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22891564</th>\n",
       "      <td>chrisgeidner</td>\n",
       "      <td>Geidner, Chris</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>M</td>\n",
       "      <td>83316</td>\n",
       "      <td>73.00</td>\n",
       "      <td>15.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112526560</th>\n",
       "      <td>kenvogel</td>\n",
       "      <td>Vogel, Kenneth P.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>53894</td>\n",
       "      <td>67.00</td>\n",
       "      <td>45.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18646108</th>\n",
       "      <td>BretBaier</td>\n",
       "      <td>Baier, Bret</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>1095184</td>\n",
       "      <td>66.00</td>\n",
       "      <td>18.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>61.00</td>\n",
       "      <td>38.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16067683</th>\n",
       "      <td>pauldemko</td>\n",
       "      <td>Demko, Paul Jeffrey</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>8170</td>\n",
       "      <td>60.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59676104</th>\n",
       "      <td>danbalz</td>\n",
       "      <td>Balz, Daniel</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>90819</td>\n",
       "      <td>57.00</td>\n",
       "      <td>26.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>71294756</th>\n",
       "      <td>wolfblitzer</td>\n",
       "      <td>Blitzer, Wolf</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1281914</td>\n",
       "      <td>56.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               screen_name                 name           organization gender  \\\n",
       "user_id                                                                         \n",
       "28496589   TenaciousTopper       Shutt, Charles                WUSA–TV      M   \n",
       "63149389              hbwx    Bernstein, Howard                WUSA–TV      M   \n",
       "407013776        burgessev     Everett, John B.               Politico      M   \n",
       "14529929        jaketapper         Tapper, Jake                    CNN      M   \n",
       "169586280         WaPoSean       Sullivan, Sean        Washington Post      M   \n",
       "997684836        pkcapitol           Kane, Paul        Washington Post      M   \n",
       "123327472    peterbakernyt         Baker, Peter         New York Times      M   \n",
       "13524182        daveweigel        Weigel, David        Washington Post      M   \n",
       "46557945     StevenTDennis    Dennis, Steven T.         Bloomberg News      M   \n",
       "15931637           jonkarl       Karl, Jonathan               ABC News      M   \n",
       "9126752        reporterjoe     Gould, Joseph M.  Sightline Media Group      M   \n",
       "39155029            mkraju        Raju, Manu K.                    CNN      M   \n",
       "16930125          edatpost      O’Keefe, Edward        Washington Post      M   \n",
       "48802204     HardballChris      Matthews, Chris               NBC News      M   \n",
       "19107878       GlennThrush     Thrush, Glenn H.         New York Times      M   \n",
       "217550862     BresPolitico      Bresnahan, John               Politico      M   \n",
       "24439201      jameshohmann    Hohmann, James P.        Washington Post      M   \n",
       "18678924          jmartNYT     Martin, Jonathan         New York Times      M   \n",
       "22891564      chrisgeidner       Geidner, Chris               BuzzFeed      M   \n",
       "112526560         kenvogel    Vogel, Kenneth P.               Politico      M   \n",
       "18646108         BretBaier          Baier, Bret               Fox News      M   \n",
       "22771961            Acosta          Acosta, Jim                    CNN      M   \n",
       "16067683         pauldemko  Demko, Paul Jeffrey               Politico      M   \n",
       "59676104           danbalz         Balz, Daniel        Washington Post      M   \n",
       "71294756       wolfblitzer        Blitzer, Wolf                    CNN      M   \n",
       "\n",
       "           followers_count        mention_count     mentioning_count  \n",
       "user_id                                                               \n",
       "28496589             15868               239.00                13.00  \n",
       "63149389              8337               235.00                10.00  \n",
       "407013776            31010               212.00                46.00  \n",
       "14529929           1305680               127.00                51.00  \n",
       "169586280            22860               117.00                20.00  \n",
       "997684836            31300               116.00                47.00  \n",
       "123327472            96956               107.00                43.00  \n",
       "13524182            332344               106.00                42.00  \n",
       "46557945             55762               105.00                27.00  \n",
       "15931637            183467               104.00                40.00  \n",
       "9126752               4702                98.00                16.00  \n",
       "39155029             88366                95.00                43.00  \n",
       "16930125             58670                84.00                41.00  \n",
       "48802204            718330                80.00                 9.00  \n",
       "19107878            308181                78.00                37.00  \n",
       "217550862            40562                78.00                27.00  \n",
       "24439201             38708                78.00                27.00  \n",
       "18678924            197322                75.00                37.00  \n",
       "22891564             83316                73.00                15.00  \n",
       "112526560            53894                67.00                45.00  \n",
       "18646108           1095184                66.00                18.00  \n",
       "22771961            350650                61.00                38.00  \n",
       "16067683              8170                60.00                13.00  \n",
       "59676104             90819                57.00                26.00  \n",
       "71294756           1281914                56.00                30.00  "
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "male_journalists_mention_summary_df = journalists_mention_summary_df[journalists_mention_summary_df.gender == 'M']\n",
    "male_journalists_mention_summary_df.to_csv('output/male_journalists_mentioned_by_journalists.csv')\n",
    "male_journalists_mention_summary_df[journalist_mention_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times are male journalists mentioned by journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>mention_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1,299.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>6.39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>17.31</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>5.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>239.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             mention_count\n",
       "count             1,299.00\n",
       "mean                  6.39\n",
       "std                  17.31\n",
       "min                   0.00\n",
       "25%                   0.00\n",
       "50%                   1.00\n",
       "75%                   5.00\n",
       "max                 239.00"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "male_journalists_mention_summary_df[['mention_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Female journalists mentioning other journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of female journalists mentioning other journalists, who do they mention the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>164.00</td>\n",
       "      <td>20.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>116.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>79.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>169586280</th>\n",
       "      <td>WaPoSean</td>\n",
       "      <td>Sullivan, Sean</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>22860</td>\n",
       "      <td>71.00</td>\n",
       "      <td>11.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>48802204</th>\n",
       "      <td>HardballChris</td>\n",
       "      <td>Matthews, Chris</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>718330</td>\n",
       "      <td>70.00</td>\n",
       "      <td>3.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>64.00</td>\n",
       "      <td>16.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22891564</th>\n",
       "      <td>chrisgeidner</td>\n",
       "      <td>Geidner, Chris</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>M</td>\n",
       "      <td>83316</td>\n",
       "      <td>61.00</td>\n",
       "      <td>6.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108617810</th>\n",
       "      <td>DanaBashCNN</td>\n",
       "      <td>Bash, Dana</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>281861</td>\n",
       "      <td>60.00</td>\n",
       "      <td>26.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16067683</th>\n",
       "      <td>pauldemko</td>\n",
       "      <td>Demko, Paul Jeffrey</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>8170</td>\n",
       "      <td>57.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>313545488</th>\n",
       "      <td>LauraLitvan</td>\n",
       "      <td>Litvan, Laura</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>F</td>\n",
       "      <td>4468</td>\n",
       "      <td>53.00</td>\n",
       "      <td>2.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52392666</th>\n",
       "      <td>ZoeTillman</td>\n",
       "      <td>Tillman, Zoe</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>F</td>\n",
       "      <td>15246</td>\n",
       "      <td>52.00</td>\n",
       "      <td>8.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>49.00</td>\n",
       "      <td>11.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82151660</th>\n",
       "      <td>kelsey_snell</td>\n",
       "      <td>Snell, Kelse</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>8108</td>\n",
       "      <td>47.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247852986</th>\n",
       "      <td>rachanadixit</td>\n",
       "      <td>Pradhan, Rachana D.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>6178</td>\n",
       "      <td>43.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9126752</th>\n",
       "      <td>reporterjoe</td>\n",
       "      <td>Gould, Joseph M.</td>\n",
       "      <td>Sightline Media Group</td>\n",
       "      <td>M</td>\n",
       "      <td>4702</td>\n",
       "      <td>43.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>40.00</td>\n",
       "      <td>21.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16930125</th>\n",
       "      <td>edatpost</td>\n",
       "      <td>O’Keefe, Edward</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>58670</td>\n",
       "      <td>40.00</td>\n",
       "      <td>18.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>37.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16149614</th>\n",
       "      <td>jrovner</td>\n",
       "      <td>Rovner, Julie</td>\n",
       "      <td>Kaiser Health News</td>\n",
       "      <td>F</td>\n",
       "      <td>21844</td>\n",
       "      <td>35.00</td>\n",
       "      <td>14.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997684836</th>\n",
       "      <td>pkcapitol</td>\n",
       "      <td>Kane, Paul</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>31300</td>\n",
       "      <td>35.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>35.00</td>\n",
       "      <td>12.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>158072303</th>\n",
       "      <td>ValerieInsinna</td>\n",
       "      <td>Insinna, Valerie</td>\n",
       "      <td>Defense News</td>\n",
       "      <td>F</td>\n",
       "      <td>4572</td>\n",
       "      <td>35.00</td>\n",
       "      <td>2.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15931637</th>\n",
       "      <td>jonkarl</td>\n",
       "      <td>Karl, Jonathan</td>\n",
       "      <td>ABC News</td>\n",
       "      <td>M</td>\n",
       "      <td>183467</td>\n",
       "      <td>33.00</td>\n",
       "      <td>18.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>342226913</th>\n",
       "      <td>GregStohr</td>\n",
       "      <td>Stohr, Greg</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>7245</td>\n",
       "      <td>32.00</td>\n",
       "      <td>2.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>297532865</th>\n",
       "      <td>kwelkernbc</td>\n",
       "      <td>Welker, Kristen</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>99234</td>\n",
       "      <td>31.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name                    name           organization  \\\n",
       "user_id                                                                    \n",
       "407013776       burgessev        Everett, John B.               Politico   \n",
       "16018516           jenhab  Haberkorn, Jennifer A.               Politico   \n",
       "46557945    StevenTDennis       Dennis, Steven T.         Bloomberg News   \n",
       "169586280        WaPoSean          Sullivan, Sean        Washington Post   \n",
       "48802204    HardballChris         Matthews, Chris               NBC News   \n",
       "19186003      seungminkim          Kim, Seung Min               Politico   \n",
       "22891564     chrisgeidner          Geidner, Chris               BuzzFeed   \n",
       "108617810     DanaBashCNN              Bash, Dana                    CNN   \n",
       "16067683        pauldemko     Demko, Paul Jeffrey               Politico   \n",
       "313545488     LauraLitvan           Litvan, Laura         Bloomberg News   \n",
       "52392666       ZoeTillman            Tillman, Zoe               BuzzFeed   \n",
       "33919343    AshleyRParker          Parker, Ashley        Washington Post   \n",
       "82151660     kelsey_snell            Snell, Kelse        Washington Post   \n",
       "247852986    rachanadixit     Pradhan, Rachana D.               Politico   \n",
       "9126752       reporterjoe        Gould, Joseph M.  Sightline Media Group   \n",
       "14529929       jaketapper            Tapper, Jake                    CNN   \n",
       "16930125         edatpost         O’Keefe, Edward        Washington Post   \n",
       "217550862    BresPolitico         Bresnahan, John               Politico   \n",
       "16149614          jrovner           Rovner, Julie     Kaiser Health News   \n",
       "997684836       pkcapitol              Kane, Paul        Washington Post   \n",
       "12354832            kasie             Hunt, Kasie               NBC News   \n",
       "158072303  ValerieInsinna        Insinna, Valerie           Defense News   \n",
       "15931637          jonkarl          Karl, Jonathan               ABC News   \n",
       "342226913       GregStohr             Stohr, Greg         Bloomberg News   \n",
       "297532865      kwelkernbc         Welker, Kristen               NBC News   \n",
       "\n",
       "          gender  followers_count        mention_count     mentioning_count  \n",
       "user_id                                                                      \n",
       "407013776      M            31010               164.00                20.00  \n",
       "16018516       F            20028               116.00                13.00  \n",
       "46557945       M            55762                79.00                10.00  \n",
       "169586280      M            22860                71.00                11.00  \n",
       "48802204       M           718330                70.00                 3.00  \n",
       "19186003       F            33980                64.00                16.00  \n",
       "22891564       M            83316                61.00                 6.00  \n",
       "108617810      F           281861                60.00                26.00  \n",
       "16067683       M             8170                57.00                10.00  \n",
       "313545488      F             4468                53.00                 2.00  \n",
       "52392666       F            15246                52.00                 8.00  \n",
       "33919343       F           122382                49.00                11.00  \n",
       "82151660       F             8108                47.00                10.00  \n",
       "247852986      F             6178                43.00                 7.00  \n",
       "9126752        M             4702                43.00                 7.00  \n",
       "14529929       M          1305680                40.00                21.00  \n",
       "16930125       M            58670                40.00                18.00  \n",
       "217550862      M            40562                37.00                13.00  \n",
       "16149614       F            21844                35.00                14.00  \n",
       "997684836      M            31300                35.00                13.00  \n",
       "12354832       F           187357                35.00                12.00  \n",
       "158072303      F             4572                35.00                 2.00  \n",
       "15931637       M           183467                33.00                18.00  \n",
       "342226913      M             7245                32.00                 2.00  \n",
       "297532865      F            99234                31.00                 9.00  "
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mentioned_by_female_summary_df = journalist_mention_summary(journalists_mention_df[journalists_mention_df.gender == 'F'])\n",
    "journalists_mentioned_by_female_summary_df.to_csv('output/journalists_mentioned_by_female_journalists.csv')\n",
    "journalists_mentioned_by_female_summary_df[journalist_mention_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of female journalists mentioning journalists, how many are male / female?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_mentions</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>3162</td>\n",
       "      <td>54.8%</td>\n",
       "      <td>2.43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>2605</td>\n",
       "      <td>45.2%</td>\n",
       "      <td>2.62</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_mentions\n",
       "index                                       \n",
       "M       3162      54.8%                 2.43\n",
       "F       2605      45.2%                 2.62"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_mention_gender_summary(journalists_mention_df[journalists_mention_df.gender == 'F'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Male journalists mentioning other journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of male journalists mentioning other journalists, who do they mention the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>mention_count</th>\n",
       "      <th>mentioning_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>325050734</th>\n",
       "      <td>AllysonRaeWx</td>\n",
       "      <td>Banks, Allyson</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>F</td>\n",
       "      <td>6918</td>\n",
       "      <td>324.00</td>\n",
       "      <td>4.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28496589</th>\n",
       "      <td>TenaciousTopper</td>\n",
       "      <td>Shutt, Charles</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>15868</td>\n",
       "      <td>225.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>63149389</th>\n",
       "      <td>hbwx</td>\n",
       "      <td>Bernstein, Howard</td>\n",
       "      <td>WUSA–TV</td>\n",
       "      <td>M</td>\n",
       "      <td>8337</td>\n",
       "      <td>225.00</td>\n",
       "      <td>4.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>87.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>84.00</td>\n",
       "      <td>30.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>84.00</td>\n",
       "      <td>18.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>997684836</th>\n",
       "      <td>pkcapitol</td>\n",
       "      <td>Kane, Paul</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>31300</td>\n",
       "      <td>81.00</td>\n",
       "      <td>34.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>79.00</td>\n",
       "      <td>25.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>123327472</th>\n",
       "      <td>peterbakernyt</td>\n",
       "      <td>Baker, Peter</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>96956</td>\n",
       "      <td>78.00</td>\n",
       "      <td>29.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26632935</th>\n",
       "      <td>HopeSeck</td>\n",
       "      <td>Hodge Seck, Hope</td>\n",
       "      <td>Military.com</td>\n",
       "      <td>F</td>\n",
       "      <td>4584</td>\n",
       "      <td>76.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15931637</th>\n",
       "      <td>jonkarl</td>\n",
       "      <td>Karl, Jonathan</td>\n",
       "      <td>ABC News</td>\n",
       "      <td>M</td>\n",
       "      <td>183467</td>\n",
       "      <td>71.00</td>\n",
       "      <td>22.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>69.00</td>\n",
       "      <td>31.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>67.00</td>\n",
       "      <td>27.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>66.00</td>\n",
       "      <td>29.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16441088</th>\n",
       "      <td>jestei</td>\n",
       "      <td>Steinhauer, Jennifer</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>F</td>\n",
       "      <td>13452</td>\n",
       "      <td>64.00</td>\n",
       "      <td>17.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82151660</th>\n",
       "      <td>kelsey_snell</td>\n",
       "      <td>Snell, Kelse</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>8108</td>\n",
       "      <td>62.00</td>\n",
       "      <td>12.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24439201</th>\n",
       "      <td>jameshohmann</td>\n",
       "      <td>Hohmann, James P.</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>38708</td>\n",
       "      <td>59.00</td>\n",
       "      <td>17.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18646108</th>\n",
       "      <td>BretBaier</td>\n",
       "      <td>Baier, Bret</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>1095184</td>\n",
       "      <td>59.00</td>\n",
       "      <td>14.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>108617810</th>\n",
       "      <td>DanaBashCNN</td>\n",
       "      <td>Bash, Dana</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>281861</td>\n",
       "      <td>55.00</td>\n",
       "      <td>29.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9126752</th>\n",
       "      <td>reporterjoe</td>\n",
       "      <td>Gould, Joseph M.</td>\n",
       "      <td>Sightline Media Group</td>\n",
       "      <td>M</td>\n",
       "      <td>4702</td>\n",
       "      <td>55.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>381664207</th>\n",
       "      <td>caitlinnowens</td>\n",
       "      <td>Owens, Caitlin N.</td>\n",
       "      <td>Axios</td>\n",
       "      <td>F</td>\n",
       "      <td>5749</td>\n",
       "      <td>55.00</td>\n",
       "      <td>7.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>51.00</td>\n",
       "      <td>20.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204599219</th>\n",
       "      <td>pw_cunningham</td>\n",
       "      <td>Cunningham, Paige</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>F</td>\n",
       "      <td>9255</td>\n",
       "      <td>51.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112526560</th>\n",
       "      <td>kenvogel</td>\n",
       "      <td>Vogel, Kenneth P.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>53894</td>\n",
       "      <td>50.00</td>\n",
       "      <td>32.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36607254</th>\n",
       "      <td>Oriana0214</td>\n",
       "      <td>Pawlyk, Oriana</td>\n",
       "      <td>Military.com</td>\n",
       "      <td>F</td>\n",
       "      <td>6397</td>\n",
       "      <td>50.00</td>\n",
       "      <td>3.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               screen_name                    name           organization  \\\n",
       "user_id                                                                     \n",
       "325050734     AllysonRaeWx          Banks, Allyson                WUSA–TV   \n",
       "28496589   TenaciousTopper          Shutt, Charles                WUSA–TV   \n",
       "63149389              hbwx       Bernstein, Howard                WUSA–TV   \n",
       "14529929        jaketapper            Tapper, Jake                    CNN   \n",
       "13524182        daveweigel           Weigel, David        Washington Post   \n",
       "16018516            jenhab  Haberkorn, Jennifer A.               Politico   \n",
       "997684836        pkcapitol              Kane, Paul        Washington Post   \n",
       "19186003       seungminkim          Kim, Seung Min               Politico   \n",
       "123327472    peterbakernyt            Baker, Peter         New York Times   \n",
       "26632935          HopeSeck        Hodge Seck, Hope           Military.com   \n",
       "15931637           jonkarl          Karl, Jonathan               ABC News   \n",
       "18678924          jmartNYT        Martin, Jonathan         New York Times   \n",
       "39155029            mkraju           Raju, Manu K.                    CNN   \n",
       "19107878       GlennThrush        Thrush, Glenn H.         New York Times   \n",
       "16441088            jestei    Steinhauer, Jennifer         New York Times   \n",
       "82151660      kelsey_snell            Snell, Kelse        Washington Post   \n",
       "24439201      jameshohmann       Hohmann, James P.        Washington Post   \n",
       "18646108         BretBaier             Baier, Bret               Fox News   \n",
       "108617810      DanaBashCNN              Bash, Dana                    CNN   \n",
       "9126752        reporterjoe        Gould, Joseph M.  Sightline Media Group   \n",
       "381664207    caitlinnowens       Owens, Caitlin N.                  Axios   \n",
       "33919343     AshleyRParker          Parker, Ashley        Washington Post   \n",
       "204599219    pw_cunningham       Cunningham, Paige    Washington Examiner   \n",
       "112526560         kenvogel       Vogel, Kenneth P.               Politico   \n",
       "36607254        Oriana0214          Pawlyk, Oriana           Military.com   \n",
       "\n",
       "          gender  followers_count        mention_count     mentioning_count  \n",
       "user_id                                                                      \n",
       "325050734      F             6918               324.00                 4.00  \n",
       "28496589       M            15868               225.00                 7.00  \n",
       "63149389       M             8337               225.00                 4.00  \n",
       "14529929       M          1305680                87.00                30.00  \n",
       "13524182       M           332344                84.00                30.00  \n",
       "16018516       F            20028                84.00                18.00  \n",
       "997684836      M            31300                81.00                34.00  \n",
       "19186003       F            33980                79.00                25.00  \n",
       "123327472      M            96956                78.00                29.00  \n",
       "26632935       F             4584                76.00                 1.00  \n",
       "15931637       M           183467                71.00                22.00  \n",
       "18678924       M           197322                69.00                31.00  \n",
       "39155029       M            88366                67.00                27.00  \n",
       "19107878       M           308181                66.00                29.00  \n",
       "16441088       F            13452                64.00                17.00  \n",
       "82151660       F             8108                62.00                12.00  \n",
       "24439201       M            38708                59.00                17.00  \n",
       "18646108       M          1095184                59.00                14.00  \n",
       "108617810      F           281861                55.00                29.00  \n",
       "9126752        M             4702                55.00                 9.00  \n",
       "381664207      F             5749                55.00                 7.00  \n",
       "33919343       F           122382                51.00                20.00  \n",
       "204599219      F             9255                51.00                 9.00  \n",
       "112526560      M            53894                50.00                32.00  \n",
       "36607254       F             6397                50.00                 3.00  "
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_mentioned_by_male_summary_df = journalist_mention_summary(journalists_mention_df[journalists_mention_df.gender == 'M'])\n",
    "journalists_mentioned_by_male_summary_df.to_csv('output/journalists_mentioned_by_male_journalists.csv')\n",
    "journalists_mentioned_by_male_summary_df[journalist_mention_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of male journalists mentioning other journalists, how many are male / female?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_mentions</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>5136</td>\n",
       "      <td>60.2%</td>\n",
       "      <td>3.95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>3395</td>\n",
       "      <td>39.8%</td>\n",
       "      <td>3.42</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_mentions\n",
       "index                                       \n",
       "M       5136      60.2%                 3.95\n",
       "F       3395      39.8%                 3.42"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_mention_gender_summary(journalists_mention_df[journalists_mention_df.gender == 'M'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retweet data prep"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load retweets from tweets\n",
    "Including retweets and quotes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:root:Loading from tweets/642bf140607547cb9d4c6b1fc49772aa_001.json.gz\n",
      "DEBUG:root:Loaded 50000\n",
      "DEBUG:root:Loaded 100000\n",
      "DEBUG:root:Loaded 150000\n",
      "DEBUG:root:Loaded 200000\n",
      "DEBUG:root:Loaded 250000\n",
      "INFO:root:Loading from tweets/9f7ed17c16a1494c8690b4053609539d_001.json.gz\n",
      "DEBUG:root:Loaded 300000\n",
      "DEBUG:root:Loaded 350000\n",
      "DEBUG:root:Loaded 400000\n",
      "DEBUG:root:Loaded 450000\n",
      "DEBUG:root:Loaded 500000\n",
      "INFO:root:Loading from tweets/41feff28312c433ab004cd822212f4c2_001.json.gz\n",
      "DEBUG:root:Loaded 550000\n",
      "DEBUG:root:Loaded 600000\n",
      "DEBUG:root:Loaded 650000\n",
      "DEBUG:root:Loaded 700000\n",
      "DEBUG:root:Loaded 750000\n",
      "DEBUG:root:Loaded 800000\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "tweet_id               456956\n",
       "user_id                456956\n",
       "screen_name            456956\n",
       "retweet_user_id        456956\n",
       "retweet_screen_name    456956\n",
       "tweet_created_at       456956\n",
       "dtype: int64"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Simply the tweet on load\n",
    "def retweet_transform(tweet):\n",
    "    if tweet_type(tweet) in ('retweet', 'quote'):\n",
    "        retweet = tweet.get('retweeted_status') or tweet.get('quoted_status')\n",
    "        return {\n",
    "            'tweet_id': tweet['id_str'],\n",
    "            'user_id': tweet['user']['id_str'],\n",
    "            'screen_name': tweet['user']['screen_name'],\n",
    "            'retweet_user_id': retweet['user']['id_str'],\n",
    "            'retweet_screen_name': retweet['user']['screen_name'],\n",
    "            'tweet_created_at': date_parse(tweet['created_at'])            \n",
    "        }\n",
    "    return None\n",
    "\n",
    "base_retweet_df = load_tweet_df(retweet_transform, ['tweet_id', 'user_id', 'screen_name', 'retweet_user_id',\n",
    "                                           'retweet_screen_name', 'tweet_created_at'],\n",
    "                           dedupe_columns=['tweet_id'])\n",
    "\n",
    "base_retweet_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tweet_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>screen_name</th>\n",
       "      <th>retweet_user_id</th>\n",
       "      <th>retweet_screen_name</th>\n",
       "      <th>tweet_created_at</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>872631046088601600</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>93069110</td>\n",
       "      <td>maggieNYT</td>\n",
       "      <td>2017-06-08 01:47:08+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>872610483647516673</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>160951141</td>\n",
       "      <td>TomNamako</td>\n",
       "      <td>2017-06-08 00:25:26+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>872609618626826240</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-08 00:22:00+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>872605974699311104</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>93069110</td>\n",
       "      <td>maggieNYT</td>\n",
       "      <td>2017-06-08 00:07:31+00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>872603191518646276</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>94784682</td>\n",
       "      <td>JonathanTurley</td>\n",
       "      <td>2017-06-07 23:56:27+00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             tweet_id    user_id    screen_name retweet_user_id  \\\n",
       "0  872631046088601600  327862439  jonathanvswan        93069110   \n",
       "1  872610483647516673  327862439  jonathanvswan       160951141   \n",
       "2  872609618626826240  327862439  jonathanvswan        18678924   \n",
       "3  872605974699311104  327862439  jonathanvswan        93069110   \n",
       "4  872603191518646276  327862439  jonathanvswan        94784682   \n",
       "\n",
       "  retweet_screen_name          tweet_created_at  \n",
       "0           maggieNYT 2017-06-08 01:47:08+00:00  \n",
       "1           TomNamako 2017-06-08 00:25:26+00:00  \n",
       "2            jmartNYT 2017-06-08 00:22:00+00:00  \n",
       "3           maggieNYT 2017-06-08 00:07:31+00:00  \n",
       "4      JonathanTurley 2017-06-07 23:56:27+00:00  "
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base_retweet_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add gender of retweeter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tweet_id               456956\n",
       "user_id                456956\n",
       "screen_name            456956\n",
       "retweet_user_id        456956\n",
       "retweet_screen_name    456956\n",
       "tweet_created_at       456956\n",
       "gender                 456956\n",
       "dtype: int64"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retweet_df = base_retweet_df.join(user_summary_df['gender'], on='user_id')\n",
    "retweet_df.count()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### How many users have been retweeted by journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "49154"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retweet_df['retweet_user_id'].unique().size"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Limit to retweeted journalists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tweet_id               117048\n",
       "user_id                117048\n",
       "screen_name            117048\n",
       "retweet_user_id        117048\n",
       "retweet_screen_name    117048\n",
       "tweet_created_at       117048\n",
       "gender                 117048\n",
       "retweet_gender         117048\n",
       "dtype: int64"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweet_df = retweet_df.join(user_summary_df['gender'], how='inner', on='retweet_user_id', rsuffix='_retweet')\n",
    "journalists_retweet_df.rename(columns = {'gender_retweet': 'retweet_gender'}, inplace=True)\n",
    "journalists_retweet_df.count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>tweet_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>screen_name</th>\n",
       "      <th>retweet_user_id</th>\n",
       "      <th>retweet_screen_name</th>\n",
       "      <th>tweet_created_at</th>\n",
       "      <th>gender</th>\n",
       "      <th>retweet_gender</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>872609618626826240</td>\n",
       "      <td>327862439</td>\n",
       "      <td>jonathanvswan</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-08 00:22:00+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>435</th>\n",
       "      <td>871437820044464128</td>\n",
       "      <td>242169927</td>\n",
       "      <td>colinwilhelm</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-04 18:45:41+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1406</th>\n",
       "      <td>872620054889857024</td>\n",
       "      <td>163589845</td>\n",
       "      <td>PoliticoKevin</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-08 01:03:28+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1424</th>\n",
       "      <td>872240756597174272</td>\n",
       "      <td>163589845</td>\n",
       "      <td>PoliticoKevin</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-06 23:56:16+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1455</th>\n",
       "      <td>870749993279385601</td>\n",
       "      <td>163589845</td>\n",
       "      <td>PoliticoKevin</td>\n",
       "      <td>18678924</td>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>2017-06-02 21:12:30+00:00</td>\n",
       "      <td>M</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                tweet_id    user_id    screen_name retweet_user_id  \\\n",
       "2     872609618626826240  327862439  jonathanvswan        18678924   \n",
       "435   871437820044464128  242169927   colinwilhelm        18678924   \n",
       "1406  872620054889857024  163589845  PoliticoKevin        18678924   \n",
       "1424  872240756597174272  163589845  PoliticoKevin        18678924   \n",
       "1455  870749993279385601  163589845  PoliticoKevin        18678924   \n",
       "\n",
       "     retweet_screen_name          tweet_created_at gender retweet_gender  \n",
       "2               jmartNYT 2017-06-08 00:22:00+00:00      M              M  \n",
       "435             jmartNYT 2017-06-04 18:45:41+00:00      M              M  \n",
       "1406            jmartNYT 2017-06-08 01:03:28+00:00      M              M  \n",
       "1424            jmartNYT 2017-06-06 23:56:16+00:00      M              M  \n",
       "1455            jmartNYT 2017-06-02 21:12:30+00:00      M              M  "
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweet_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Functions for summarizing retweets by beltway journalists"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Gender of beltway journalists retweeted by beltway journalists\n",
    "def journalist_retweet_gender_summary(retweet_df):\n",
    "    gender_summary_df = pd.DataFrame({'count':retweet_df.retweet_gender.value_counts(), \n",
    "                  'percentage': retweet_df.retweet_gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
    "    gender_summary_df.reset_index(inplace=True)\n",
    "    gender_summary_df['avg_retweets'] = gender_summary_df.apply(lambda row: row['count'] / journalist_gender_summary_df.loc[row['index']]['count'], axis=1)    \n",
    "    gender_summary_df.set_index('index', inplace=True, drop=True)\n",
    "    return gender_summary_df\n",
    "\n",
    "\n",
    "def journalist_retweet_summary(retweet_df):\n",
    "    # Retweet count\n",
    "    retweet_count_df = pd.DataFrame(retweet_df.retweet_user_id.value_counts().rename('retweet_count'))\n",
    "\n",
    "    # Retweeting users. That is, the number of unique users retweeting each user.\n",
    "    retweet_user_id_per_user_df = retweet_df[['retweet_user_id', 'user_id']].drop_duplicates()\n",
    "    retweeting_user_count_df = pd.DataFrame(retweet_user_id_per_user_df.groupby('retweet_user_id').size(), columns=['retweeting_count'])\n",
    "    retweeting_user_count_df.index.name = 'user_id'\n",
    "\n",
    "    # Join with user summary\n",
    "    journalist_retweet_summary_df = user_summary_df.join([retweet_count_df, retweeting_user_count_df])\n",
    "    journalist_retweet_summary_df.fillna(0, inplace=True)\n",
    "    journalist_retweet_summary_df = journalist_retweet_summary_df.sort_values(['retweet_count', 'retweeting_count', 'followers_count'], ascending=False)\n",
    "    return journalist_retweet_summary_df\n",
    "\n",
    "# Gender of top journalists retweeted by beltway journalists\n",
    "def top_journalist_retweet_gender_summary(retweet_summary_df, retweeting_count_threshold=0, head=100):\n",
    "    top_retweet_summary_df = retweet_summary_df[retweet_summary_df.retweeting_count > retweeting_count_threshold].head(head)\n",
    "    return pd.DataFrame({'count': top_retweet_summary_df.gender.value_counts(), \n",
    "                  'percentage': top_retweet_summary_df.gender.value_counts(normalize=True).mul(100).round(1).astype(str) + '%'})\n",
    "\n",
    "# Fields for displaying journalist mention summaries\n",
    "journalist_retweet_summary_fields = ['screen_name', 'name', 'organization', 'gender', 'followers_count', 'retweet_count', 'retweeting_count']\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Retweet analysis\n",
    "*Note that for each of these, the complete list is being written to CSV in the output directory.*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retweets of all accounts (not just journalists)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting other accounts, how many of the retweets are from males / females?\n",
    "That is, by gender of retweeter."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>retweet</th>\n",
       "      <th>quote</th>\n",
       "      <th>total</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_retweets</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gender</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>134,606.00</td>\n",
       "      <td>38,998.00</td>\n",
       "      <td>173,604.00</td>\n",
       "      <td>38.0%</td>\n",
       "      <td>174.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>210,660.00</td>\n",
       "      <td>72,692.00</td>\n",
       "      <td>283,352.00</td>\n",
       "      <td>62.0%</td>\n",
       "      <td>218.13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    retweet                quote                total  \\\n",
       "gender                                                                  \n",
       "F                134,606.00            38,998.00           173,604.00   \n",
       "M                210,660.00            72,692.00           283,352.00   \n",
       "\n",
       "       percentage         avg_retweets  \n",
       "gender                                  \n",
       "F           38.0%               174.83  \n",
       "M           62.0%               218.13  "
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retweets_by_gender_df = user_summary_df[['gender', 'retweet', 'quote']].groupby('gender').sum()\n",
    "retweets_by_gender_df['total'] = retweets_by_gender_df.retweet + retweets_by_gender_df.quote\n",
    "retweets_by_gender_df['percentage'] = retweets_by_gender_df.total.div(retweets_by_gender_df.total.sum()).mul(100).round(1).astype(str) + '%'\n",
    "retweets_by_gender_df.reset_index(inplace=True)\n",
    "retweets_by_gender_df['avg_retweets'] = retweets_by_gender_df.apply(lambda row: row['total'] / journalist_gender_summary_df.loc[row['gender']]['count'], axis=1)\n",
    "retweets_by_gender_df.set_index('gender', inplace=True, drop=True)\n",
    "retweets_by_gender_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting other accounts, who retweets the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>tweet_count</th>\n",
       "      <th>retweet</th>\n",
       "      <th>quote</th>\n",
       "      <th>tweets_in_dataset</th>\n",
       "      <th>retweet_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2453025128</th>\n",
       "      <td>gloriaminott</td>\n",
       "      <td>Minott, Gloria</td>\n",
       "      <td>WPFW–FM</td>\n",
       "      <td>F</td>\n",
       "      <td>586</td>\n",
       "      <td>61473</td>\n",
       "      <td>21,524.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>21,547.00</td>\n",
       "      <td>21,524.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>304988603</th>\n",
       "      <td>NeilWMcCabe</td>\n",
       "      <td>McCabe, Neil</td>\n",
       "      <td>Breitbart News</td>\n",
       "      <td>M</td>\n",
       "      <td>18903</td>\n",
       "      <td>64673</td>\n",
       "      <td>7,528.00</td>\n",
       "      <td>625.00</td>\n",
       "      <td>9,370.00</td>\n",
       "      <td>8,153.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18825339</th>\n",
       "      <td>CahnEmily</td>\n",
       "      <td>Cahn, Emily</td>\n",
       "      <td>Mic</td>\n",
       "      <td>F</td>\n",
       "      <td>16980</td>\n",
       "      <td>100803</td>\n",
       "      <td>4,449.00</td>\n",
       "      <td>1,834.00</td>\n",
       "      <td>8,196.00</td>\n",
       "      <td>6,283.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191964162</th>\n",
       "      <td>SamLitzinger</td>\n",
       "      <td>Litzinger, Sam</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>2329</td>\n",
       "      <td>95236</td>\n",
       "      <td>6,017.00</td>\n",
       "      <td>225.00</td>\n",
       "      <td>7,537.00</td>\n",
       "      <td>6,242.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21612122</th>\n",
       "      <td>HotlineJosh</td>\n",
       "      <td>Kraushaar, Josh P.</td>\n",
       "      <td>National Journal</td>\n",
       "      <td>M</td>\n",
       "      <td>50438</td>\n",
       "      <td>156610</td>\n",
       "      <td>4,881.00</td>\n",
       "      <td>893.00</td>\n",
       "      <td>6,703.00</td>\n",
       "      <td>5,774.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259395895</th>\n",
       "      <td>JohnJHarwood</td>\n",
       "      <td>Harwood, John</td>\n",
       "      <td>CNBC</td>\n",
       "      <td>M</td>\n",
       "      <td>149040</td>\n",
       "      <td>78015</td>\n",
       "      <td>4,570.00</td>\n",
       "      <td>822.00</td>\n",
       "      <td>6,377.00</td>\n",
       "      <td>5,392.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16031927</th>\n",
       "      <td>greta</td>\n",
       "      <td>Van Susteren, Greta</td>\n",
       "      <td>MSNBC</td>\n",
       "      <td>F</td>\n",
       "      <td>1186850</td>\n",
       "      <td>116645</td>\n",
       "      <td>794.00</td>\n",
       "      <td>3,069.00</td>\n",
       "      <td>4,792.00</td>\n",
       "      <td>3,863.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21810329</th>\n",
       "      <td>sdonnan</td>\n",
       "      <td>Donnan, Shawn</td>\n",
       "      <td>Financial Times</td>\n",
       "      <td>M</td>\n",
       "      <td>12311</td>\n",
       "      <td>79125</td>\n",
       "      <td>3,332.00</td>\n",
       "      <td>449.00</td>\n",
       "      <td>4,537.00</td>\n",
       "      <td>3,781.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>47408060</th>\n",
       "      <td>JonathanLanday</td>\n",
       "      <td>Landay, Jonathan</td>\n",
       "      <td>McClatchy Newspapers</td>\n",
       "      <td>M</td>\n",
       "      <td>11213</td>\n",
       "      <td>81042</td>\n",
       "      <td>3,687.00</td>\n",
       "      <td>80.00</td>\n",
       "      <td>4,285.00</td>\n",
       "      <td>3,767.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>169908</td>\n",
       "      <td>2,703.00</td>\n",
       "      <td>859.00</td>\n",
       "      <td>4,564.00</td>\n",
       "      <td>3,562.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21696279</th>\n",
       "      <td>brianbeutler</td>\n",
       "      <td>Beutler, Brian Alfred</td>\n",
       "      <td>New Republic</td>\n",
       "      <td>M</td>\n",
       "      <td>74435</td>\n",
       "      <td>99050</td>\n",
       "      <td>2,694.00</td>\n",
       "      <td>684.00</td>\n",
       "      <td>4,560.00</td>\n",
       "      <td>3,378.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104299137</th>\n",
       "      <td>DavidMDrucker</td>\n",
       "      <td>Drucker, David</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>M</td>\n",
       "      <td>35033</td>\n",
       "      <td>104613</td>\n",
       "      <td>1,377.00</td>\n",
       "      <td>1,955.00</td>\n",
       "      <td>4,907.00</td>\n",
       "      <td>3,332.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>593813785</th>\n",
       "      <td>DonnaYoungDC</td>\n",
       "      <td>Young, Donna</td>\n",
       "      <td>S&amp;P Global Market Intelligence</td>\n",
       "      <td>F</td>\n",
       "      <td>5894</td>\n",
       "      <td>49967</td>\n",
       "      <td>1,740.00</td>\n",
       "      <td>1,327.00</td>\n",
       "      <td>4,414.00</td>\n",
       "      <td>3,067.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>456994513</th>\n",
       "      <td>maria_e_recio</td>\n",
       "      <td>Recio, Maria</td>\n",
       "      <td>Austin American-Statesman</td>\n",
       "      <td>F</td>\n",
       "      <td>1072</td>\n",
       "      <td>40822</td>\n",
       "      <td>2,613.00</td>\n",
       "      <td>336.00</td>\n",
       "      <td>3,370.00</td>\n",
       "      <td>2,949.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19576571</th>\n",
       "      <td>JaredRizzi</td>\n",
       "      <td>Rizzi, Jared</td>\n",
       "      <td>Sirius XM Satellite Radio</td>\n",
       "      <td>M</td>\n",
       "      <td>13545</td>\n",
       "      <td>41620</td>\n",
       "      <td>2,112.00</td>\n",
       "      <td>828.00</td>\n",
       "      <td>5,567.00</td>\n",
       "      <td>2,940.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16459325</th>\n",
       "      <td>ryanbeckwith</td>\n",
       "      <td>Beckwith, Ryan Teague</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>20947</td>\n",
       "      <td>92203</td>\n",
       "      <td>2,231.00</td>\n",
       "      <td>521.00</td>\n",
       "      <td>5,187.00</td>\n",
       "      <td>2,752.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>148143</td>\n",
       "      <td>2,435.00</td>\n",
       "      <td>287.00</td>\n",
       "      <td>5,078.00</td>\n",
       "      <td>2,722.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61734492</th>\n",
       "      <td>Fahrenthold</td>\n",
       "      <td>Fahrenthold, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>451778</td>\n",
       "      <td>27573</td>\n",
       "      <td>2,505.00</td>\n",
       "      <td>184.00</td>\n",
       "      <td>2,871.00</td>\n",
       "      <td>2,689.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19545932</th>\n",
       "      <td>kampeas</td>\n",
       "      <td>Kampeas, Ron</td>\n",
       "      <td>Jewish Telegraphic Agency</td>\n",
       "      <td>M</td>\n",
       "      <td>6977</td>\n",
       "      <td>53053</td>\n",
       "      <td>1,988.00</td>\n",
       "      <td>444.00</td>\n",
       "      <td>3,249.00</td>\n",
       "      <td>2,432.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42352386</th>\n",
       "      <td>rschles</td>\n",
       "      <td>Schlesinger, Robert</td>\n",
       "      <td>U.S. News &amp; World Report</td>\n",
       "      <td>M</td>\n",
       "      <td>4553</td>\n",
       "      <td>35375</td>\n",
       "      <td>1,644.00</td>\n",
       "      <td>617.00</td>\n",
       "      <td>2,459.00</td>\n",
       "      <td>2,261.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25702314</th>\n",
       "      <td>EricMGarcia</td>\n",
       "      <td>Garcia, Eric M.</td>\n",
       "      <td>CQ Roll Call</td>\n",
       "      <td>M</td>\n",
       "      <td>3094</td>\n",
       "      <td>44783</td>\n",
       "      <td>528.00</td>\n",
       "      <td>1,723.00</td>\n",
       "      <td>3,584.00</td>\n",
       "      <td>2,251.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18646108</th>\n",
       "      <td>BretBaier</td>\n",
       "      <td>Baier, Bret</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>1095184</td>\n",
       "      <td>52271</td>\n",
       "      <td>1,623.00</td>\n",
       "      <td>615.00</td>\n",
       "      <td>2,379.00</td>\n",
       "      <td>2,238.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15486163</th>\n",
       "      <td>SimonMarksFSN</td>\n",
       "      <td>Marks, Simon</td>\n",
       "      <td>Feature Story News</td>\n",
       "      <td>M</td>\n",
       "      <td>7767</td>\n",
       "      <td>41541</td>\n",
       "      <td>1,296.00</td>\n",
       "      <td>934.00</td>\n",
       "      <td>3,432.00</td>\n",
       "      <td>2,230.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>106970</td>\n",
       "      <td>1,665.00</td>\n",
       "      <td>467.00</td>\n",
       "      <td>2,810.00</td>\n",
       "      <td>2,132.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15730608</th>\n",
       "      <td>edroso</td>\n",
       "      <td>Edroso, Roy</td>\n",
       "      <td>UCG</td>\n",
       "      <td>M</td>\n",
       "      <td>4696</td>\n",
       "      <td>38064</td>\n",
       "      <td>1,714.00</td>\n",
       "      <td>379.00</td>\n",
       "      <td>2,883.00</td>\n",
       "      <td>2,093.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               screen_name                   name  \\\n",
       "user_id                                             \n",
       "2453025128    gloriaminott         Minott, Gloria   \n",
       "304988603      NeilWMcCabe           McCabe, Neil   \n",
       "18825339         CahnEmily            Cahn, Emily   \n",
       "191964162     SamLitzinger         Litzinger, Sam   \n",
       "21612122       HotlineJosh     Kraushaar, Josh P.   \n",
       "259395895     JohnJHarwood          Harwood, John   \n",
       "16031927             greta    Van Susteren, Greta   \n",
       "21810329           sdonnan          Donnan, Shawn   \n",
       "47408060    JonathanLanday       Landay, Jonathan   \n",
       "13524182        daveweigel          Weigel, David   \n",
       "21696279      brianbeutler  Beutler, Brian Alfred   \n",
       "104299137    DavidMDrucker         Drucker, David   \n",
       "593813785     DonnaYoungDC           Young, Donna   \n",
       "456994513    maria_e_recio           Recio, Maria   \n",
       "19576571        JaredRizzi           Rizzi, Jared   \n",
       "16459325      ryanbeckwith  Beckwith, Ryan Teague   \n",
       "14529929        jaketapper           Tapper, Jake   \n",
       "61734492       Fahrenthold     Fahrenthold, David   \n",
       "19545932           kampeas           Kampeas, Ron   \n",
       "42352386           rschles    Schlesinger, Robert   \n",
       "25702314       EricMGarcia        Garcia, Eric M.   \n",
       "18646108         BretBaier            Baier, Bret   \n",
       "15486163     SimonMarksFSN           Marks, Simon   \n",
       "18678924          jmartNYT       Martin, Jonathan   \n",
       "15730608            edroso            Edroso, Roy   \n",
       "\n",
       "                              organization gender  followers_count  \\\n",
       "user_id                                                              \n",
       "2453025128                         WPFW–FM      F              586   \n",
       "304988603                   Breitbart News      M            18903   \n",
       "18825339                               Mic      F            16980   \n",
       "191964162                         CBS News      M             2329   \n",
       "21612122                  National Journal      M            50438   \n",
       "259395895                             CNBC      M           149040   \n",
       "16031927                             MSNBC      F          1186850   \n",
       "21810329                   Financial Times      M            12311   \n",
       "47408060              McClatchy Newspapers      M            11213   \n",
       "13524182                   Washington Post      M           332344   \n",
       "21696279                      New Republic      M            74435   \n",
       "104299137              Washington Examiner      M            35033   \n",
       "593813785   S&P Global Market Intelligence      F             5894   \n",
       "456994513        Austin American-Statesman      F             1072   \n",
       "19576571         Sirius XM Satellite Radio      M            13545   \n",
       "16459325                     Time Magazine      M            20947   \n",
       "14529929                               CNN      M          1305680   \n",
       "61734492                   Washington Post      M           451778   \n",
       "19545932         Jewish Telegraphic Agency      M             6977   \n",
       "42352386          U.S. News & World Report      M             4553   \n",
       "25702314                      CQ Roll Call      M             3094   \n",
       "18646108                          Fox News      M          1095184   \n",
       "15486163                Feature Story News      M             7767   \n",
       "18678924                    New York Times      M           197322   \n",
       "15730608                               UCG      M             4696   \n",
       "\n",
       "            tweet_count              retweet                quote  \\\n",
       "user_id                                                             \n",
       "2453025128        61473            21,524.00                 0.00   \n",
       "304988603         64673             7,528.00               625.00   \n",
       "18825339         100803             4,449.00             1,834.00   \n",
       "191964162         95236             6,017.00               225.00   \n",
       "21612122         156610             4,881.00               893.00   \n",
       "259395895         78015             4,570.00               822.00   \n",
       "16031927         116645               794.00             3,069.00   \n",
       "21810329          79125             3,332.00               449.00   \n",
       "47408060          81042             3,687.00                80.00   \n",
       "13524182         169908             2,703.00               859.00   \n",
       "21696279          99050             2,694.00               684.00   \n",
       "104299137        104613             1,377.00             1,955.00   \n",
       "593813785         49967             1,740.00             1,327.00   \n",
       "456994513         40822             2,613.00               336.00   \n",
       "19576571          41620             2,112.00               828.00   \n",
       "16459325          92203             2,231.00               521.00   \n",
       "14529929         148143             2,435.00               287.00   \n",
       "61734492          27573             2,505.00               184.00   \n",
       "19545932          53053             1,988.00               444.00   \n",
       "42352386          35375             1,644.00               617.00   \n",
       "25702314          44783               528.00             1,723.00   \n",
       "18646108          52271             1,623.00               615.00   \n",
       "15486163          41541             1,296.00               934.00   \n",
       "18678924         106970             1,665.00               467.00   \n",
       "15730608          38064             1,714.00               379.00   \n",
       "\n",
       "              tweets_in_dataset        retweet_count  \n",
       "user_id                                               \n",
       "2453025128            21,547.00            21,524.00  \n",
       "304988603              9,370.00             8,153.00  \n",
       "18825339               8,196.00             6,283.00  \n",
       "191964162              7,537.00             6,242.00  \n",
       "21612122               6,703.00             5,774.00  \n",
       "259395895              6,377.00             5,392.00  \n",
       "16031927               4,792.00             3,863.00  \n",
       "21810329               4,537.00             3,781.00  \n",
       "47408060               4,285.00             3,767.00  \n",
       "13524182               4,564.00             3,562.00  \n",
       "21696279               4,560.00             3,378.00  \n",
       "104299137              4,907.00             3,332.00  \n",
       "593813785              4,414.00             3,067.00  \n",
       "456994513              3,370.00             2,949.00  \n",
       "19576571               5,567.00             2,940.00  \n",
       "16459325               5,187.00             2,752.00  \n",
       "14529929               5,078.00             2,722.00  \n",
       "61734492               2,871.00             2,689.00  \n",
       "19545932               3,249.00             2,432.00  \n",
       "42352386               2,459.00             2,261.00  \n",
       "25702314               3,584.00             2,251.00  \n",
       "18646108               2,379.00             2,238.00  \n",
       "15486163               3,432.00             2,230.00  \n",
       "18678924               2,810.00             2,132.00  \n",
       "15730608               2,883.00             2,093.00  "
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "retweet_user_summary_df = user_summary_df.loc[:,('screen_name', 'name', 'organization', 'gender', 'followers_count', 'tweet_count', 'retweet', 'quote', 'tweets_in_dataset')]\n",
    "retweet_user_summary_df['retweet_count'] = retweet_user_summary_df.retweet + retweet_user_summary_df.quote\n",
    "retweet_user_summary_df.sort_values(['retweet_count'], ascending=False).head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting other accounts, who is retweeted the most?\n",
    "This is based on screen name, which could have changed during collection period. However, for the users that would be at the top of this list, seems unlikely."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>realDonaldTrump</th>\n",
       "      <td>6650</td>\n",
       "      <td>807</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>thehill</th>\n",
       "      <td>5424</td>\n",
       "      <td>457</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>BraddJaffy</th>\n",
       "      <td>3564</td>\n",
       "      <td>554</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>maggieNYT</th>\n",
       "      <td>3024</td>\n",
       "      <td>530</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>business</th>\n",
       "      <td>3000</td>\n",
       "      <td>229</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>washingtonpost</th>\n",
       "      <td>2638</td>\n",
       "      <td>498</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>AP</th>\n",
       "      <td>2480</td>\n",
       "      <td>581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>politico</th>\n",
       "      <td>2335</td>\n",
       "      <td>334</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>nytimes</th>\n",
       "      <td>2268</td>\n",
       "      <td>485</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>WSJ</th>\n",
       "      <td>1949</td>\n",
       "      <td>213</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>burgessev</th>\n",
       "      <td>1836</td>\n",
       "      <td>289</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>kylegriffin1</th>\n",
       "      <td>1803</td>\n",
       "      <td>429</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ZekeJMiller</th>\n",
       "      <td>1723</td>\n",
       "      <td>387</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CNN</th>\n",
       "      <td>1602</td>\n",
       "      <td>366</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GlennThrush</th>\n",
       "      <td>1577</td>\n",
       "      <td>451</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Reuters</th>\n",
       "      <td>1487</td>\n",
       "      <td>265</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jaketapper</th>\n",
       "      <td>1459</td>\n",
       "      <td>397</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>TheEconomist</th>\n",
       "      <td>1458</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>StevenTDennis</th>\n",
       "      <td>1403</td>\n",
       "      <td>280</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>FoxNews</th>\n",
       "      <td>1400</td>\n",
       "      <td>258</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>seungminkim</th>\n",
       "      <td>1393</td>\n",
       "      <td>327</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mkraju</th>\n",
       "      <td>1359</td>\n",
       "      <td>341</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>PhilipRucker</th>\n",
       "      <td>1349</td>\n",
       "      <td>365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>markknoller</th>\n",
       "      <td>1343</td>\n",
       "      <td>341</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MEPFuller</th>\n",
       "      <td>1324</td>\n",
       "      <td>286</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 retweet_count  retweeting_count\n",
       "realDonaldTrump           6650               807\n",
       "thehill                   5424               457\n",
       "BraddJaffy                3564               554\n",
       "maggieNYT                 3024               530\n",
       "business                  3000               229\n",
       "washingtonpost            2638               498\n",
       "AP                        2480               581\n",
       "politico                  2335               334\n",
       "nytimes                   2268               485\n",
       "WSJ                       1949               213\n",
       "burgessev                 1836               289\n",
       "kylegriffin1              1803               429\n",
       "ZekeJMiller               1723               387\n",
       "CNN                       1602               366\n",
       "GlennThrush               1577               451\n",
       "Reuters                   1487               265\n",
       "jaketapper                1459               397\n",
       "TheEconomist              1458                86\n",
       "StevenTDennis             1403               280\n",
       "FoxNews                   1400               258\n",
       "seungminkim               1393               327\n",
       "mkraju                    1359               341\n",
       "PhilipRucker              1349               365\n",
       "markknoller               1343               341\n",
       "MEPFuller                 1324               286"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Retweet count\n",
    "retweet_count_screen_name_df = pd.DataFrame(retweet_df.retweet_screen_name.value_counts().rename('retweet_count'))\n",
    "\n",
    "# Count of retweeting users\n",
    "retweet_user_id_per_user_screen_name_df = retweet_df[['retweet_screen_name', 'user_id']].drop_duplicates()\n",
    "retweeting_count_screen_name_df = pd.DataFrame(retweet_user_id_per_user_screen_name_df.groupby('retweet_screen_name').size(), columns=['retweeting_count'])\n",
    "retweeting_count_screen_name_df.index.name = 'screen_name'\n",
    "\n",
    "all_retweeted_df = retweet_count_screen_name_df.join(retweeting_count_screen_name_df)\n",
    "all_retweeted_df.to_csv('output/all_retweeted_by_journalists.csv')\n",
    "all_retweeted_df.head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists retweeting other journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting other journalists, who is retweeted the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>1,836.00</td>\n",
       "      <td>289.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21316253</th>\n",
       "      <td>ZekeJMiller</td>\n",
       "      <td>Miller, Zeke J.</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>198517</td>\n",
       "      <td>1,723.00</td>\n",
       "      <td>387.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>1,577.00</td>\n",
       "      <td>451.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>1,459.00</td>\n",
       "      <td>397.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>1,403.00</td>\n",
       "      <td>280.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>1,393.00</td>\n",
       "      <td>327.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>1,359.00</td>\n",
       "      <td>341.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31127446</th>\n",
       "      <td>markknoller</td>\n",
       "      <td>Knoller, Mark</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>301474</td>\n",
       "      <td>1,343.00</td>\n",
       "      <td>341.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398088661</th>\n",
       "      <td>MEPFuller</td>\n",
       "      <td>Fuller, Matt E.</td>\n",
       "      <td>Huffington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>77919</td>\n",
       "      <td>1,324.00</td>\n",
       "      <td>286.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>1,221.00</td>\n",
       "      <td>306.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14007532</th>\n",
       "      <td>frankthorp</td>\n",
       "      <td>Thorp, Frank</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>39798</td>\n",
       "      <td>1,207.00</td>\n",
       "      <td>334.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19847765</th>\n",
       "      <td>sahilkapur</td>\n",
       "      <td>Kapur, Sahil</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>69086</td>\n",
       "      <td>1,186.00</td>\n",
       "      <td>296.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16187637</th>\n",
       "      <td>ChadPergram</td>\n",
       "      <td>Pergram, Chad</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>59305</td>\n",
       "      <td>1,177.00</td>\n",
       "      <td>297.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104914594</th>\n",
       "      <td>Phil_Mattingly</td>\n",
       "      <td>Mattingly, Phil</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>40119</td>\n",
       "      <td>1,120.00</td>\n",
       "      <td>314.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16006592</th>\n",
       "      <td>BenjySarlin</td>\n",
       "      <td>Sarlin, Benjamin</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>78075</td>\n",
       "      <td>1,039.00</td>\n",
       "      <td>215.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259395895</th>\n",
       "      <td>JohnJHarwood</td>\n",
       "      <td>Harwood, John</td>\n",
       "      <td>CNBC</td>\n",
       "      <td>M</td>\n",
       "      <td>149040</td>\n",
       "      <td>1,011.00</td>\n",
       "      <td>277.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21252618</th>\n",
       "      <td>JakeSherman</td>\n",
       "      <td>Sherman, Jacob S.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>81762</td>\n",
       "      <td>943.00</td>\n",
       "      <td>281.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33653195</th>\n",
       "      <td>ericawerner</td>\n",
       "      <td>Werner, Erica</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>14049</td>\n",
       "      <td>939.00</td>\n",
       "      <td>281.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>916.00</td>\n",
       "      <td>247.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>909.00</td>\n",
       "      <td>388.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70511174</th>\n",
       "      <td>Hadas_Gold</td>\n",
       "      <td>Gold, Hadas</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>45221</td>\n",
       "      <td>849.00</td>\n",
       "      <td>306.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>829.00</td>\n",
       "      <td>315.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104299137</th>\n",
       "      <td>DavidMDrucker</td>\n",
       "      <td>Drucker, David</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>M</td>\n",
       "      <td>35033</td>\n",
       "      <td>770.00</td>\n",
       "      <td>193.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>593813785</th>\n",
       "      <td>DonnaYoungDC</td>\n",
       "      <td>Young, Donna</td>\n",
       "      <td>S&amp;P Global Market Intelligence</td>\n",
       "      <td>F</td>\n",
       "      <td>5894</td>\n",
       "      <td>708.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118130765</th>\n",
       "      <td>dylanlscott</td>\n",
       "      <td>Scott, Dylan L.</td>\n",
       "      <td>Stat News</td>\n",
       "      <td>M</td>\n",
       "      <td>20122</td>\n",
       "      <td>705.00</td>\n",
       "      <td>155.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name               name                    organization  \\\n",
       "user_id                                                                        \n",
       "407013776       burgessev   Everett, John B.                        Politico   \n",
       "21316253      ZekeJMiller    Miller, Zeke J.                   Time Magazine   \n",
       "19107878      GlennThrush   Thrush, Glenn H.                  New York Times   \n",
       "14529929       jaketapper       Tapper, Jake                             CNN   \n",
       "46557945    StevenTDennis  Dennis, Steven T.                  Bloomberg News   \n",
       "19186003      seungminkim     Kim, Seung Min                        Politico   \n",
       "39155029           mkraju      Raju, Manu K.                             CNN   \n",
       "31127446      markknoller      Knoller, Mark                        CBS News   \n",
       "398088661       MEPFuller    Fuller, Matt E.                 Huffington Post   \n",
       "13524182       daveweigel      Weigel, David                 Washington Post   \n",
       "14007532       frankthorp       Thorp, Frank                        NBC News   \n",
       "19847765       sahilkapur       Kapur, Sahil                  Bloomberg News   \n",
       "16187637      ChadPergram      Pergram, Chad                        Fox News   \n",
       "104914594  Phil_Mattingly    Mattingly, Phil                             CNN   \n",
       "16006592      BenjySarlin   Sarlin, Benjamin                        NBC News   \n",
       "259395895    JohnJHarwood      Harwood, John                            CNBC   \n",
       "21252618      JakeSherman  Sherman, Jacob S.                        Politico   \n",
       "33653195      ericawerner      Werner, Erica                Associated Press   \n",
       "18678924         jmartNYT   Martin, Jonathan                  New York Times   \n",
       "12354832            kasie        Hunt, Kasie                        NBC News   \n",
       "70511174       Hadas_Gold        Gold, Hadas                        Politico   \n",
       "22771961           Acosta        Acosta, Jim                             CNN   \n",
       "104299137   DavidMDrucker     Drucker, David             Washington Examiner   \n",
       "593813785    DonnaYoungDC       Young, Donna  S&P Global Market Intelligence   \n",
       "118130765     dylanlscott    Scott, Dylan L.                       Stat News   \n",
       "\n",
       "          gender  followers_count        retweet_count     retweeting_count  \n",
       "user_id                                                                      \n",
       "407013776      M            31010             1,836.00               289.00  \n",
       "21316253       M           198517             1,723.00               387.00  \n",
       "19107878       M           308181             1,577.00               451.00  \n",
       "14529929       M          1305680             1,459.00               397.00  \n",
       "46557945       M            55762             1,403.00               280.00  \n",
       "19186003       F            33980             1,393.00               327.00  \n",
       "39155029       M            88366             1,359.00               341.00  \n",
       "31127446       M           301474             1,343.00               341.00  \n",
       "398088661      M            77919             1,324.00               286.00  \n",
       "13524182       M           332344             1,221.00               306.00  \n",
       "14007532       M            39798             1,207.00               334.00  \n",
       "19847765       M            69086             1,186.00               296.00  \n",
       "16187637       M            59305             1,177.00               297.00  \n",
       "104914594      M            40119             1,120.00               314.00  \n",
       "16006592       M            78075             1,039.00               215.00  \n",
       "259395895      M           149040             1,011.00               277.00  \n",
       "21252618       M            81762               943.00               281.00  \n",
       "33653195       F            14049               939.00               281.00  \n",
       "18678924       M           197322               916.00               247.00  \n",
       "12354832       F           187357               909.00               388.00  \n",
       "70511174       F            45221               849.00               306.00  \n",
       "22771961       M           350650               829.00               315.00  \n",
       "104299137      M            35033               770.00               193.00  \n",
       "593813785      F             5894               708.00                13.00  \n",
       "118130765      M            20122               705.00               155.00  "
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweet_summary_df = journalist_retweet_summary(journalists_retweet_df)\n",
    "journalists_retweet_summary_df.to_csv('output/journalists_retweeted_by_journalists.csv')\n",
    "journalists_retweet_summary_df[journalist_retweet_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting other journalists, how many of the retweets are of males / females?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_retweets</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>80634</td>\n",
       "      <td>68.9%</td>\n",
       "      <td>62.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>36414</td>\n",
       "      <td>31.1%</td>\n",
       "      <td>36.67</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_retweets\n",
       "index                                       \n",
       "M      80634      68.9%                62.07\n",
       "F      36414      31.1%                36.67"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_retweet_gender_summary(journalists_retweet_df)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times are journalists retweeted by other journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>retweet_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>2,292.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>51.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>149.06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>6.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>33.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1,836.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             retweet_count\n",
       "count             2,292.00\n",
       "mean                 51.07\n",
       "std                 149.06\n",
       "min                   0.00\n",
       "25%                   0.00\n",
       "50%                   6.00\n",
       "75%                  33.00\n",
       "max               1,836.00"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweet_summary_df[['retweet_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists retweeting female journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting female journalists, who is retweeted the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>1,393.00</td>\n",
       "      <td>327.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33653195</th>\n",
       "      <td>ericawerner</td>\n",
       "      <td>Werner, Erica</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>14049</td>\n",
       "      <td>939.00</td>\n",
       "      <td>281.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>909.00</td>\n",
       "      <td>388.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70511174</th>\n",
       "      <td>Hadas_Gold</td>\n",
       "      <td>Gold, Hadas</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>45221</td>\n",
       "      <td>849.00</td>\n",
       "      <td>306.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>593813785</th>\n",
       "      <td>DonnaYoungDC</td>\n",
       "      <td>Young, Donna</td>\n",
       "      <td>S&amp;P Global Market Intelligence</td>\n",
       "      <td>F</td>\n",
       "      <td>5894</td>\n",
       "      <td>708.00</td>\n",
       "      <td>13.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167024520</th>\n",
       "      <td>rachaelmbade</td>\n",
       "      <td>Bade, Rachel M.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>30164</td>\n",
       "      <td>614.00</td>\n",
       "      <td>161.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33919343</th>\n",
       "      <td>AshleyRParker</td>\n",
       "      <td>Parker, Ashley</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>F</td>\n",
       "      <td>122382</td>\n",
       "      <td>539.00</td>\n",
       "      <td>268.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139738464</th>\n",
       "      <td>mj_lee</td>\n",
       "      <td>Lee, MJ</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>31940</td>\n",
       "      <td>518.00</td>\n",
       "      <td>189.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16018516</th>\n",
       "      <td>jenhab</td>\n",
       "      <td>Haberkorn, Jennifer A.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>20028</td>\n",
       "      <td>474.00</td>\n",
       "      <td>136.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18825339</th>\n",
       "      <td>CahnEmily</td>\n",
       "      <td>Cahn, Emily</td>\n",
       "      <td>Mic</td>\n",
       "      <td>F</td>\n",
       "      <td>16980</td>\n",
       "      <td>444.00</td>\n",
       "      <td>118.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>45399148</th>\n",
       "      <td>jeneps</td>\n",
       "      <td>Epstein, Jennifer</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>F</td>\n",
       "      <td>61242</td>\n",
       "      <td>443.00</td>\n",
       "      <td>189.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>705706292</th>\n",
       "      <td>rebeccaballhaus</td>\n",
       "      <td>Ballhaus, Rebecca</td>\n",
       "      <td>Wall Street Journal / Dow Jones</td>\n",
       "      <td>F</td>\n",
       "      <td>24638</td>\n",
       "      <td>409.00</td>\n",
       "      <td>154.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19734832</th>\n",
       "      <td>sarahkliff</td>\n",
       "      <td>Kliff, Sarah L.</td>\n",
       "      <td>Vox Media</td>\n",
       "      <td>F</td>\n",
       "      <td>100090</td>\n",
       "      <td>392.00</td>\n",
       "      <td>136.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>163995093</th>\n",
       "      <td>AlexNBCNews</td>\n",
       "      <td>Moe, Alexandra</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>21689</td>\n",
       "      <td>388.00</td>\n",
       "      <td>134.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>237477771</th>\n",
       "      <td>juliehdavis</td>\n",
       "      <td>Davis, Julie</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>F</td>\n",
       "      <td>49821</td>\n",
       "      <td>375.00</td>\n",
       "      <td>194.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16149614</th>\n",
       "      <td>jrovner</td>\n",
       "      <td>Rovner, Julie</td>\n",
       "      <td>Kaiser Health News</td>\n",
       "      <td>F</td>\n",
       "      <td>21844</td>\n",
       "      <td>351.00</td>\n",
       "      <td>137.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116341480</th>\n",
       "      <td>RosieGray</td>\n",
       "      <td>Gray, Rosie</td>\n",
       "      <td>The Atlantic</td>\n",
       "      <td>F</td>\n",
       "      <td>96935</td>\n",
       "      <td>345.00</td>\n",
       "      <td>125.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28181835</th>\n",
       "      <td>jpaceDC</td>\n",
       "      <td>Pace, Julie</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>46017</td>\n",
       "      <td>328.00</td>\n",
       "      <td>132.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52392666</th>\n",
       "      <td>ZoeTillman</td>\n",
       "      <td>Tillman, Zoe</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>F</td>\n",
       "      <td>15246</td>\n",
       "      <td>312.00</td>\n",
       "      <td>70.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>906734342</th>\n",
       "      <td>KimberlyRobinsn</td>\n",
       "      <td>Robinson, Kimberly S.</td>\n",
       "      <td>Bloomberg BNA</td>\n",
       "      <td>F</td>\n",
       "      <td>7170</td>\n",
       "      <td>308.00</td>\n",
       "      <td>38.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188857501</th>\n",
       "      <td>alexis_levinson</td>\n",
       "      <td>Levinson, Alexis R.</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>F</td>\n",
       "      <td>25375</td>\n",
       "      <td>288.00</td>\n",
       "      <td>111.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>56552341</th>\n",
       "      <td>LACaldwellDC</td>\n",
       "      <td>Caldwell, Leigh Ann</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>8464</td>\n",
       "      <td>282.00</td>\n",
       "      <td>98.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>151444950</th>\n",
       "      <td>DaviSusan</td>\n",
       "      <td>Davis, Susan</td>\n",
       "      <td>National Public Radio</td>\n",
       "      <td>F</td>\n",
       "      <td>27297</td>\n",
       "      <td>270.00</td>\n",
       "      <td>150.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>360080772</th>\n",
       "      <td>FoxReports</td>\n",
       "      <td>Fox, Lauren</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>7282</td>\n",
       "      <td>269.00</td>\n",
       "      <td>116.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>313545488</th>\n",
       "      <td>LauraLitvan</td>\n",
       "      <td>Litvan, Laura</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>F</td>\n",
       "      <td>4468</td>\n",
       "      <td>269.00</td>\n",
       "      <td>115.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               screen_name                    name  \\\n",
       "user_id                                              \n",
       "19186003       seungminkim          Kim, Seung Min   \n",
       "33653195       ericawerner           Werner, Erica   \n",
       "12354832             kasie             Hunt, Kasie   \n",
       "70511174        Hadas_Gold             Gold, Hadas   \n",
       "593813785     DonnaYoungDC            Young, Donna   \n",
       "167024520     rachaelmbade         Bade, Rachel M.   \n",
       "33919343     AshleyRParker          Parker, Ashley   \n",
       "139738464           mj_lee                 Lee, MJ   \n",
       "16018516            jenhab  Haberkorn, Jennifer A.   \n",
       "18825339         CahnEmily             Cahn, Emily   \n",
       "45399148            jeneps       Epstein, Jennifer   \n",
       "705706292  rebeccaballhaus       Ballhaus, Rebecca   \n",
       "19734832        sarahkliff         Kliff, Sarah L.   \n",
       "163995093      AlexNBCNews          Moe, Alexandra   \n",
       "237477771      juliehdavis            Davis, Julie   \n",
       "16149614           jrovner           Rovner, Julie   \n",
       "116341480        RosieGray             Gray, Rosie   \n",
       "28181835           jpaceDC             Pace, Julie   \n",
       "52392666        ZoeTillman            Tillman, Zoe   \n",
       "906734342  KimberlyRobinsn   Robinson, Kimberly S.   \n",
       "188857501  alexis_levinson     Levinson, Alexis R.   \n",
       "56552341      LACaldwellDC     Caldwell, Leigh Ann   \n",
       "151444950        DaviSusan            Davis, Susan   \n",
       "360080772       FoxReports             Fox, Lauren   \n",
       "313545488      LauraLitvan           Litvan, Laura   \n",
       "\n",
       "                              organization gender  followers_count  \\\n",
       "user_id                                                              \n",
       "19186003                          Politico      F            33980   \n",
       "33653195                  Associated Press      F            14049   \n",
       "12354832                          NBC News      F           187357   \n",
       "70511174                          Politico      F            45221   \n",
       "593813785   S&P Global Market Intelligence      F             5894   \n",
       "167024520                         Politico      F            30164   \n",
       "33919343                   Washington Post      F           122382   \n",
       "139738464                              CNN      F            31940   \n",
       "16018516                          Politico      F            20028   \n",
       "18825339                               Mic      F            16980   \n",
       "45399148                    Bloomberg News      F            61242   \n",
       "705706292  Wall Street Journal / Dow Jones      F            24638   \n",
       "19734832                         Vox Media      F           100090   \n",
       "163995093                         NBC News      F            21689   \n",
       "237477771                   New York Times      F            49821   \n",
       "16149614                Kaiser Health News      F            21844   \n",
       "116341480                     The Atlantic      F            96935   \n",
       "28181835                  Associated Press      F            46017   \n",
       "52392666                          BuzzFeed      F            15246   \n",
       "906734342                    Bloomberg BNA      F             7170   \n",
       "188857501                         BuzzFeed      F            25375   \n",
       "56552341                          NBC News      F             8464   \n",
       "151444950            National Public Radio      F            27297   \n",
       "360080772                              CNN      F             7282   \n",
       "313545488                   Bloomberg News      F             4468   \n",
       "\n",
       "                 retweet_count     retweeting_count  \n",
       "user_id                                              \n",
       "19186003              1,393.00               327.00  \n",
       "33653195                939.00               281.00  \n",
       "12354832                909.00               388.00  \n",
       "70511174                849.00               306.00  \n",
       "593813785               708.00                13.00  \n",
       "167024520               614.00               161.00  \n",
       "33919343                539.00               268.00  \n",
       "139738464               518.00               189.00  \n",
       "16018516                474.00               136.00  \n",
       "18825339                444.00               118.00  \n",
       "45399148                443.00               189.00  \n",
       "705706292               409.00               154.00  \n",
       "19734832                392.00               136.00  \n",
       "163995093               388.00               134.00  \n",
       "237477771               375.00               194.00  \n",
       "16149614                351.00               137.00  \n",
       "116341480               345.00               125.00  \n",
       "28181835                328.00               132.00  \n",
       "52392666                312.00                70.00  \n",
       "906734342               308.00                38.00  \n",
       "188857501               288.00               111.00  \n",
       "56552341                282.00                98.00  \n",
       "151444950               270.00               150.00  \n",
       "360080772               269.00               116.00  \n",
       "313545488               269.00               115.00  "
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "female_journalists_retweet_summary_df = journalists_retweet_summary_df[journalists_retweet_summary_df.gender == 'F']\n",
    "female_journalists_retweet_summary_df.to_csv('output/female_journalists_retweeted_by_journalists.csv')\n",
    "female_journalists_retweet_summary_df[journalist_retweet_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times are female journalists retweeted by other journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>retweet_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>993.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>36.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>97.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>5.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>25.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1,393.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             retweet_count\n",
       "count               993.00\n",
       "mean                 36.67\n",
       "std                  97.34\n",
       "min                   0.00\n",
       "25%                   0.00\n",
       "50%                   5.00\n",
       "75%                  25.00\n",
       "max               1,393.00"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "female_journalists_retweet_summary_df[['retweet_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Journalists retweeting male journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of journalists retweeting male journalists, who is retweeted the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>1,836.00</td>\n",
       "      <td>289.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21316253</th>\n",
       "      <td>ZekeJMiller</td>\n",
       "      <td>Miller, Zeke J.</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>198517</td>\n",
       "      <td>1,723.00</td>\n",
       "      <td>387.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>1,577.00</td>\n",
       "      <td>451.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>1,459.00</td>\n",
       "      <td>397.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>1,403.00</td>\n",
       "      <td>280.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>1,359.00</td>\n",
       "      <td>341.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31127446</th>\n",
       "      <td>markknoller</td>\n",
       "      <td>Knoller, Mark</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>301474</td>\n",
       "      <td>1,343.00</td>\n",
       "      <td>341.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398088661</th>\n",
       "      <td>MEPFuller</td>\n",
       "      <td>Fuller, Matt E.</td>\n",
       "      <td>Huffington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>77919</td>\n",
       "      <td>1,324.00</td>\n",
       "      <td>286.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>1,221.00</td>\n",
       "      <td>306.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14007532</th>\n",
       "      <td>frankthorp</td>\n",
       "      <td>Thorp, Frank</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>39798</td>\n",
       "      <td>1,207.00</td>\n",
       "      <td>334.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19847765</th>\n",
       "      <td>sahilkapur</td>\n",
       "      <td>Kapur, Sahil</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>69086</td>\n",
       "      <td>1,186.00</td>\n",
       "      <td>296.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16187637</th>\n",
       "      <td>ChadPergram</td>\n",
       "      <td>Pergram, Chad</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>59305</td>\n",
       "      <td>1,177.00</td>\n",
       "      <td>297.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104914594</th>\n",
       "      <td>Phil_Mattingly</td>\n",
       "      <td>Mattingly, Phil</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>40119</td>\n",
       "      <td>1,120.00</td>\n",
       "      <td>314.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16006592</th>\n",
       "      <td>BenjySarlin</td>\n",
       "      <td>Sarlin, Benjamin</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>78075</td>\n",
       "      <td>1,039.00</td>\n",
       "      <td>215.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259395895</th>\n",
       "      <td>JohnJHarwood</td>\n",
       "      <td>Harwood, John</td>\n",
       "      <td>CNBC</td>\n",
       "      <td>M</td>\n",
       "      <td>149040</td>\n",
       "      <td>1,011.00</td>\n",
       "      <td>277.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21252618</th>\n",
       "      <td>JakeSherman</td>\n",
       "      <td>Sherman, Jacob S.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>81762</td>\n",
       "      <td>943.00</td>\n",
       "      <td>281.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>916.00</td>\n",
       "      <td>247.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>829.00</td>\n",
       "      <td>315.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104299137</th>\n",
       "      <td>DavidMDrucker</td>\n",
       "      <td>Drucker, David</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>M</td>\n",
       "      <td>35033</td>\n",
       "      <td>770.00</td>\n",
       "      <td>193.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118130765</th>\n",
       "      <td>dylanlscott</td>\n",
       "      <td>Scott, Dylan L.</td>\n",
       "      <td>Stat News</td>\n",
       "      <td>M</td>\n",
       "      <td>20122</td>\n",
       "      <td>705.00</td>\n",
       "      <td>155.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3817401</th>\n",
       "      <td>ericgeller</td>\n",
       "      <td>Geller, Eric</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>58173</td>\n",
       "      <td>704.00</td>\n",
       "      <td>225.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>699.00</td>\n",
       "      <td>223.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22129280</th>\n",
       "      <td>jimsciutto</td>\n",
       "      <td>Sciutto, James</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>172012</td>\n",
       "      <td>688.00</td>\n",
       "      <td>242.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61734492</th>\n",
       "      <td>Fahrenthold</td>\n",
       "      <td>Fahrenthold, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>451778</td>\n",
       "      <td>654.00</td>\n",
       "      <td>284.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15463671</th>\n",
       "      <td>samstein</td>\n",
       "      <td>Stein, Sam</td>\n",
       "      <td>Huffington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>313211</td>\n",
       "      <td>642.00</td>\n",
       "      <td>229.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name                name         organization gender  \\\n",
       "user_id                                                                     \n",
       "407013776       burgessev    Everett, John B.             Politico      M   \n",
       "21316253      ZekeJMiller     Miller, Zeke J.        Time Magazine      M   \n",
       "19107878      GlennThrush    Thrush, Glenn H.       New York Times      M   \n",
       "14529929       jaketapper        Tapper, Jake                  CNN      M   \n",
       "46557945    StevenTDennis   Dennis, Steven T.       Bloomberg News      M   \n",
       "39155029           mkraju       Raju, Manu K.                  CNN      M   \n",
       "31127446      markknoller       Knoller, Mark             CBS News      M   \n",
       "398088661       MEPFuller     Fuller, Matt E.      Huffington Post      M   \n",
       "13524182       daveweigel       Weigel, David      Washington Post      M   \n",
       "14007532       frankthorp        Thorp, Frank             NBC News      M   \n",
       "19847765       sahilkapur        Kapur, Sahil       Bloomberg News      M   \n",
       "16187637      ChadPergram       Pergram, Chad             Fox News      M   \n",
       "104914594  Phil_Mattingly     Mattingly, Phil                  CNN      M   \n",
       "16006592      BenjySarlin    Sarlin, Benjamin             NBC News      M   \n",
       "259395895    JohnJHarwood       Harwood, John                 CNBC      M   \n",
       "21252618      JakeSherman   Sherman, Jacob S.             Politico      M   \n",
       "18678924         jmartNYT    Martin, Jonathan       New York Times      M   \n",
       "22771961           Acosta         Acosta, Jim                  CNN      M   \n",
       "104299137   DavidMDrucker      Drucker, David  Washington Examiner      M   \n",
       "118130765     dylanlscott     Scott, Dylan L.            Stat News      M   \n",
       "3817401        ericgeller        Geller, Eric             Politico      M   \n",
       "217550862    BresPolitico     Bresnahan, John             Politico      M   \n",
       "22129280       jimsciutto      Sciutto, James                  CNN      M   \n",
       "61734492      Fahrenthold  Fahrenthold, David      Washington Post      M   \n",
       "15463671         samstein          Stein, Sam      Huffington Post      M   \n",
       "\n",
       "           followers_count        retweet_count     retweeting_count  \n",
       "user_id                                                               \n",
       "407013776            31010             1,836.00               289.00  \n",
       "21316253            198517             1,723.00               387.00  \n",
       "19107878            308181             1,577.00               451.00  \n",
       "14529929           1305680             1,459.00               397.00  \n",
       "46557945             55762             1,403.00               280.00  \n",
       "39155029             88366             1,359.00               341.00  \n",
       "31127446            301474             1,343.00               341.00  \n",
       "398088661            77919             1,324.00               286.00  \n",
       "13524182            332344             1,221.00               306.00  \n",
       "14007532             39798             1,207.00               334.00  \n",
       "19847765             69086             1,186.00               296.00  \n",
       "16187637             59305             1,177.00               297.00  \n",
       "104914594            40119             1,120.00               314.00  \n",
       "16006592             78075             1,039.00               215.00  \n",
       "259395895           149040             1,011.00               277.00  \n",
       "21252618             81762               943.00               281.00  \n",
       "18678924            197322               916.00               247.00  \n",
       "22771961            350650               829.00               315.00  \n",
       "104299137            35033               770.00               193.00  \n",
       "118130765            20122               705.00               155.00  \n",
       "3817401              58173               704.00               225.00  \n",
       "217550862            40562               699.00               223.00  \n",
       "22129280            172012               688.00               242.00  \n",
       "61734492            451778               654.00               284.00  \n",
       "15463671            313211               642.00               229.00  "
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "male_journalists_retweet_summary_df = journalists_retweet_summary_df[journalists_retweet_summary_df.gender == 'M']\n",
    "male_journalists_retweet_summary_df.to_csv('output/male_journalists_retweeted_by_journalists.csv')\n",
    "male_journalists_retweet_summary_df[journalist_retweet_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times are male journalists retweeted by other journalists?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>retweet_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1,299.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>62.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>178.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>8.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>39.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>1,836.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             retweet_count\n",
       "count             1,299.00\n",
       "mean                 62.07\n",
       "std                 178.04\n",
       "min                   0.00\n",
       "25%                   1.00\n",
       "50%                   8.00\n",
       "75%                  39.50\n",
       "max               1,836.00"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "male_journalists_retweet_summary_df[['retweet_count']].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Female journalists retweeting other journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of female journalists retweeting other journalists, who is retweeted the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>748.00</td>\n",
       "      <td>122.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>593813785</th>\n",
       "      <td>DonnaYoungDC</td>\n",
       "      <td>Young, Donna</td>\n",
       "      <td>S&amp;P Global Market Intelligence</td>\n",
       "      <td>F</td>\n",
       "      <td>5894</td>\n",
       "      <td>704.00</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>572.00</td>\n",
       "      <td>142.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31127446</th>\n",
       "      <td>markknoller</td>\n",
       "      <td>Knoller, Mark</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>301474</td>\n",
       "      <td>549.00</td>\n",
       "      <td>140.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21316253</th>\n",
       "      <td>ZekeJMiller</td>\n",
       "      <td>Miller, Zeke J.</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>198517</td>\n",
       "      <td>516.00</td>\n",
       "      <td>149.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>503.00</td>\n",
       "      <td>97.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14007532</th>\n",
       "      <td>frankthorp</td>\n",
       "      <td>Thorp, Frank</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>39798</td>\n",
       "      <td>470.00</td>\n",
       "      <td>140.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>463.00</td>\n",
       "      <td>165.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33653195</th>\n",
       "      <td>ericawerner</td>\n",
       "      <td>Werner, Erica</td>\n",
       "      <td>Associated Press</td>\n",
       "      <td>F</td>\n",
       "      <td>14049</td>\n",
       "      <td>452.00</td>\n",
       "      <td>119.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398088661</th>\n",
       "      <td>MEPFuller</td>\n",
       "      <td>Fuller, Matt E.</td>\n",
       "      <td>Huffington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>77919</td>\n",
       "      <td>447.00</td>\n",
       "      <td>116.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>403.00</td>\n",
       "      <td>132.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>388.00</td>\n",
       "      <td>158.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104914594</th>\n",
       "      <td>Phil_Mattingly</td>\n",
       "      <td>Mattingly, Phil</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>40119</td>\n",
       "      <td>372.00</td>\n",
       "      <td>129.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>118130765</th>\n",
       "      <td>dylanlscott</td>\n",
       "      <td>Scott, Dylan L.</td>\n",
       "      <td>Stat News</td>\n",
       "      <td>M</td>\n",
       "      <td>20122</td>\n",
       "      <td>367.00</td>\n",
       "      <td>67.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16187637</th>\n",
       "      <td>ChadPergram</td>\n",
       "      <td>Pergram, Chad</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>59305</td>\n",
       "      <td>365.00</td>\n",
       "      <td>122.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>344.00</td>\n",
       "      <td>164.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19847765</th>\n",
       "      <td>sahilkapur</td>\n",
       "      <td>Kapur, Sahil</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>69086</td>\n",
       "      <td>338.00</td>\n",
       "      <td>103.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>167024520</th>\n",
       "      <td>rachaelmbade</td>\n",
       "      <td>Bade, Rachel M.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>30164</td>\n",
       "      <td>303.00</td>\n",
       "      <td>59.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21252618</th>\n",
       "      <td>JakeSherman</td>\n",
       "      <td>Sherman, Jacob S.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>81762</td>\n",
       "      <td>302.00</td>\n",
       "      <td>106.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22891564</th>\n",
       "      <td>chrisgeidner</td>\n",
       "      <td>Geidner, Chris</td>\n",
       "      <td>BuzzFeed</td>\n",
       "      <td>M</td>\n",
       "      <td>83316</td>\n",
       "      <td>287.00</td>\n",
       "      <td>61.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70511174</th>\n",
       "      <td>Hadas_Gold</td>\n",
       "      <td>Gold, Hadas</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>45221</td>\n",
       "      <td>279.00</td>\n",
       "      <td>111.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>265.00</td>\n",
       "      <td>119.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>139738464</th>\n",
       "      <td>mj_lee</td>\n",
       "      <td>Lee, MJ</td>\n",
       "      <td>CNN</td>\n",
       "      <td>F</td>\n",
       "      <td>31940</td>\n",
       "      <td>259.00</td>\n",
       "      <td>79.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>217550862</th>\n",
       "      <td>BresPolitico</td>\n",
       "      <td>Bresnahan, John</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>40562</td>\n",
       "      <td>256.00</td>\n",
       "      <td>82.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>61734492</th>\n",
       "      <td>Fahrenthold</td>\n",
       "      <td>Fahrenthold, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>451778</td>\n",
       "      <td>253.00</td>\n",
       "      <td>115.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name                name                    organization  \\\n",
       "user_id                                                                         \n",
       "407013776       burgessev    Everett, John B.                        Politico   \n",
       "593813785    DonnaYoungDC        Young, Donna  S&P Global Market Intelligence   \n",
       "19186003      seungminkim      Kim, Seung Min                        Politico   \n",
       "31127446      markknoller       Knoller, Mark                        CBS News   \n",
       "21316253      ZekeJMiller     Miller, Zeke J.                   Time Magazine   \n",
       "46557945    StevenTDennis   Dennis, Steven T.                  Bloomberg News   \n",
       "14007532       frankthorp        Thorp, Frank                        NBC News   \n",
       "19107878      GlennThrush    Thrush, Glenn H.                  New York Times   \n",
       "33653195      ericawerner       Werner, Erica                Associated Press   \n",
       "398088661       MEPFuller     Fuller, Matt E.                 Huffington Post   \n",
       "39155029           mkraju       Raju, Manu K.                             CNN   \n",
       "14529929       jaketapper        Tapper, Jake                             CNN   \n",
       "104914594  Phil_Mattingly     Mattingly, Phil                             CNN   \n",
       "118130765     dylanlscott     Scott, Dylan L.                       Stat News   \n",
       "16187637      ChadPergram       Pergram, Chad                        Fox News   \n",
       "12354832            kasie         Hunt, Kasie                        NBC News   \n",
       "19847765       sahilkapur        Kapur, Sahil                  Bloomberg News   \n",
       "167024520    rachaelmbade     Bade, Rachel M.                        Politico   \n",
       "21252618      JakeSherman   Sherman, Jacob S.                        Politico   \n",
       "22891564     chrisgeidner      Geidner, Chris                        BuzzFeed   \n",
       "70511174       Hadas_Gold         Gold, Hadas                        Politico   \n",
       "22771961           Acosta         Acosta, Jim                             CNN   \n",
       "139738464          mj_lee             Lee, MJ                             CNN   \n",
       "217550862    BresPolitico     Bresnahan, John                        Politico   \n",
       "61734492      Fahrenthold  Fahrenthold, David                 Washington Post   \n",
       "\n",
       "          gender  followers_count        retweet_count     retweeting_count  \n",
       "user_id                                                                      \n",
       "407013776      M            31010               748.00               122.00  \n",
       "593813785      F             5894               704.00                 9.00  \n",
       "19186003       F            33980               572.00               142.00  \n",
       "31127446       M           301474               549.00               140.00  \n",
       "21316253       M           198517               516.00               149.00  \n",
       "46557945       M            55762               503.00                97.00  \n",
       "14007532       M            39798               470.00               140.00  \n",
       "19107878       M           308181               463.00               165.00  \n",
       "33653195       F            14049               452.00               119.00  \n",
       "398088661      M            77919               447.00               116.00  \n",
       "39155029       M            88366               403.00               132.00  \n",
       "14529929       M          1305680               388.00               158.00  \n",
       "104914594      M            40119               372.00               129.00  \n",
       "118130765      M            20122               367.00                67.00  \n",
       "16187637       M            59305               365.00               122.00  \n",
       "12354832       F           187357               344.00               164.00  \n",
       "19847765       M            69086               338.00               103.00  \n",
       "167024520      F            30164               303.00                59.00  \n",
       "21252618       M            81762               302.00               106.00  \n",
       "22891564       M            83316               287.00                61.00  \n",
       "70511174       F            45221               279.00               111.00  \n",
       "22771961       M           350650               265.00               119.00  \n",
       "139738464      F            31940               259.00                79.00  \n",
       "217550862      M            40562               256.00                82.00  \n",
       "61734492       M           451778               253.00               115.00  "
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweeted_by_female_summary_df = journalist_retweet_summary(journalists_retweet_df[journalists_retweet_df.gender == 'F'])\n",
    "journalists_retweeted_by_female_summary_df.to_csv('output/journalists_retweeted_by_female_journalists.csv')\n",
    "journalists_retweeted_by_female_summary_df[journalist_retweet_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of female journalists retweeting other journalists, how many are male / female?\n",
    "Average is of female journalists retweeting other journalists, how many retweets does each male / female journalist receive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_retweets</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>25410</td>\n",
       "      <td>59.6%</td>\n",
       "      <td>19.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>17228</td>\n",
       "      <td>40.4%</td>\n",
       "      <td>17.35</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_retweets\n",
       "index                                       \n",
       "M      25410      59.6%                19.56\n",
       "F      17228      40.4%                17.35"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_retweet_gender_summary(journalists_retweet_df[journalists_retweet_df.gender == 'F'])\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times do female journalists retweet male / female / all journalists?\n",
    "That is, retweets per female journalist.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>all</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "      <td>993.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>17.35</td>\n",
       "      <td>25.59</td>\n",
       "      <td>42.94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>45.34</td>\n",
       "      <td>74.55</td>\n",
       "      <td>113.79</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>2.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>4.00</td>\n",
       "      <td>6.00</td>\n",
       "      <td>10.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>16.00</td>\n",
       "      <td>22.00</td>\n",
       "      <td>39.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>857.00</td>\n",
       "      <td>1,779.00</td>\n",
       "      <td>2,385.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         F                    M                  all\n",
       "count               993.00               993.00               993.00\n",
       "mean                 17.35                25.59                42.94\n",
       "std                  45.34                74.55               113.79\n",
       "min                   0.00                 0.00                 0.00\n",
       "25%                   0.00                 1.00                 2.00\n",
       "50%                   4.00                 6.00                10.00\n",
       "75%                  16.00                22.00                39.00\n",
       "max                 857.00             1,779.00             2,385.00"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "female_journalists_retweet_df = journalists_retweet_df[journalists_retweet_df.gender == 'F']\n",
    "female_journalists_retweet_by_gender_df = pd.merge(user_summary_df[user_summary_df.gender == 'F'], female_journalists_retweet_df.groupby(['user_id', 'retweet_gender']).size().unstack(), how='left', left_index=True, right_index=True)[['F', 'M']]\n",
    "female_journalists_retweet_by_gender_df.fillna(0, inplace=True)\n",
    "female_journalists_retweet_by_gender_df['all'] = female_journalists_retweet_by_gender_df.F + female_journalists_retweet_by_gender_df.M\n",
    "female_journalists_retweet_by_gender_df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Male journalists retweeting other journalists"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of male journalists retweeting other journalists, who is retweeted the most?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>screen_name</th>\n",
       "      <th>name</th>\n",
       "      <th>organization</th>\n",
       "      <th>gender</th>\n",
       "      <th>followers_count</th>\n",
       "      <th>retweet_count</th>\n",
       "      <th>retweeting_count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>user_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>21316253</th>\n",
       "      <td>ZekeJMiller</td>\n",
       "      <td>Miller, Zeke J.</td>\n",
       "      <td>Time Magazine</td>\n",
       "      <td>M</td>\n",
       "      <td>198517</td>\n",
       "      <td>1,207.00</td>\n",
       "      <td>238.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19107878</th>\n",
       "      <td>GlennThrush</td>\n",
       "      <td>Thrush, Glenn H.</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>308181</td>\n",
       "      <td>1,114.00</td>\n",
       "      <td>286.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>407013776</th>\n",
       "      <td>burgessev</td>\n",
       "      <td>Everett, John B.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>31010</td>\n",
       "      <td>1,088.00</td>\n",
       "      <td>167.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14529929</th>\n",
       "      <td>jaketapper</td>\n",
       "      <td>Tapper, Jake</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>1305680</td>\n",
       "      <td>1,071.00</td>\n",
       "      <td>239.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13524182</th>\n",
       "      <td>daveweigel</td>\n",
       "      <td>Weigel, David</td>\n",
       "      <td>Washington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>332344</td>\n",
       "      <td>975.00</td>\n",
       "      <td>209.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39155029</th>\n",
       "      <td>mkraju</td>\n",
       "      <td>Raju, Manu K.</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>88366</td>\n",
       "      <td>956.00</td>\n",
       "      <td>209.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>46557945</th>\n",
       "      <td>StevenTDennis</td>\n",
       "      <td>Dennis, Steven T.</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>55762</td>\n",
       "      <td>900.00</td>\n",
       "      <td>183.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398088661</th>\n",
       "      <td>MEPFuller</td>\n",
       "      <td>Fuller, Matt E.</td>\n",
       "      <td>Huffington Post</td>\n",
       "      <td>M</td>\n",
       "      <td>77919</td>\n",
       "      <td>877.00</td>\n",
       "      <td>170.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19847765</th>\n",
       "      <td>sahilkapur</td>\n",
       "      <td>Kapur, Sahil</td>\n",
       "      <td>Bloomberg News</td>\n",
       "      <td>M</td>\n",
       "      <td>69086</td>\n",
       "      <td>848.00</td>\n",
       "      <td>193.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16006592</th>\n",
       "      <td>BenjySarlin</td>\n",
       "      <td>Sarlin, Benjamin</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>78075</td>\n",
       "      <td>828.00</td>\n",
       "      <td>141.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19186003</th>\n",
       "      <td>seungminkim</td>\n",
       "      <td>Kim, Seung Min</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>33980</td>\n",
       "      <td>821.00</td>\n",
       "      <td>185.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16187637</th>\n",
       "      <td>ChadPergram</td>\n",
       "      <td>Pergram, Chad</td>\n",
       "      <td>Fox News</td>\n",
       "      <td>M</td>\n",
       "      <td>59305</td>\n",
       "      <td>812.00</td>\n",
       "      <td>175.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31127446</th>\n",
       "      <td>markknoller</td>\n",
       "      <td>Knoller, Mark</td>\n",
       "      <td>CBS News</td>\n",
       "      <td>M</td>\n",
       "      <td>301474</td>\n",
       "      <td>794.00</td>\n",
       "      <td>201.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>259395895</th>\n",
       "      <td>JohnJHarwood</td>\n",
       "      <td>Harwood, John</td>\n",
       "      <td>CNBC</td>\n",
       "      <td>M</td>\n",
       "      <td>149040</td>\n",
       "      <td>777.00</td>\n",
       "      <td>196.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104914594</th>\n",
       "      <td>Phil_Mattingly</td>\n",
       "      <td>Mattingly, Phil</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>40119</td>\n",
       "      <td>748.00</td>\n",
       "      <td>185.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14007532</th>\n",
       "      <td>frankthorp</td>\n",
       "      <td>Thorp, Frank</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>M</td>\n",
       "      <td>39798</td>\n",
       "      <td>737.00</td>\n",
       "      <td>194.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18678924</th>\n",
       "      <td>jmartNYT</td>\n",
       "      <td>Martin, Jonathan</td>\n",
       "      <td>New York Times</td>\n",
       "      <td>M</td>\n",
       "      <td>197322</td>\n",
       "      <td>726.00</td>\n",
       "      <td>167.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21252618</th>\n",
       "      <td>JakeSherman</td>\n",
       "      <td>Sherman, Jacob S.</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>81762</td>\n",
       "      <td>641.00</td>\n",
       "      <td>175.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104299137</th>\n",
       "      <td>DavidMDrucker</td>\n",
       "      <td>Drucker, David</td>\n",
       "      <td>Washington Examiner</td>\n",
       "      <td>M</td>\n",
       "      <td>35033</td>\n",
       "      <td>583.00</td>\n",
       "      <td>127.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70511174</th>\n",
       "      <td>Hadas_Gold</td>\n",
       "      <td>Gold, Hadas</td>\n",
       "      <td>Politico</td>\n",
       "      <td>F</td>\n",
       "      <td>45221</td>\n",
       "      <td>570.00</td>\n",
       "      <td>195.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12354832</th>\n",
       "      <td>kasie</td>\n",
       "      <td>Hunt, Kasie</td>\n",
       "      <td>NBC News</td>\n",
       "      <td>F</td>\n",
       "      <td>187357</td>\n",
       "      <td>565.00</td>\n",
       "      <td>224.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22771961</th>\n",
       "      <td>Acosta</td>\n",
       "      <td>Acosta, Jim</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>350650</td>\n",
       "      <td>564.00</td>\n",
       "      <td>196.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19580890</th>\n",
       "      <td>LeeCamp</td>\n",
       "      <td>Camp, Lee</td>\n",
       "      <td>RTTV America</td>\n",
       "      <td>M</td>\n",
       "      <td>67601</td>\n",
       "      <td>560.00</td>\n",
       "      <td>6.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3817401</th>\n",
       "      <td>ericgeller</td>\n",
       "      <td>Geller, Eric</td>\n",
       "      <td>Politico</td>\n",
       "      <td>M</td>\n",
       "      <td>58173</td>\n",
       "      <td>524.00</td>\n",
       "      <td>149.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22129280</th>\n",
       "      <td>jimsciutto</td>\n",
       "      <td>Sciutto, James</td>\n",
       "      <td>CNN</td>\n",
       "      <td>M</td>\n",
       "      <td>172012</td>\n",
       "      <td>507.00</td>\n",
       "      <td>151.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              screen_name               name         organization gender  \\\n",
       "user_id                                                                    \n",
       "21316253      ZekeJMiller    Miller, Zeke J.        Time Magazine      M   \n",
       "19107878      GlennThrush   Thrush, Glenn H.       New York Times      M   \n",
       "407013776       burgessev   Everett, John B.             Politico      M   \n",
       "14529929       jaketapper       Tapper, Jake                  CNN      M   \n",
       "13524182       daveweigel      Weigel, David      Washington Post      M   \n",
       "39155029           mkraju      Raju, Manu K.                  CNN      M   \n",
       "46557945    StevenTDennis  Dennis, Steven T.       Bloomberg News      M   \n",
       "398088661       MEPFuller    Fuller, Matt E.      Huffington Post      M   \n",
       "19847765       sahilkapur       Kapur, Sahil       Bloomberg News      M   \n",
       "16006592      BenjySarlin   Sarlin, Benjamin             NBC News      M   \n",
       "19186003      seungminkim     Kim, Seung Min             Politico      F   \n",
       "16187637      ChadPergram      Pergram, Chad             Fox News      M   \n",
       "31127446      markknoller      Knoller, Mark             CBS News      M   \n",
       "259395895    JohnJHarwood      Harwood, John                 CNBC      M   \n",
       "104914594  Phil_Mattingly    Mattingly, Phil                  CNN      M   \n",
       "14007532       frankthorp       Thorp, Frank             NBC News      M   \n",
       "18678924         jmartNYT   Martin, Jonathan       New York Times      M   \n",
       "21252618      JakeSherman  Sherman, Jacob S.             Politico      M   \n",
       "104299137   DavidMDrucker     Drucker, David  Washington Examiner      M   \n",
       "70511174       Hadas_Gold        Gold, Hadas             Politico      F   \n",
       "12354832            kasie        Hunt, Kasie             NBC News      F   \n",
       "22771961           Acosta        Acosta, Jim                  CNN      M   \n",
       "19580890          LeeCamp          Camp, Lee         RTTV America      M   \n",
       "3817401        ericgeller       Geller, Eric             Politico      M   \n",
       "22129280       jimsciutto     Sciutto, James                  CNN      M   \n",
       "\n",
       "           followers_count        retweet_count     retweeting_count  \n",
       "user_id                                                               \n",
       "21316253            198517             1,207.00               238.00  \n",
       "19107878            308181             1,114.00               286.00  \n",
       "407013776            31010             1,088.00               167.00  \n",
       "14529929           1305680             1,071.00               239.00  \n",
       "13524182            332344               975.00               209.00  \n",
       "39155029             88366               956.00               209.00  \n",
       "46557945             55762               900.00               183.00  \n",
       "398088661            77919               877.00               170.00  \n",
       "19847765             69086               848.00               193.00  \n",
       "16006592             78075               828.00               141.00  \n",
       "19186003             33980               821.00               185.00  \n",
       "16187637             59305               812.00               175.00  \n",
       "31127446            301474               794.00               201.00  \n",
       "259395895           149040               777.00               196.00  \n",
       "104914594            40119               748.00               185.00  \n",
       "14007532             39798               737.00               194.00  \n",
       "18678924            197322               726.00               167.00  \n",
       "21252618             81762               641.00               175.00  \n",
       "104299137            35033               583.00               127.00  \n",
       "70511174             45221               570.00               195.00  \n",
       "12354832            187357               565.00               224.00  \n",
       "22771961            350650               564.00               196.00  \n",
       "19580890             67601               560.00                 6.00  \n",
       "3817401              58173               524.00               149.00  \n",
       "22129280            172012               507.00               151.00  "
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalists_retweeted_by_male_summary_df = journalist_retweet_summary(journalists_retweet_df[journalists_retweet_df.gender == 'M'])\n",
    "journalists_retweeted_by_male_summary_df.to_csv('output/journalists_retweeted_by_male_journalists.csv')\n",
    "journalists_retweeted_by_male_summary_df[journalist_retweet_summary_fields].head(25)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Of male  journalists retweeting other journalists, how many are male / female?\n",
    "Average is of male journalists retweeting other journalists, how many retweets does each male / female journalist receive."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>percentage</th>\n",
       "      <th>avg_retweets</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>55224</td>\n",
       "      <td>74.2%</td>\n",
       "      <td>42.51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>19186</td>\n",
       "      <td>25.8%</td>\n",
       "      <td>19.32</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       count percentage         avg_retweets\n",
       "index                                       \n",
       "M      55224      74.2%                42.51\n",
       "F      19186      25.8%                19.32"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "journalist_retweet_gender_summary(journalists_retweet_df[journalists_retweet_df.gender == 'M'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### On average, how many times do male journalists retweet male / female / all journalists?\n",
    "That is, retweets per male journalist.  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>F</th>\n",
       "      <th>M</th>\n",
       "      <th>all</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "      <td>1,299.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>14.77</td>\n",
       "      <td>42.51</td>\n",
       "      <td>57.28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>33.50</td>\n",
       "      <td>106.87</td>\n",
       "      <td>136.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "      <td>0.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>0.00</td>\n",
       "      <td>1.00</td>\n",
       "      <td>1.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>3.00</td>\n",
       "      <td>7.00</td>\n",
       "      <td>11.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>14.00</td>\n",
       "      <td>35.00</td>\n",
       "      <td>50.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>442.00</td>\n",
       "      <td>1,414.00</td>\n",
       "      <td>1,766.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         F                    M                  all\n",
       "count             1,299.00             1,299.00             1,299.00\n",
       "mean                 14.77                42.51                57.28\n",
       "std                  33.50               106.87               136.92\n",
       "min                   0.00                 0.00                 0.00\n",
       "25%                   0.00                 1.00                 1.00\n",
       "50%                   3.00                 7.00                11.00\n",
       "75%                  14.00                35.00                50.00\n",
       "max                 442.00             1,414.00             1,766.00"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "male_journalists_retweet_df = journalists_retweet_df[journalists_retweet_df.gender == 'M']\n",
    "male_journalists_retweet_by_gender_df = pd.merge(user_summary_df[user_summary_df.gender == 'M'], male_journalists_retweet_df.groupby(['user_id', 'retweet_gender']).size().unstack(), how='left', left_index=True, right_index=True)[['F', 'M']]\n",
    "male_journalists_retweet_by_gender_df.fillna(0, inplace=True)\n",
    "male_journalists_retweet_by_gender_df['all'] = male_journalists_retweet_by_gender_df.F + male_journalists_retweet_by_gender_df.M\n",
    "male_journalists_retweet_by_gender_df.describe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  },
  "toc": {
   "nav_menu": {
    "height": "512px",
    "width": "252px"
   },
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "toc_cell": true,
   "toc_position": {
    "height": "674px",
    "left": "0px",
    "right": "1254px",
    "top": "112px",
    "width": "282px"
   },
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}