{ "metadata": { "name": "", "signature": "sha256:5b95a8302b15faeeb3cd06add8368865bcd635b7cc339daa35ec7ebd79bb6106" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Analysis of Pronoun Usage In Presidential Addresses\n", "\n", "This notebook is designed to look at how presidents have used first person vs. second person pronouns during their speeches." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import pandas as pd\n", "import json\n", "import nltk" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 20 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load in Data\n", "\n", "The data used in this notebook comes from Vocativ's collection of presidential addressses, which can be found here: https://github.com/Vocativ-data/presidents_readability" ] }, { "cell_type": "code", "collapsed": false, "input": [ "objects = json.loads(open(\"../../vocativ_president_data/The original speeches.json\").read())[\"objects\"]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df = pd.DataFrame(objects)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"word_count\"] = speeches_df[\"Text\"].apply(lambda x: len(x.split()))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 4 }, { "cell_type": "code", "collapsed": false, "input": [ "json_data = open().read()" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"tokens\"] = speeches_df[\"Text\"].apply(lambda x: nltk.word_tokenize(x))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find and Count All First-Person Singular Pronouns" ] }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"i\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"i\"]), axis=1)\n", "speeches_df[\"me\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"me\"]), axis=1)\n", "speeches_df[\"my\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"my\"]), axis=1)\n", "speeches_df[\"mine\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"mine\"]), axis=1)\n", "speeches_df[\"myself\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"myself\"]), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"first_person_singular\"] = speeches_df.apply(lambda x: x[\"i\"] + x[\"me\"] + x[\"my\"] +\\\n", " x[\"mine\"] + x[\"myself\"], axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Find And Count All First-Person Plural Pronouns" ] }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"we\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"we\"]), axis=1)\n", "speeches_df[\"our\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"our\"]), axis=1)\n", "speeches_df[\"ours\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"ours\"]), axis=1)\n", "speeches_df[\"ourselves\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"ourselves\"]), axis=1)\n", "speeches_df[\"us\"] = speeches_df.apply(lambda x: len([ t for t in x[\"tokens\"] if t.lower() == \"us\"]), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"first_person_plural\"] = speeches_df.apply(lambda x: x[\"we\"] + x[\"our\"] + x[\"ours\"] + x[\"ourselves\"] + x[\"us\"], axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "speeches_df[\"first_person\"] = speeches_df.apply(lambda x: x[\"first_person_singular\"] + x[\"first_person_singular\"], axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Segment Off Necessary Data Points" ] }, { "cell_type": "code", "collapsed": false, "input": [ "speech_analysis = speeches_df[[\"word_count\", \"tokens\", \"President\", \"first_person\", \n", " \"first_person_singular\", \"first_person_plural\"]]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We only want modern presidents (since 1929) because that's the data that's available for our news conference analysis. This is a list of all the presidents with names matching the data found in the President column of the address dataframe." ] }, { "cell_type": "code", "collapsed": false, "input": [ "news_conf_presidents = [\"Richard Nixon\", \"Gerald Ford\", \"George H. W. Bush\", \"Lyndon B. Johnson\", \"Jimmy Carter\", \n", " \"Bill Clinton\", \"Harry S. Truman\", \"Ronald Reagan\", \"Barack Obama\", \"John F. Kennedy\", \n", " \"Franklin D. Roosevelt\", \"Dwight D. Eisenhower\", \"Herbert Hoover\", \"George W. Bush\"]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 12 }, { "cell_type": "code", "collapsed": false, "input": [ "modern_presidents = speech_analysis[speech_analysis[\"President\"].isin(news_conf_presidents)]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "code", "collapsed": false, "input": [ "presidents = pd.DataFrame(modern_presidents.groupby(\"President\").sum())" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyze Each President's Total Corpus of Speeches" ] }, { "cell_type": "code", "collapsed": false, "input": [ "presidents[\"pct_first\"] = presidents.apply(lambda x: round(100.0 * x[\"first_person\"] / x[\"word_count\"], 2), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "presidents[\"pct_first_singular\"] = presidents.apply(lambda x: round(100.0 * x[\"first_person_singular\"] / x[\"word_count\"], 2), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "code", "collapsed": false, "input": [ "presidents[\"pct_first_plural\"] = presidents.apply(lambda x: round(100.0 * x[\"first_person_plural\"] / x[\"word_count\"], 2), axis=1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "presidents.sort(\"pct_first_singular\", ascending=False)" ], "language": "python", "metadata": {}, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
word_countfirst_personfirst_person_singularfirst_person_pluralpct_firstpct_first_singularpct_first_plural
President
Richard Nixon 67445 3368 1684 1943 4.99 2.50 2.88
Gerald Ford 40301 1950 975 1323 4.84 2.42 3.28
George H. W. Bush 89646 4308 2154 2878 4.81 2.40 3.21
Lyndon B. Johnson 246786 10116 5058 8062 4.10 2.05 3.27
Jimmy Carter 91936 3642 1821 2997 3.96 1.98 3.26
Bill Clinton 145846 5234 2617 5694 3.59 1.79 3.90
Harry S. Truman 31802 1132 566 852 3.56 1.78 2.68
Ronald Reagan 206217 6592 3296 6679 3.20 1.60 3.24
Barack Obama 33672 1046 523 1292 3.11 1.55 3.84
John F. Kennedy 160468 4670 2335 4907 2.91 1.46 3.06
Franklin D. Roosevelt 130024 3034 1517 3222 2.33 1.17 2.48
Dwight D. Eisenhower 17919 354 177 429 1.98 0.99 2.39
George W. Bush 45437 808 404 1818 1.78 0.89 4.00
Herbert Hoover 10718 178 89 303 1.66 0.83 2.83
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ " word_count first_person first_person_singular \\\n", "President \n", "Richard Nixon 67445 3368 1684 \n", "Gerald Ford 40301 1950 975 \n", "George H. W. Bush 89646 4308 2154 \n", "Lyndon B. Johnson 246786 10116 5058 \n", "Jimmy Carter 91936 3642 1821 \n", "Bill Clinton 145846 5234 2617 \n", "Harry S. Truman 31802 1132 566 \n", "Ronald Reagan 206217 6592 3296 \n", "Barack Obama 33672 1046 523 \n", "John F. Kennedy 160468 4670 2335 \n", "Franklin D. Roosevelt 130024 3034 1517 \n", "Dwight D. Eisenhower 17919 354 177 \n", "George W. Bush 45437 808 404 \n", "Herbert Hoover 10718 178 89 \n", "\n", " first_person_plural pct_first pct_first_singular \\\n", "President \n", "Richard Nixon 1943 4.99 2.50 \n", "Gerald Ford 1323 4.84 2.42 \n", "George H. W. Bush 2878 4.81 2.40 \n", "Lyndon B. Johnson 8062 4.10 2.05 \n", "Jimmy Carter 2997 3.96 1.98 \n", "Bill Clinton 5694 3.59 1.79 \n", "Harry S. Truman 852 3.56 1.78 \n", "Ronald Reagan 6679 3.20 1.60 \n", "Barack Obama 1292 3.11 1.55 \n", "John F. Kennedy 4907 2.91 1.46 \n", "Franklin D. Roosevelt 3222 2.33 1.17 \n", "Dwight D. Eisenhower 429 1.98 0.99 \n", "George W. Bush 1818 1.78 0.89 \n", "Herbert Hoover 303 1.66 0.83 \n", "\n", " pct_first_plural \n", "President \n", "Richard Nixon 2.88 \n", "Gerald Ford 3.28 \n", "George H. W. Bush 3.21 \n", "Lyndon B. Johnson 3.27 \n", "Jimmy Carter 3.26 \n", "Bill Clinton 3.90 \n", "Harry S. Truman 2.68 \n", "Ronald Reagan 3.24 \n", "Barack Obama 3.84 \n", "John F. Kennedy 3.06 \n", "Franklin D. Roosevelt 2.48 \n", "Dwight D. Eisenhower 2.39 \n", "George W. Bush 4.00 \n", "Herbert Hoover 2.83 " ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Do a quick calculation to find the overall average so that you can compare it to Obama's 1.55 in table above." ] }, { "cell_type": "code", "collapsed": false, "input": [ "round(100.0 * presidents[\"first_person_singular\"].sum() / presidents[\"word_count\"].sum(), 2)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 19, "text": [ "1.76" ] } ], "prompt_number": 19 } ], "metadata": {} } ] }