{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import numpy as np\n", "import pandas as pd\n", "import random\n", "import matplotlib.pyplot as plt\n", "plt.style.use('ggplot')\n", "from collections import Counter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Converting state polling average to probability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The state polling average is the average from multiple polling companies using different margins of error. For example, if a candidate has a polling average of 50% with an assumed margin of error of +/- 4, that means the truth is in the range of 46 to 54. Their opponent has a polling average of 46% with an assumed margin of error of +/- 4, that means the truth is in the range of 42 to 50." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![example7](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex7.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**This means we can have one candidate leading in the polls but ultimately lose the election.\n", "![example8](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex8.png)\n", "It also means we can have the candidate that lead in the polls win larger than expected.**\n", "![example9](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex9.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Polling average alone does not give us the probability of a candidate winning. The candidate in the lead would win 100% of the time." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 100.0% of 10000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "wins = 0 # number of wins x over y\n", "number_of_elections = 10000\n", "for i in range(number_of_elections):\n", " victory = x - y\n", " if victory >= 1:\n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead, we will randomly add the margin of error to a candidate's polling average." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "n = 4\n", "random.choice(range((n*-1),(n+1),1))" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.58% of 10000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 10000\n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) # random selection of margin or error for candidate x\n", " moe_y = random.choice(range((n*-1),(n+1),1)) # different margin of error for candidate y\n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that a candidate with a 4 point lead only wins 81% of the time. Keep in mind that a tie is not a victory. \n", "What if we run it again? Will the percentage stay the same?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.44% of 10000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 # number of wins x over y\n", "number_of_elections = 10000\n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) # random selection of margin or error for candidate x\n", " moe_y = random.choice(range((n*-1),(n+1),1)) # different margin of error for candidate y\n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: # cannot be a tie \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is still about 81%. Let's try doubling the number of simulated elections." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.41000000000001% of 20000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 20000 # previously we used 10,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.89% of 20000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 20000 # previously we used 10,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about 100,000?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.286% of 100000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 100000 # previously we used 20,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.541% of 100000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 100000 # previously we used 20,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1,000,000?" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.4585% of 1000000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 1000000 # previously we used 100,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x wins: 81.5544% of 1000000 elections\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "wins = 0 \n", "number_of_elections = 1000000 # previously we used 100,000 \n", "for i in range(number_of_elections):\n", " moe_x = random.choice(range((n*-1),(n+1),1)) \n", " moe_y = random.choice(range((n*-1),(n+1),1)) \n", " x1 = x + moe_x\n", " y1 = y + moe_y\n", " victory = x1 - y1\n", " if victory >= 1: \n", " wins += 1\n", "wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", "print('Candidate x wins:', wins_percentage,'of', number_of_elections,'elections')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have the probability stabilizing, but do we need to do Monte Carlo simulations or can we calculate the probability to get an accurate number? Let's look at the range again.\n", "![example7](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex7.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Candidate x can end up winning 46, 47, 48, 49, 50, 51, 52, 53, or 54 votes. \n", "Candidate y can end up winning 42, 43, 44, 45, 46, 47, 48, 49, or 50 votes. \n", "A margin of error of 4 means both candidates have a 1 out of 9 probability (4 below average, average, 4 above). \n", "\n", "Candidate y can only win with 47, 48, 49, or 50 (a 4 out of 9 probability) **but** it is dependent on candidate x. \n", "\n", "For candidate y to win with a 47 (1 out of 9 probability), candidate y must get 46 (1 out of 9 probability). \n", "For candidate y to win with a 48 (2 out of 9 probability), candidate y must get 47 or less (2 out of 9 probability). \n", "For candidate y to win with a 49 (3 out of 9 probability), candidate y must get 48 or less (3 out of 9 probability). \n", "For candidate y to win with a 50 (4 out of 9 probability), candidate y must get 49 or less (4 out of 9 probability). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do the complicated combinatorics or we can do a quick Monte Carlo simulation that gets us pretty close." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def moe(x, y, n):\n", " moe_x = random.choice(range((n*-1),(n+1),1))\n", " moe_y = random.choice(range((n*-1),(n+1),1))\n", " x += moe_x\n", " y += moe_y\n", " return x, y\n", "\n", "def sim(x, y, n, number_of_elections=20000):\n", " x = int(x)\n", " y = int(y)\n", " n = int(n)\n", " wins = 0 \n", " for i in range(number_of_elections):\n", " x1, y1 = moe(x, y, n)\n", " victory = x1 - y1\n", " if victory >= 1:\n", " wins += 1\n", " wins_percentage = str((float(wins) / number_of_elections)*100)+'%'\n", " return wins_percentage" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's stick with our previous example of candidate x averaging 50 and candidate y averaging 46 with a margin of error of 4" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'81.06%'" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "sim(x,y,n)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Margin of Error\n", "\n", "Let's take a quick look how those probabilities change as we increase the margin of error" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Candidate x winning probability with a MoE of 4: 81.69%\n", "Candidate x winning probability with a MoE of 5: 76.885%\n", "Candidate x winning probability with a MoE of 6: 73.76%\n", "Candidate x winning probability with a MoE of 7: 70.38499999999999%\n", "Candidate x winning probability with a MoE of 8: 68.61%\n", "Candidate x winning probability with a MoE of 9: 66.97%\n", "Candidate x winning probability with a MoE of 10: 65.66%\n" ] } ], "source": [ "x = 50\n", "y = 46\n", "n = 4\n", "print('Candidate x winning probability with a MoE of 4:', sim(x,y,n))\n", "n = 5\n", "print('Candidate x winning probability with a MoE of 5:', sim(x,y,n))\n", "n = 6\n", "print('Candidate x winning probability with a MoE of 6:', sim(x,y,n))\n", "n = 7\n", "print('Candidate x winning probability with a MoE of 7:', sim(x,y,n))\n", "n = 8\n", "print('Candidate x winning probability with a MoE of 8:', sim(x,y,n))\n", "n = 9\n", "print('Candidate x winning probability with a MoE of 9:', sim(x,y,n))\n", "n = 10\n", "print('Candidate x winning probability with a MoE of 10:', sim(x,y,n))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are using **unweighted** polling averages. This means that every poll entered into the average is treated like all others. Polling aggregators **claim** they weight individual polls in order to produce correct probabilities. \n", "\n", "It is universally understood that a prediction above 50% is a prediction for that outcome. The strength of that prediction is how far above 50% that probability may be. \n", "\n", "For example, a 99% probability is a more confident prediction than 65%.\n", "\n", "To clarify: \n", "**Prediction is above 50% \n", "Probability is confidence in the prediction**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Applying State Probabilities to the 2016 Election" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('backtest2016.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We need to decide what margin of error to use. We can take a guess or we can look at the actual results. \n", "Let's create a margin of victory column and see what we have." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "df['spread'] = abs(df['trump_rcp'] - df['clinton_rcp'])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.plot(x='state',y='spread',kind='bar',figsize=(16,4),\n", " title='Polling Average Spread by State/District',legend=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see from the chart above there are states that are \"safe states\" for candidates. These safe states are states in which one candidate does not overlap the other within the margin of error. In other words, one candidate's *lowest* possible result is above another candidate's *highest* possible result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![example12](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex12.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's refer to the qualitative experts to find a cutoff for what constitutes a \"safe state.\" For this we will use the Cook Political Report." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCook
35MississippiSolid Rep.
40LouisianaSolid Rep.
41MontanaSolid Rep.
42Nebraska (CD 1)*Solid Rep.
43West VirginiaSolid Rep.
44TennesseeSolid Rep.
45North DakotaSolid Rep.
46KansasSolid Rep.
47AlabamaSolid Rep.
48ArkansasSolid Rep.
49NebraskaSolid Rep.
50South DakotaSolid Rep.
51IdahoSolid Rep.
52KentuckySolid Rep.
53OklahomaSolid Rep.
54WyomingSolid Rep.
55Nebraska (CD 3)*Solid Rep.
\n", "
" ], "text/plain": [ " State Cook\n", "35 Mississippi Solid Rep.\n", "40 Louisiana Solid Rep.\n", "41 Montana Solid Rep.\n", "42 Nebraska (CD 1)* Solid Rep.\n", "43 West Virginia Solid Rep.\n", "44 Tennessee Solid Rep.\n", "45 North Dakota Solid Rep.\n", "46 Kansas Solid Rep.\n", "47 Alabama Solid Rep.\n", "48 Arkansas Solid Rep.\n", "49 Nebraska Solid Rep.\n", "50 South Dakota Solid Rep.\n", "51 Idaho Solid Rep.\n", "52 Kentucky Solid Rep.\n", "53 Oklahoma Solid Rep.\n", "54 Wyoming Solid Rep.\n", "55 Nebraska (CD 3)* Solid Rep." ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qual = pd.read_csv('aggpreds2016.csv')\n", "qual = qual[['State','Cook']]\n", "rep_safe = list(qual[qual['Cook'].str.startswith('Solid R', na=False)]['State'].values)\n", "qual[qual['Cook'].str.startswith('Solid R', na=False)]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCook
0D.C.Solid Dem.
1CaliforniaSolid Dem.
2MarylandSolid Dem.
3HawaiiSolid Dem.
4VermontSolid Dem.
5New YorkSolid Dem.
6Rhode IslandSolid Dem.
7IllinoisSolid Dem.
8WashingtonSolid Dem.
9New JerseySolid Dem.
10ConnecticutSolid Dem.
11Maine (CD 1)*Solid Dem.
12DelawareSolid Dem.
13MassachusettsSolid Dem.
14OregonSolid Dem.
\n", "
" ], "text/plain": [ " State Cook\n", "0 D.C. Solid Dem.\n", "1 California Solid Dem.\n", "2 Maryland Solid Dem.\n", "3 Hawaii Solid Dem.\n", "4 Vermont Solid Dem.\n", "5 New York Solid Dem.\n", "6 Rhode Island Solid Dem.\n", "7 Illinois Solid Dem.\n", "8 Washington Solid Dem.\n", "9 New Jersey Solid Dem.\n", "10 Connecticut Solid Dem.\n", "11 Maine (CD 1)* Solid Dem.\n", "12 Delaware Solid Dem.\n", "13 Massachusetts Solid Dem.\n", "14 Oregon Solid Dem." ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dem_safe = list(qual[qual['Cook'].str.startswith('Solid D', na=False)]['State'].values)\n", "qual[qual['Cook'].str.startswith('Solid D', na=False)]" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "17 Connecticut\n", "22 Texas\n", "26 South Dakota\n", "27 Tennessee\n", "28 Alaska\n", "29 Kansas\n", "33 Washington\n", "34 Rhode Island\n", "35 Delaware\n", "36 Massachusetts\n", "37 New York\n", "38 California\n", "39 District Of Columbia\n", "40 Hawaii\n", "41 Maryland\n", "42 Vermont\n", "43 Louisiana\n", "44 Mississippi\n", "45 Alabama\n", "46 Arkansas\n", "47 Kentucky\n", "48 Idaho\n", "49 Nebraska\n", "50 North Dakota\n", "51 Oklahoma\n", "52 West Virginia\n", "53 Wyoming\n", "Name: state, dtype: object" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "spread_states = df[df['spread'] >= 12]['state'].values\n", "df[df['spread'] >= 12]['state']" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Texas\n", "Alaska\n", "District Of Columbia\n" ] } ], "source": [ "for state in spread_states:\n", " if state in rep_safe or state in dem_safe:\n", " pass\n", " else:\n", " print(state)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we use 6 as our margin of error, we can have all the states above 12 land in the \"safe state\" category which will be represented by a probability of 100% for the safe candidate and 0% for the opponent." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "trump_x = df['trump_rcp'].values\n", "clinton_y = df['clinton_rcp'].values\n", "\n", "clinton_x = df['clinton_rcp'].values\n", "trump_y = df['trump_rcp'].values" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "def sim(x, y, n, number_of_elections=20000):\n", " x = int(x)\n", " y = int(y)\n", " n = int(n)\n", " wins = 0 \n", " for i in range(number_of_elections):\n", " x1, y1 = moe(x, y, n)\n", " victory = x1 - y1\n", " if victory >= 1:\n", " wins += 1\n", " wins_percentage = float(wins) / number_of_elections\n", " return wins_percentage" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [], "source": [ "trump_rcp_prob = []\n", "for x, y in zip(trump_x, clinton_y):\n", " nwp = sim(x,y,6)\n", " trump_rcp_prob.append(100*nwp)\n", "\n", "clinton_rcp_prob = []\n", "for x, y in zip(clinton_x, trump_y):\n", " nwp = sim(x,y,6)\n", " clinton_rcp_prob.append(100*nwp)\n", " \n", "df['trump_rcp_prob'] = trump_rcp_prob\n", "df['clinton_rcp_prob'] = clinton_rcp_prob" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stateecclinton_rcptrump_rcpclinton_actualtrump_actualspreadtrump_rcp_probclinton_rcp_prob
0Florida2946.446.647.8049.020.245.40546.535
1Ohio1842.345.843.5651.693.567.45527.085
2Michigan1645.442.047.2747.503.426.58067.155
3Pennsylvania2046.244.347.4648.181.932.55060.570
4New Hampshire443.342.746.9846.610.638.76054.105
\n", "
" ], "text/plain": [ " state ec clinton_rcp trump_rcp clinton_actual trump_actual \\\n", "0 Florida 29 46.4 46.6 47.80 49.02 \n", "1 Ohio 18 42.3 45.8 43.56 51.69 \n", "2 Michigan 16 45.4 42.0 47.27 47.50 \n", "3 Pennsylvania 20 46.2 44.3 47.46 48.18 \n", "4 New Hampshire 4 43.3 42.7 46.98 46.61 \n", "\n", " spread trump_rcp_prob clinton_rcp_prob \n", "0 0.2 45.405 46.535 \n", "1 3.5 67.455 27.085 \n", "2 3.4 26.580 67.155 \n", "3 1.9 32.550 60.570 \n", "4 0.6 38.760 54.105 " ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Electoral College" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![example13](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex13.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have probabilities assigned to the states and districts, we can bring in the Electoral College scoring system. \n", "For more detail on how it works, you can read about it [here](https://en.wikipedia.org/wiki/Electoral_college)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Maximum electoral college votes possible: 538\n", "Minimum electoral college votes to win: 270\n" ] } ], "source": [ "print('Maximum electoral college votes possible:', np.sum(df['ec']))\n", "print('Minimum electoral college votes to win:', int((np.sum(df['ec'])/2) + 1) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see which of the two candidates are in the lead according to our probabilities" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Trump Most Likely Electoral College Results:: 230\n", "\n", "Clinton Most Likely Electoral College Results:: 272\n" ] } ], "source": [ "print('Trump Most Likely Electoral College Results::', np.sum(df['ec'][df['trump_rcp_prob'] >= 50]))\n", "print()\n", "print('Clinton Most Likely Electoral College Results::', np.sum(df['ec'][df['clinton_rcp_prob'] >= 50]))" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "538 - (272+230): 36\n" ] } ], "source": [ "print('538 - (272+230):',538 - (272+230) )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Which states are missing?" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
stateec
0Florida29
5Maine CD21
11Nevada6
\n", "
" ], "text/plain": [ " state ec\n", "0 Florida 29\n", "5 Maine CD2 1\n", "11 Nevada 6" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "unc_states = list(df[ (df['trump_rcp_prob'] <= 50) & (df['clinton_rcp_prob'] <= 50) ]['state'].values)\n", "df[ (df['trump_rcp_prob'] <= 50) & (df['clinton_rcp_prob'] <= 50) ][['state','ec']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is another reason why the 2016 Presidential Election was unique. We have two states and a district where neither major candidate had a 50% probability of winning. These are statistically \"tossup states\" since we can assign a probability but not a prediction." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reusing our Simulation to Create Election Winning Probabilities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we can simulate the electoral college results, we need to address something unique with our probability assignments. A state awarding its electoral college votes to a candidate should be 100%, yet we have a number of states and districts that do not total 100%.\n", "\n", "Let us determine this probability for each state and make that our uncertainty probability." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "df['unc_prob'] = 100 - (df['trump_rcp_prob'] + df['clinton_rcp_prob'])" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
statetrump_rcp_probclinton_rcp_probunc_prob
5Maine CD245.72545.7658.510
11Nevada45.75046.1858.065
0Florida45.40546.5358.060
7North Carolina53.77038.8107.420
4New Hampshire38.76054.1057.135
3Pennsylvania32.55060.5706.880
2Michigan26.58067.1556.265
14Iowa66.95027.0655.985
1Ohio67.45527.0855.460
10Colorado27.09567.4655.440
12New Mexico16.67577.9455.380
13Arizona73.07521.6355.290
20South Carolina73.42521.4555.120
9Georgia78.89516.7754.330
8Virginia17.08578.6304.285
6Maine16.98078.8704.150
15Wisconsin12.48083.5903.930
16Oregon6.17591.2702.555
30Nebraska CD294.1203.3952.485
19Maine CD13.41594.3352.250
18Minnesota1.78596.2201.995
24Utah96.4001.7151.885
25Montana96.4401.6901.870
21Indiana98.2150.5351.250
23Missouri98.2700.5101.220
32New Jersey0.53598.2951.170
27Tennessee99.4000.0000.600
31Illinois0.00099.4100.590
22Texas99.4850.0000.515
26South Dakota100.0000.0000.000
42Vermont0.000100.0000.000
52West Virginia100.0000.0000.000
51Oklahoma100.0000.0000.000
50North Dakota100.0000.0000.000
49Nebraska100.0000.0000.000
48Idaho100.0000.0000.000
47Kentucky100.0000.0000.000
46Arkansas100.0000.0000.000
45Alabama100.0000.0000.000
44Mississippi100.0000.0000.000
43Louisiana100.0000.0000.000
40Hawaii0.000100.0000.000
41Maryland0.000100.0000.000
28Alaska100.0000.0000.000
39District Of Columbia0.000100.0000.000
38California0.000100.0000.000
37New York0.000100.0000.000
36Massachusetts0.000100.0000.000
35Delaware0.000100.0000.000
34Rhode Island0.000100.0000.000
33Washington0.000100.0000.000
17Connecticut0.000100.0000.000
29Kansas100.0000.0000.000
53Wyoming100.0000.0000.000
\n", "
" ], "text/plain": [ " state trump_rcp_prob clinton_rcp_prob unc_prob\n", "5 Maine CD2 45.725 45.765 8.510\n", "11 Nevada 45.750 46.185 8.065\n", "0 Florida 45.405 46.535 8.060\n", "7 North Carolina 53.770 38.810 7.420\n", "4 New Hampshire 38.760 54.105 7.135\n", "3 Pennsylvania 32.550 60.570 6.880\n", "2 Michigan 26.580 67.155 6.265\n", "14 Iowa 66.950 27.065 5.985\n", "1 Ohio 67.455 27.085 5.460\n", "10 Colorado 27.095 67.465 5.440\n", "12 New Mexico 16.675 77.945 5.380\n", "13 Arizona 73.075 21.635 5.290\n", "20 South Carolina 73.425 21.455 5.120\n", "9 Georgia 78.895 16.775 4.330\n", "8 Virginia 17.085 78.630 4.285\n", "6 Maine 16.980 78.870 4.150\n", "15 Wisconsin 12.480 83.590 3.930\n", "16 Oregon 6.175 91.270 2.555\n", "30 Nebraska CD2 94.120 3.395 2.485\n", "19 Maine CD1 3.415 94.335 2.250\n", "18 Minnesota 1.785 96.220 1.995\n", "24 Utah 96.400 1.715 1.885\n", "25 Montana 96.440 1.690 1.870\n", "21 Indiana 98.215 0.535 1.250\n", "23 Missouri 98.270 0.510 1.220\n", "32 New Jersey 0.535 98.295 1.170\n", "27 Tennessee 99.400 0.000 0.600\n", "31 Illinois 0.000 99.410 0.590\n", "22 Texas 99.485 0.000 0.515\n", "26 South Dakota 100.000 0.000 0.000\n", "42 Vermont 0.000 100.000 0.000\n", "52 West Virginia 100.000 0.000 0.000\n", "51 Oklahoma 100.000 0.000 0.000\n", "50 North Dakota 100.000 0.000 0.000\n", "49 Nebraska 100.000 0.000 0.000\n", "48 Idaho 100.000 0.000 0.000\n", "47 Kentucky 100.000 0.000 0.000\n", "46 Arkansas 100.000 0.000 0.000\n", "45 Alabama 100.000 0.000 0.000\n", "44 Mississippi 100.000 0.000 0.000\n", "43 Louisiana 100.000 0.000 0.000\n", "40 Hawaii 0.000 100.000 0.000\n", "41 Maryland 0.000 100.000 0.000\n", "28 Alaska 100.000 0.000 0.000\n", "39 District Of Columbia 0.000 100.000 0.000\n", "38 California 0.000 100.000 0.000\n", "37 New York 0.000 100.000 0.000\n", "36 Massachusetts 0.000 100.000 0.000\n", "35 Delaware 0.000 100.000 0.000\n", "34 Rhode Island 0.000 100.000 0.000\n", "33 Washington 0.000 100.000 0.000\n", "17 Connecticut 0.000 100.000 0.000\n", "29 Kansas 100.000 0.000 0.000\n", "53 Wyoming 100.000 0.000 0.000" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[['state','trump_rcp_prob','clinton_rcp_prob','unc_prob']].sort_values('unc_prob',ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulating One Election" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Won Indiana for 11 electoral college votes\n", "Won Texas for 38 electoral college votes\n", "Won Missouri for 10 electoral college votes\n", "Won Utah for 6 electoral college votes\n", "Won Montana for 3 electoral college votes\n", "Won South Dakota for 3 electoral college votes\n", "Won Tennessee for 11 electoral college votes\n", "Won Alaska for 3 electoral college votes\n", "Won Kansas for 6 electoral college votes\n", "Won Nebraska CD2 for 1 electoral college votes\n", "Won Louisiana for 8 electoral college votes\n", "Won Mississippi for 6 electoral college votes\n", "Won Alabama for 9 electoral college votes\n", "Won Arkansas for 6 electoral college votes\n", "Won Kentucky for 8 electoral college votes\n", "Won Idaho for 4 electoral college votes\n", "Won Nebraska for 4 electoral college votes\n", "Won North Dakota for 3 electoral college votes\n", "Won Oklahoma for 7 electoral college votes\n", "Won West Virginia for 5 electoral college votes\n", "Won Wyoming for 3 electoral college votes\n", "\n", "Final Results: 155 electoral college votes\n" ] } ], "source": [ "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand = list(df['trump_rcp_prob'].values)\n", "sim_election = np.random.uniform()*100\n", "ec_total = 0\n", "for x, y, z in zip(cand, states, ec):\n", " if x > sim_election:\n", " print('Won',y,'for',z,'electoral college votes')\n", " ec_total += z\n", "print()\n", "print('Final Results:',ec_total,'electoral college votes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Making state-by-state outcomes just a bit more realistic\n", "\n", "The challenge that comes with simulations is that if it is statistically possible then it will likely show up in the simulation. Whether to leave the possibility open ultimately becomes a matter of judgement. For this simulation, we will make any state above a 5% probability eligible for a candidate to win in the simulation. This is inline with out \"safe state\" assessments from the qualitative experts." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Won Florida for 29 electoral college votes\n", "Won Ohio for 18 electoral college votes\n", "Won Michigan for 16 electoral college votes\n", "Won Pennsylvania for 20 electoral college votes\n", "Won New Hampshire for 4 electoral college votes\n", "Won Maine CD2 for 1 electoral college votes\n", "Won North Carolina for 15 electoral college votes\n", "Won Georgia for 16 electoral college votes\n", "Won Colorado for 9 electoral college votes\n", "Won Nevada for 6 electoral college votes\n", "Won Arizona for 11 electoral college votes\n", "Won Iowa for 6 electoral college votes\n", "Won South Carolina for 9 electoral college votes\n", "Won Indiana for 11 electoral college votes\n", "Won Texas for 38 electoral college votes\n", "Won Missouri for 10 electoral college votes\n", "Won Utah for 6 electoral college votes\n", "Won Montana for 3 electoral college votes\n", "Won South Dakota for 3 electoral college votes\n", "Won Tennessee for 11 electoral college votes\n", "Won Alaska for 3 electoral college votes\n", "Won Kansas for 6 electoral college votes\n", "Won Nebraska CD2 for 1 electoral college votes\n", "Won Louisiana for 8 electoral college votes\n", "Won Mississippi for 6 electoral college votes\n", "Won Alabama for 9 electoral college votes\n", "Won Arkansas for 6 electoral college votes\n", "Won Kentucky for 8 electoral college votes\n", "Won Idaho for 4 electoral college votes\n", "Won Nebraska for 4 electoral college votes\n", "Won North Dakota for 3 electoral college votes\n", "Won Oklahoma for 7 electoral college votes\n", "Won West Virginia for 5 electoral college votes\n", "Won Wyoming for 3 electoral college votes\n", "\n", "Final Results: 315 electoral college votes\n" ] } ], "source": [ "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand = list(df['trump_rcp_prob'].values)\n", "sim_election = np.random.uniform(low=0.05)*100\n", "ec_total = 0\n", "for x, y, z in zip(cand, states, ec):\n", " if x > sim_election:\n", " print('Won',y,'for',z,'electoral college votes')\n", " ec_total += z\n", "print()\n", "print('Final Results:',ec_total,'electoral college votes')" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Won Michigan for 16 electoral college votes\n", "Won Pennsylvania for 20 electoral college votes\n", "Won New Hampshire for 4 electoral college votes\n", "Won Maine for 2 electoral college votes\n", "Won Virginia for 13 electoral college votes\n", "Won Colorado for 9 electoral college votes\n", "Won New Mexico for 5 electoral college votes\n", "Won Wisconsin for 10 electoral college votes\n", "Won Oregon for 7 electoral college votes\n", "Won Connecticut for 7 electoral college votes\n", "Won Minnesota for 10 electoral college votes\n", "Won Maine CD1 for 1 electoral college votes\n", "Won Illinois for 20 electoral college votes\n", "Won New Jersey for 14 electoral college votes\n", "Won Washington for 12 electoral college votes\n", "Won Rhode Island for 4 electoral college votes\n", "Won Delaware for 3 electoral college votes\n", "Won Massachusetts for 11 electoral college votes\n", "Won New York for 29 electoral college votes\n", "Won California for 55 electoral college votes\n", "Won District Of Columbia for 3 electoral college votes\n", "Won Hawaii for 4 electoral college votes\n", "Won Maryland for 10 electoral college votes\n", "Won Vermont for 3 electoral college votes\n", "\n", "Final Results: 272 electoral college votes\n" ] } ], "source": [ "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand = list(df['clinton_rcp_prob'].values)\n", "sim_election = np.random.uniform(low=0.05)*100\n", "ec_total = 0\n", "for x, y, z in zip(cand, states, ec):\n", " if x > sim_election:\n", " print('Won',y,'for',z,'electoral college votes')\n", " ec_total += z\n", "print()\n", "print('Final Results:',ec_total,'electoral college votes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Swing States" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will go back to the qualitative experts in order to determine swing states.\n", "![example14](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/ex14.png)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCook
17MichiganLean Dem.
19WisconsinLean Dem.
21PennsylvaniaLean Dem.
22ColoradoLean Dem.
23New HampshireLean Dem.
24NevadaLean Dem.
27OhioLean Rep.
28IowaLean Rep.
30UtahLean Rep.
33GeorgiaLean Rep.
34ArizonaLean Rep.
\n", "
" ], "text/plain": [ " State Cook\n", "17 Michigan Lean Dem.\n", "19 Wisconsin Lean Dem.\n", "21 Pennsylvania Lean Dem.\n", "22 Colorado Lean Dem.\n", "23 New Hampshire Lean Dem.\n", "24 Nevada Lean Dem.\n", "27 Ohio Lean Rep.\n", "28 Iowa Lean Rep.\n", "30 Utah Lean Rep.\n", "33 Georgia Lean Rep.\n", "34 Arizona Lean Rep." ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qual[qual['Cook'].str.startswith('Lea', na=False)]" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateCook
25FloridaTossup
26North CarolinaTossup
29Maine (CD 2)*Tossup
31Nebraska (CD 2)*Tossup
\n", "
" ], "text/plain": [ " State Cook\n", "25 Florida Tossup\n", "26 North Carolina Tossup\n", "29 Maine (CD 2)* Tossup\n", "31 Nebraska (CD 2)* Tossup" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "qual[qual['Cook'].str.startswith('Tos', na=False)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now have 13 swing states and two swing districts. The four toss ups will get their own simulation results independent of each other. For the remaining swing states, we can either group according to the qualitative experts or group according to the qualitative experts and geography. Combining the qualitative experts with state geography allows us to capture regional attitudes, interests, and opinions. For example, Utah and Arizona have different public opinion polling than Colorado and Nevada on the issue of legalized marijuana." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "swing_states = ['Michigan', 'Wisconsin', 'Pennsylvania', 'Colorado', 'New Hampshire',\n", " 'Nevada', 'Ohio', 'Iowa', 'Utah', 'Georgia', 'Arizona','Florida', 'North Carolina','Maine CD2','Nebraska CD2']" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [], "source": [ "lean_r_west = ['Utah','Arizona']\n", "lean_d_west = ['Colorado','Nevada']\n", "lean_r_lakes = ['Ohio', 'Iowa']\n", "lean_d_lakes = ['Michigan', 'Wisconsin', 'Pennsylvania',]\n", "tossups = ['Florida', 'North Carolina','Maine CD2','Nebraska CD2']" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Won Florida for 29 electoral college votes\n", "Won Maine for 2 electoral college votes\n", "Won Virginia for 13 electoral college votes\n", "Won Colorado for 9 electoral college votes\n", "Won Nevada for 6 electoral college votes\n", "Won New Mexico for 5 electoral college votes\n", "Won Wisconsin for 10 electoral college votes\n", "Won Oregon for 7 electoral college votes\n", "Won Connecticut for 7 electoral college votes\n", "Won Minnesota for 10 electoral college votes\n", "Won Maine CD1 for 1 electoral college votes\n", "Won Illinois for 20 electoral college votes\n", "Won New Jersey for 14 electoral college votes\n", "Won Washington for 12 electoral college votes\n", "Won Rhode Island for 4 electoral college votes\n", "Won Delaware for 3 electoral college votes\n", "Won Massachusetts for 11 electoral college votes\n", "Won New York for 29 electoral college votes\n", "Won California for 55 electoral college votes\n", "Won District Of Columbia for 3 electoral college votes\n", "Won Hawaii for 4 electoral college votes\n", "Won Maryland for 10 electoral college votes\n", "Won Vermont for 3 electoral college votes\n", "\n", "Final Results: 267 electoral college votes\n" ] } ], "source": [ "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand = list(df['clinton_rcp_prob'].values)\n", "sim_election = np.random.uniform(low=0.05)*100\n", "lean_r_west_sim = np.random.uniform()*100\n", "lean_d_west_sim = np.random.uniform()*100\n", "lean_r_lakes_sim = np.random.uniform()*100\n", "lean_d_lakes_sim = np.random.uniform()*100\n", "ec_total = 0\n", "for x, y, z in zip(cand, states, ec):\n", " if y in swing_states:\n", " if y in lean_r_west:\n", " sim_election = lean_r_west_sim\n", " if y in lean_d_west:\n", " sim_election = lean_d_west_sim\n", " if y in lean_r_lakes:\n", " sim_election = lean_r_lakes_sim\n", " if y in lean_d_lakes:\n", " sim_election = lean_d_lakes_sim\n", " if y in tossups:\n", " sim_election = np.random.uniform()*100\n", " if x > sim_election:\n", " print('Won',y,'for',z,'electoral college votes')\n", " ec_total += z\n", "print()\n", "print('Final Results:',ec_total,'electoral college votes')" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Won Florida for 29 electoral college votes\n", "Won Michigan for 16 electoral college votes\n", "Won Pennsylvania for 20 electoral college votes\n", "Won New Hampshire for 4 electoral college votes\n", "Won Arizona for 11 electoral college votes\n", "Won Wisconsin for 10 electoral college votes\n", "Won South Carolina for 9 electoral college votes\n", "Won Indiana for 11 electoral college votes\n", "Won Texas for 38 electoral college votes\n", "Won Missouri for 10 electoral college votes\n", "Won Utah for 6 electoral college votes\n", "Won Montana for 3 electoral college votes\n", "Won South Dakota for 3 electoral college votes\n", "Won Tennessee for 11 electoral college votes\n", "Won Alaska for 3 electoral college votes\n", "Won Kansas for 6 electoral college votes\n", "Won Nebraska CD2 for 1 electoral college votes\n", "Won Louisiana for 8 electoral college votes\n", "Won Mississippi for 6 electoral college votes\n", "Won Alabama for 9 electoral college votes\n", "Won Arkansas for 6 electoral college votes\n", "Won Kentucky for 8 electoral college votes\n", "Won Idaho for 4 electoral college votes\n", "Won Nebraska for 4 electoral college votes\n", "Won North Dakota for 3 electoral college votes\n", "Won Oklahoma for 7 electoral college votes\n", "Won West Virginia for 5 electoral college votes\n", "Won Wyoming for 3 electoral college votes\n", "\n", "Final Results: 254 electoral college votes\n" ] } ], "source": [ "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand = list(df['trump_rcp_prob'].values)\n", "ec_total = 0\n", "\n", "\n", "sim_election = np.random.uniform(low=0.05)*100\n", "lean_r_west_sim = np.random.uniform()*100\n", "lean_d_west_sim = np.random.uniform()*100\n", "lean_r_lakes_sim = np.random.uniform()*100\n", "lean_d_lakes_sim = np.random.uniform()*100\n", "for x, y, z in zip(cand, states, ec):\n", " if y in swing_states:\n", " if y in lean_r_west:\n", " sim_election = lean_r_west_sim\n", " if y in lean_d_west:\n", " sim_election = lean_d_west_sim\n", " if y in lean_r_lakes:\n", " sim_election = lean_r_lakes_sim\n", " if y in lean_d_lakes:\n", " sim_election = lean_d_lakes_sim\n", " if y in tossups:\n", " sim_election = np.random.uniform()*100\n", " if x > sim_election:\n", " print('Won',y,'for',z,'electoral college votes')\n", " ec_total += z\n", "print()\n", "print('Final Results:',ec_total,'electoral college votes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Running 20,000 Simulations" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [], "source": [ "def electoral_college(ec, cand, state, sims=10):\n", " cand_wins = 0\n", " cand_ec_total = []\n", " cand_states = []\n", " for i in range(sims):\n", " cand_ec = 0\n", " cand_state = []\n", " sim_election = np.random.uniform(low=0.05)*100\n", " lean_r_west_sim = np.random.uniform()*100\n", " lean_d_west_sim = np.random.uniform()*100\n", " lean_r_lakes_sim = np.random.uniform()*100\n", " lean_d_lakes_sim = np.random.uniform()*100\n", " for x, y, z in zip(cand, states, ec):\n", " if y in swing_states:\n", " if y in lean_r_west:\n", " sim_election = lean_r_west_sim\n", " if y in lean_d_west:\n", " sim_election = lean_d_west_sim\n", " if y in lean_r_lakes:\n", " sim_election = lean_r_lakes_sim\n", " if y in lean_d_lakes:\n", " sim_election = lean_d_lakes_sim\n", " if y in tossups:\n", " sim_election = np.random.uniform()*100 \n", " if x > sim_election:\n", " cand_ec += z\n", " cand_state.append(y)\n", " cand_ec_total.append(cand_ec)\n", " cand_states.append(cand_state)\n", " if cand_ec > 269:\n", " cand_wins += 1\n", " return cand_wins, cand_ec_total, cand_states" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Monte Carlo Simulation of Electoral College Results\n", "\n", "Clinton Win Prob: 65.945\n", "Electoral College Results: Clinton 279\n", "Sim Percent Outcome: 0.0198\n", "\n", "Trump Win Prob: 22.335\n", "Electoral College Results: Trump 235\n", "Sim Percent Outcome: 0.01955\n" ] } ], "source": [ "print(\"Monte Carlo Simulation of Electoral College Results\")\n", "print()\n", "sims = 20000\n", "ec = list(df.ec.values)\n", "states = list(df.state.values)\n", "cand_1 = list(df['clinton_rcp_prob'].values)\n", "cand_1_wins, cand_1_ec_totals, cand_1_states = electoral_college(ec, cand_1, states, sims=sims)\n", "print('Clinton Win Prob:', (cand_1_wins/sims)*100)\n", "for i,j in Counter(cand_1_ec_totals).most_common(n=1):\n", " print('Electoral College Results: Clinton',i)\n", " print('Sim Percent Outcome:',j/20000)\n", "print()\n", "cand_2 = list(df['trump_rcp_prob'].values) \n", "cand_2_wins, cand_2_ec_totals, cand_2_states = electoral_college(ec, cand_2, states, sims=sims)\n", "print('Trump Win Prob:', (cand_2_wins/sims)*100)\n", "for i,j in Counter(cand_2_ec_totals).most_common(n=1):\n", " print('Electoral College Results: Trump',i)\n", " print('Sim Percent Outcome:',j/20000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Top 5 Results for Each Candidate" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Electoral College Results: Clinton 279\n", "Sim Percent Outcome: 0.0198\n", "Electoral College Results: Clinton 303\n", "Sim Percent Outcome: 0.01715\n", "Electoral College Results: Clinton 288\n", "Sim Percent Outcome: 0.01685\n", "Electoral College Results: Clinton 294\n", "Sim Percent Outcome: 0.01485\n", "Electoral College Results: Clinton 287\n", "Sim Percent Outcome: 0.0143\n" ] } ], "source": [ "for i,j in Counter(cand_1_ec_totals).most_common(n=5):\n", " print('Electoral College Results: Clinton',i)\n", " print('Sim Percent Outcome:',j/20000)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Electoral College Results: Trump 235\n", "Sim Percent Outcome: 0.01955\n", "Electoral College Results: Trump 259\n", "Sim Percent Outcome: 0.01855\n", "Electoral College Results: Trump 250\n", "Sim Percent Outcome: 0.0175\n", "Electoral College Results: Trump 230\n", "Sim Percent Outcome: 0.0165\n", "Electoral College Results: Trump 219\n", "Sim Percent Outcome: 0.0163\n" ] } ], "source": [ "for i,j in Counter(cand_2_ec_totals).most_common(n=5):\n", " print('Electoral College Results: Trump',i)\n", " print('Sim Percent Outcome:',j/20000)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plt.figure(figsize=(18,8))\n", "plt.hist(cand_2_ec_totals,500)\n", "plt.hist(cand_1_ec_totals,500)\n", "plt.axvline(x=270, color='k', linestyle='dashed')\n", "plt.title('Simulation Results')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Actual 2016 Electoral College Results\n", "\n", "![final](https://raw.githubusercontent.com/ahoaglandnu/election/master/images/final.png)\n", "\n", "### Conclusion\n", "\n", "The simulation gives us more reasonable probabilities for each candidate but does not overcorrect to replicate the 2016 Electoral College results. The polling errors in Wisconsin, Pennsylvania, and Michigan would have to have been identified **prior** to running simulations when assigning probabilities to each state. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }