--- date: 1603748777 title: How to evaluate probabilities --- ::: {.epigraph} Oh! let us never, never doubt\ What nobody is sure about! ::: The 2016 presidential election was an ignominious moment for pollsters, who overwhelmingly favored Clinton to the last, even as exit poll data belied their forecasts. Even *FiveThirtyEight*, one of the [most](https://fivethirtyeight.com/features/trump-is-just-a-normal-polling-error-behind-clinton/) [generous](https://fivethirtyeight.com/features/election-update-dont-ignore-the-polls-clinton-leads-but-its-a-close-race/) toward Trump's chances, gave him only 5 to 2 odds against. ![](images/fae80910ea76a15a31143562483592e10792bb55.png) Of course, probabilities are [not](https://xkcd.com/2370/) certainties. If I assure you that a die has only a 17% chance of coming up 6 (which, if it's fair, it does), you can't call me a liar if you roll it once and get a 6. A presidential election is a massive, complex die roll that, for political reasons, can't be repeated 10,000 times to generate a sampling distribution. In this light, it's impossible to say whether *FiveThirtyEight*'s probabilities were accurate or not---you would have to run the election many times in parallel universes, and see whether Trump was victorious in two out of every seven of them. We can get some sense, though, by looking at the state-level predictions. This gives us 56 data points (50 states, Washington D.C., and the five independently awarded districts of [Maine and Nebraska](https://www.270towin.com/content/split-electoral-votes-maine-and-nebraska/)), which we can treat, roughly, as 56 separate races. In general, if the pollsters are accurate, Clinton should win about 20% of the races which they gave her a 20% chance of winning. It may seem like Trump should sweep these races---20% is low, after all. But giving a candidate 20% rather than 0% captures something about your certainty---about how surprised you'd be if the candidate won. For example, *FiveThirtyEight* gave Clinton 0% in Alabama, but 20% in Georgia---and, indeed, Georgia saw a much closer race.[^1] Our intuitions should bear this out: if Clinton had won Georgia, it would be an upset; if she had won Alabama, it would be a refactoring of American politics as we know it. By binning together state-level predictions, we can gauge whether more or less certainty was warranted, and visualize the results with a *calibration plot* (@fig:538). ![Percent of state races won by probability projection (Maine and Nebraska are counted by individual district). Error bars show 95% confidence intervals. Source: [*FiveThirtyEight*](https://projects.fivethirtyeight.com/2016-election-forecast/)\ \ Clinton win\ Trump win\ Gold standard\ ](images/b6969e272a983223da2528fb2520b7fe50611157.svg){#fig:538} A line above the gold standard indicates underconfidence; below it, overconfidence. The error bars show 95% confidence intervals, which are fairly large toward the middle given the small sampling of states predicted to be swing states. Nonetheless, it is clear that Clinton was given overly favorable odds to win. *FiveThirtyEight* computes its overall probabilities by running election simulations based on weighted aggregations of third-party polls. If they had just a little less credence in Clinton, and a little more in Trump, they might have given Trump much better odds. Here's the same data in tabular form: Clinton probability Clinton won States/districts --------------------- ------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0--19% 0% [AL]{.t title="Alabama"} [AR]{.t title="Arkansas"} [ID]{.t title="Idaho"} [IN]{.t title="Indiana"} [KS]{.t title="Kansas"} [KY]{.t title="Kentucky"} [LA]{.t title="Louisiana"} [MO]{.t title="Missouri"} [MS]{.t title="Mississippi"} [MT]{.t title="Montana"} [ND]{.t title="North Dakota"} [NE]{.t title="Nebraska"} [NE1]{.t title="Nebraska, first district"} [NE3]{.t title="Nebraska, third district"} [OK]{.t title="Oklahoma"} [SC]{.t title="South Carolina"} [SD]{.t title="South Dakota"} [TN]{.t title="Tennessee"} [TX]{.t title="Texas"} [UT]{.t title="Utah"} [WV]{.t title="West Virginia"} [WY]{.t title="Wyoming"} 20--39% 0% [AK]{.t title="Alaska"} [AZ]{.t title="Arizona"} [GA]{.t title="Georgia"} [IA]{.t title="Iowa"} [OH]{.t title="Ohio"} 40--59% 20% [FL]{.t title="Florida"} [ME2]{.t title="Maine, second district"} [NC]{.t title="North Carolina"} [NE2]{.t title="Nebraska, second district"} [NV]{.c title="Nevada"} 60--79% 50% [CO]{.c title="Colorado"} [MI]{.t title="Michigan"} [NH]{.c title="New Hampshire"} [PA]{.t title="Pennsylvania"} 80--100% 95% [CA]{.c title="California"} [CT]{.c title="Connecticut"} [DC]{.c title="District of Columbia"} [DE]{.c title="Delaware"} [HI]{.c title="Hawaii"} [IL]{.c title="Illinois"} [MA]{.c title="Massachusetts"} [MD]{.c title="Maryland"} [ME]{.c title="Maine"} [ME1]{.c title="Maine, first district"} [MN]{.c title="Minnesota"} [NJ]{.c title="New Jersey"} [NM]{.c title="New Mexico"} [NY]{.c title="New York"} [OR]{.c title="Oregon"} [RI]{.c title="Rhode Island"} [VA]{.c title="Virginia"} [VT]{.c title="Vermont"} [WA]{.c title="Washington"} [WI]{.t title="Wisconsin"} : Binned *FiveThirtyEight* predictions. Hover over an abbreviation to see the full state or district name. {\#tbl:538} States with probabilities below 20% or above 80% were all won by Trump and Clinton, respectively, except Wisconsin, which Clinton was given an 83.5% chance of winning, but which she ended up losing by a three-quarter-point margin. This may seem like a glaring blunder, but remember: these are probabilities. Clinton won 95% of these states, which is pretty close to the stated uncertainty. It is entirely possible that if the election were run many times, Clinton would indeed win Wisconsin 5 out of 6 times. The real trouble comes in the 0--59% range: for instance, in the five polities in which Clinton was given between a 40 and 59% chance, she won only one of the five---20%. In these areas, *FiveThirtyEight* was underconfident in Trump. As mentioned above, these probabilities come from polling data. An idealized version is shown in @fig:normal: given a population of 100,000 voters, in which 55% intend to vote for Clinton, you can take 10 polls of 25 people and get different results each time. ![Left: results of ten perfectly random polls of a population of 100,000 in which 55% are Clinton voters.[^2] Right: a possible extrapolation of poll results to a sampling distribution.](images/5c16af09d1f49c97524c29e0278e3f6d37b09a9e.svg){#fig:normal} If we kept drawing perfectly random samples from this population, we [would end up](https://en.wikipedia.org/wiki/Central_limit_theorem) in the limit with a normal distribution, whose mean was the population mean. One way to predict the results of the election---itself a poll like any other---would be to randomly draw them from the area under this bell curve. This gives Clinton an 84% chance and Trump a 16% chance (actually a gross overestimate, as the voting population would be much larger than 25). Of course, no poll can achieve a perfectly random sample. It might sample one demographic more heavily than another, fail to target voters vs. nonvoters, or simply be [unlucky](https://fivethirtyeight.com/features/heres-proof-some-pollsters-are-putting-a-thumb-on-the-scale/). For whatever reason, errors will creep in, and may shift the results of *every* poll one way or the other; per [Nate Silver](https://fivethirtyeight.com/features/why-fivethirtyeight-gave-trump-a-better-chance-than-almost-anyone-else/): > \[P\]olling errors are correlated. No matter how many polls you have > in a state, it's often the case that all or most of them miss in the > same direction. Furthermore, if the polls miss in one direction in one > state, they often also miss in the same direction in other states, > especially if those states are similar demographically. Technically, then, since these are not independent probabilities, @fig:538 is technically a little unfair to *FiveThirtyEight*.[^3] The errors inherent in methods of the polls they used compounded on them, and threw off their probability calculations. So as not to pick on *FiveThirtyEight* too much, let's look at what the *New York Times* predicted, as a calibration plot: ![Percent of state races won by probability projection (Maine and Nebraska are counted by individual district). Error bars show 95% confidence intervals. Source: [*The New York Times*](https://www.nytimes.com/interactive/2016/upshot/presidential-polls-forecast.html)\ \ Clinton win\ Trump win\ Gold standard\ ](images/b5abf744ecd7f42fc8fb94598b73c1fbc00175b6.svg){#fig:nyt} and as a table: Clinton probability Clinton won States/districts --------------------- ------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0--19% 0% [AK]{.t title="Alaska"} [AL]{.t title="Alabama"} [AR]{.t title="Arkansas"} [AZ]{.t title="Arizona"} [GA]{.t title="Georgia"} [ID]{.t title="Idaho"} [IN]{.t title="Indiana"} [KS]{.t title="Kansas"} [KY]{.t title="Kentucky"} [LA]{.t title="Louisiana"} [MO]{.t title="Missouri"} [MS]{.t title="Mississippi"} [MT]{.t title="Montana"} [ND]{.t title="North Dakota"} [NE]{.t title="Nebraska"} [NE1]{.t title="Nebraska, first district"} [NE2]{.t title="Nebraska, second district"} [NE3]{.t title="Nebraska, third district"} [OK]{.t title="Oklahoma"} [SC]{.t title="South Carolina"} [SD]{.t title="South Dakota"} [TN]{.t title="Tennessee"} [TX]{.t title="Texas"} [WV]{.t title="West Virginia"} [WY]{.t title="Wyoming"} 20--39% 0% [IA]{.t title="Iowa"} [ME2]{.t title="Maine, second district"} [NC]{.t title="North Carolina"} [UT]{.t title="Utah"} 40--59% 0% [OH]{.t title="Ohio"} 60--79% 67% [FL]{.t title="Florida"} [NH]{.c title="New Hampshire"} [NV]{.c title="Nevada"} 80--100% 87% [CA]{.c title="California"} [CO]{.c title="Colorado"} [CT]{.c title="Connecticut"} [DC]{.c title="District of Columbia"} [DE]{.c title="Delaware"} [HI]{.c title="Hawaii"} [IL]{.c title="Illinois"} [MA]{.c title="Massachusetts"} [MD]{.c title="Maryland"} [ME]{.c title="Maine"} [ME1]{.c title="Maine, first district"} [MI]{.t title="Michigan"} [MN]{.c title="Minnesota"} [NJ]{.c title="New Jersey"} [NM]{.c title="New Mexico"} [NY]{.c title="New York"} [OR]{.c title="Oregon"} [PA]{.t title="Pennsylvania"} [RI]{.c title="Rhode Island"} [VA]{.c title="Virginia"} [VT]{.c title="Vermont"} [WA]{.c title="Washington"} [WI]{.t title="Wisconsin"} : Binned *New York Times* predictions. Hover over an abbreviation to see the full state or district name. {\#tbl:nyt} We see the same pattern here as with *FiveThirtyEight*. The *Times* was right on the mark in assigning probabilities to Clinton above 60%. It erred in giving her odds too generous below that mark---Clinton lost each of the thirty polities in which the *Times* gave her up to a 60% chance of winning, where realistically, at these levels, she should have at least snagged a few. The problem was not a failure to predict the pivots of the historically blue[^4] Michigan and Wisconsin, but a gross overestimate of Clinton's chances in states like North Carolina and Utah. The main thing communicated by probabilities is information about certainty and uncertainty. If the weather app shows a 10% chance of rain and it rains anyway, you might be irritated, but whether the app was *wrong* depends on how often it rains given such a forecast. If it is more than one in ten times, that's when you might want to change apps---or start bringing an umbrella. [^1]: To wit, excluding third parties: ![](images/7c7eb48455e5435befd0b1e7bd38ee434ae84b92.svg) [^2]: ``` {.python} # Population to sample # (0=Trump, 1=Clinton) pop = np.random.binomial( n=1, p=0.55, size=100_000) # Draw 10 samples of 25 polls = [ np.random.choice( pop, size=25, replace=False ).mean() for _ in range(10) ] ``` [^3]: It also technically double-counts the probabilities in Maine and Nebraska, in which the statewide predictions are based on the district predictions. [^4]: Well, since '92