This page was last updated on October 09, 2019.


Getting started

In this tutorial we will learn about the following:

  • The Binomial distribution
  • Conducting a Binomial hypothesis test
  • Learn the proper concluding statement for a Binomial test
  • Using a Confidence-interval approach to hypothesis testing

NOTE: In order to ensure we all get the same results when using random sampling routines, please run the following line of code every time you open an R/RStudio session (and disregard the warning it gives you):

## Warning in RNGkind("Mersenne-Twister", "Inversion", "Rounding"): non-
## uniform 'Rounding' sampler used

Required packages

  • the tigerstats package
library(tigerstats)

Required data

Import the bumpus.csv data set:

Let’s inspect the bumpus dataset, which contains data about a sparrow population caught in a wind storm in the 1880’s. Pay particular attention to the categorical variable sex, which is stored as type “factor”:

## 
## categorical variables:  
##       name   class levels   n missing
## 1      sex  factor      2 136       0
## 2      age  factor      3 136       0
## 3 survival logical      2 136       0
##                                    distribution
## 1 m (64%), f (36%)                             
## 2 a (43.4%),  (36%), y (20.6%)                 
## 3 TRUE (52.9%), FALSE (47.1%)                  
## 
## quantitative variables:  
##                        name   class     min        Q1  median        Q3
## 1              BumpusNumber integer   1.000  17.75000  34.500  51.25000
## 2          Total.length.mm. integer 152.000 157.00000 160.000 162.00000
## 3           alar.extent.mm. integer 230.000 242.00000 246.000 249.00000
## 4                 weight.g. numeric  22.600  24.57500  25.550  26.50000
## 5  length.beak.and.head.mm. numeric  29.800  31.10000  31.600  32.02500
## 6        length.humerus.in. numeric   0.659   0.71775   0.733   0.74825
## 7          length.femur.in. numeric   0.653   0.70175   0.713   0.73125
## 8    length.tibiotarsus.in. numeric   1.011   1.11200   1.133   1.16225
## 9           skull.width.in. numeric   0.551   0.59200   0.602   0.61100
## 10          keel.length.in. numeric   0.734   0.80900   0.841   0.86525
##        max        mean          sd   n missing
## 1   72.000  34.6176471 19.90349265 136       0
## 2  167.000 159.5441176  3.56083146 136       0
## 3  256.000 245.1911765  5.52102340 136       0
## 4   31.000  25.5250000  1.47521499 136       0
## 5   33.400  31.5742647  0.70237653 136       0
## 6    0.780   0.7319412  0.02307821 136       0
## 7    0.767   0.7130294  0.02411329 136       0
## 8    1.230   1.1335662  0.04074450 136       0
## 9    0.640   0.6024779  0.01499579 136       0
## 10   0.927   0.8399338  0.03964924 136       0

Note in the inspection above that of the 136 individual birds, 87 (64%) were male.


The binomial distribution

The binomial distribution applies to cases in which we conduct a set number of independent random trials, and for each trial there is an equal probability of a success.

Let’s say there are n trials and the probability of success is p. Of those n trials, anywhere between 0 and n of them could produce an outcome “success” and the rest will be failures. Let’s call the number of successes X, and thus the probability of X successes is:


Simulating a random trial

Here we will simulate rolling a fair, 6-sided die to illustrate what a binomial distribution is.

First let’s specify the number of random trials (rolls of the die) to do:

Now let’s create a vector object that includes all the possible outcomes associated with a single random trial; here, this is the numbers one through six:

Now let’s specify what a “success” is; here, rolling a “4” will be considered a success, which we know will have an associated probability of 1/6:

With these variables specified above, we now have the ingredients to conduct a simulation to show what a binomial distribution looks like.

First, let’s roll our die once. We use that handy command sample that you learned about in the Simulate random trials tutorial, remembering to first set the seed to ensure reproducible results:

## [1] 5

What number did you get? You should have gotten a 5.

Now let’s devise a way to keep track of whether the outcome was a success or not. First, let’s ask whether the outcome is equal to our definition of a success, i.e. whether it equals 4:

## [1] FALSE

This time we get the logical value “FALSE”.

A handy feature of R is that it considers a value of “TRUE” as equal to “1”, and a value of “FALSE” as equal to “0”.

Thus, the code below will return a “0” if the roll of the die did not produce a 4, and a “1” if it did!

## [1] 0

Here we got a zero.

Let’s change the arguments to the “sample” command so that it mimics rolling the single die number.of.trials times (defined above), or equivalently, mimics rolling n independent dice simultaneously. To do this, we need to specify the argument replace = TRUE, which tells it to put the number it sampled back into the list, for each random trial.

## [1] 5 2 4 6 6

What 5 numbers did you get? How many were a “success” (= 4)?

You can keep track of the number of successes all in one go by nesting the sample routine within the sum command as follows:

## [1] 0

Make sure you understand what the output is telling you: it is the number of trials (rolls of a die) out of 5 that yielded a “success”; or equivalently, the number of die out of the 5 die rolled simultaneously that yielded a “success”. Typically we represent this number as “X”, as in the formula for the binomial distribution shown above.


  1. Conduct your own random trial

Set the seed with number 12, and roll a fair, six-sided die 12 times. Consider a success to be rolling a number 2.

  • How many times should you expect to roll a 2?
  • In your random trial, how many times did you roll a 2?

Binomial distribution functions in R

It turns out there’s a handy function dbinom that will calculate the exact probability associated with any particular outcome for a random trial with a given sample space (set of outcomes) and probability of success. It uses the equation shown above.

?dbinom

So, for our example involving rolling a fair, 6-sided die n = 6 times (six random trials), and a probability (p) of observing a 4 of 1/6 (0.1667), we can calculate, for example, the probability of observing two successes (i.e. two number 4s) in our 6 rolls of the die (random trials):

## [1] 0.2009388

In order to get the probabilities associated with each possible outcome (i.e. 0 through 6 successes), we use the following code:

## [1] 3.348980e-01 4.018776e-01 2.009388e-01 5.358368e-02 8.037551e-03
## [6] 6.430041e-04 2.143347e-05

The dbinom function will accept a vector of values of x, for which the associated probabilities are calculated.

Now let’s use these exact probabilities to create a barplot showing an exact, discrete probability distribution, corresponding to the binomial distribution with a sample size (number of trials) of n = 6 and a probability of success p = 1/6:

Figure 1: Probability of obtaining X successes out of 6 random trials, with probability of success = 1/6.

Figure 1: Probability of obtaining X successes out of 6 random trials, with probability of success = 1/6.

The names.arg argument in the preceding barplot code provides the labels for each bar.

Now let’s increase the sample size to n = 15, and compare the distribution to that observed using n = 6:

Now plot the binomial probability distribution:

Figure 2: Probability of obtaining X successes out of 15 random trials, with probability of success = 1/6.

Figure 2: Probability of obtaining X successes out of 15 random trials, with probability of success = 1/6.


  1. Challenge: Binomial probabilities
  • Use the dbinom function to calculate the probability of rolling three “2”s when rolling a fair six-sided die 20 times.
  • Produce a graph of a discrete probability distribution for this scenario: p = 1/4, and n = 12.

Binomial hypothesis test

For this example we’ll use the bumpus dataset we imported above. It contains data about sparrows caught in a wind storm in the 1880’s.

Let’s assume that, as with many animal species, male sparrows tend to move around more than females. Moving around more makes males more susceptible to predation, and to events such as wind storms. Let’s also assume that in the general population of sparrows, males represent 50% of the population at any given time.

Based on the available data, did the wind storm catch males more than females?


Steps to hypothesis testing

  • State the null (H0) and alternative (HA) hypotheses
  • Set an \(\alpha\) level (usually 0.05)
  • Determine whether a one-tailed or two-tailed test is appropriate (usually the latter)
  • Identify the appropriate test statistic, and calculate the test statistic value using observed data

  • Simulate data and use them to construct a null distribution for the test statistic, under the assumption of a true null hypothesis
    OR: Instead of simulating data to construct a null distribution, use an appropriate statistical test, which provides a theoretical null distribution

  • Evaluate any assumptions associated the statistical test
  • Use the null distribution to determine the P-value associated with the observed test statistic
  • Draw the appropriate conclusion and communicate it clearly


  • Formulate hypotheses:

Let’s devise an appropriate null and alternative hypothesis for this question.

TIP: Recall that “probability” and “proportion” are related: the probability of an individual randomly drawn from the population belonging to the particular category of interest is equal to the proportion of individuals in the population that belong to that category.

H0: Males and females were caught by the storm in equal proportions (p = 0.5).
HA: Males and females were not caught by the storm in equal proportions (p \(\ne\) 0.5).

  • We’ll use an \(\alpha\) level of 0.05.

  • It is a two-tailed alternative hypothesis; even though our research hypothesis suggests that males would have been more susceptible to being caught by the storm than females, we must allow for the possibility that they were less susceptible than females.

  • The test statistic is the number of males caught (successes) (which easily converts to a proportion)

  • We’ll use a binomial test to test the null hypothesis, because this is the most powerful test when analyzing frequency data pertaining to a categorical variable (eg. sex) that has only 2 categories (male / female).

  • The binomial test assumes that the random trials (samples) were independent, and the probability of success was equal in each trial - we’ll assume so

  • Imagine what the sampling distribution for this test statistic would look like, given a TRUE null hypothesis: it would be centred on 0.5 times the total number individuals in the sample (here, 136), so 68, and would also be symmetric around this value (because our null value for p is 0.5). We then use this sampling distribution as the reference against which we compare our observed number of males caught.

In practice, however, we use the binomial.test function to do this comparison for us (see below).

  • Our observed test statistic, or number of males caught, is 87, as shown here:
## [1] 87

or using this code, which shows the counts of both the successes (“male”) and failures (“females”):

## 
##  f  m 
## 49 87

This represents our number of successes X, and corresponds to a \(\hat{p}\) = 87/136 = 0.6397. This observed proportion is certainly higher than the expected proportion of 0.5, but is it significantly higher?

  • Our observed number of “random trials” (our total sample size) is 136, which represents our value of n.

Based on the fact that males represent 50% of the general sparrow population, the correct null value to use for p is 0.5.

We now have all we need to calculate the P-value associated with our test statistic, using the binomial test.

The binom.test function in the tigerstats package (in turn taken from the mosaic package) is a modified version of the function by the same name in the base package:

?binom.test  # select the function associated with the "mosaic" package

You’ll see that there is an argument alternative, and that the first (default) value for this argument is “two.sided”. This default ensures that the P-value calculated from the binomial distribution is appropriately two-sided.

Now let’s conduct the test:

## 
##  Exact binomial test (Agresti-Coull CI)
## 
## data:  bumpus$sex  [with success = m]
## number of successes = 87, number of trials = 136, p-value =
## 0.001419
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.5561159 0.7156204
## sample estimates:
## probability of success 
##              0.6397059

Alternative syntax for the binom.test function:

binom.result <- binom.test(x = 87, n = 136, p = 0.5, alternative = "two.sided", ci.method = "Agresti-Coull")
binom.result

The output from the binomial test includes the number of successes, the number of trials, and the calculated P-value. It also includes the appropriate Agresti-Coull 95% confidence interval.

## [1] "statistic"   "parameter"   "p.value"     "conf.int"    "estimate"   
## [6] "null.value"  "alternative" "method"      "data.name"

NOTE: the confidence interval provided in the binom.test in the base package (as opposed to the one in the tigerstats package, wich we use here) is not the correct Agresti-Coull interval!

ALWAYS use the Agresti-Coull method to calculate a confidence interval for a proportion, as shown in the Estimating proportions tutorial.

Now we have all the ingredients for a proper concluding statement.

Ideal concluding statement

This is an example of an appropriate concluding statement for a binomial test:

Male sparrows were caught by the storm at a significantly higher proportion than females (n = 136; observed proportion of males = 0.64; Binomial test; P-value = 0.001; Agresti-Coull 95% confidence interval: 0.556 \(\leq\) p \(\leq\) 0.716.


Confidence interval approach to hypothesis testing

In the preceding example, we calculated the Agresti-Coull 95% confidence interval as: 0.556 \(\leq\) p \(\leq\) 0.716.

Given that the interval excludes (does not encompass) the null hypothesized proportion of 0.5, we can reject the null hypothesis.

In this case, the appropriate concluding statement would be:

Male sparrows were caught by the storm at a significantly higher proportion than 0.5 (n = 136; observed proportion of males = 0.64; Agresti-Coull 95% confidence interval: 0.556 \(\leq\) p \(\leq\) 0.716).


NOTE: The binomtestGC function available in the tigerstats can be used to conduct a binomial test, but the 95% confidence interval that it calculates is not the appropriate Agresti-Coull interval. We therefore do not use this function.


  1. Binomial hypothesis test practice

In the Hypothesis testing tutorial, Activity 1 asked you to conduct all the steps of a hypothesis test for Example 6.2 in the text book (concerning handedness in toads) using a simulation approach: you were to simulate data to construct a null distribution, with which a P-value could be calculated.

Using the present tutorial as a guide, repeat this activity, but this time use a binomial test to calculate an exact probability (as shown above) on Example 6.2 in the text book (concerning handedness in toads). Be sure to include all the steps of a hypothesis test.


List of functions

Getting started:

  • library

Data frame structure:

  • str
  • inspect

Tabulation:

  • xtabs

Simulation:

  • set.seed
  • sample
  • do
  • seq

Math:

  • sum

Graphs:

  • barplot

Binomial distribution:

  • binom.confint (from the binom package)
  • binom.test (from the binom package)