This page was last updated on September 18, 2019.


Getting started

In this tutorial you will learn how to:

  • simulate random trials

Required packages

  • the tigerstats package
library(tigerstats)

Required data

  • none

Simulate random trials

Rolling a 6-sided dice

If you roll a fair six-sided dice once, you have a 1 in 6 chance of rolling any one of the possible outcomes, i.e. numbers 1 through 6.

Let’s define our random trial as the rolling of a fair 6-sided dice, and rolling the number 2 as our event of interest. The probability of rolling a 2 is of course 1/6, or 0.16667.

Let’s simulate this random trial in R. We’ll use the sample function you learned about in a previous tutorial.

Recall that whenever we implement some procedure in R that requires the use of a random draw or something similar, we first need to use the set.seed function to ensure we all get the same results (i.e. we can achieve “computational reproducibility”):

Here we simulate one roll of a 6-sided dice.

Recall: the code 1:6 produces a vector of integers one through 6, and the replace = F argument tells R to “sample without replacement”, as described in a previous tutorial.

## [1] 6

You should see the number 6.

Recall:

A probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions

If we were to roll the dice many thousands of times, we should expect to roll the number 2 approximately a sixth of the time, i.e. with a proportion 1/6 or 0.16667.

Let’s simulate this process, by rolling a dice 10000 times, and storing the outcome of each of these 10000 random trials in an object called many.rolls.

We’ll use the do function from the mosaic package (which is automatically loaded as part of the tigerstats package), which you first learned about in the preceding tutorial.

Recall that the do function simply tells R to repeat whatever is after the *. Here we ask it to repeat the sampling process 10000 times.

Let’s also look at the structure of the resulting object using the str function:

## Classes 'do.data.frame' and 'data.frame':    10000 obs. of  1 variable:
##  $ sample: int  2 4 2 3 6 3 1 3 6 4 ...
##  - attr(*, "lazy")=Class 'formula'  language ~sample(1:6, size = 1, replace = FALSE)
##   .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
##  - attr(*, "culler")=function (object, ...)

We see that the repeated sample procedure, when assigned to an object, produced a data frame (we called it many.rolls) that includes one integer variable called sample.

Now let’s tabulate the results: how many times was each of the numbers 1 through 6 rolled, out of 10000 trials?

## sample
##    1    2    3    4    5    6 
## 1654 1723 1712 1682 1656 1573

We rolled the number two 1723 times out of 10000.

Now let’s calculate the relative frequencies by dividing these frequencies by the total number, i.e. 10000:

## sample
##      1      2      3      4      5      6 
## 0.1654 0.1723 0.1712 0.1682 0.1656 0.1573

The proportion of times we rolled the number two was 0.1723, which is close to the probability we expected (0.1667).

Let’s visualize this with a bar chart.


TIP: If you have already calculated and stored the relative frequencies of a categorical variable or an integer variable such as this one, then you should use the barplot function to create a bar chart, not the barchartGC function. The barplot function simply creates bars with heigths defined by the values in the object.


Figure 1: Bar chart of the outcomes of 10000 rolls of a fair, 6-sided dice.

Figure 1: Bar chart of the outcomes of 10000 rolls of a fair, 6-sided dice.

The las = 1 argument tells the barplot function to make sure the y-axis labels are oriented horizontally.
The ylim = c(0, 0.2) argument provides the lower and upper limits to the y-axis.


Flipping a coin

Now let’s simulate a random trial in which a fair coin is tossed, and in which a heads is considered a success.

We can use the handy rflip function from the mosaic package, which is loaded as part of the tigerstats package.

Check out the function’s help file:

?rflip

Note that the default value for the prob argument is 0.5, meaning we have a fair coin for which the probability of getting a heads is 0.5.

An unfair coin would have a prob value different from 0.5.

Let’s flip a coin once, to see what the output looks like, remembering to set.seed first:

##   n heads tails prop
## 1 1     0     1    0

In this instance we were unsuccessfull in getting a heads.

Let’s re-define our random trial as being 10 flips of the coin, and let’s simulate repeating this random trial 10000 times, storing the number of heads (successes) per trial in a new object coin.results.

In each random trial, we could get anywhere from zero heads (very unlikely) to ten heads (equally unlikely), but most commonly we would expect to get heads 5 times out of the 10 flips, i.e. half the time.

##    n heads tails prop
## 1 10     4     6  0.4
## 2 10     6     4  0.6
## 3 10     7     3  0.7
## 4 10     2     8  0.2
## 5 10     3     7  0.3
## 6 10     3     7  0.3

Now let’s tabulate the results, calculating the relative frequencies:

## heads
##      0      1      2      3      4      5      6      7      8      9 
## 0.0013 0.0099 0.0461 0.1199 0.2011 0.2523 0.2030 0.1155 0.0410 0.0094 
##     10 
## 0.0005

And visualize them with a bar chart:

Figure 2: Bar chart of the outcomes of 10000 flips of a coin.

Figure 2: Bar chart of the outcomes of 10000 flips of a coin.


Tutorial activity

  1. Conduct your own simulation
  • For this activity, consider a single random trial as 20 wrestling matches
  • In each match, one combattant wears blue and one wears red
  • Simulate 10000 random trials, assuming no advantage to either red or blue shirts
  • Produce a bar chart of the relative frequencies of all possible outcomes, as we did above for the coin tossing example

List of functions

Getting started:

  • library

Data frame structure:

  • head
  • str

Tabulation:

  • xtabs

Simulation:

  • set.seed
  • sample
  • do
  • rflip (from the mosaic package in the tigerstats package)

Graphs:

  • barplot (different from the barchartGC function used in previous tutorials)