This page was last updated on September 18, 2019.
In this tutorial you will learn how to:
tigerstats
packagelibrary(tigerstats)
If you roll a fair six-sided dice once, you have a 1 in 6 chance of rolling any one of the possible outcomes, i.e. numbers 1 through 6.
Let’s define our random trial as the rolling of a fair 6-sided dice, and rolling the number 2 as our event of interest. The probability of rolling a 2 is of course 1/6, or 0.16667.
Let’s simulate this random trial
in R. We’ll use the sample
function you learned about in a previous tutorial.
Recall that whenever we implement some procedure in R that requires the use of a random draw or something similar, we first need to use the set.seed
function to ensure we all get the same results (i.e. we can achieve “computational reproducibility”):
Here we simulate one roll of a 6-sided dice.
Recall: the code 1:6
produces a vector of integers one through 6, and the replace = F
argument tells R to “sample without replacement”, as described in a previous tutorial.
## [1] 6
You should see the number 6.
Recall:
A probability of an event is the proportion of times the event would occur if we repeated a random trial over and over again under the same conditions
If we were to roll the dice many thousands of times, we should expect to roll the number 2 approximately a sixth of the time, i.e. with a proportion 1/6 or 0.16667.
Let’s simulate this process, by rolling a dice 10000 times, and storing the outcome of each of these 10000 random trials in an object called many.rolls
.
We’ll use the do
function from the mosaic
package (which is automatically loaded as part of the tigerstats
package), which you first learned about in the preceding tutorial.
Recall that the do
function simply tells R to repeat whatever is after the *
. Here we ask it to repeat the sampling process 10000 times.
Let’s also look at the structure of the resulting object using the str
function:
## Classes 'do.data.frame' and 'data.frame': 10000 obs. of 1 variable:
## $ sample: int 2 4 2 3 6 3 1 3 6 4 ...
## - attr(*, "lazy")=Class 'formula' language ~sample(1:6, size = 1, replace = FALSE)
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## - attr(*, "culler")=function (object, ...)
We see that the repeated sample
procedure, when assigned to an object, produced a data frame (we called it many.rolls
) that includes one integer variable called sample
.
Now let’s tabulate the results: how many times was each of the numbers 1 through 6 rolled, out of 10000 trials?
## sample
## 1 2 3 4 5 6
## 1654 1723 1712 1682 1656 1573
We rolled the number two 1723 times out of 10000.
Now let’s calculate the relative frequencies
by dividing these frequencies by the total number, i.e. 10000:
## sample
## 1 2 3 4 5 6
## 0.1654 0.1723 0.1712 0.1682 0.1656 0.1573
The proportion of times we rolled the number two was 0.1723, which is close to the probability we expected (0.1667).
Let’s visualize this with a bar chart.
TIP: If you have already calculated and stored the relative frequencies of a categorical variable or an integer variable such as this one, then you should use the barplot
function to create a bar chart, not the barchartGC
function. The barplot
function simply creates bars with heigths defined by the values in the object.
Figure 1: Bar chart of the outcomes of 10000 rolls of a fair, 6-sided dice.
The las = 1
argument tells the barplot function to make sure the y-axis labels are oriented horizontally.
The ylim = c(0, 0.2)
argument provides the lower and upper limits to the y-axis.
Now let’s simulate a random trial in which a fair coin is tossed, and in which a heads
is considered a success
.
We can use the handy rflip
function from the mosaic
package, which is loaded as part of the tigerstats
package.
Check out the function’s help file:
?rflip
Note that the default value for the prob
argument is 0.5, meaning we have a fair coin for which the probability of getting a heads
is 0.5.
An unfair coin would have a prob
value different from 0.5.
Let’s flip a coin once, to see what the output looks like, remembering to set.seed
first:
## n heads tails prop
## 1 1 0 1 0
In this instance we were unsuccessfull in getting a heads
.
Let’s re-define our random trial as being 10 flips of the coin, and let’s simulate repeating this random trial 10000 times, storing the number of heads (successes) per trial in a new object coin.results
.
In each random trial, we could get anywhere from zero heads (very unlikely) to ten heads (equally unlikely), but most commonly we would expect to get heads 5 times out of the 10 flips, i.e. half the time.
## n heads tails prop
## 1 10 4 6 0.4
## 2 10 6 4 0.6
## 3 10 7 3 0.7
## 4 10 2 8 0.2
## 5 10 3 7 0.3
## 6 10 3 7 0.3
Now let’s tabulate the results, calculating the relative frequencies:
coin.results.table <- xtabs(~ heads, data = coin.results)
coin.results.table.rel <- coin.results.table / 10000
coin.results.table.rel
## heads
## 0 1 2 3 4 5 6 7 8 9
## 0.0013 0.0099 0.0461 0.1199 0.2011 0.2523 0.2030 0.1155 0.0410 0.0094
## 10
## 0.0005
And visualize them with a bar chart:
Figure 2: Bar chart of the outcomes of 10000 flips of a coin.
Getting started:
library
Data frame structure:
head
str
Tabulation:
xtabs
Simulation:
set.seed
sample
do
rflip
(from the mosaic
package in the tigerstats
package)Graphs:
barplot
(different from the barchartGC
function used in previous tutorials)