{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In this brief tutorial, we are going to see how to make use of the different probability distributions in R.\n", "\n", "There is one simple rule here, that no matter what distribution we’re talking about, there will ALWAYS be a *d* function, a *p* function, a *q* function and a *r* function, each representing the following:\n", "\n", "- *d*: **probability density**, i.e. the probability of obtaining a particular outcome under that distribution.\n", "- *p*: **cumulative probability function**. Here one specifies a particular quantile, and it tells you the probability of obtaining an outcome smaller than or equal to that quantile.\n", "- *r*: **random number generation**. It generates n random outcomes from the distribution.\n", "- *q*: **quantile**. One specifies a probability value, and gives the value of the variable for which there’s a probability of obtaining an outcome lower than that value.\n", "\n", "We'll understand all these better now with several examples, which you guys will complete in class." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Binomial Distribution\n", "\n", "The binomial distribution models the distribution of the number of sucesses of a given outcome after a certain number of trials.\n", "\n", "In R, for this distribution we will have:\n", "\n", "- `dbinom` \n", "- `pbinom`\n", "- `rbinom`\n", "- `qbinom`" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
function (x, size, prob, log = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (x, size, prob, log = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (x, size, prob, log = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (x, size, prob, log = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (q, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (q, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (q, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (q, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (n, size, prob) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (n, size, prob) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (n, size, prob) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (n, size, prob) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (p, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (p, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (p, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (p, size, prob, lower.tail = TRUE, log.p = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"args(dbinom)\n",
"args(pbinom)\n",
"args(rbinom)\n",
"args(qbinom)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here, *size* and *prob* are the parameters of the binomial distribution. Let's see documentation to see the meaning of the rest of arguments."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"?dbinom"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, say that we have a coin that we toss three times. Then, the probability of getting two heads in these tosses would be:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"0.375"
],
"text/latex": [
"0.375"
],
"text/markdown": [
"0.375"
],
"text/plain": [
"[1] 0.375"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"(0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is the same as using `dbinom` as follows:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"0.375"
],
"text/latex": [
"0.375"
],
"text/markdown": [
"0.375"
],
"text/plain": [
"[1] 0.375"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# i.e. 0.5 * 0.5, as expected\n",
"dbinom(2, size=3, prob=0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example, say we want to calculate the probability of getting two heads or lower. For this, instead of the previous probability, we should be summing the probabilities of getting no heads and one head."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"0.875"
],
"text/latex": [
"0.875"
],
"text/markdown": [
"0.875"
],
"text/plain": [
"[1] 0.875"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"p.0<- (0.5 * 0.5 * 0.5) # prbability of no heads\n",
"p.1<- (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) # prability of one head\n",
"p.2<- (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) + (0.5 * 0.5 * 0.5) # probability of two heads\n",
"p.0 + p.1 + p.2 "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But the above is the same as computing the cumulative probability, so we should be able to get it with `pbinom`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"0.875"
],
"text/latex": [
"0.875"
],
"text/markdown": [
"0.875"
],
"text/plain": [
"[1] 0.875"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"pbinom(2, size=3, prob=0.5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following, we are going to put into practice all this considering a dice instead of a coin, and later a gaussian distribution, but the logic behind the use of the above functions will be always the same."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"function (x, mean = 0, sd = 1, log = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (x, mean = 0, sd = 1, log = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (x, mean = 0, sd = 1, log = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (x, mean = 0, sd = 1, log = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (n, mean = 0, sd = 1) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (n, mean = 0, sd = 1) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (n, mean = 0, sd = 1) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (n, mean = 0, sd = 1) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL
"
],
"text/latex": [
"\\begin{minted}{r}\n",
"function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"\\end{minted}"
],
"text/markdown": [
"```r\n",
"function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL\n",
"```"
],
"text/plain": [
"function (p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) \n",
"NULL"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"args(dnorm)\n",
"args(pnorm)\n",
"args(rnorm)\n",
"args(qnorm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"