{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Lecture 8: Randomness Part I\n",
"\n",
"## Last time\n",
"Slicing of numpy arrays, messing around with images of shape `(M, N, 3)`, histograms. \n",
"\n",
"## Today\n",
"* review on histogram\n",
"* Randomness and scattered plot"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lambda function in Python\n",
"Recall in reviewing eigenvalues in Linear algebra, we want to avoid using lambda? This is because lambda has a special use in Python. It can be used to define a *function handle* or *anonymous function*, similar to `@` used in Matlab (`y = @(x) x^2 + 1`)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y = lambda x: x**2 + 1 # avoid using lambda in ordinary programming in Python"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this can be applied to ndarray as well"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Randomness\n",
"\n",
"Randomness is used a lot both in mathematics and the real world.\n",
"\n",
"Generally, a random number comes from a probability distribution. \n",
"\n",
"The distribution might be discrete: i.e., \n",
"it comes from a set \n",
"\n",
"$$ \\big\\{ (x_1, p_1), ..., (x_n, p_n) \\big\\},$$\n",
"\n",
"where you get outcome $x_i$ with probability $p_i$, i.e., \n",
"\n",
"$$P(X = x_i) = p_i.$$\n",
"\n",
"\n",
"It is assumed that $\\sum_i p_i = 1$ (if not you can normalize the $p$'s so their sum is 1). The function that takes $x_i \\mapsto p_i$ is called the *probability mass function*.\n",
"\n",
"For continuous random numbers, one normally uses a *probability density function* (pdf). For example, the normal distribution comes from the following function: $\\mathcal{N}(\\mu, \\sigma^2) $\n",
"\n",
"$$p(x; \\mu,\\sigma) = \\frac{1}{\\sqrt{2\\pi \\sigma^2}} e^{-\\frac{(x-\\mu )^2}{2\\sigma^2} },$$\n",
"\n",
"where $\\mu$ and $\\sigma$ are parameters (mean and standard deviation).\n",
"\n",
"The probability of a random number from this distribution being in the interval $[a,b]$ is then:\n",
"\n",
"$$P\\big(X\\in [a,b]\\big) = \\int_a^b p(x)\\,dx$$\n",
"\n",
"The most well-known distributions are the uniform distribution (where pdf is a constant) and the normal distribution. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Remark:\n",
"\n",
" The histogram is an estimate of the (probability) density distribution of a (continuous) variable."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"# let us graph the density function of the normal distribution.\n",
"from math import pi, sqrt, e\n",
"xs = np.linspace(-5,5,300)\n",
"pdf = lambda x: 1/sqrt(2*pi)*e**(-0.5*x**2) # pdf for N(0,1) standard normal dist\n",
"ys = pdf(xs)\n",
"plt.plot(xs, ys)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# numpy.random module\n",
"\n",
"\"pseudo\" random number generator."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from numpy import random # random submodule in numpy, natively vectorized"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Random integers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# random.randint()\n",
"# simulate a die rolling sequence\n",
"N = 2000\n",
"X = np.zeros(N)\n",
"for i in range(N):\n",
" X[i] = random.randint(1, 7) # from 1 (inclusive) to 7 (exclusive)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# what is the mean of the dice rolling?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Uniform distribution\n",
"\n",
"The easiest distribution is the uniform distribution on $(0,1)$, in which all numbers in a given interval are equally likely. We can use the function `random.random()` that will produce a uniformly distributed random number in $(0,1)$.\n",
"Furthermore, we can turn this uniform random number from $(0,1)$ into random numbers from $a$ to $b$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"random.seed(42)\n",
"# the seed will initialize the random number generator\n",
"# fixing the seed will fix the \"random\" number generated\n",
"for i in range(5):\n",
" r = random.random()\n",
" print(r)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def rnum(a,b):\n",
" return a + (b-a)*random.random()\n",
"\n",
"for i in range(5):\n",
" print(rnum(-3,6))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"N = 300\n",
"x = np.random.uniform(0,1,N) # this syntax is okay as well\n",
"y = np.random.uniform(low=0,high=1,size=N)\n",
"plt.scatter(x,y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Adding scattered noise to a linear function"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X = np.linspace(0,1,100)\n",
"Y = 3 * X + 1\n",
"plt.plot(X,Y)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# let's add some noise\n",
"Z = 3 * X + 1 + np.random.normal(loc=0,scale=1, size= X.shape[0])\n",
"# np.random.normal(0,1, X.shape[0]) same output \n",
"# loc is mean\n",
"# scale is standard dev\n",
"# size is the number of samples we draw in this distribution\n",
"# we'll see much more about randomness later\n",
"plt.scatter(X,Z) # we use a scatter plot\n",
"plt.plot(X,Y, color = \"red\", linewidth= 2.0)\n",
"plt.grid(True, linestyle = 'dashed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 1:\n",
"Write a function `rand_linear`, takes input of the slope `m` and `b`, the strength of the normal random noise (mean 0 and standard deviation `sigma`), and a numpy array `x`, returns the function values of the linear function $y = mx + b$ with a random noise."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# code here"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Normal distribution\n",
"\n",
"Best way to view a probability distribution? Histogram."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"N = 50 # no. of samples\n",
"mu = 0.0\n",
"sigma = 1.0\n",
"X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
"plt.hist(X, bins=10, edgecolor='k')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"N = 500000 # no of samples\n",
"mu = 0.0\n",
"sigma = 1.0\n",
"X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
"plt.axis([-6, 6, 0, 0.45]) # fix our axes view\n",
"plt.hist(X, bins=20, density=True, edgecolor= 'k')\n",
"# plt.hist()\n",
"# bin size = (total sample)/(no. of bins)\n",
"plt.grid(True, linestyle = 'dashed')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$\\sigma$ is the standard deviation, which measures how spread out the normal distribution is. For example:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"N = 500000\n",
"mu = 0.0\n",
"sigma = 2.0 # highers standard dev\n",
"X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
"plt.axis([-6, 6, 0, 0.45])\n",
"plt.hist(X, bins=20, density=True, edgecolor ='k')\n",
"plt.grid(True, linestyle = 'dashed')\n",
"plt.show()\n",
"\n",
"# looks the same but look at the numbers above and below"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise 2:\n",
"\n",
"* Change the `sigma` and the `bins` (no. of bins), while fix the axis by using `plt.axis([-6, 6, 0, 0.45])` like the plots above, see what happens.\n",
"* When plotting the histogram, toggle the option `density=True` to `density=False` (by default), see what happens."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Histogram of uniform distribution. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"N = 50000\n",
"X = np.random.uniform(low=0, high=1, size=N)\n",
"plt.hist(X, 50)\n",
"\n",
"plt.grid(True, linestyle = 'dashed')\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can compute the mean and standard deviation of any data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": false
},
"outputs": [],
"source": [
"np.mean(X)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"np.std(X)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In general, if `X` is our dataset, then the normal distribution with `mu = np.mean(X)`, and `sigma = np.std(X)` will fit the dataset's \"empircal distribution\" best.\n",
"\n",
"If a dataset's distribution is normal then **about 68 percent of the data values are within one standard deviation of the mean**:\n",
"$$\n",
"P(\\mu - \\sigma < X < \\mu+\\sigma) \\approx 68\\%\n",
"$$\n",
"\n",
"
\n",
"Reference: [68–95–99.7 rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.2"
},
"latex_envs": {
"LaTeX_envs_menu_present": true,
"autoclose": true,
"autocomplete": true,
"bibliofile": "biblio.bib",
"cite_by": "apalike",
"current_citInitial": 1,
"eqLabelWithNumbers": true,
"eqNumInitial": 1,
"hotkeys": {
"equation": "Ctrl-E",
"itemize": "Ctrl-I"
},
"labels_anchors": false,
"latex_user_defs": false,
"report_style_numbering": false,
"user_envs_cfg": false
},
"varInspector": {
"cols": {
"lenName": 16,
"lenType": 16,
"lenVar": 40
},
"kernels_config": {
"python": {
"delete_cmd_postfix": "",
"delete_cmd_prefix": "del ",
"library": "var_list.py",
"varRefreshCmd": "print(var_dic_list())"
},
"r": {
"delete_cmd_postfix": ") ",
"delete_cmd_prefix": "rm(",
"library": "var_list.r",
"varRefreshCmd": "cat(var_dic_list()) "
}
},
"types_to_exclude": [
"module",
"function",
"builtin_function_or_method",
"instance",
"_Feature"
],
"window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}