{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lecture 8: Randomness Part I\n",
    "\n",
    "## Last time\n",
    "Slicing of numpy arrays, messing around with images of shape `(M, N, 3)`, histograms. \n",
    "\n",
    "## Today\n",
    "* review on histogram\n",
    "* Randomness and scattered plot"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lambda function in Python\n",
    "Recall in reviewing eigenvalues in Linear algebra, we want to avoid using lambda? This is because lambda has a special use in Python. It can be used to define a *function handle* or *anonymous function*, similar to `@` used in Matlab (`y = @(x) x^2 + 1`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y = lambda x: x**2 + 1 # avoid using lambda in ordinary programming in Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# this can be applied to ndarray as well"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Randomness\n",
    "\n",
    "Randomness is used a lot both in mathematics and the real world.\n",
    "\n",
    "Generally, a random number comes from a probability distribution. \n",
    "\n",
    "The distribution might be discrete: i.e., \n",
    "it comes from a set \n",
    "\n",
    "$$ \\big\\{ (x_1, p_1), ..., (x_n, p_n) \\big\\},$$\n",
    "\n",
    "where you get outcome $x_i$ with probability $p_i$, i.e., \n",
    "\n",
    "$$P(X = x_i) = p_i.$$\n",
    "\n",
    "\n",
    "It is assumed that $\\sum_i p_i = 1$ (if not you can normalize the $p$'s so their sum is 1). The function that takes $x_i \\mapsto p_i$ is called the *probability mass function*.\n",
    "\n",
    "For continuous random numbers, one normally uses a *probability density function* (pdf). For example, the normal distribution comes from the following function: $\\mathcal{N}(\\mu, \\sigma^2) $\n",
    "\n",
    "$$p(x; \\mu,\\sigma) = \\frac{1}{\\sqrt{2\\pi \\sigma^2}} e^{-\\frac{(x-\\mu )^2}{2\\sigma^2} },$$\n",
    "\n",
    "where $\\mu$ and $\\sigma$ are parameters (mean and standard deviation).\n",
    "\n",
    "The probability of a random number from this distribution being in the interval $[a,b]$ is then:\n",
    "\n",
    "$$P\\big(X\\in [a,b]\\big) = \\int_a^b p(x)\\,dx$$\n",
    "\n",
    "The most well-known distributions are the uniform distribution (where pdf is a constant) and the normal distribution. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Remark:\n",
    "\n",
    " The histogram is an estimate of the (probability) density distribution of a (continuous) variable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# let us graph the density function of the normal distribution.\n",
    "from math import pi, sqrt, e\n",
    "xs = np.linspace(-5,5,300)\n",
    "pdf = lambda x: 1/sqrt(2*pi)*e**(-0.5*x**2) # pdf for N(0,1) standard normal dist\n",
    "ys = pdf(xs)\n",
    "plt.plot(xs, ys)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# numpy.random module\n",
    "\n",
    "\"pseudo\" random number generator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from numpy import random # random submodule in numpy, natively vectorized"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Random integers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# random.randint()\n",
    "# simulate a die rolling sequence\n",
    "N = 2000\n",
    "X = np.zeros(N)\n",
    "for i in range(N):\n",
    "    X[i] = random.randint(1, 7)   # from 1 (inclusive) to 7 (exclusive)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# what is the mean of the dice rolling?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Uniform distribution\n",
    "\n",
    "The easiest distribution is the uniform distribution on $(0,1)$, in which all numbers in a given interval are equally likely. We can use the function `random.random()` that will produce a uniformly distributed random number in $(0,1)$.\n",
    "Furthermore, we can turn this uniform random number from $(0,1)$ into random numbers from $a$ to $b$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "random.seed(42)\n",
    "# the seed will initialize the random number generator\n",
    "# fixing the seed will fix the \"random\" number generated\n",
    "for i in range(5):\n",
    "    r = random.random()\n",
    "    print(r)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rnum(a,b):\n",
    "    return a + (b-a)*random.random()\n",
    "\n",
    "for i in range(5):\n",
    "    print(rnum(-3,6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "N = 300\n",
    "x = np.random.uniform(0,1,N) # this syntax is okay as well\n",
    "y = np.random.uniform(low=0,high=1,size=N)\n",
    "plt.scatter(x,y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding scattered noise to a linear function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = np.linspace(0,1,100)\n",
    "Y = 3 * X + 1\n",
    "plt.plot(X,Y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# let's add some noise\n",
    "Z = 3 * X + 1 + np.random.normal(loc=0,scale=1, size= X.shape[0])\n",
    "# np.random.normal(0,1, X.shape[0]) same output \n",
    "# loc is mean\n",
    "# scale is standard dev\n",
    "# size is the number of samples we draw in this distribution\n",
    "# we'll see much more about randomness later\n",
    "plt.scatter(X,Z)  # we use a scatter plot\n",
    "plt.plot(X,Y, color = \"red\", linewidth= 2.0)\n",
    "plt.grid(True, linestyle = 'dashed')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 1:\n",
    "Write a function `rand_linear`, takes input of the slope `m` and `b`, the strength of the normal random noise (mean 0 and standard deviation `sigma`), and a numpy array `x`, returns the function values of the linear function $y = mx + b$ with a random noise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Normal distribution\n",
    "\n",
    "Best way to view a probability distribution? Histogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "N = 50 # no. of samples\n",
    "mu = 0.0\n",
    "sigma = 1.0\n",
    "X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
    "plt.hist(X, bins=10, edgecolor='k')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "N = 500000 # no of samples\n",
    "mu = 0.0\n",
    "sigma = 1.0\n",
    "X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
    "plt.axis([-6, 6, 0, 0.45]) # fix our axes view\n",
    "plt.hist(X,  bins=20,  density=True, edgecolor= 'k')\n",
    "# plt.hist()\n",
    "# bin size = (total sample)/(no. of bins)\n",
    "plt.grid(True, linestyle = 'dashed')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$\\sigma$ is the standard deviation, which measures how spread out the normal distribution is. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "N = 500000\n",
    "mu = 0.0\n",
    "sigma = 2.0  # highers standard dev\n",
    "X = np.random.normal(loc=mu, scale=sigma, size=N)\n",
    "plt.axis([-6, 6, 0, 0.45])\n",
    "plt.hist(X, bins=20, density=True, edgecolor ='k')\n",
    "plt.grid(True, linestyle = 'dashed')\n",
    "plt.show()\n",
    "\n",
    "# looks the same but look at the numbers above and below"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercise 2:\n",
    "\n",
    "* Change the `sigma` and the `bins` (no. of bins), while fix the axis by using `plt.axis([-6, 6, 0, 0.45])` like the plots above, see what happens.\n",
    "* When plotting the histogram, toggle the option `density=True` to `density=False` (by default), see what happens."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Histogram of uniform distribution. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "N = 50000\n",
    "X = np.random.uniform(low=0, high=1, size=N)\n",
    "plt.hist(X, 50)\n",
    "\n",
    "plt.grid(True, linestyle = 'dashed')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can compute the mean and standard deviation of any data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "np.mean(X)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "np.std(X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, if `X` is our dataset, then the normal distribution with `mu = np.mean(X)`, and `sigma = np.std(X)` will fit the dataset's \"empircal distribution\" best.\n",
    "\n",
    "If a dataset's distribution is normal then **about 68 percent of the data values are within one standard deviation of the mean**:\n",
    "$$\n",
    "P(\\mu - \\sigma < X < \\mu+\\sigma) \\approx 68\\%\n",
    "$$\n",
    "\n",
    "<br><br>\n",
    "Reference: [68–95–99.7 rule](https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  },
  "latex_envs": {
   "LaTeX_envs_menu_present": true,
   "autoclose": true,
   "autocomplete": true,
   "bibliofile": "biblio.bib",
   "cite_by": "apalike",
   "current_citInitial": 1,
   "eqLabelWithNumbers": true,
   "eqNumInitial": 1,
   "hotkeys": {
    "equation": "Ctrl-E",
    "itemize": "Ctrl-I"
   },
   "labels_anchors": false,
   "latex_user_defs": false,
   "report_style_numbering": false,
   "user_envs_cfg": false
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}