{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# CS 236756 - Technion - Intro to Machine Learning\n", "---\n", "\n", "#### Tal Daniel\n", "\n", "## Tutorial 01 - Probability Refresher and Maximum Likelihood Estimator (MLE)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Agenda\n", "---\n", "* [Probability Basics](#-Probability-Basics)\n", "* [Bayes Rule](#-Bayes-Rule)\n", "* [Expectation & Variance](#-Mean-&-Variance)\n", "* [Correlation](#-Correlation)\n", "* [Maximum Likelihood Estimator](#-Maximum-Likelihood-Estimation-(MLE))\n", "* [Recommended Videos](#-Recommended-Videos)\n", "* [Credits](#-Credits)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "# imports for the tutorial\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "%matplotlib notebook" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Probability Basics\n", "---\n", "We define the following:\n", "* **Experiment** - an experiment or trial is any procedure that can be infinitely repeated and has a well-defined set of possible outcomes, known as the sample space.\n", " * Example: toss a coin twice\n", "* **Sample Space ($\\Omega$)** - possible outcomes of an experiment\n", " * Example (coin toss): {HH, HT, TH, TT} (H = Heads, T = Tails)\n", "* **Event** - a subset of possible outcomes\n", " * Example (coin toss): A = {HH} , B= {HT, TH}" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **Probability (of an event)** - a number assigned to an event\n", " * Example (coin toss): $Pr(A) = \\frac{1}{4}$\n", "* **Axioms**:\n", " 1. $0 \\leq Pr(A) \\leq 1$\n", " 2. $Pr(\\Omega) = 1 $\n", " 3. $Pr(A \\cup B) = Pr(A) + Pr(B) - P(A \\cap B)$ (if $A, B$ are independent $P(A \\cap B) = 0$)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "(image from tistats.com)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Summary\n", "\n", "\n", "| Term | Usually donated by | Definition | Example |\n", "| --- | --- | --- | --- |\n", "| **Experiment** | |
any procedure that can be infinitely repeated and has a well-defined set of possible outcomes
|
toss a coin twice
|\n", "| **Sample** | $\\omega$ |
A single outcome of an experiment
|
A single outcome. for example: $H$
|\n", "| **Sample Space** | $\\Omega$ |
The set of all possible outcomes
|
The set of all possible outcomes, for example: {HH, HT, TH, TT}
|\n", "| **Event** | $A$ |
a subset of possible outcomes
|
A = {HH} , B= {HT, TH}
An empty set $\\emptyset$
The entire set (any outcome): $\\Omega$
|\n", "| **Event Space** | $\\mathcal{F}$ |
The space of all possible events
|
$\\{HH\\}, \\{HT\\}, \\{TH\\}, \\{TT\\}, \\{\\emptyset\\}, \\{\\Omega\\}$
|\n", "| **Probability** | P, Pr |
A function $P:\\mathcal{F}\\rightarrow\\left[0,1\\right]$ which assigns a probability to each event
|
$P(HT) = \\frac{1}{4}$
$P(\\emptyset)=0$
$P(\\Omega)=1$
|\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "first toss result: H\n", "second toss result: T\n" ] } ], "source": [ "possible_outcomes = ['H', 'T']\n", "probabilities = [0.5, 0.5]\n", "# toss a coin twice\n", "first_toss = np.random.choice(possible_outcomes, p=probabilities)\n", "print(\"first toss result: \", first_toss)\n", "second_toss = np.random.choice(possible_outcomes, p=probabilities)\n", "print(\"second toss result: \", second_toss)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercise 1 - Dice Probability\n", "---\n", "Find the proability of getting an even number **or** a number that is a multiple of 3." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Solution 1\n", "---\n", "* $S = \\{1, 2, 3, 4, 5, 6\\}$\n", "* $ P(A=even) = \\frac{3}{6} = 0.5$\n", "* $ P(B=\\textit{multiple of 3}) = \\frac{2}{6} = \\frac{1}{3}$\n", "* $ P(A \\cap B) = \\frac{1}{6}$, since $ A \\cap B = \\{6\\}$\n", "* $\\rightarrow P(A \\cup B) = \\frac{3}{6} + \\frac{2}{6} - \\frac{1}{6} = \\frac{4}{6} = \\frac{2}{3} $\n", "\n", "Why is it different than just giving the probability of getting a 6? Because we ask about 2 different events (even number and a multiple of 3), and we ask for the union of 2 events." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Joint Probability\n", "---\n", "* **Joint Probability - Pr(A,B)** - the probability the both event A and event B happen ($Pr(A,B) \\geq 0$)\n", "* **Marginal Distributions** -\n", " * $\\sum_i Pr(A_i, B_j) = Pr(B_j)$\n", " * $\\sum_j Pr(A_i, B_j) = Pr(A_i)$\n", "* **Law of Total Probability** - suppose the events $B_1, B_2, ..., B_k$ are mutually exclusive (intersection of all events is zero) and form a partition of the sample space (i.e. one of them must occur), then for any event $Pr(A)$: $$Pr(A) = \\sum_{j=1}^k Pr(A|B_j)P(B_j)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **Conditioning** - if A and B are events and $Pr(B) > 0$, the **conditional probability of A given B** is: $Pr(A|B)$\n", " * $Pr(A|B) = \\frac{P(A,B)}{P(B)}$\n", " * $\\rightarrow Pr(A,B) = Pr(A|B)Pr(B)$\n", " * **the chain rule** - in the general case: $$Pr(\\bigcap_i A_i) = \\prod_{i=n}^1 Pr(A_i|A_{i-1}, ..., A_1)$$\n", " * $Pr(A,B,C) = P(A|B,C)P(B,C) = P(A|B,C)P(B|C)P(C)$\n", " * Is that the only option? No! $Pr(A,B,C) = P(C|A,B)P(A,B) = P(C|A,B)P(B|A)P(A)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **Independence** - \n", " * Two events A and B are independent iff\n", " * $Pr(A|B) = Pr(A)$\n", " * $Pr(A,B) = Pr(A)Pr(B)$\n", " * For a set of events $\\{A_i\\}$, independence:\n", " * $Pr(\\bigcap_i A_i) = \\prod_i Pr(A_i)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **Conditional Independence** - Events A and B are conditionally independent given C:\n", " * $Pr(A,B|C) = Pr(A|C)Pr(B|C)$\n", " * $Pr(A|B,C) = Pr(A|C)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Continuous Random Variables\n", "---\n", "Assume $X$ is a continuous random variable. We define the following:\n", "* **Cumulative Distribution Function (CDF)** - $F(x) = P(X \\leq x)$\n", " * The CDF is monotonically non-decreasing\n", " * $ P(a < X \\leq b) = F(b) - F(a)$\n", " * $\\lim_{a\\to\\infty} F_x(a) = 1$\n", " * $\\lim_{a\\to 0} F_x(a) = 0$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **Probability Density Function (PDF)** - $f(x) = \\frac{d}{dx}F(x)$\n", " * $P(a < X \\leq b) = \\int_a^b f(x)dx $\n", "* All we have seen for **discrete** variables hold for **continuous** by replacing the sum with an integral\n", "* Note that unlike the discrete case the PDF can be larger than one, i.e. $p(x) > 0$ but is not upper bounded. In order for the integral of the PDF to be smaller than 1, the PDF can be larger than one, but not for long intervals (recall that the CDF is just the area under the PDF)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "def plot_normal_pdf_cdf(mu=0, sigma=1):\n", " x = np.linspace(-10, 10, 1000)\n", " x_pdf = (1 / np.sqrt(2 * np.pi * sigma ** 2)) * np.exp(- (x - mu) ** 2 / (2 * sigma ** 2))\n", " x_cdf = np.cumsum(x_pdf) / (len(x) / (np.max(x) - np.min(x))) # normalization\n", " fig = plt.figure(figsize=(8,5))\n", " ax = fig.add_subplot(1,1,1)\n", " ax.plot(x, x_pdf, label='PDF')\n", " ax.plot(x, x_cdf, label='CDF')\n", " ax.grid()\n", " ax.legend()\n", " ax.set_xlabel('x')\n", " ax.set_title('PDF and CDF of Normal Distribution')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_normal_pdf_cdf(mu=0, sigma=1)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "plot_normal_pdf_cdf(mu=0, sigma=0.3)\n", "# notice how the pdf can be larger than 1 " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Bayes Rule\n", "---\n", "Suppose that the events $B_1, ..., B_k$ are mutually exclusive and form a partiotion of the sample space (i.e. one of them must occur), then for any event $Pr(A)$, **Bayes Rule**:\n", "$$Pr(B_i|A) = \\frac{Pr(A,B_i)}{Pr(A)} = \\frac{Pr(A|B_i)Pr(B_i)}{Pr(A)} = \\frac{Pr(A|B_i)Pr(B_i)}{\\sum_{j=1}^k Pr(A|B_j)Pr(B_j)}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### REMEMBER THESE!\n", "* **Posterior Distribution** - $Pr(B_i|A)$\n", "* **Liklihood Distribution** - $Pr(A|B_i)$\n", "* **Prior Distribution** - $Pr(B_i)$\n", "* **Evidence** - $Pr(A)$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "\n", "* Image Source" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example \n", "---\n", "Given a dataset where each sample is a male or a female with their height, what is the probability that given a certain height, that person is a female, that is, calculate: $Pr(\\textit{Gender} = \\textit{Female} | \\textit{Height} = X cm)$?" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of rows in the dataset: 10000\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GenderHeight
0Male187.571423
1Male174.706036
2Male188.239668
3Male182.196685
4Male177.499761
5Male170.822660
6Male174.714106
7Male173.605229
8Male170.228132
9Male161.179495
\n", "
" ], "text/plain": [ " Gender Height\n", "0 Male 187.571423\n", "1 Male 174.706036\n", "2 Male 188.239668\n", "3 Male 182.196685\n", "4 Male 177.499761\n", "5 Male 170.822660\n", "6 Male 174.714106\n", "7 Male 173.605229\n", "8 Male 170.228132\n", "9 Male 161.179495" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# load the data\n", "dataset = pd.read_csv('./datasets/heights_dataset.csv')\n", "# use only the heights\n", "dataset = dataset.drop('Weight', axis=1)\n", "# inch -> cm\n", "dataset['Height'] = dataset['Height'] * 2.54\n", "## print the number of rows in the data set\n", "number_of_rows = len(dataset)\n", "print('Number of rows in the dataset: {}'.format(number_of_rows))\n", "## show the first 10 rows\n", "dataset.head(10)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Histogram\n", "---\n", "* A histogram is an accurate representation of the distribution of numerical data. \n", "* It is an estimate of the probability distribution (PDF) of a continuous variable. \n", "* To construct a histogram, the first step is to \"bin\" (or \"bucket\") the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. \n", " * The bins are usually specified as consecutive, non-overlapping intervals of a variable. \n", " * The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/plain": [ "Text(0.5, 0, 'Height(cm)')" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# let's plot the histogram\n", "figure = plt.figure()\n", "ax = figure.add_subplot(1,1,1)\n", "male_ds = dataset[:5000].rename(index=str, columns={\"Height\": \"Male\"}).plot.hist(ax=ax)\n", "female_ds = dataset[5000:].rename(index=str, columns={\"Height\": \"Female\"}).plot.hist(ax=ax)\n", "ax.grid()\n", "ax.set_xlabel('Height(cm)')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* Assume that the height is a *discrete* variable (we use quantization on the dataset, only integers).\n", "* $Pr(Male) = Pr(Female) = 0.5$\n", "* $Pr(170cm|Female) \\approx \\frac{800}{5000} = 0.16$ (in the presentation $0.1$)\n", "* $Pr(170cm|Male) \\approx \\frac{1300}{5000} = 0.26$ (in the presentation $0.3$)\n", "* Using **Bayes rule**: $$ Pr(Female|170cm) = \\frac{Pr(170cm|Female)Pr(Female)}{Pr(170cm)} = \\frac{Pr(170cm|Female)Pr(Female)}{Pr(170cm|Male)Pr(Male) + Pr(170cm|Female)Pr(Female)} = \\frac{0.16 * 0.5}{0.16 * 0.5 + 0.26 * 0.5} = 0.38$$ ($0.25$ in the presentation)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mean & Variance\n", "---\n", "### Mean (Expectation) - $\\mu$\n", "---\n", "The mean is the proability weighted average of all possible values.\n", "* Discrete Variables:\n", " * $\\mathbb{E}[X] = \\sum_{x \\in X} xp(x)$\n", " * $\\mathbb{E}[f(X)] = \\sum_{x \\in X} f(x)p(x)$\n", "* Continuous Variables:\n", " * $\\mathbb{E}[X] = \\int_x xp(x)$\n", " * $\\mathbb{E}[f(X)] = \\int_{x \\in X} f(x)p(x)$\n", "* Example: the mean of a fair six-sided dice: $$\\mathbb{E}[X] = 1*\\frac{1}{6} + 2*\\frac{1}{6} + 3*\\frac{1}{6} + 4*\\frac{1}{6} + 5*\\frac{1}{6} + 6*\\frac{1}{6} = 3.5$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* **The Law of Total Expectation (Smoothing Theorem)**: $$\\mathbb{E}[X] = \\mathbb{E}\\big[\\mathbb{E}[X|Y] \\big] $$\n", " * Proof: $$\\mathbb{E}\\big[\\mathbb{E}[X|Y] \\big] = \\mathbb{E}\\big[ \\sum_x x \\cdot P(X=x|Y) \\big] = \\sum_y \\big[ \\sum_x x \\cdot P(X=x|Y) \\big] \\cdot P(Y=y)$$ $$ = \\sum_y \\big[ \\sum_x x \\cdot P(X=x|Y) \\cdot P(Y=y)\\big] = \\sum_y \\big[ \\sum_x x \\cdot P(X=x, Y=y)\\big] $$ $$ = \\sum_x x\\cdot \\big[ \\sum_y \\cdot P(X=x, Y=y)\\big] = \\sum_x x\\cdot \\big[P(X=x)\\big] = \\mathbb{E}[X]$$\n", " * Example: Suppose that two factories supply light bulbs to the market. Factory X's bulbs work for an average of 5000 hours, whereas factory Y's bulbs work for an average of 4000 hours. It is known that factory X supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for? $$\\mathbb{E}[L] = \\mathbb{E}\\big[\\mathbb{E}[L|factory] \\big] = \\mathbb{E}[L|X] \\cdot P(X) + \\mathbb{E}[L|Y] \\cdot P(Y) = 5000 \\cdot 0.6 + 4000 \\cdot 0.4 = 4600$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Variance - $\\sigma^2$\n", "---\n", "The variance is a measure of the \"spread\" of the distribution (can also be considered as confidence).\n", "\n", "* $var[X] = \\mathbb{E}[(X - \\mu)^2] = \\sum (x-\\mu)^2 p(x) = \\sum x^2 p(x) + \\mu^2 \\sum p(x) -2 \\mu \\sum x p(x) = \\mathbb{E}[X^2] - \\mu^2$\n", "* **The Standard Deviation** - $std[X] = \\sqrt{var[X]}$\n", "* Example: the variance of a fair six-sided dice: $$var[X] = \\sum_{i=1}^6 \\frac{1}{6} (i-3.5)^2$$ $$ E[X^2] = 1^2 *\\frac{1}{6} + 2^2 *\\frac{1}{6} + 3^2 *\\frac{1}{6} + 4^2*\\frac{1}{6} + 5^2*\\frac{1}{6} + 6^2*\\frac{1}{6} = \\frac{91}{6} $$ $$ var[X] = E[X^2] -\\mu ^2 = \\frac{91}{6} - 3.5^2 \\approx 2.92 $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Example cont.\n", "---\n", "What is the mean and variance of the heights of males? females? combined together?" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the mean of males' height is: 175.327 cm\n", "the variance of males' height is: 52.896 cm^2\n", "the std of males' height is: 7.273 cm\n" ] } ], "source": [ "# easy with pandas\n", "print(\"the mean of males' height is: {:.3f} cm\".format(dataset[:5000].Height.mean()))\n", "print(\"the variance of males' height is: {:.3f} cm^2\".format(dataset[:5000].Height.var()))\n", "print(\"the std of males' height is: {:.3f} cm\".format(dataset[:5000].Height.std()))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the mean of females' height is: 161.820 cm\n", "the variance of females' height is: 46.903 cm^2\n", "the std of females' height is: 6.849 cm\n" ] } ], "source": [ "print(\"the mean of females' height is: {:.3f} cm\".format(dataset[5000:].Height.mean()))\n", "print(\"the variance of females' height is: {:.3f} cm^2\".format(dataset[5000:].Height.var()))\n", "print(\"the std of females' height is: {:.3f} cm\".format(dataset[5000:].Height.std()))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the mean of total height is: 168.574 cm\n", "the variance of total height is: 95.506 cm^2\n", "the std of total' height is: 9.773 cm\n" ] } ], "source": [ "print(\"the mean of total height is: {:.3f} cm\".format(dataset.Height.mean()))\n", "print(\"the variance of total height is: {:.3f} cm^2\".format(dataset.Height.var()))\n", "print(\"the std of total' height is: {:.3f} cm\".format(dataset.Height.std()))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Correlation\n", "---\n", "Correlation is a measure of linear dependency between two variables.\n", "\n", "We define **correlation** between two Random Variables (RV) as $\\sigma_{xy}$:\n", "* $\\sigma_{xy} = Cov(X,Y) = \\mathbb{E}[(X - \\mu_x)(Y - \\mu_y)] = \\mathbb{E}[XY] - \\mu_x \\mu_y$\n", "* $X, Y$ are **uncorrelated** if $\\sigma_{xy} = 0 \\leftrightarrow \\mu_{xy} = \\mu_x \\mu_y$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ " **REMEMBER**: \n", "\n", "Independence $\\rightarrow$ Uncorrelated **BUT** Uncorrelated $\\nrightarrow$ Independence" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Pearson's Correlation Coefficient (Pearson's r)\n", "---\n", "It is a measure of linear correlation between two variables X and Y, denoted $\\rho$ or $r_{xy}$:\n", "* $\\rho = r_{xy} = \\frac{\\sigma_{xy}}{\\sigma_x \\sigma_y}$\n", "* From **Cauchy-Schwarz** inequality: $$-1 \\leq \\rho \\leq 1$$\n", " * $\\rho = 0$ - no correlation\n", " * $\\rho = 1$ - positive linear correlation\n", " * $\\rho = -1$ - negative linear correlation\n", " * Reminder: Cauchy-Schwarz inequality:$|\\langle x,y \\rangle|^2 \\leq \\langle x,x \\rangle \\cdot \\langle y,y \\rangle$, equality iff $x,y$ are linealy dependent.\n", " * $\\mathbb{E}(X,Y) = \\langle X, Y \\rangle \\rightarrow |\\mathbb{E}(X,Y)|^2 \\leq \\mathbb{E}(X^2) \\mathbb{E}(Y^2) $\n", " * $$|Cov(X,Y)|^2 = |\\mathbb{E}[(X-\\mu_x) (Y-\\mu_y)]|^2 = |\\langle X-\\mu_x, Y-\\mu_y \\rangle|^2 \\leq \\langle X-\\mu_x, X-\\mu_x \\rangle \\cdot \\langle Y-\\mu_y, Y-\\mu_y \\rangle$$ $$ = \\mathbb{E}[(X - \\mu_x)^2] \\mathbb{E}[(Y - \\mu_y)^2] = Var(X) Var(Y) $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Illustration:\n", "\n", "By Kiatdd - Own work, CC BY-SA 3.0, Link" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Example \n", "---\n", "* Let $X \\in \\{-1,0,1\\} \\sim U(\\frac{1}{3})$, $Y=X^2$\n", "* $X,Y$ are clearly dependent **BUT**:\n", " * $\\mu_x = 0$, $\\mu_y = \\frac{2}{3}$\n", " * $\\mu_{xy} = \\mathbb{E}[X^3] = 0$\n", " * $Cov(X,Y) = \\mu_{xy} - \\mu_x \\mu_y = 0 - 0 * \\frac{2}{3} = 0$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Correlation DOES NOT Imply Causation\n", "Below are examples (from the presentation) that show correlated variables, but they are not neccessariy caused by one another:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Outliers\n", "---\n", "* In statistics, an outlier is an observation point that is distant from other observations. \n", "* An outlier may be due to variability in the measurement or it may indicate experimental error. The latter are sometimes excluded from the data set. \n", "* An outlier can cause serious problems in statistical analyses." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Example\n", "---\n", "If one is calculating the average temperature of 10 objects in a room, and nine of them are between 20 and 25 degrees Celsius, but an oven is at 175 °C, the median of the data will be between 20 and 25 °C but the mean temperature will be between 35.5 and 40 °C. In this case, the median better reflects the temperature of a randomly sampled object (but not the temperature in the room) than the mean. Naively interpreting the mean as \"a typical sample\", equivalent to the median, is incorrect. As illustrated in this case, outliers may indicate data points that belong to a different population than the rest of the sample set." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Correlation - Sensitive to Ouliers\n", "All the examples (from the presentation) below share the same Pearson's r:\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Vectors of Random Variables\n", "---\n", "* Let $\\overline{X}$ be a d-dimensional **random** vector\n", " * $\\overline{X} = [x_1, x_2, ..., x_d]$\n", "* The d-dimensional **mean vector $\\overline{\\mu}$** is:\n", " * $\\overline{\\mu} = \\mathbb{E}[\\overline{X}] = [\\mathbb{E}[x_1], \\mathbb{E}[x_2], ..., \\mathbb{E}[x_d]] = [\\mu_1, \\mu_2, ..., \\mu_d]$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* The **covariance matrix $\\Sigma$** is defined as the (**square**) matrix, where each component $\\sigma_{ij}$ is the covariance of $x_i, x_j$:\n", " * $\\sigma_{ij} = \\mathbb{E}[(x_i - \\mu_i)(x_j - \\mu_j)]$\n", " * $$\\Sigma = \n", " \\begin{pmatrix}\n", " \\sigma_1 ^2 & \\sigma_{1,2} & \\cdots & \\sigma_{1,d} \\\\\n", " \\sigma_{2,1} & \\sigma_2^2 & \\cdots & \\sigma_{2,d} \\\\\n", " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", " \\sigma_{d,1} & \\sigma_{d,2} & \\cdots & \\sigma_d^2 \n", " \\end{pmatrix}$$\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Multivariate Normal Distribution\n", "---\n", "* $ x \\sim N_d (\\mu, \\Sigma)$\n", "* $$f(x) = \\frac{1}{(2\\pi)^{\\frac{d}{2}} |\\Sigma|^{\\frac{1}{2}}} e^{- \\frac{1}{2}(x - \\mu)^{T} \\Sigma^{-1} (x - \\mu)}$$" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mu:\n", "[0. 0. 0. 0. 0.]\n", "Sigma:\n", "[[0.04800934 0.0189633 0.13069076 0.03377338 0.08095299]\n", " [0.0189633 0.00760599 0.05172954 0.01336806 0.03204252]\n", " [0.13069076 0.05172954 0.35660823 0.0921296 0.22082975]\n", " [0.03377338 0.01336806 0.0921296 0.02390833 0.05706729]\n", " [0.08095299 0.03204252 0.22082975 0.05706729 0.13688724]]\n", "draw a sample from each variable:\n", "[-0.24298271 -0.09838847 -0.72393916 -0.176606 -0.41587365]\n" ] } ], "source": [ "num_samples = 1000\n", "num_variables = 5\n", "mu = np.random.random(size=(1, num_variables))\n", "sigma = mu * mu.T + np.eye(num_variables) * 1e-4\n", "# generate multivariate distribution\n", "mult_var = np.random.multivariate_normal(np.zeros(num_variables), sigma)\n", "print(\"mu:\")\n", "print(np.zeros(num_variables))\n", "print(\"Sigma:\")\n", "print(sigma)\n", "print(\"draw a sample from each variable:\")\n", "print(mult_var)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mahalanobis Distance\n", "---\n", "* $d = (x - \\mu)^{T} \\Sigma^{-1} (x - \\mu)$\n", "* Measures the distance from $x$ to $\\mu$ in terms of $\\Sigma$, that is, the distance between a point $p$ and a distribution $D$.\n", " * If $p$ is the mean of $D$, the distance is 0.\n", " * The distance grows as $p$ moves away from the mean along each principal component axis.\n", " * Note: if $\\Sigma = I$ , that is the **Euclidean** distance.\n", "* It normalizes the expression for difference in variances and correlations." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Example - Bivariate\n", "---\n", "* $d=2$\n", "* $\\Sigma = \n", " \\begin{pmatrix}\n", " \\sigma_1 ^2 & \\rho \\sigma_1 \\sigma_2 \\\\\n", " \\rho \\sigma_1 \\sigma_2 & \\sigma_2^2\n", " \\end{pmatrix}$\n", "* $\\rho$ is the Pearson coefficient" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* Reminder: 2x2 matrix inversion $$ \\begin{pmatrix}\n", " a & b \\\\\n", " c & d\n", " \\end{pmatrix}^{-1} = \\frac{1}{ad-bc} \\begin{pmatrix}\n", " d & -b \\\\\n", " -c & a\n", " \\end{pmatrix} $$\n", "
\n", "* $$\\Sigma^{-1} = \\frac{1}{\\sigma_1^2 \\sigma_2^2 (1 - \\rho^2)} \\begin{pmatrix} \\sigma_2 ^2 & - \\rho \\sigma_1 \\sigma_2 \\\\ - \\rho \\sigma_1 \\sigma_2 & \\sigma_1^2 \\end{pmatrix}$$\n", "
\n", "* $$f(x_1, x_2) = \\frac{1}{2\\pi \\sigma_1 \\sigma_2 \\sqrt{1 - \\rho^2}} e^{- \\frac{1}{2(1-\\rho^2)}(z_1^2 -2\\rho z_1 z_2 + z_2^2)}$$\n", " * $z_i = \\frac{x_i - \\mu_i}{\\sigma_i}$ (also called **standardization**)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "skip" } }, "outputs": [], "source": [ "from matplotlib import cm\n", "from mpl_toolkits.mplot3d import Axes3D\n", "from scipy.stats import multivariate_normal\n", "\n", "def plot_3d_normal_dist():\n", " # Our 2-dimensional distribution will be over variables X and Y\n", " N = 60\n", " X = np.linspace(-3, 3, N)\n", " Y = np.linspace(-3, 4, N)\n", " X, Y = np.meshgrid(X, Y)\n", "\n", " # Mean vector and covariance matrix\n", " mu = np.array([0., 1.])\n", " Sigma = np.array([[ 1. , -0.5], [-0.5, 1.5]])\n", "\n", " # Pack X and Y into a single 3-dimensional array\n", " pos = np.empty(X.shape + (2,))\n", " pos[:, :, 0] = X\n", " pos[:, :, 1] = Y\n", "\n", " F = multivariate_normal(mu, Sigma)\n", " Z = F.pdf(pos)\n", "\n", " # Create a surface plot and projected filled contour plot under it.\n", " fig = plt.figure(figsize=(8,5))\n", " ax = fig.gca(projection='3d')\n", " ax.plot_surface(X, Y, Z, rstride=3, cstride=3, linewidth=1, antialiased=True,\n", " cmap=cm.viridis)\n", "\n", " # cset = ax.contourf(X, Y, Z, zdir='z', offset=-0.15, cmap=cm.viridis)\n", "\n", " # Adjust the limits, ticks and view angle\n", " ax.set_zlim(-0.15,0.2)\n", " ax.set_zticks(np.linspace(0,0.2,5))\n", " ax.view_init(27, -21)\n", "\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support. ' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
');\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('');\n", " button.click(method_name, toolbar_event);\n", " button.mouseover(tooltip, toolbar_mouse_event);\n", " nav_element.append(button);\n", " }\n", "\n", " // Add the status bar.\n", " var status_bar = $('');\n", " nav_element.append(status_bar);\n", " this.message = status_bar[0];\n", "\n", " // Add the close button to the window.\n", " var buttongrp = $('
');\n", " var button = $('');\n", " button.click(function (evt) { fig.handle_close(fig, {}); } );\n", " button.mouseover('Stop Interaction', toolbar_mouse_event);\n", " buttongrp.append(button);\n", " var titlebar = this.root.find($('.ui-dialog-titlebar'));\n", " titlebar.prepend(buttongrp);\n", "}\n", "\n", "mpl.figure.prototype._root_extra_style = function(el){\n", " var fig = this\n", " el.on(\"remove\", function(){\n", "\tfig.close_ws(fig, {});\n", " });\n", "}\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(el){\n", " // this is important to make the div 'focusable\n", " el.attr('tabindex', 0)\n", " // reach out to IPython and tell the keyboard manager to turn it's self\n", " // off when our div gets focus\n", "\n", " // location in version 3\n", " if (IPython.notebook.keyboard_manager) {\n", " IPython.notebook.keyboard_manager.register_events(el);\n", " }\n", " else {\n", " // location in version 2\n", " IPython.keyboard_manager.register_events(el);\n", " }\n", "\n", "}\n", "\n", "mpl.figure.prototype._key_event_extra = function(event, name) {\n", " var manager = IPython.notebook.keyboard_manager;\n", " if (!manager)\n", " manager = IPython.keyboard_manager;\n", "\n", " // Check for shift+enter\n", " if (event.shiftKey && event.which == 13) {\n", " this.canvas_div.blur();\n", " event.shiftKey = false;\n", " // Send a \"J\" for go to next cell\n", " event.which = 74;\n", " event.keyCode = 74;\n", " manager.command_mode();\n", " manager.handle_keydown(event);\n", " }\n", "}\n", "\n", "mpl.figure.prototype.handle_save = function(fig, msg) {\n", " fig.ondownload(fig, null);\n", "}\n", "\n", "\n", "mpl.find_output_cell = function(html_output) {\n", " // Return the cell and output element which can be found *uniquely* in the notebook.\n", " // Note - this is a bit hacky, but it is done because the \"notebook_saving.Notebook\"\n", " // IPython event is triggered only after the cells have been serialised, which for\n", " // our purposes (turning an active figure into a static one), is too late.\n", " var cells = IPython.notebook.get_cells();\n", " var ncells = cells.length;\n", " for (var i=0; i= 3 moved mimebundle to data attribute of output\n", " data = data.data;\n", " }\n", " if (data['text/html'] == html_output) {\n", " return [cell, data, j];\n", " }\n", " }\n", " }\n", " }\n", "}\n", "\n", "// Register the function which deals with the matplotlib target/channel.\n", "// The kernel may be null if the page has been refreshed.\n", "if (IPython.notebook.kernel != null) {\n", " IPython.notebook.kernel.comm_manager.register_target('matplotlib', mpl.mpl_figure_comm);\n", "}\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# let's see how the MLE performs\n", "plot_normal_mle()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercise 2.5 - MLE for m-Dimensional Gaussian\n", "---\n", "Given $\\{x_i\\}_{i=1}^n$ i.i.d samples of $X \\sim N(\\mu, \\Sigma)$, what is the MLE?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Solution 2.5\n", "---\n", "The final results are pretty much the same, but with vectors and matrices, though the math is a little more complicated.\n", "$$ \\hat{\\overline{\\mu}}_{MLE} = \\frac{1}{n} \\sum_{i=1}^n \\overline{x_i} $$\n", "$$ \\hat{\\Sigma}_{MLE} = \\frac{1}{n} \\sum_{i=1}^n (\\overline{x_i} - \\hat{\\overline{\\mu}}_{MLE}) (\\overline{x_i} - \\hat{\\overline{\\mu}}_{MLE})^{T}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Vector & Matrix Deriviatives\n", "---\n", "* $\\nabla_x Ax = A^{T}$\n", "* $\\nabla_x x^{T} A x = (A + A^{T}) x$ \n", "* $\\frac{\\partial}{\\partial A} \\ln |A| = A^{-T}$\n", "* $\\frac{\\partial}{\\partial A} Tr[AB] = B^{T}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Using the above, we will use the following:\n", "1. $\\nabla_{\\mu} {\\mu}^{T} \\Sigma^{-1} x_i = \\Sigma^{-1} x_i$ \n", "2. $\\nabla_{\\mu} {\\mu}^{T} \\Sigma^{-1} \\mu = (\\Sigma^{-1} + {\\Sigma}^{-T}) \\mu$\n", "3. $\\frac{\\partial}{\\partial \\Sigma^{-1}} \\ln |\\Sigma^{-1}| = \\Sigma^{T} = \\Sigma$\n", "4. $\\frac{\\partial}{\\partial \\Sigma^{-1}} Tr[\\Sigma^{-1} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu}) (\\overline{x_i} - \\overline{\\mu})^{T}] = \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu}) (\\overline{x_i} - \\overline{\\mu})^{T}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Solve for the d-dimensional case\n", "* $p(x|\\mu, \\Sigma) = \\frac{1}{(2\\pi)^{\\frac{nd}{2}} |\\Sigma|^{\\frac{n}{2}}} e^{- \\frac{1}{2}\\sum_{i=1}^n (x_i - \\mu)^{T} \\Sigma^{-1} (x_i - \\mu)}$\n", "\n", "
\n", "\n", "* $\\ln p(x|\\mu, \\Sigma) \\propto -\\frac{n}{2} \\ln |\\Sigma^{-1}| -\\frac{1}{2} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu})^{T} \\Sigma^{-1} (\\overline{x_i} - \\overline{\\mu}) $\n", "\n", "
\n", "\n", "* $\\nabla_{\\mu} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu})^{T} \\Sigma^{-1} (\\overline{x_i} - \\overline{\\mu}) = \\sum_{i=1}^{n} (-2\\Sigma^{-1} x_i + (\\Sigma^{-1} + {\\Sigma}^{-T}) \\mu) = 0 \\rightarrow \\hat{\\overline{\\mu}}_{MLE} = \\frac{1}{n} \\sum_{i=1}^n \\overline{x_i} $" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "* **The Trace Trick** - $\\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu})^{T} \\Sigma^{-1} (\\overline{x_i} - \\overline{\\mu}) = \\sum_{i=1}^n \\textit{Trace}\\big((\\overline{x_i} - \\overline{\\mu})^{T} \\Sigma^{-1} (\\overline{x_i} - \\overline{\\mu})\\big) = \\textit{Trace}\\big(\\Sigma^{-1} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu}) (\\overline{x_i} - \\overline{\\mu})^{T} \\big)$\n", "\n", "
\n", "\n", "* $\\frac{\\partial}{\\partial \\Sigma^{-1}}\\big( \\frac{n}{2} \\ln |\\Sigma^{-1}| -\\frac{1}{2} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu})^{T} \\Sigma^{-1} (\\overline{x_i} - \\overline{\\mu}) \\big) = \\frac{\\partial}{\\partial \\Sigma^{-1}}\\big( \\frac{n}{2} \\ln |\\Sigma^{-1}| -\\frac{1}{2} \\Sigma^{-1} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu}) (\\overline{x_i} - \\overline{\\mu})^{T} \\big) = $ $$ \\frac{n}{2} \\Sigma - \\frac{1}{2} \\sum_{i=1}^n (\\overline{x_i} - \\overline{\\mu}) (\\overline{x_i} - \\overline{\\mu})^{T} = 0 \\rightarrow \\hat{\\Sigma}_{MLE} = \\frac{1}{n} \\sum_{i=1}^n (\\overline{x_i} - \\hat{\\overline{\\mu}}_{MLE}) (\\overline{x_i} - \\hat{\\overline{\\mu}}_{MLE})^{T} $$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Exercise 3 - MLE for Geometric Distribution\n", "---\n", "Given $\\{x_i\\}_{i=1}^n$ i.i.d samples of $X \\sim \\textit{Geom}(\\theta)$, what is the MLE?\n", "\n", "Assume:\n", "* $f(x;\\theta) = Pr(X=x) = \\theta(1-\\theta)^{x-1}$\n", "* $ 0 < \\theta < 1$\n", "* $\\mathbb{E}(X) = \\frac{1}{\\theta}$\n", "* $Var(X) = \\frac{1 - \\theta}{\\theta^2}$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Solution 3\n", "---\n", "* $L(x_1, x_2, ..., x_n; \\theta) = \\prod_{i=1}^n f(x_i;\\theta) = \\theta^n (1 -\\theta)^{\\sum_{i=1}^n (x_i - 1)}$\n", "* $l(\\theta) = \\ln L(x_1, x_2, ..., x_n; \\theta) = \\sum_{i=1}^n \\ln f(x_i;\\theta) = n\\ln(\\theta) +\\ln (1-\\theta) \\sum_{i=1}^n (x_i - 1) $\n", "* $\\theta_{MLE} = \\underset{0 < \\theta < 1}{\\mathrm{argmax}} l(\\theta)$\n", "* First derivative: $$\\frac{\\partial l(\\theta)}{\\partial \\theta} = \\frac{n}{\\theta} - \\frac{1}{1 - \\theta} \\sum_{i=1}^n (x_i - 1)$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* Second derivative: $$\\frac{\\partial^2 l(\\theta)}{\\partial \\theta^2} = -\\frac{n}{\\theta^2} - (\\frac{1}{1 - \\theta})^2 \\sum_{i=1}^n (x_i - 1)$$\n", "* $$ \\frac{n}{\\theta} - \\frac{1}{1 - \\theta} \\sum_{i=1}^n (x_i - 1) = 0 \\rightarrow \\theta_{MLE} = \\frac{1}{n}\\sum_{i=1}^n x_i$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "* Plug in $\\theta_{MLE}$ in the second deriviative and keep in mind that $0 < \\theta_{MLE} < 1$:\n", " * $\\sum_{i=1}^n (x_i - 1) = n(\\frac{1}{\\theta_{MLE}} - 1) = n \\frac{1 - \\theta_{MLE}}{\\theta_{MLE}}$\n", " * $$ -\\frac{n}{\\theta_{MLE}^2} - (\\frac{1}{1 - \\theta_{MLE}})^2 \\sum_{i=1}^n (x_i - 1) = -\\frac{n}{\\theta_{MLE}^2} - (\\frac{1}{1 - \\theta_{MLE}})^2 n \\frac{1 - \\theta_{MLE}}{\\theta_{MLE}} = ... = - \\frac{n}{\\theta_{MLE}^2(1-\\theta_{MLE})} < 0$$\n", " * Since $0 < \\theta_{MLE} < 1$, we have a maximum." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Recommended Videos\n", "---\n", "#### Warning!\n", "* These videos do not replace the lectures and tutorials.\n", "* Please use these to get a better understanding of the material, and not as an alternative to the written material.\n", "\n", "#### Video By Subject\n", "* Basic Probability - Math Antics - Basic Probability\n", "* Probability for ML - Machine Learning 1/5: Probability\n", "* Maximum Likelihood Estimation (MLE)\n", " * Simple Version (6 min) - StatQuest\n", " * Complete Lecture (50 min) - Cornell CS4780" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "skip" } }, "source": [ "## Credits\n", "---\n", "* Icons from Icon8.com - https://icons8.com\n", "* Datasets from Kaggle - https://www.kaggle.com/" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 2 }