{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quantitative exploratory data analysis\n", "> A Summary of lecture \"Statistical Thinking in Python (Part 1)\", via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science, Statistics]\n", "- image: images/petal-ecdf.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "sns.set()\n", "\n", "df = pd.read_csv('./dataset/iris.csv')\n", "renamed_columns = ['sepal length (cm)', 'sepal width (cm)', \n", " 'petal length (cm)', 'petal width (cm)', 'species']\n", "df.columns = renamed_columns\n", "versicolor_petal_length = df[df['species'] == 'Versicolor']['petal length (cm)']\n", "setosa_petal_length = df[df['species'] == 'Setosa']['petal length (cm)']\n", "virginica_petal_length = df[df['species'] == 'Virginica']['petal length (cm)']\n", "versicolor_petal_width = df[df['species'] == 'Versicolor']['petal width (cm)']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to summary statistics: The sample mean and median\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing means\n", "The mean of all measurements gives an indication of the typical magnitude of a measurement. It is computed using ```np.mean()```." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I. versicolor: 4.26 cm\n" ] } ], "source": [ "# compute the mean: mean_length_vers\n", "mean_length_vers = np.mean(versicolor_petal_length)\n", "\n", "# Print the result with some nice formatting\n", "print('I. versicolor:', mean_length_vers, 'cm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Percentiles, outliers, and box plots\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute percentiles\n", "In this exercise, you will compute the percentiles of petal length of Iris versicolor.\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3.3 4. 4.35 4.6 4.9775]\n" ] } ], "source": [ "# Specify array of percentiles: percentiles\n", "percentiles = np.array([2.5, 25, 50, 75, 97.5])\n", "\n", "# Compute percentiles: ptiles_vers\n", "ptiles_vers = np.percentile(versicolor_petal_length, percentiles)\n", "\n", "# Print the result\n", "print(ptiles_vers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing percentiles to ECDF\n", "To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. \n", "\n", " Note that to ensure the Y-axis of the ECDF plot remains between 0 and 1, you will need to rescale the percentiles array accordingly - in this case, dividing it by 100." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def ecdf(data):\n", " \"\"\"Compute ECDF for a one-dimensional array of measurements.\"\"\"\n", " # Number of data points: n\n", " n = len(data)\n", "\n", " # x-data for the ECDF: x\n", " x = np.sort(data)\n", "\n", " # y-data for the ECDF: y\n", " y = np.arange(1, n + 1) / n\n", "\n", " return x, y" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x_vers, y_vers = ecdf(versicolor_petal_length)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the ECDF\n", "_ = plt.plot(x_vers, y_vers, '.')\n", "_ = plt.xlabel('petal length (cm)')\n", "_ = plt.ylabel('ECDF')\n", "\n", "# Overlay percentiles as red diamonds\n", "_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red', linestyle='none')\n", "plt.savefig('../images/petal-ecdf.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Box-and-whisker plot\n", "Making a box plot for the petal lengths is unnecessary because the iris data set is not too large and the bee swarm plot works fine. However, it is always good to get some practice. Make a box plot of the iris petal lengths. You have a pandas DataFrame, df, which contains the petal length data, in your namespace. Inspect the data frame df in the IPython shell using ```df.head()``` to make sure you know what the pertinent columns are.\n", "\n", "For your reference, the code used to produce the box plot in the video is provided below:\n", "```python\n", "_ = sns.boxplot(x='east_west', y='dem_share', data=df_all_states)\n", "_ = plt.xlabel('region')\n", "_ = plt.ylabel('percent of vote for Obama')\n", "```\n", "In the IPython Shell, you can use ```sns.boxplot?``` or ```help(sns.boxplot)``` for more details on how to make box plots using seaborn." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create box plot with Seaborn`s default settings\n", "_ = sns.boxplot(x='species', y='petal length (cm)', data=df)\n", "\n", "# Label the axes\n", "_ = plt.xlabel('species')\n", "_ = plt.ylabel('petal length (cm)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variance and standard deviation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ variance = \\dfrac{1}{n}\\sum^{n}_{i=1}(x_i - \\bar{x})^2 $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the variance\n", "It is important to have some understanding of what commonly-used functions are doing under the hood. Though you may already know how to compute variances, this is a beginner course that does not assume so. In this exercise, we will explicitly compute the variance of the petal length of Iris veriscolor using the equations discussed in the videos. We will then use ```np.var()``` to compute it." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.21640000000000004 0.21640000000000004\n" ] } ], "source": [ "# Array of differnces to mean: differences\n", "differences = np.array(versicolor_petal_length - np.mean(versicolor_petal_length))\n", "\n", "# Square the differences: diff_sq\n", "diff_sq = differences ** 2\n", "\n", "# Compute the mean square differences: variance_explicit\n", "variance_explicit = np.mean(diff_sq)\n", "\n", "# Compute the variance using NumPy: variance_np\n", "variance_np = np.var(differences)\n", "\n", "# Print the results\n", "print(variance_explicit, variance_np)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The standard deviation and the variance\n", "As mentioned in the video, the standard deviation is the square root of the variance. You will see this for yourself by computing the standard deviation using ```np.std()``` and comparing it to what you get by computing the variance with ```np.var()``` and then computing the square root." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4651881339845204\n", "0.4651881339845204\n" ] } ], "source": [ "# Compute the variance: variance\n", "variance = np.var(versicolor_petal_length)\n", "\n", "# Print the square root of the variance\n", "print(np.sqrt(variance))\n", "\n", "# Print the standard deviation\n", "print(np.std(versicolor_petal_length))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Covariance and the Pearson correlation coefficient\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ covariance = \\dfrac{1}{n}\\sum^{n}_{i=1}(x_i - \\bar{x})(y_i - \\bar{y})$$\n", "$$ \\begin{align} \\rho &= \\text{Pearson correlation} = \\dfrac{\\text{covariance}}{(\\text{std of x})(\\text{std of y})} \\\\ &= \\dfrac{\\text{variability due to codependence}}{\\text{independent variability}} \\end{align}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scatter plots\n", "When you made bee swarm plots, box plots, and ECDF plots in previous exercises, you compared the petal lengths of different species of iris. But what if you want to compare two properties of a single species? This is exactly what we will do in this exercise. We will make a scatter plot of the petal length and width measurements of Anderson's Iris versicolor flowers. If the flower scales (that is, it preserves its proportion as it grows), we would expect the length and width to be correlated.\n", "\n", "For your reference, the code used to produce the scatter plot in the video is provided below:\n", "```python\n", "_ = plt.plot(total_votes/1000, dem_share, marker='.', linestyle='none')\n", "_ = plt.xlabel('total votes (thousands)')\n", "_ = plt.ylabel('percent of vote for Obama')\n", "```" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEMCAYAAAAxoErWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3df1wUdf4H8NfuIiZKgLgisFImydWJQqLJpWaLv05UyCI1PO6BCqkpPlJRz86vv+g6sLMzM607UyvSsh8WD0jxR1xi5q/D4Ewg/HGeK6igkLie4O58/zA2OZ3dWWBnFvb1fDx8PFh2PjPvee+4r52ZZUYlCIIAIiKie1ArXQARETkvhgQREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEoN6ULaGlXr16H2Wz/n374+nZCVVWtAypqO9gj69gf29gj65Toj1qtgo9PR9Hn21xImM1Ck0KiYSxZxx5Zx/7Yxh5Z52z94eEmIiISxZAgIiJRDAkiIhIlS0ikp6dDr9cjJCQEpaWl95ymqqoKycnJGDt2LEaNGoVly5bh1q1bcpRHREQiZAmJqKgoZGZmIjAwUHSaDRs2oGfPnsjKykJWVhZOnDiB3NxcOcojIiIRsny7KSIiwuY0KpUK169fh9lsRl1dHerr6+Hn5ydDdUREjZUZalBy7ipCgnwQHOildDmKcppzEjNnzsSZM2cwaNAgy79+/fopXRYRuZgyQw1WbS3AZ9+cxqqtBSgz1ChdkqKc5u8kdu7ciZCQEGzZsgXXr19HUlISdu7ciVGjRtk1H1/fTk2uQav1bPJYV8EeWcf+2ObsPcorLIfJZIYgACaTGeerjIgM08m2fGfrj9OExAcffIA//elPUKvV8PT0hF6vx6FDh+wOiaqq2ib9MYpW64nLl6/ZPc6VsEfWsT+2tYYe6Xw9oNGoAZMZGo0aOl8P2WpWoj9qtcrqh2unCQmdTodvvvkGffr0QV1dHQ4ePIjhw4crXRYRuZjgQC+kTgrnOYmfyXJOIi0tDUOGDEFFRQUSExMRHR0NAEhKSkJRUREAYPHixTh27BjGjh2L2NhYPPjgg3juuefkKI+IqJHgQC9ERz7o8gEBACpBEJzrQiHNxMNNjsMeWcf+2MYeWeeMh5uc5ttNRETkfBgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCRKltuXpqenY9euXTAYDMjKykKvXr3ummbBggUoKSmxPC4pKcG6desQFRUlR4lERHQPsoREVFQUEhISEB8fLzpNRkaG5efi4mL8/ve/x+DBg+Uoj4hIcWWGGuQVlkPn6+FUt02VJSQiIiLsmv6TTz7B2LFj4e7u7qCKiIicR5mhBqu2FsBkMkOjUSN1UrjTBIUsIWGPuro6ZGVlYfPmzU0ab+1erbZotZ5NHusq2CPr2B/b2KO75RWWw2QywywAMJlxvsqIyDCd0mUBcMKQ2LNnDwICAvDII480aXxVVS3MZsHucbxBu23skXXsj23s0b3pfD2g0aiBn/ckdL4esvVJrVZZ/XDtdCHx6aef4plnnlG6DCIi2QQHeiF1UjjOVxmd7pyEU30FtqKiAseOHcOYMWOULoWISFbBgV6Ii+rlVAEByBQSaWlpGDJkCCoqKpCYmIjo6GgAQFJSEoqKiizTff7553jqqafg7e0tR1lERGSDShAE+w/gOzGek3Ac9sg69sc29sg6Jfpj65yEUx1uIiIi58KQICIiUQwJIiISxZAgIiJRDAkiIhLFkCAiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRDAkiIhLFkCAiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRsoREeno69Ho9QkJCUFpaKjpdTk4Oxo4dizFjxmDs2LGorKyUozwiIhLhJsdCoqKikJCQgPj4eNFpioqK8Oabb2LLli3QarW4du0a3N3d5SiPiGRQZqhBXmE5dL4est3HucxQg5JzVxES5ON0945uSY5cT1lCIiIiwuY0mzdvxpQpU6DVagEAnp6eji6LiGRSZqjBqq0FMJnM0GjUSJ0U7vA37YZl3jKZ4SbTMpXg6PWUJSSkOHXqFHQ6HeLj42E0GjF8+HDMmDEDKpXKrvlYu1erLVotg8kW9sg69ufe8grLYTKZYRYAmMw4X2VEZJhOlmUKAmCSaZktwd5tyNHr6TQhYTKZUFJSgk2bNqGurg7Tpk1DQEAAYmNj7ZpPVVUtzGbB7uXzBu22sUfWsT/idL4e0GjUwM97EjpfD4f3SollNldTtqHmrqdarbL64dppQiIgIACjRo2Cu7s73N3dERUVhcLCQrtDgoicT3CgF1InheN8lVG2cxINy2zr5yQcvZ5OExJjxozBP/7xD8TExODWrVv47rvvMHLkSKXLIqIWEhzohcgwnayf5oMDvdpsONzJkespy1dg09LSMGTIEFRUVCAxMRHR0dEAgKSkJBQVFQEAoqOj4evri9GjRyM2NhbBwcF49tln5SiPiIhEqARBsP8AvhPjOQnHYY+sY39sY4+sU6I/ts5J8C+uiYhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIREm6VHh1dTXeffddnDx5EkajsdFzmZmZDimMiIiUJykk5s2bh7q6Ovz2t79Fhw4dHF0TERE5CUkhUVBQgO+++w7u7u6OroeIiJyIpHMSISEhqKioaNaC0tPTodfrERISgtLS0ntOs3btWkRGRiImJgYxMTFYvnx5s5ZJRETNI7on8cknn1h+HjhwIKZNm4bx48ejS5cujaaTeve4qKgoJCQkID4+3up0sbGxWLhwoaR5EhGRY4mGxBdffNHosZ+fHw4cONDodyqVSnJIRERENKE8IhJTZqhBybmrCAnyaRX3cf7LtgL8aKjBw4FemDcxXPK45qynEj3KO27AsZJL6BfSFUPDAmVZpiOJhsT7778vZx0W2dnZyM/Ph1arxezZsxEeLn1jInIVZYYarNpagFsmM9w0aqROCnfqoPjLtgKcOHsVAHDi7FX8ZVuBpKBoznoq0aO84wa8t7MEAHDizO31be1BIenEdWxsLHbs2HHX78ePH4/PPvusxYqZOHEipk+fjnbt2uHAgQOYOXMmcnJy4OPjI3ke1u7VaotW69nksa6CPbJOrv7kFZbDZDJDEACTyYzzVUZEhulkWXZT/GioueuxlF41Zz2V6FHR6St3PY4b/iu75uFs/8ckhcS///3vu34nCALOnz/fosVotVrLz0888QT8/f3x448/YsCAAZLnUVVVC7NZaMKyeYN2W9gj6+Tsj87XAxqNGjCZodGoofP1cOrX5uFAL8ueRMNjKfU2Zz2V6FHoQ51RUHq50WN7lqnE/zG1WmX1w7XVkFiwYAEAoL6+3vJzA4PBgODg4BYo8RcXL16En58fAODkyZMwGAzo0aNHiy6DqC0IDvRC6qTwVnNOYt7E8Cadk2jOeirRo4ZDSy5xTgIAgoKC7vkzADz22GMYNWqU5AWlpaUhNzcXlZWVSExMhLe3N7Kzs5GUlISUlBSEhoZi9erVOHHiBNRqNdq1a4eMjIxGexdE9IvgQC+nD4c7zZsY3qRPys1ZTyV6NDQssE2EQwOVIAg2j83s378fgwcPlqOeZuPhJsdhj6xjf2xjj6xrVYebDh48+MtEbm6NHt8pMjKyGeUREZEzEw2Jl19+2fKzSqXCxYsXAQDe3t6orq4GcPtvJ/bu3evgEomISCmiIbFv3z7Lzxs2bEB1dTXmzJmDDh064MaNG3jjjTfg7e0tS5FERKQMSddu2rx5M+bNm2e5AmyHDh0wd+5cbNq0yaHFERGRsiSFhIeHBwoLCxv9rqioiJcNJyJq4yT9MV1KSgqmTZsGvV6Pbt26oaKiAl9//TX+7//+z9H1ERGRgiRflqN3797YtWsXLl26hB49emDGjBkt/sd0RETkXCSFBAAEBwczFIiIXIxoSCxZsgQrV64EAKSmpkKlUt1zuoyMDMdURkREihMNCZ3ul6slPvDAA7IUQ0REzkU0JF544QXLz7NmzZKlGCIici6SvgI7a9YsbNmyBSdPnnR0PURE5EQknbh+8skncfToUWzZsgW1tbV47LHHMGDAAERERKBPnz6OrpGIiBQiKSTi4uIQFxcH4PZ9JD7++GOsW7cORqORexdERG2YpJA4deoUjhw5giNHjuDYsWPo0qULJkyYYNcd44iIqPWRFBLR0dEICgpCcnIyVq5cCQ8PD0fXRURETkDSiev09HQMHDgQ7777LsaPH48lS5bgyy+/RHl5ueQFpaenQ6/XIyQkBKWlpVanPX36NPr27Yv09HTJ8yciopYnaU8iJiYGMTExAIDKykq8//77WL58uV3nJKKiopCQkID4+Hir05lMJixduhTDhg2TNF8iInIcSSHxww8/4PDhwzh8+DCOHTuG9u3bY+jQoXadk4iIiJA03TvvvIOhQ4fCaDTCaDRKnj9RSygz1KDk3FWEBPm0qvtHyyXvuAHHSi6hX0hX2e7j3JzXpKn1cjv4haSQmDVrFgYMGAC9Xo9FixYhKCjIIcUUFxcjPz8f7733Ht566y2HLINITJmhBqu2FuCWyQw3jRqpk8Jd/g3iTnnHDXhvZwkA4MSZqwDg8KBozmvS1Hq5HTQmKSTuvEudo9TX12PJkiV49dVXodFomjwfazf0tkWr9WzyWFfRlnuUV1gOk8kMQQBMJjPOVxkRGaazPfAObbk/Raev3PU4bviv7J6PPT1qzmvS1HpbYjtoDmfbhiRfBdbRLl++jHPnziE5ORkA8NNPP0EQBNTW1louNChFVVUtzGbB7uVrtZ64fPma3eNcSVvvkc7XAxqNGjCZodGoofP1sGt923p/Qh/qjILSy40e27u+9vaoOa9JU+tt7nbQHEpsQ2q1yuqHa6cJiYCAABw6dMjyeO3atTAajVi4cKGCVZErCQ70QuqkcB6LFtFwqEbOcxLNeU2aWi+3g8ZkC4m0tDTk5uaisrISiYmJ8Pb2RnZ2NpKSkpCSkoLQ0FC5SiESFRzo5fJvCtYMDQuU7YR1g+a8Jk2tl9vBL1SCINh/bMaJ8XCT47BH1rE/trFH1rWqw01r1qyRtIA5c+bYXxUREbUKoiFRUVEhZx1EROSEREPi1VdflbMOIiJyQnaduK6trcXVq1cb/a579+4tWhARETkPSSFRVlaG+fPno7i4GCqVCoIgQKVSAQDvJ0FE1IZJugrs8uXL8fjjj+Pw4cPo1KkTjhw5ggkTJuDPf/6zo+sjIiIFSQqJ4uJizJ8/H/fffz8EQYCnpycWLFgg+RtQRETUOkkKifbt2+PWrVsAAB8fH1y4cAFmsxnV1dUOLY6IiJQl6ZxEv3798NVXX2H8+PEYOXIkkpKS4O7ujoEDBzq6PiIiUpCkkLjzsNLcuXPx8MMP4/r163j66acdVhgRESlP0uGmjRs3/jJArUZMTAyef/55bNu2zWGFERGR8iSFxLp16+75+/Xr17doMURE5FysHm46ePAgAMBsNuO7777DndcCPH/+PDp27OjY6oiISFFWQ+Lll18GANy8eROLFy+2/F6lUqFLly744x//6NjqiIhIUVZDouG2pQsWLEBGRoYsBRERkfOQdE4iIyMD9fX1OHr0KHJycgAARqMRRqPRocUREZGyJH0FtqSkBDNmzIC7uzsuXryI0aNH48iRI/j888/x17/+1eb49PR07Nq1CwaDAVlZWejVq9dd03z66afYvHkz1Go1zGYz4uLikJCQYP8aERFRi5G0J7Fs2TKkpKRg586dcHO7nSv9+/fHsWPHJC0kKioKmZmZCAwUv43gyJEj8eWXX+KLL77A1q1bsWnTJhQXF0uaPzmvMkMNsg+eRZmhRulSHGr712VIfnUPtn9dZvfYvOMG/OWjAuQdN8gyrjljm/N6lhlqsH1vqd1jXWUbclaSrwIbExMDAJarv3p4eODmzZuSFhIREWFzmk6dfrl93n//+1/U19dblkWtU5mhBqu2FuCWyQw3jRqpk8Lb5H2Dt39dhq8OnQMAlFdeBwDEPRUsaWzecQPe21kCADhx5vZl+KXck7mp45oztjmvZ8NYk8kMjR1jXWUbcmaSQiIwMBD/+te/EBoaavldYWEhgoKCWrSYvXv3YvXq1Th37hzmzZuHkJAQu+dh7V6ttmi1nk0e6yrs6VFeYTlMJjMEATCZzDhfZURkmM6B1Snj+Kmqux7PfC5c0tii01fuehw3/FcOG9ecsc15PRvGmgUAdox1lW3oTs72PiQpJObMmYMXXngBEydORH19Pd5++21s27YNK1eubNFioqKiEBUVhQsXLuDFF1/EkCFD8NBDD9k1j6qqWpjNgu0J/wdv0G6bvT3S+XpAo1EDP3961Pl6tMkeh/X0texBNDyWup6hD3VGQenlRo+ljG3quOaMbc7r2dSxrrINNVDifUitVln9cK1ZtmzZMlsz6dGjBwYOHIhjx45ZLheempoq6TDSnbZs2YIxY8bA19fX6nSenp744YcfUF1djfBwaZ/IGty4UQfB/oxAx47tYTTW2T/Qhdjbo87334dHHvCB1rsDxg3q0WYPE/y6R2fU1Ztw/eYtDA71l3yoCQAe7HY/vDq545bJjN8OfEDyIaOmjmvO2Oa8ng1jHwz0xm8HBEke6yrbUAMl3odUKhU8PNzFnxeEprylNo1er8eGDRvu+e2mU6dOoWfPngCAK1euYNKkSViyZAkGDRpk1zK4J+E47JF17I9t7JF1zrgnIelwU11dHdavX4/s7GxcunQJXbt2xejRozFjxgy0b9/e5vi0tDTk5uaisrISiYmJ8Pb2RnZ2NpKSkpCSkoLQ0FB89NFHOHDgANzc3CAIAiZPnmx3QBARUcuStCexePFinDlzBtOnT0dgYCAMBgPeeecdBAUF4dVXX5WjTsm4J+E47JF17I9t7JF1rXZPYu/evdi9ezfuv/9+AEBwcDD69u2LESNGtEyVRETklCT9MV2XLl1w48aNRr+7efMmtFqtQ4oiIiLnIGlPIiYmBtOmTcPvfvc7+Pn5oaKiApmZmYiJibFcThwAIiMjHVYoERHJT9I5Cb1eb3tGKhX27t3bIkU1B89JOA57ZB37Yxt7ZF2rPSfRcMlwIiJyLZLOSRARkWtiSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKNlCIj09HXq9HiEhISgtLb3nNOvWrUN0dDTGjRuH8ePHY//+/XKVR0RE9yDp2k0tISoqCgkJCYiPjxedpk+fPpgyZQo6dOiA4uJiTJ48Gfn5+bjvvvvkKpOIiO4g255EREQE/P39rU4zePBgdOjQAQAQEhICQRBQXV0tR3lEAIAyQw2yD55FmaHG7nHb95baPa45mlpra1tmc7S2ep2RbHsS9tqxYweCgoLQrVs3pUshF1FmqMGqrQW4ZTLDTaNG6qRwBAd6SR5nMpmhsWOcErW2tmU2R2ur11k5ZUgcPnwYa9aswbvvvmv3WGvXRbdFq/Vs8lhX0ZZ7lFdYDpPJDEEATCYzzlcZERmmkzzOLACwY5wStTrDMuXahpToUUtwtv9jThcSBQUFSE1NxVtvvYWHHnrI7vG86ZDjtPUe6Xw9oNGogZ/3CHS+HpLWt6njlKhV6WXKuQ0p0aPmcsabDkm6M11L0uv12LBhA3r16nXXc4WFhUhJScGaNWvQt2/fJs2fIeE4rtCjMkMNSs5dRUiQj12HJsoMNThfZYTO10O2QxpNrVXJZcq9DSnRo+Zw6ZBIS0tDbm4uKisr4ePjA29vb2RnZyMpKQkpKSkIDQ3FM888A4PBAD8/P8u4jIwMhISESF4OQ8Jx2CPr2B/b2CPrXDok5MKQcBz2yDr2xzb2yDpnDAn+xTUREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKFlCIj09HXq9HiEhISgtLb3nNPn5+Rg/fjx69+6N9PR0OcoiIiIbZAmJqKgoZGZmIjAwUHSa7t27Iy0tDVOnTpWjJCIikkCWkIiIiIC/v7/VaR544AE8+uijcHNzk6OkRsoMNdi+txRlhhrZl00tr8xQg+yDZ5v0ejZnLFFbJP87spMpM9Rg1dYCmExmaDRqpE4KR3Cgl9JlURM1vJ63TGa42fl6NmcsUVvV5kLC2g297yWvsBwmkxlmAYDJjPNVRkSG6RxTXBug1XoqXYJVDa+nIAAmO1/P5oxt4Oz9cQbskXXO1p82FxJVVbUwmwXJ0+t8PaDRqIGf9yR0vh64fPmaAytsvbRaT6fvTXNez+ZuC62hP0pjj6xToj9qtcrqh+s2FxL2Cg70QuqkcJyvMkLn68HDC61cw+tZcu4qQoJ87Ho9mzOWqK1SCYIg/WN3E6WlpSE3NxeVlZXw8fGBt7c3srOzkZSUhJSUFISGhuLo0aOYO3cuamtrIQgCPD098corr2Dw4MF2LcvePYkG/IRjG3tkHftjG3tknTPuScgSEnJiSDgOe2Qd+2Mbe2SdM4YE/+KaiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISJQsIZGeng69Xo+QkBCUlpbecxqTyYTly5dj2LBhGD58OLZv3y5HaUREZIUs97iOiopCQkIC4uPjRafJysrCuXPnkJubi+rqasTGxiIyMhI6nU6OEluVvOMGHCu5hH4hXTE0LFC2ZRadvoLQhzrLtswyQ02rud90maEGeYXlTbpPemtaT3I9soRERESEzWlycnIQFxcHtVqNzp07Y9iwYdi5cyemTZsmQ4WtR95xA97bWQIAOHHmKgA4/E37zmUWlF6WZZllhhqs2lqAWyYz3DRqpE4Kd9o30IZaTSYzNHbW2prWk1yTLCEhRXl5OQICAiyP/f39UVFRYfd8rN2r1Rat1rPJY+VSdPrKXY/jhv+qzS0zr7AcJpMZggCYTGacrzIiMsw59yobajULAOystTWtZ0tpDf/PlORs/XGakGgpVVW1MJsFu8e1lhu0hz7U2fJpvuGxo+tWYpk6Xw9oNGrg50/nOl8Pp319mlNra1rPltBa/p8pRYn+qNUqqx+unSYk/P39ceHCBfTp0wfA3XsWdFvDYR45z0k0LEPOcxLBgV5InRTeKo7VN9R6vspo9zmJ1rSe5JqcJiRGjRqF7du3Y8SIEaiursaePXuQmZmpdFlOaWhYoGwnj+9cZtzwX8n6KSc40KvVvGkGB3ohMkzXpP60pvUk1yPLV2DT0tIwZMgQVFRUIDExEdHR0QCApKQkFBUVAQBiYmKg0+kwYsQIPPfcc3jxxRfRvXt3OcojIiIRKkEQ7D+A78Ta+jkJJbFH1rE/trFH1jnjOQn+xTUREYliSBARkSiGBBERiXKabze1FLVapchYV8EeWcf+2MYeWSd3f2wtr82duCYiopbDw01ERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJKrNXZbDmpkzZ+L8+fNQq9Xw8PDAkiVL8MgjjzSaxmQyIS0tDfv374dKpUJycjLi4uIUqlh+Unq0du1afPjhh+jatSsA4LHHHsPSpUuVKFcxb775JtauXYusrCz06tWr0XM3btzAH/7wB5w4cQIajQYLFy7EU089pVClyrHWo0WLFuHbb7+Fj48PgNs3HZsxY4YSZcpOr9fD3d0d7du3BwDMnz8fgwcPbjSNM21DLhUS6enp8PS8fZPxPXv2YPHixfj8888bTZOVlYVz584hNzcX1dXViI2NRWRkJHS6tn1z+gZSegQAsbGxWLhwodzlOYUTJ07g+PHjorfX3bhxIzp27Ijdu3fj7NmziI+PR25uLjp27Chzpcqx1SMASE5OxuTJk2Wsynm88cYbdwXnnZxpG3Kpw00Nb34AUFtbC5Xq7gtb5eTkIC4uDmq1Gp07d8awYcOwc+dOOctUlJQeubK6ujqsWLECS5cuFe3NV199hYkTJwIAHnzwQfTu3RvffPONnGUqSkqPyDpn2oZcak8CAF5++WUcOHAAgiDg73//+13Pl5eXN/r04+/vj4qKCjlLVJytHgFAdnY28vPzodVqMXv2bISHh8tcpTLWrFmDcePGWb217oULFxAY+Ms9yF1tG5LSIwDYtGkTPvroI3Tv3h3z5s1Dz549ZapQefPnz4cgCOjXrx/mzp2L+++/v9HzzrQNudSeBAC88soryMvLw0svvYSMjAyly3FKtno0ceJE7N27F1lZWZg6dSpmzpyJq1evKlCpvAoKClBUVITnn39e6VKcltQevfTSS9i9ezeysrIwYsQITJs2DSaTSaYqlZWZmYkvv/wSn376KQRBwIoVK5QuySqXC4kGsbGxOHTo0F1vbv7+/rhw4YLlcXl5Obp16yZ3eU5BrEdarRbt2rUDADzxxBPw9/fHjz/+qESJsjpy5AhOnz6NqKgo6PV6VFRUYOrUqcjPz280XUBAAAwGg+WxK21DUnvk5+cHtfr2209sbCyMRqPL7G35+/sDANzd3fH888/jn//8513TONM25DIhcf36dZSXl1se79u3D15eXvD29m403ahRo7B9+3aYzWZcuXIFe/bswciRI+UuVxFSe3Tx4kXLzydPnoTBYECPHj1kq1MpycnJyM/Px759+7Bv3z5069YNGzduxKBBgxpNN2rUKHz00UcAgLNnz6KoqOiub6+0VVJ7dOc2tH//fqjVavj5+cldruyMRiOuXbsGABAEATk5OXd9exBwrm3IZc5J3LhxA3PmzMGNGzegVqvh5eWFDRs2QKVSISkpCSkpKQgNDUVMTAy+//57jBgxAgDw4osv2jy22lZI7dHq1Ys6S+0AAAU1SURBVKtx4sQJqNVqtGvXDhkZGdBqtUqXr6iYmBi888478PPzw9SpU7Fo0SIMHz4carUaK1asQKdOnZQuUXF39mjhwoWoqqqCSqVCp06dsH79eri5tf23o6qqKsyePRsmkwlmsxk9e/a0fH3cWbch3pmOiIhEuczhJiIish9DgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KoBSxatAivv/76PZ/77LPPMGnSJJkrus1aXURSMCSI7kGv1+Pbb79Vugy7KBlG1HYxJIiISBRDgtosvV6Pt99+G6NHj0b//v3xhz/8ATdv3rQ8//XXXyMmJgYRERGYOHEiiouLAQCpqam4cOECpk+fjvDwcPztb38DAKSkpOCJJ55Av379EB8f3+TrVZ06dQqJiYkYMGAARo4ciZycHMtzixYtwvLly5GcnIzw8HDExcXh3Llzlufz8/MxcuRI9OvXD8uWLcPkyZOxfft2nDp1CkuXLsXx48cRHh6OiIgIy5iffvpJdH5EtjAkqE3LysrCxo0bsXv3bpw5cwZvvfUWgNs3xVm8eDFWrFiBQ4cOYcKECZg5cybq6uqwatUqBAQEYMOGDSgoKEBSUhIAYMiQIdi1axcOHjyIRx99FPPnz7e7HqPRiClTpmDMmDH49ttvsXr1aixfvrxR4GRnZ2PWrFk4cuQIgoKCLOcUrly5gpSUFMybNw+HDh1Cjx49UFBQAADo2bMnli9fjrCwMBQUFODo0aM250ckBUOC2rT4+Hj4+/vD29sbM2bMQHZ2NgDg448/xoQJE9C3b19oNBo8/fTTaNeuHY4fPy46r2effRadOnWCu7s7Zs+ejeLiYsvF2qTKy8tDYGAgnnnmGbi5ueHXv/41Ro4ciV27dlmmGT58OPr06QM3NzeMGzcOJ0+eBAB88803ePjhhzFixAi4ubkhISEBXbp0sblMsfkRSdH2r6hFLq3hsszA7csvX7p0CcDtm7rs2LEDH3zwgeX5+vp6y/P/y2Qy4fXXX8fOnTtx5coVy2Wur1692uhufrYYDAYUFhY2OhxkMpkwbtw4y+M73/jvu+8+GI1GAMClS5caXS5apVJJuny02PyIpGBIUJt256XPL1y4gK5duwK4HR7Tp0/HjBkzJM0nKysLe/fuxaZNm6DT6XDt2jX0798f9l4f09/fH/3798emTZvsGgfcvo/HnZfYFgSh0T0YeKtQcgQebqI27cMPP0RFRQWqq6stJ7EBIC4uDtu2bcP3338PQRBgNBqRl5eH2tpaALc/ff/nP/+xzOf69etwd3eHj48Pbty4gdWrVzepnqFDh+Ls2bPYsWMH6uvrUV9fj8LCQpw6dcrm2CeffBIlJSXYs2cPbt26hczMTFRWVlqe9/X1xcWLF1FXV9ek2ojuhSFBbdqYMWMwZcoUDBs2DN27d7fsOYSGhmLlypVYsWIF+vfvjxEjRuCzzz6zjEtOTsb69esRERGBjRs3IjY2FgEBARg8eDCio6MRFhbWpHo6deqEjRs3IicnB4MHD8agQYPw2muvSXpj79y5M9asWYNVq1bh8ccfR1lZGXr37m25S+DAgQMRHByMQYMG4fHHH29SfUT/i/eToDZLr9cjLS0Nv/nNb5QuxSHMZjOGDBmC1157DQMHDlS6HGqjuCdB1Irs378fP/30E+rq6rBhwwYAaPJeDZEUPHFN1IocP34c8+fPR11dHYKDg7Fu3Trcd999SpdFbRgPNxERkSgebiIiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRDAkiIhL1/2Qbv3QWHoR0AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Make a scatter plot\n", "_ = plt.plot(versicolor_petal_length, versicolor_petal_width, marker='.', linestyle='none')\n", "\n", "# Label the axes\n", "_ = plt.xlabel('petal length')\n", "_ = plt.ylabel('petal width')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the covariance\n", "The covariance may be computed using the Numpy function ```np.cov()```. For example, we have two sets of data ```x``` and ```y```, ```np.cov(x, y)``` returns a 2D array where entries ```[0,1]``` and ```[1,0]``` are the covariances. Entry ```[0,0]``` is the variance of the data in x, and entry ```[1,1]``` is the variance of the data in y. This 2D output array is called the covariance matrix, since it organizes the self- and covariance." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.22081633 0.07310204]\n", " [0.07310204 0.03910612]]\n", "0.07310204081632653\n" ] } ], "source": [ "# Compute the covariance matrix: covariance_matrix\n", "covariance_matrix = np.cov(versicolor_petal_length, versicolor_petal_width)\n", "\n", "# Print covariance matrix\n", "print(covariance_matrix)\n", "\n", "# Extract covariance of length and width of petals: petal_cov\n", "petal_cov = covariance_matrix[0, 1]\n", "\n", "# Print the length/width covariance\n", "print(petal_cov)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the Pearson correlation coefficient\n", "In this exercise, you will write a function, ```pearson_r(x, y)``` that takes in two arrays and returns the Pearson correlation coefficient. You will then use this function to compute it for the petal lengths and widths of I. versicolor." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7866680885228169\n" ] } ], "source": [ "def pearson_r(x, y):\n", " \"\"\"Compute Pearson correlation coefficient between two arrays\n", " \n", " Args:\n", " x: arrays\n", " y: arrays\n", " \n", " returns:\n", " r: int\n", " \"\"\"\n", " # Compute correlation matrix: corr_mat\n", " corr_mat = np.corrcoef(x, y)\n", " \n", " # Return entry[0, 1]\n", " return corr_mat[0, 1]\n", "\n", "# Compute Pearson correlation coefficient for I. versicolor: r\n", "r = pearson_r(versicolor_petal_length, versicolor_petal_width)\n", "\n", "# Print the result\n", "print(r)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }