{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Quantitative exploratory data analysis\n", "> A Summary of lecture \"Statistical Thinking in Python (Part 1)\", via datacamp\n", "\n", "- toc: true \n", "- badges: true\n", "- comments: true\n", "- author: Chanseok Kang\n", "- categories: [Python, Datacamp, Data_Science, Statistics]\n", "- image: images/petal-ecdf.png" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "\n", "sns.set()\n", "\n", "df = pd.read_csv('./dataset/iris.csv')\n", "renamed_columns = ['sepal length (cm)', 'sepal width (cm)', \n", " 'petal length (cm)', 'petal width (cm)', 'species']\n", "df.columns = renamed_columns\n", "versicolor_petal_length = df[df['species'] == 'Versicolor']['petal length (cm)']\n", "setosa_petal_length = df[df['species'] == 'Setosa']['petal length (cm)']\n", "virginica_petal_length = df[df['species'] == 'Virginica']['petal length (cm)']\n", "versicolor_petal_width = df[df['species'] == 'Versicolor']['petal width (cm)']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction to summary statistics: The sample mean and median\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing means\n", "The mean of all measurements gives an indication of the typical magnitude of a measurement. It is computed using ```np.mean()```." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I. versicolor: 4.26 cm\n" ] } ], "source": [ "# compute the mean: mean_length_vers\n", "mean_length_vers = np.mean(versicolor_petal_length)\n", "\n", "# Print the result with some nice formatting\n", "print('I. versicolor:', mean_length_vers, 'cm')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Percentiles, outliers, and box plots\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Compute percentiles\n", "In this exercise, you will compute the percentiles of petal length of Iris versicolor.\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[3.3 4. 4.35 4.6 4.9775]\n" ] } ], "source": [ "# Specify array of percentiles: percentiles\n", "percentiles = np.array([2.5, 25, 50, 75, 97.5])\n", "\n", "# Compute percentiles: ptiles_vers\n", "ptiles_vers = np.percentile(versicolor_petal_length, percentiles)\n", "\n", "# Print the result\n", "print(ptiles_vers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comparing percentiles to ECDF\n", "To see how the percentiles relate to the ECDF, you will plot the percentiles of Iris versicolor petal lengths you calculated in the last exercise on the ECDF plot you generated in chapter 1. \n", "\n", " Note that to ensure the Y-axis of the ECDF plot remains between 0 and 1, you will need to rescale the percentiles array accordingly - in this case, dividing it by 100." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def ecdf(data):\n", " \"\"\"Compute ECDF for a one-dimensional array of measurements.\"\"\"\n", " # Number of data points: n\n", " n = len(data)\n", "\n", " # x-data for the ECDF: x\n", " x = np.sort(data)\n", "\n", " # y-data for the ECDF: y\n", " y = np.arange(1, n + 1) / n\n", "\n", " return x, y" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x_vers, y_vers = ecdf(versicolor_petal_length)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYQAAAEJCAYAAACUk1DVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3de1jUZdoH8O9wkEMSKs6AQlami1aQZLuCcdFqBchBPL6b+YatxRqWXLHXellq2lIaHgqFiiurV9fEVqmV1ArxcBkp2IZ1hZlKiGaiwgQ2Ag46zDzvHzoTEzAwML85fj9/7W+eOdx7O3Hze26e55EJIQSIiMjludk6ACIisg8sCEREBIAFgYiIbmJBICIiACwIRER0EwsCEREBYEEgIqKbPGwdQF9cvtwCnc78ZRQBAf3R0NAsQUTOgzkyjfnpHnNkmi3y4+Ymw8CBt3Q57tAFQacTvSoI+teSacyRacxP95gj0+wtP5wyIiIiACwIRER0EwsCEREBsEJBaG5uRlJSEs6fP99h7MSJE5g2bRri4uKwZMkStLW1SR0OERF1QdKC8N1332HWrFk4e/Zsp+MLFy7EsmXLsGfPHgghsH37dinDISIiEyQtCNu3b8fy5cuhUCg6jNXW1qK1tRVjxowBAEybNg3FxcVShkNE5PCqa1X4tPwsqmtVFn9vSf/sdMWKFV2O1dfXQy6XG67lcjnq6uqkDIeIyKFV16qw5sNv0abVwcPdDQtnRWBEsL/F3t9m6xB0Oh1kMpnhWghhdN0TAQH9e/35crlfr1/rKpgj05if7jFHppmbn4OVF6HV6iAEoNXqcL7hKqLGhFgsHpsVhKCgICiVSsP1L7/80unUkikNDc29Wtghl/tBqWwy+3WuhDkyjfnpHnNkzPNQKfwy0tGUmw9NdEyv8hMS4At3dzdAq4O7uxtCAnzNeg83N5nJX6RtVhCCg4Ph5eWFo0ePYuzYsfjkk08QExNjq3CIiCTjeagU/rNnQqZWw3/2TPywfhMODh+DkABfs6Z8RgT7Y+GsCJw6dxmhwwZadLoIsME6hLS0NBw7dgwAsHbtWrz22muIj4/H1atXkZqaau1wiIgk1b4YAIBMrcZd81Px3TvbsebDb81uDo8I9kdi1B0WLwYAIBNC2NdmGmbglJF0mCPTmJ/uMUcdi0F7rR5eeHXqUoz43xQkRt1hlXi6mzLiSmUiIon4ZaR3WgwAwLvtGjL25CJ02EArR9U1FgQiol7qbk1AU24+hI9Pp2NtXt5Qrn1Tkqmf3nLo7a+JiGylJ2sCNNExUBUUdpg2Ej4+8Pj0UyjufcDaYZvEOwQiol44de4y2tqtCTh17nKnz9MXBf2dgvDxgaqgEJgwwZrh9ggLAhFRL4QOGwgPdze4yQB3dzeTvQB9UdCG3AZVQSE00fb5J/acMiIi6oURwf6Y9chIHD1Vj7Ghim57AZroGDR+c9xK0fUOCwIRUS9U16rw4b4f0abVoepnFULk/e2qQdwbnDIiIuqFnvYQHAkLAhFRL5jTQ3AUnDIiIpdWXavq1d5AUu8rZAssCETksvp6vsCIYH+nKAR6nDIiIpfljH2AvmBBICKX5Yx9gL7glBERuSxz1xI4OxYEInJZzriWoC84ZURELos9BGMsCETksthDMMYpIyJyCr1ZT+CMawn6ggWBiBxeX9YTONtagr7glBEROTz2AiyDBYGIHB57AZbBKSMicnhcT2AZLAhE5PC4nsAyOGVERA6PPQTLYEEgIofHHoJlcMqIiBweewiWwYJARA6PPQTL4JQRETk89hAsgwWBiBweewiWwSkjInJ47CFYBgsCETk89hAsQ9Ipo127diEhIQGxsbEoKCjoMH78+HFMnz4dkydPxrx583DlyhUpwyEiJ8UegmVIVhDq6uqQk5ODrVu3oqioCNu2bUN1dbXRc1asWIGMjAzs3LkTd955J95//32pwiEiJ8YegmVIVhDKysoQGRmJAQMGwNfXF3FxcSguLjZ6jk6nQ0tLCwBArVbD29tbqnCIyInpewij7xiIWY+M5HRRL0nWQ6ivr4dcLjdcKxQKVFZWGj3nhRdewNy5c7Fy5Ur4+Phg+/btUoVDRE6MPQTLkKwg6HQ6yGQyw7UQwui6tbUVS5YswaZNmxAeHo6NGzdi0aJF2LBhQ48/IyCgf6/jk8v9ev1aV8Ecmcb8dM9aOTpYeRHadj2E8w1XETUmxCqf3Rf29h2SrCAEBQWhoqLCcK1UKqFQKAzXVVVV8PLyQnh4OADgL3/5C9avX2/WZzQ0NEOnE2bHJpf7QalsMvt1roQ5Mo356Z41cxQS4At3dzdAq4O7uxtCAnzt/t/HFt8hNzeZyV+kJeshjB8/HuXl5WhsbIRarUZJSQliYmIM47fffjsuXbqEmpoaAMD+/fsRFhYmVThE5ACqa1X4tPwsqmtVZr1Ofzby1JjhZh2fScYku0MIDAxEZmYmUlNTodFoMGPGDISHhyMtLQ0ZGRkICwvDa6+9hueffx5CCAQEBGDlypVShUNEdq4v5yIDPBvZEmRCCPPnXOwEp4ykwxyZxvx0z9wcfVp+Fv8prYEQgJsMmBozHIlRd0gWn6251JQREZE5uJbA9lgQiMgujAj2R8bgRmz6v3nIGNzI6R8bYEEgIrtQX/Q5IhfPw8DLdYhcPA/1RZ/bOiSXw4JARDbneagUoQvmwLvtGgDAu+0aQhfMgeehUhtH5lpYEIjIpjwPlcJ/9kx4XGs1etzjWiv8Z89kUbAiFgQisim/jHTI1OpOx2RqNfwy0q0cketiQSAim2rKzYfw8el0TPj4oCk338oRuS4WBCKyKU10DFQFhWjzMt7tuM3LG6qCQmiiY7p4JVkaCwIR2ZwmOgan8v6FVg8vAECrhxdO5f2LxcDKWBCIyOKqa1Uo3F9l1p5EiimTcPrtzWiWD8HptzdDMWWShBFSZ3imMhFZlH5PIu3NnUfN2ZNIMWUS1FMmQdH9U0kCvEMgIovSn2+s4/nGDocFgYgsinsSOS5OGRGRRenPNz5W04iw4YO4J5EDYUEgIovSn2+s1erwfU0Dzzd2IJwyIiKLYg/BcbEgEJFFsYfguDhlREQWxR6C42JBICKLYg/BcXHKiIgsij0Ex8WCQEQWxR6C4+KUERFZFHsIjosFgYgsij0Ex8UpIyKyKPYQHBcLAhFZFHsIjotTRkRkUewhOC4WBCKyKPYQHBenjIjIothDcFwsCERkUewhOC5OGRGRRbGH4LhYEIjIothDcFycMiIii2IPwXFJWhB27dqFhIQExMbGoqCgoMN4TU0NnnjiCUyePBlPPfUUVCqVlOEQkRWwh+C4JCsIdXV1yMnJwdatW1FUVIRt27ahurraMC6EQHp6OtLS0rBz506MHj0aGzZskCocIrISfQ/hvpFyzHpkJKeLHIhkPYSysjJERkZiwIABAIC4uDgUFxfjueeeAwAcP34cvr6+iImJAQA888wzuHLlilThEJGVsIfguCQrCPX19ZDL5YZrhUKByspKw/W5c+cwePBgLF68GCdOnMDw4cPx0ksvmfUZAQH9ex2fXO7X69e6CubINOancwcrL0J7s4cArQ7nG64iakyIrcOyS/b2HZKsIOh0OshkMsO1EMLouq2tDf/973+xZcsWhIWFYd26dcjOzkZ2dnaPP6OhoRk6nTA7NrncD0plk9mvcyXMkWnMT9dCAnzh7u4GaHVwd3dDSIAvc9UJW3yH3NxkJn+RlqwgBAUFoaKiwnCtVCqhUCgM13K5HLfffjvCwsIAAElJScjIyJAqHCIyU3WtCqfOXUbosIFmTfmMCPbHwlkRON9wFSEBvpwuciCSNZXHjx+P8vJyNDY2Qq1Wo6SkxNAvAICIiAg0Njbi5MmTAIADBw7gnnvukSocIjJDda0Kaz78Fv8prcGaD79Fda15fwE4ItgfMx/+A4uBg5HsDiEwMBCZmZlITU2FRqPBjBkzEB4ejrS0NGRkZCAsLAxvvfUWli5dCrVajaCgIKxevVqqcIjIDPq1BKLdWgL+cHd+MiFEl5PwU6dOxY4dOwAApaWlRr/h2wP2EKTDHJnm7PnR3yFob/YBFs6KMLsgOHuO+soeewgmp4za14qcnBzLRUVEdk2/lmD0HQO5lsCFmJwy+v1fCRGRa9CvJWjT6lD1s4prCVxEj5vK7YsDETm3znoI5PxM3iFcuXIFe/fuhRACTU1NKCkpMRqPjY2VNDgisg39fkT6HgL3I3INJgvC0KFDsXnzZgDAkCFD8MEHHxjGZDIZCwKRk9L3EI6eqsfYUAWni1yEyYLQvgAQketgD8E1dbsOoaWlBbt370ZVVRW8vb0RGhqK+Ph49OvXzxrxEZENcB2CazLZVP7pp5+QmJiIkpISeHl5AQA++ugjxMfHo7a21ioBEpH18UwD12TyDiE3NxeZmZlISUkxerywsBBr167l2gQiJ6Xfj6g3exmR4zJ5h1BVVdWhGADAzJkzcebMGcmCIiIi6zN5h+Du7t7lGNclEDkv/dYVbVodPHq5dQU5HpN3CPyhT+TcPA+VYtD998DzUKnR41yY5ppM3iFcunQJr776aqdjdXV1kgRERNbheagU/rNnQqZWw3/2TKgKCqGJvrGBJRemuSaTBWH27Nldjj3++OMWD4aILK+zg27aFwMAHYoCm8quyeT21525fv263axB4PbX0mGOTHOU/HTWCxh95jujYtCe8PExulPoC0fJka043PbX169fx6JFi7B3717DYwsWLMCLL76ItrY2y0VJRJLorBfgl5HeaTEAbtwp+GWkWzlKshcmC0Jubi6am5tx//33Gx7LysqCSqVCXl6e5MERUd90tsCsKTcfwsen0+cLHx805eZbOUqyFyYLwsGDB/H6668jICDA8FhgYCBWr16Nffv2SR4cEfVNZwfdaKJjoCoo7FAULDldRI7JZEHw9PSEt7d3h8f79+9vN30EIuqafpO6H85exof7fkR1rQoAOhQFFgMCuikIbm5uaG5u7vB4c3MzewhEDsDUegJ9UdCG3MZiQAC6KQhJSUlYunQprl69anjs6tWrWLp0Kc9CIHIA3W1Sp4mOQeM3x1kMCEA36xDmzJmD5cuX48EHH8TIkSOh0+lw+vRpJCcn49lnn7VWjEQur7O1BD3B9QRkjh6tQ6itrcXx48fh5uaG8PBwKBQKa8TWLa5DkA5zZJo18+Oo+wrxO2SaPa5DMHmHcOHCBQwdOhTBwcEIDg42GistLUVMDG8ziaTGw2rIWkz2ENpPCy1YsMBojGchEFkHD6shazF5h9B+Nunnn3/ucoyIpMMD78laTBaE9ttf/34rbG6NTWQdPPCerMXklBHvAohsj2cTkLWYvEPQ6XRQqVQQQkCr1Rr+NwBotVqrBEjk6ng2AVmLyYJQVVWFyMhIQxEYN26cYYxTRkTWwR4CWYvJgnDy5ElrxUFEXWAPgazFZA+hr3bt2oWEhATExsaioKCgy+cdPHgQEydOlDIUIofFHgJZi8k7hL6oq6tDTk4O/vOf/6Bfv3547LHHMG7cOIwYMcLoeb/88gtWrVolVRhEDo89BLIWye4QysrKEBkZiQEDBsDX1xdxcXEoLi7u8LylS5fiueeekyoMIofX2ZkGRFKQ7A6hvr4ecrnccK1QKFBZWWn0nM2bN+Puu+/GfffdJ1UYRA6PPQSyFskKgk6nM/pLJCGE0XVVVRVKSkqwadMmXLp0qVefYWqTpu7I5X69fq2rYI5Ms1Z+DlZehLZdD+F8w1VEjQmxymf3Fb9DptlbfiQrCEFBQaioqDBcK5VKo11Si4uLoVQqMX36dGg0GtTX1+Pxxx/H1q1be/wZ3O1UOsyRadbMT0iAL9zd3YCbPYSQAF+H+Lfhd8g0e9zttEfbX/dGXV0dZs2ahY8++gg+Pj547LHH8MorryA8PLzDc8+fP4/U1FQcOHDArM9gQZAOc2Rab/PT23MNevs6W+J3yDR7LAiS3SEEBgYiMzMTqamp0Gg0mDFjBsLDw5GWloaMjAyEhYVJ9dFEdqkv5xqMCPZ3mEJAjkuyggAAycnJSE5ONnrs3Xff7fC8kJAQs+8OiBwNzzUgeyfpwjQi+g3PNSB7J+kdAhH9hnsSkb1jQSCyEq4nIHvHKSMiK+GeRGTvWBCIrIQ9BLJ3nDIishL2EMjesSAQWQl7CGTvOGVEZCXsIZC9Y0EgshL2EMjeccqIyEzVtSocrLyIkABfs6Z8RgT7Y+GsCIfbk4hcBwsCkRn0+xHpTy8zZz8igHsSkX3jlBGRGfR9AB37AOSEWBCIzMA+ADkzThkRmUG/luBYTSPChg/i9A85FRYEIjPo1xJotTp8X9PAtQTkVDhlRGQG9hDImbEgEJmBPQRyZpwyIjIDewjkzFgQiMzAHgI5M04ZEZmBPQRyZiwIRGZgD4GcGQsCkRlGBPsjY3AjNm18BhmDGzldRE6FBYHIDPVFnyNy8TwMbLyEyMXzUF/0ua1DIrIYFgSiHvI8VIrQBXPg3XYNAODddg2hC+bA81CpjSMjsgwWBKIe8DxUCv/ZM+FxrdXocY9rrfCfPZNFgZwCCwJRD/hlpEOmVnc6JlOr4ZeRbuWIiCyPBYGoB5py8yF8fDodEz4+aMrNt3JERJbHgkDUA5roGKgKCtHm5W30eJuXN1QFhdBEx9goMiLLYUEg6iFNdAxO5f0LrR5eAIBWDy+cyvsXiwE5DRYEclnVtSp8Wn4W1bWqHr9GMWUSTr+9GS2KoTj99mYopkySLkAiK+NeRuSS9Gcjt2l18DDzbGTFlEm4Je1/oFA2SRwlkXVJeoewa9cuJCQkIDY2FgUFBR3G9+3bh5SUFEyePBnz58+HStXz39SI+kK/J5HgnkREBpIVhLq6OuTk5GDr1q0oKirCtm3bUF1dbRhvbm7Gyy+/jA0bNmDnzp0IDQ1FXl6eVOEQGeGeREQdSVYQysrKEBkZiQEDBsDX1xdxcXEoLi42jGs0GixfvhyBgYEAgNDQUFy8eFGqcIiM6M81GH3HQMx6ZCT3JCKChAWhvr4ecrnccK1QKFBXV2e4HjhwIB599FEAQGtrKzZs2IBHHnlEqnCIjOjPNfjh7GV8uO9HsxrLRM5KsqayTqeDTCYzXAshjK71mpqa8Oyzz2LUqFGYOnWqWZ8RENC/1/HJ5X69fq2rcOYcHay8CG27HsL5hquIGhNi1ns4c34shTkyzd7yI1lBCAoKQkVFheFaqVRCoVAYPae+vh5PPfUUIiMjsXjxYrM/o6GhGTqdMPt1crkflPwLEZOcPUchAb5wd3cDtDq4u7shJMDXrP+/zp4fS2COTLNFftzcZCZ/kZZsymj8+PEoLy9HY2Mj1Go1SkpKEBPz2wIerVaLZ555BpMmTcKSJUs6vXsgkgp7CEQdSXaHEBgYiMzMTKSmpkKj0WDGjBkIDw9HWloaMjIycOnSJfzwww/QarXYs2cPAODee+/FihUrpAqJyEDfQ2jT6lD1s4pnIxNB4oVpycnJSE5ONnrs3XffBQCEhYXh5MmTUn48UZc6W4fAgkCujltXkEviOgSijrh1BTm06loVTp27jNBhA836DX9EsD8Wzoro1WuJnBULAjmsvuxHBNwoCiwERL/hlBE5LO5HRGRZLAjksNgHILIsThmRXehNL4B9ACLLYkEgm+tLL4B9ACLL4ZQR2Rx7AUT2gQWBbI69ACL7wCkjshiuCSBybCwIZBFcE0Dk+DhlRBbBPgCR42NBIItgH4DI8XHKiDqorlXhYOVFhAT4ck0AkQthQSAj+l6A9uZJYlwTQOQ6OGVERvS9AB17AUQuhwWBjLAXQOS6OGVERvRnDR+raUTY8EGcAiJyISwIZER/1rBWq8P3NQ08a5jIhXDKiIywh0DkulgQyAh7CESui1NGdq63+wP1ln49wfmGq2atQyAix8eCYMf6uj9Qb40I9kfUmBAolU2SfxYR2Q9OGdkx7g9ERNbEgmDHOJ9PRNbEKSM7pl8TcPRUPcaGKjifT0SS4h2CHauuVeHY+x/j70tm4Nj7H6O6VmXrkIjIibEg2DHV7j1Y/PErUFxRYvHHr0C1e4+tQyIiJ8aCYKc8D5Xi0RUL4N12DQDg3XYNj65YAM9DpTaOjIiclcsVhOpaFQr3V9n19IvnoVL4z54J91a10ePurWr4z57JokBEknCppnJf9vq3Jr+MdMjU6k7HZGo1/DLS0fjNcStHRUTOzqXuEBxln56m3HwIH59Ox4SPD5py860cERG5AkkLwq5du5CQkIDY2FgUFBR0GD9x4gSmTZuGuLg4LFmyBG1tbVKG4zB/16+JjoGqoLBDURA+PlAVFEITHWOjyIjImUlWEOrq6pCTk4OtW7eiqKgI27ZtQ3V1tdFzFi5ciGXLlmHPnj0QQmD79u1ShQPgt316/nfSaLudLtL7fVFgMSAiqUlWEMrKyhAZGYkBAwbA19cXcXFxKC4uNozX1taitbUVY8aMAQBMmzbNaFwqI4L9MfPhP9h1MdDTFwVtyG0sBkQkOcmayvX19ZDL5YZrhUKBysrKLsflcjnq6urM+oyAgP69jk8u9+v1a61qaiIw9RwG2OCjHSZHNsL8dI85Ms3e8iNZQdDpdJDJZIZrIYTRdXfjPdHQ0AydTpgdm1zux508u8Ecmcb8dI85Ms0W+XFzk5n8RVqyKaOgoCAolUrDtVKphEKh6HL8l19+MRonIiLrkqwgjB8/HuXl5WhsbIRarUZJSQliYn6bAw8ODoaXlxeOHj0KAPjkk0+MxomIyLokKwiBgYHIzMxEamoqpkyZgqSkJISHhyMtLQ3Hjh0DAKxduxavvfYa4uPjcfXqVaSmpkoVDhERdUMmhDB/Et5OXL7c0qseQkBAfzQ0NEsQkfNgjkxjfrrHHJlmi/y4uckwcOAtXY47dEEgIiLLcamtK4iIqGssCEREBIAFgYiIbmJBICIiACwIRER0EwsCEREBYEEgIqKbWBCIiAgACwIREd3k1AVh/fr1SEhIQGJiIjZu3Nhh3NpHeNqj7nL05ptvYsKECUhJSUFKSkqnR6G6glWrVuGFF17o8PiFCxcwe/ZsxMfHIz09HS0tLTaIzva6ys+OHTsQHR1t+P7k5OTYIDrbeuKJJ5CYmGjIwXfffWc0XlZWhuTkZMTGxto+P8JJffXVV+Kxxx4TGo1GqNVqMWHCBHH69Gmj5yQmJopvv/1WCCHEiy++KAoKCmwRqs30JEfz5s0T33zzjY0itA9lZWVi3LhxYtGiRR3G/va3v4ndu3cLIYR48803xerVq60dns2Zyk9WVpbYtWuXDaKyDzqdTkRHRwuNRtPpuFqtFg899JA4d+6c0Gg0Yu7cueLgwYNWjvI3TnuH8Kc//QmbN2+Gh4cHGhoaoNVq4evraxi31RGe9qS7HAHA999/j3feeQfJycnIysrCtWvXbBStbfz666/IycnBM88802FMo9Hg66+/RlxcHADX/A6Zyg8AHDt2DDt27EBycjL+8Y9/QKVSWTlC26qpqQEAzJ07F5MnT8aWLVuMxisrK3H77bfjtttug4eHB5KTk236HXLaggAAnp6eyM3NRWJiIqKiohAYGGgYs8QRns7AVI5aWlowevRoLFy4EDt27MCVK1fw9ttv2zBa61u2bBkyMzNx6623dhi7fPky+vfvDw+PGwcPuuJ3yFR+gBs5mT9/Pnbu3IkhQ4YgKyvLyhHa1pUrVxAVFYW33noLmzZtwr///W8cPnzYMN7ZUcO2/A45dUEAgIyMDJSXl+PixYvYvn274XFLHOHpLLrK0S233IJ3330Xd911Fzw8PDB37lx88cUXNozUugoLCzFkyBBERUV1Ot7Zd8aVvkPd5QcA3nrrLYwdOxYymQxPP/00vvzySytGaHsRERFYvXo1/Pz8MGjQIMyYMcPovyF7+znktAXh9OnTOHHiBADAx8cHsbGxOHXqlGGcR3h2n6MLFy7go48+MlwLIQy/DbuCzz77DIcPH0ZKSgpyc3Nx4MABrFy50jA+aNAgNDU1QavVAuh4TKyz6y4/TU1N2LRpk+FaCAF3d3cbRGo7FRUVKC8vN1z//r+h7o4atjanLQjnz5/H0qVLcf36dVy/fh379+/H2LFjDeM8wrP7HHl7e2PNmjX4+eefIYRAQUEBHn30URtGbF0bN27E7t278cknnyAjIwMTJ07E4sWLDeOenp544IEH8NlnnwEAioqKXOo71F1+fH198d577xn+qmbLli0u9f0BbhTF1atX49q1a2hubsaOHTuMcnDffffhzJkz+Omnn6DVarF7926bfoectiA89NBD+POf/4wpU6Zg+vTpiIiIQGJiIo/wbKe7HA0aNAhZWVlIT09HfHw8hBD461//auuwbW7JkiXYv38/AGD58uXYvn07EhISUFFRgeeff97G0dmePj/u7u5Yt24dXn75ZUyaNAnHjx/HwoULbR2eVU2YMAEPPfSQ4b8x/X9nKSkpqKurg5eXF7Kzs7FgwQIkJCRg+PDhiI+Pt1m8PDGNiIgAOPEdAhERmYcFgYiIALAgEBHRTSwIREQEgAWBiIhuYkEgp1VYWNij3VknTpxo+FPknjxuCXPnzkVjY6PZn3PixAm8+OKLFokhOzsbX331lUXei5wDCwI5raNHj6K1tdXWYXSq/X42PaXT6bBkyRKLrXV49tln8eqrr9ptjsj6XGcfAnJYX331FdauXYuhQ4eipqYG3t7eyM7Oxl133YXr169j7dq1+Prrr6HVanH33Xdj6dKlKC8vx4EDB3D48GF4e3sjLi4Oy5YtQ0NDA5RKJYKDg7Fu3ToEBAT0KIYDBw4gPz8fGo0G3t7eWLRoESIiIpCXl4fa2loolUrU1tYiMDAQa9asgUKhQGVlJV5++WVoNBoMGzYMFy5cwAsvvICioiIAwJw5c7BhwwYAwLZt27B8+XI0NjYiJSUFmZmZHWL4/PPPERISYtiA8MyZM1i2bBkaGxvh5uaG9PR0JCQkYOLEiUhKSsKRI0egUqnw9NNP45tvvsHx48fh4eGB/Px8BAYGws/PDxEREdi2bRvmzJljoX8tcmi22nebqKeOHDkiRo0aJb7++mshhBBbt24VU6dOFUIIkbxoIUIAAAROSURBVJeXJ7Kzs4VOpxNCCPH666+L5cuXCyGEWLRokXjvvfeEEEJs2rRJvPPOO0KIG3vUP/300+L9998XQggxYcIEUVlZ2eFz9Y+fOXNGJCUlicbGRiGEEFVVVeLBBx8ULS0tIjc3Vzz88MOiqalJCHHj/Ij169cLjUYjYmJiDHvbl5eXi9DQUHHkyBEhhBB/+MMfRENDg+FzsrKyhBBC1NfXi3vvvVdcuHChQzwLFiwQH3/8seF6ypQpYsuWLUIIIS5cuGCIY8KECWLlypVCCCE+/fRTMWrUKHHixAkhhBDz588X+fn5hvfYs2ePmD17do/+Hcj58Q6BHMKoUaPwwAMPAACmT5+OrKwsXL58GQcPHkRTUxPKysoA3DijoLPf+ufMmYOKigps3LgRZ8+exY8//oj77ruvR599+PBh1NfX48knnzQ8JpPJcO7cOQA3zpXo378/AODuu++GSqVCVVUVgBvbgwBAZGQkRo4c2eVnJCUlAbixXfTgwYPR0NCAIUOGGD2npqbGsL3Kr7/+ipMnT2LmzJkAgCFDhmDfvn2G58bGxgIAbrvtNgwePBijRo0CAAwbNszoTIKQkBCcOXOmR3kg58eCQA6hs10y3d3dodPpsHjxYsMP3paWlk4P8VmzZg0qKysxffp0jBs3Dm1tbRA93LVFp9MhKioK69atMzx28eJFKBQK7N27F97e3obHZTKZYVfP37+/qZ0+2++AqX+P32v/uP757bdKrqmpwdChQwEA/fr1Mzzu6elp8nPd3NhKpBv4TSCHcPLkSZw8eRLAjfn2iIgI3HrrrYiOjkZBQQGuX78OnU6Hl156CW+88QaAGz+A9edkHzp0CHPmzMGUKVMQEBCAsrIyw7bV3YmKisLhw4dx+vRpAMAXX3yByZMnm2zG3nXXXejXrx9KS0sB3DgZq6qqyvADvH1sPXXnnXca7kr69++Pe+65x9CPuHjxImbNmoWmpiaz3vP8+fMYPny4Wa8h58U7BHIIgwcPxrp161BbW4tBgwZh9erVAID58+dj1apVmDp1KrRaLUaPHm047D0mJgbZ2dkAbvxFzerVq7F+/Xp4enri/vvvN/xw7c6IESOQlZWFv//974b97PPz83HLLbd0+RoPDw/k5eVh+fLleOONN3DHHXdg8ODBhruJ+Ph4PPHEE8jLy+txDuLi4rB3715Mnz4dAPD666/jn//8Jz744APIZDKsWLHC6PStnvjyyy9tursm2RlbNjCIeuLIkSMiMTHR1mGYLTs7WyiVSiHEjabvH//4R6FSqXr9fm1tbWLy5Mni0qVLFomvqalJJCYmitbWVou8Hzk+3iEQSSQ4OBhPPvkkPDw8IITAq6++2uXZwz3h7u6OV155BW+88QZWrVrV5/jy8vKwePFieHl59fm9yDnwPAQiIgLApjIREd3EgkBERABYEIiI6CYWBCIiAsCCQEREN7EgEBERAOD/AYhYNK4qdondAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plot the ECDF\n", "_ = plt.plot(x_vers, y_vers, '.')\n", "_ = plt.xlabel('petal length (cm)')\n", "_ = plt.ylabel('ECDF')\n", "\n", "# Overlay percentiles as red diamonds\n", "_ = plt.plot(ptiles_vers, percentiles/100, marker='D', color='red', linestyle='none')\n", "plt.savefig('../images/petal-ecdf.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Box-and-whisker plot\n", "Making a box plot for the petal lengths is unnecessary because the iris data set is not too large and the bee swarm plot works fine. However, it is always good to get some practice. Make a box plot of the iris petal lengths. You have a pandas DataFrame, df, which contains the petal length data, in your namespace. Inspect the data frame df in the IPython shell using ```df.head()``` to make sure you know what the pertinent columns are.\n", "\n", "For your reference, the code used to produce the box plot in the video is provided below:\n", "```python\n", "_ = sns.boxplot(x='east_west', y='dem_share', data=df_all_states)\n", "_ = plt.xlabel('region')\n", "_ = plt.ylabel('percent of vote for Obama')\n", "```\n", "In the IPython Shell, you can use ```sns.boxplot?``` or ```help(sns.boxplot)``` for more details on how to make box plots using seaborn." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEMCAYAAADDMN02AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deUCU1f4G8GcGGDQxENx1wBIFNVNKRZNE0cq8IuIeaeUS/bTMLBPTssQyQcsMSbPbNe2S3CxFCS27oriE29XUygUEAZcBBRnZh5k5vz+8zpUUeoVZGN7n85ezvefrvPhwPO95z1EIIQSIiEg2lLYugIiIrIvBT0QkMwx+IiKZYfATEckMg5+ISGYY/EREMsPgJyKSGUdbFyDV9eslMBp5ywERkRRKpQLNmjW562t2E/xGo2DwExGZgVWC/+LFi3j55ZdNj4uKilBcXIzDhw9bo3kiIrqNVYK/ffv22Lp1q+nxBx98AIPBYI2miYjoT6x+cVen0yExMRGjR4+2dtNERAQbjPEnJyejVatW6Nat2z19zsPDxUIVERHJi9WD//vvv69Vbz8/v5gXd4mIJFIqFdV2mK061JObm4sjR44gODjYms0SEdFtrNrj37JlCwIDA9GsWTNrNktEdurAgb3Yvz/FIsfWagsBAK6ubmY/dkBAIPr3H2D245qLVXv8W7Zs4UVdIqoXtFottFqtrcuwCYW97MDFMX4iMqeoqMUAgIiId2xciWXUmzF+IiKyPQY/EZHMMPiJiGSGwU9EJDMMfiIimWHwExHJDIOfiEhmGPxERDLD4CcikhkGPxGRzDD4iYhkhsFPRCQzDH4iIplh8BMRyQyDn4hIZhj8REQyw+AnIpIZBj8Rkcww+ImIZIbBT0QkMwx+IiKZYfATEckMg5+ISGYY/EREMuNorYYqKiqwZMkSpKamwtnZGT179sTixYut1TwREf2X1YJ/2bJlcHZ2xk8//QSFQoFr165Zq2kiIrqNVYK/pKQECQkJSElJgUKhAAA0b97cGk0TEdGfWGWMPycnB25ubli1ahVGjRqFSZMm4ejRo9ZomoiI/sQqPX69Xo+cnBx07doVEREROHHiBP7v//4PP//8M1xcXCQdw8ND2vuIiKRwcnIAALRo0dTGlVifVYK/bdu2cHR0xPDhwwEAPXr0QLNmzZCZmYnu3btLOkZ+fjGMRmHJMolIRiorDQCAq1eLbFyJZSiVimo7zFYZ6nF3d4e/vz8OHDgAAMjMzER+fj68vLys0TwREd1GIYSwSjc6JycH8+fPR2FhIRwdHfHaa68hMDBQ8ufZ4yeqv775ZgNycrJsXcY9yc6+Wa+np/10QNVqL4SFPSfpvTX1+K02nVOtVuPrr7+2VnNEZEU5OVk4l3EWDq4qW5cimdHh5lDP+fxMG1cijUGrM9uxrBb8RNSwObiq4Dqgra3LaLC0ey+b7VhcsoGISGYY/EREMsPgJyKSGQY/EZHMMPiJiGSGwU9EJDMMfiIimWHwExHJDG/gIqI602oLoS+sMOtNRlSVvrACWsdCsxyLPX4iIplhj5+I6szV1Q3X9Ne5ZIMFafdehqurm1mOxR4/EZHMSOrxV1ZWIjMzEzdu3MD999+PBx54AE5OTpaujYiILKDG4N+zZw/i4+ORmpoKR0dHNGnSBCUlJdDr9ejbty8mTJiAQYMGWatWIiIyg2qDf8KECXB1dcXw4cOxaNEitGrVyvRaXl4eDh8+jPj4eHz++eeIj4+3SrFERFR31Qb/okWL4OPjc9fXWrZsieHDh2P48OE4d+6cxYojIiLzq/bibnWh/2edO3c2WzFERGR5ki7u6vV6/PDDDzh9+jRKS0urvLZ48WKLFEZERJYhKfjffPNNnDt3DgMGDICHh4elayIiIguSFPz79u3Dnj174OJy9x3biYjIfki6gcvb2xtardbStRARkRVI6vEvW7YMb7/9Nvr374/mzZtXeW3kyJEWKYyIiCxDUvBv3rwZR48ehVarRaNGjUzPKxQKBj8RkZ2RFPwbNmxAQkICOnbsaOl6iMzuwIG92L8/xezH1WpvLpFrroWzbhcQEIj+/QeY/biWZNDq7GpZZmO5AQCgbORg40qkMWh1gJnm1kgK/ubNm6NNmzZ1aigoKAgqlQrOzs4AgDlz5uDxxx+v0zGJbOnWdS9LBL+9Uau9bF3CPcvOzgIAeHrYSe0e5vueFUII8Vdv+uabb7B//36Eh4ffMZ1TrVZLaigoKAhr1qyp9Q1f+fnFMBr/slQiq4mKunkPS0TEOzauhGqjoZ8/pVIBD4+7z8SU1OOPjIwEACQnJ1d5XqFQ4PTp03Usj4iIrElS8J85c8Ysjc2ZMwdCCDz66KN4/fXXcf/990v+bHW/uYhsxcnp5thwixZNbVwJ1Yacz5+k4M/NzUWjRo3g6upqek6r1aK8vLzKqp01iYuLQ5s2baDT6fDBBx8gMjISy5cvl1woh3qovqmsvHlx8OrVIhtXQrXR0M9fTUM9km7gmjFjBjQaTZXnNBoNXnnlFclF3Lo4rFKpEBYWhmPHjkn+LBERmY+k4L9w4cIdq3X6+PggIyNDUiOlpaUoKrr5W1UIge3bt6NLly73WCoREZmDpKEed3d3ZGVlwcvrf1OJsrKy4OYmbRpbfn4+Zs6cCYPBAKPRiI4dO+Ldd9+tXcVERFQnkoJ/9OjRmDlzJmbPng21Wo3s7GysXLkSY8eOldSIWq1GQkJCnQolIiLzkBT84eHhcHR0RFRUFDQaDdq0aYMxY8Zg8uTJlq6PiIjMTFLwK5VKTJs2DdOmTbN0PUREZGHVXtyVOnffXHP8iYjIOmrcbN3FxQUhISHo3bt3lfn6eXl5OHLkCBISElBaWoq4uDirFEtERHVXbfBv3LgRu3fvRnx8PBYsWAClUokmTZqgpKQEANCvXz9MnDgRgYGBViuWiIjqrsYx/kGDBmHQoEGorKxEVlYWbty4AVdXV3h5ecHRUdLlASIiqmckpbeTkxO8vb0tXQsREVkBu+1UL3zzzQbk5GTZuox7cms991vL+9oLtdoLYWHP2boMsiEGP9ULOTlZuJB+Bq1d7OdH8j4YAQDlmnQbVyKdplhv6xKoHrCff2XU4LV2ccTkh91tXUaDtu5kga1LoHqAwU9E9Zal9ksGLDtUV9/3TJYU/Dk5Ofjkk09w+vRplJaWVnltz549lqiLiMiibt9fRG4kBf+cOXOgVqsRERGBxo0bW7omIiIAQP/+A+p1z9leSQr+tLQ0bNy4EUqlpOX7iYioHpOU5L1798Yff/xh6VqIiMgKqu3xr1y50vTndu3aYerUqXjyySfRvHnzKu+bNWuW5aoj2dBqC3G9WM9ZJxamKdajmbbQ1mWQjVUb/H/eYzcoKAh6vf6O54mIyL5UG/wffvihNesgmXN1dYNz2TXO47ewdScL0MhV2pap1HBJGuPv06fPXZ/v16+fWYshIiLLkxT8lZWVd33OaDSavSAiIrKsGqdzhoWFQaFQQKfT4dlnn63ymkajgZ+fn0WLIyIi86sx+MeOHQshBE6dOoUxY8aYnlcoFPDw8EDfvn0tXiAREZlXjcEfGhoKAOjRowc6duxolYKIiMiyJN25e/z4cRw/fvyO51UqFVq3bo2ePXtCpVKZvTgiIjI/ScG/detWHD9+HM2bN0fr1q2h0Whw7do1PPTQQ7h06RIA4LPPPkP37t3/8lirVq1CTEwMEhMT0blz57pVT0RE90xS8Ht7e+OJJ57Ac8/9b9eef/7zn8jIyMDGjRuxevVqvP/++/jXv/5V43F+//13/Prrr2jbtm3dqiYiolqTNJ3zhx9+wMSJE6s898wzzyAxMREKhQLTpk1DenrNuxDpdDpERkbi3XffhUKhqH3FRERUJ5J6/B4eHkhOTsaQIUNMz+3Zswfu7jfvsqyoqICjY82HWrlyJUaMGAG1Wl2Hcqkh09jZWj3Fupv3sbio7GfVWk2xHh1sXQTZnKTgf/vttzFr1ix06tQJbdq0wZUrV5CWlmZayO3EiROYNGlStZ8/fvw4Tp06hTlz5tS6UA8Pl1p/luo/H59OcHJysHUZ9+RqRgYAoI36QRtXIl1TAA8++CBatGhq61LIhhRCCCHljdevX0dKSgry8vLQsmVLBAYGolmzZpIaWbt2LTZs2GCa+aPRaODh4YEPP/wQAQEBko6Rn18Mo1FSqURWcWvLvoiId2xcCdGdlEpFtR1mycFvTkFBQVizZs09zeph8FN9w+Cn+qym4Oeeu0REMmOTPXeTk5PrfAwiIqod7rlLRCQz3HOXiEhmJPX4uecuEVHDISn4y8rKuOcuEVEDISn4uf8uEVHDISn4AeD8+fP48ccfkZ+fj4ULFyIjIwM6nQ6+vr6WrI+IiMxM0sXdHTt24Nlnn0Vubi4SEhIAACUlJVi6dKlFiyMiIvOT1OP/9NNPsW7dOnTp0gU7duwAAPj6+uLMmTMWLY6IiMxPUo+/oKDANKRza0llhULB5ZWJiOyQpODv1q0btm7dWuW5pKQkPPzwwxYpioiILEfSUM+CBQswdepUfPfddygtLcXUqVORmZmJf/zjH5auj4iIzExS8Hfs2BE7duzA7t27MXDgQLRp0wYDBw5EkyZNLF0fERGZmeTpnI0bN8awYcMsWQsREVlBtcEfFhYm6eJtXFycWQsiIiLLqjb4x44da806iIjISqoN/tDQUGvWQUREVsIF9omIZMYme+7WBvfcpdo6cGAv9u9PMftxs7OzAACenl5mP3ZAQCD69x9g9uOSfNR5z10iupOrq6utSyCqFfb4iYgaoFr1+FeuXCnp4NyBi4jIvlQb/Nxpi4ioYeJQDxFRA2S2i7vFxcW4fv16lefUanXtKyMiIquTFPzp6emYM2cOzpw5A4VCASGEaTmH06dPW7RAovqqsPA61qyJwfTpr8LV1c3W5RBJJukGrkWLFsHf3x+HDx+Gi4sLjhw5gvHjx9/T1oszZszAiBEjMHLkSISFhfEXBtm9xMQtSEs7i23bNtu6FKJ7ImmMv3fv3vjll1/g5OSEXr164ejRoygtLcXw4cORnJwsqaGioiI0bdoUAPDvf/8bsbGx2LJli+RCOcZP9Ulh4XVERLyGyspKODmpEB39CXv9VK/UNMYvqcfv7OwMvV4PAGjWrBkuX74Mo9GIwsJCyUXcCn3g5rUCbttI9iwxcYupI2I0GtnrJ7siaYz/0UcfxY4dOzBq1Cg89dRTePHFF6FSqdC3b997amzBggU4cOAAhBD4+9//fk+fre43F5EtHDx4AAbDzc6QwaDHwYMH8PrrvKeF7MM9T+c0Go1ITExESUkJQkND0bhx43tuNCEhAUlJSfjiiy8kf4ZDPVSffP31P7B37x4YDHo4ODhiwICBmDRpiq3LIjKp81DPl19+edvBlAgJCUFYWBji4+NrVdDIkSNx6NChO6aGEtmL4OBQKJU3hyuVSiVGjBhl44qIpJMU/LGxsXd9fvXq1ZIaKSkpwZUrV0yPk5OT4erqCjc3Xgwj++Tm1gwBAYFQKBQICBjAC7tkV2oc409NTQVwc3jn4MGDuH1U6OLFi5I3Wy8rK8OsWbNQVlYGpVIJV1dXrFmzhhd4ya4FB4fi0qWL7O2T3alxjD8oKAgAcOXKFbRp0+Z/H1Io0Lx5c4SHh2Pw4MGWrxIc4yciuhc1jfFLurg7d+5cREdHm72we8HgJyKSrs7BDwCVlZU4ceIE8vLyMGzYMJSWlgIA7rvvPvNVWgMGPxGRdHVepO3s2bOYPn06VCoVcnNzMWzYMBw5cgRbtmzBJ598YtZiiYjIsiTN6nnvvffw6quv4scff4Sj483fFb1798Z//vMfixZHRETmJyn409PTERISAgCmmTj33XcfKioqLFcZERFZhKTgb9euHX777bcqz508eRKenp4WKYqIiCxH0hj/rFmz8NJLL2HChAmorKzE559/jvj4eCxevNjS9RERkZlJntXz+++/Y9OmTbh8+TJat26NcePG4aGHHrJ0fSac1UNEJJ1ZpnPaGoOfiEi6Ok/n1Ol0WL16NZKSkpCXl4eWLVti2LBhmD59Opydnc1aLBERWZak4H/vvfeQmZmJBQsWoF27drh06RLWrl2L3NxcfPjhh5aukYiIzEhS8O/atQs///wz7r//fgCAt7c3evTogSeffNKixRERkflJms7ZvHlzlJWVVXmuoqICLVq0sEhRRERkOZJ6/CEhIZg2bRomTZqEVq1aQaPRIC4uDiEhIaalmwGgX79+FiuUiIjMQ9KsnlvLM9d4IIUCu3btMktRd8NZPURE0nE6JxGRzNR5z10iImo4GPxERDLD4CcikhkGPxGRzDD4iYhkhsFPRCQzDH4iIplh8BMRyYykJRvq6vr165g7dy6ys7OhUqng5eWFyMhIuLu7W6N5IiK6jVV6/AqFAtOmTcNPP/2ExMREqNVqLF++3BpNExHRn1gl+N3c3ODv72963LNnT1y+fNkaTRMR0Z9YfYzfaDRi48aNkhZ+k4Ps7At4+eWpyMnJsnUpRCQTVl+kbdGiRcjNzcWqVaugVPLa8owZM5CTkwNPT0/ExsbauhwikgGrXNy9JSoqCllZWVizZs09h35DXJ0zO/sCcnJy/vvnbBw79hvUai8bV0VEDUG9WJ1zxYoV+O233xAbGwuVSmWtZuu1tWur9vA//3yVjSohIjmxSo8/LS0Na9asQYcOHTBhwgQAQPv27WU/tHH58qUaHxMRWYJVgr9Tp044e/asNZqyK23btqsS9m3btrNhNUQkF7y6akPh4S9XefzSS6/YqBIikhMGvw15enYw9fLbtm3HC7tEZBUMfhsLD38ZjRs3Zm+fiKyGm60TETVANU3ntOo8fnt14MBe7N+fYpFja7WFAABXVzezHzsgIBD9+w8w+3GJyL5xqMfGtFottFqtrcsgIhnhUI+NRUUtBgBERLxj40qIqCGpF3fuEhFR/dCgevzffLPB7la5zM6+Wa+np/1M5VSrvRAW9pytyyCiGsjm4m5OThbOpqXDoZH5L5RaitHgAABIz7lm40qkMZQX2roEIqqjBhX8AODQyA33eQ22dRkNVmnWLluXQER11KCCX6sthKG8kOFkQYbyQmi1DerHhkh2eHGXiEhmGlTXzdXVDVdv6DnUY0GlWbsscrMZEVlPgwp+AHY31GPUlwMAlI6NbFyJNDcv7ja3dRlEVAcNKvjtcXVL03ROtb2EaXO7/J6J6H8a1Dx+e8Q7d4nIEnjnLhERmTD4iYhkhsFPRCQzDH4iIpnhxV0JLLkRS1bWBVRUVODBBzvC0dG8k6y4EQuRfPHibj2mUCgghBHXrl21dSlEJBPs8dtQYeF1RES8hsrKSjg5qRAd/QnviiUis2CPv55KTNwCg8EIADAYDNi2bbONKyIiObBK8EdFRSEoKAg+Pj44d+6cNZq0C6mpB2A0GgAARqMBqakHbFwREcmBVYJ/8ODBiIuLQ7t27azRnN145JFeNT4mIrIEq6zV06sXA00KhUJh6xKISAbsZpG26i5S2LPjx49WeXzs2BHMm/emjaohIrmwm+BviLN6+vbtj717d8NgMMDBwQF9+/bH1atFti6LiBoAzuqpp4KDQ6FU3jwFSqUDRowYZeOKiEgOGPw25ObWDAEBgVAoFAgIGMA5/ERkFVYJ/vfffx8DBgyARqPB5MmT8be//c0azdqF4OBQdOrkw94+EVkN79wlImqAOMZPREQmDH4iIplh8BMRyYzdzONXKnlXKxGRVDVlpt1c3CUiIvPgUA8Rkcww+ImIZIbBT0QkMwx+IiKZYfATEckMg5+ISGYY/EREMsPgJyKSGQY/EZHMMPjrYMeOHRg5ciRCQkIwdOhQvPHGGzW+/8aNG/jiiy+sVJ18TZ06FfHx8VWeE0IgKCgIR44cqfPxT5069ZfnuiaHDh3CqFHcf0GKms7lqlWr8NVXX9XquAsWLMDRo0f/8n0rV67E9u3ba9VGvSaoVnJzc4W/v7+4fPmyEEIIo9Eo/vjjjxo/k5OTI/r06WON8mQtKSlJjB07tspzqamp4oknnpB8jMrKSnOXZXLw4EERGhp6z5/T6/UWqKZ+q+25lON3dS/sZpG2+ubatWtwdHSEm9vN7RIVCgW6dOkCADhx4gSWL1+OkpISAMCrr76KgQMHIjIyEkVFRQgJCUHjxo0RHx+PrKwsLFy4EAUFBXB0dMTs2bMxYMAAlJWVISIiAunp6XB0dMQDDzyAlStX4urVq3j99ddRUlKCiooKBAYGYu7cuTb7HuqjIUOGYNGiRUhPT4e3tzcAYPPmzRg1ahR0Oh1WrFiBI0eOoLKyEp07d8Z7772HJk2aYN68eWjSpAkuXLiA69evIy4u7q7n4NChQ4iKisLmzZsBALt370ZMTAz0ej2USiWWLl0KX19f7N27Fx9//DEMBgPc3d0RGRkJLy+vO+pNSEjAl19+CQDw9PREZGQkPDw8sHnzZiQlJcHd3R3nz5/HBx98YPoZk4uazmVMTAxKS0sRERFx1+/KyckJb731FsrKyuDr64vs7GxMnz4dgwYNwqRJkzBlyhQMGjQI8+bNg0qlwoULF6DRaNCzZ09ERUVBoVBg3rx5eOihhzBx4kTTz86+ffugVCqhVqsRGxuLs2fPYtGiRSgrK0NFRQXGjRuHF154wbZf3F+x9W8ee2UwGMT06dNFnz59xMyZM8W6detEQUGB0Gq1IiQkROTm5gohbv7P4PHHHxdarfauPf4xY8aIb7/9VgghRFpamujTp4/Iz88XO3fuFM8//7zpfYWFhUIIIcrLy0VxcbEQQgidTicmTZokUlJSrPA3ti+LFy8WUVFRQgghioqKhJ+fn9BoNCI2NlbExsaa3hcdHS0+/vhjIYQQERERIjQ0VJSUlAghRLXn4PYee0ZGhnjsscdEZmamEEKIiooKUVRUJK5duyb8/f1FWlqaEEKIb7/9VowZM+aOz589e1b079/f9POyYsUKMWvWLCGEEN9//73o2bOnyMrKMvv3Y0+qO5effvqpWLp0qRDi7t9VaGioSEhIEEIIcfLkSeHr6yuSk5OFEEJMnDjR9OeIiAgxYcIEUV5eLioqKsSwYcPE/v37Ta99/fXXQgghYmJixMsvvywqKiqEEELk5+ebarr1XHFxsXj66adFenq6Rb+TuuIYfy0plUp89tln+Prrr+Hv74+UlBSMGDECKSkpuHjxIl588UWEhITgxRdfhEKhQFZW1h3HKC4uxunTpzF69GgAgLe3N7p06YJff/0Vvr6+yMjIwKJFi7Bjxw6oVCoAgMFgQHR0NEaMGIFRo0YhLS0NZ86cserf3R6MGTMG27Ztg16vx44dO/Doo4+iVatWSE5OxrZt2xASEoKQkBAkJycjOzvb9LmhQ4fivvvuA4Bqz8HtfvnlFwwYMAAdOnQAAKhUKri4uODEiRPw9fU19VJHjx6N06dPo7i4uMrnDx06hMDAQLRs2RIAMGHCBKSmpppef+SRR+Dp6WnW78beVHcu/+z276q4uBjnzp1DcHAwAKB79+7w8fGpto0hQ4bA2dkZKpUKXbt2rfIzccvu3bvx/PPPm34O3N3dAQDl5eWYP38+goOD8cwzzyAvL6/e/5vkUE8dde7cGZ07d8azzz6LYcOGQQgBHx8fxMXF3fHeixcvSjqmQqGAWq3G9u3bcfDgQezduxcrVqxAYmIi1q1bhxs3bmDTpk1wdnbGO++8g4qKCnP/teyer68vWrRogX379uH77783/ddbCIF3330X/fr1u+vnboU+gGrPwe1ENauaCyGgUPz1HhJ/9b4mTZr85TEauurO5Z/d/l3d+l6lnAMAcHZ2Nv3ZwcEBBoPhjvdUd64//vhjtGjRAkuXLoWjoyOmTJlS7/9NssdfS7m5uTh+/LjpsUajQUFBAby9vZGVlYWDBw+aXjt58iSEEHBxcUF5eTn0ej0AwMXFBV26dMGWLVsAAOfPn8eZM2fQo0cPaDQaODg4YMiQIXjrrbdQUFCAwsJCFBUVoUWLFnB2dkZubi527dpl3b+4HRk9ejRiYmJw4cIFBAUFAQCCgoLw1Vdfoby8HMDNnuH58+fv+vnqzsHtAgICsHfvXly4cAEAoNPpUFxcDD8/P5w+fdp07C1btqBr165wcam6+XW/fv2QkpKCq1evAgC+/fZbPPbYY2b7DhqKu53LmjRt2hTe3t744YcfAAC///47zp07V6cagoKCsH79euh0OgBAQUEBAKCoqAitW7eGo6Mjzp07J2m2kK2xx19Ler0eMTExuHTpEho1agSj0YjXXnsNXbt2xWeffYZly5ZhyZIlqKyshFqtxpo1a+Dm5obg4GAEBwfD1dUV8fHxWL58ORYuXIivvvoKjo6OiI6Ohru7O1JSUvDRRx8BAIxGI8LDw9GqVStMmjQJs2bNwsiRI9G6detqe64EBAcHIzo6GuPHjzf99zw8PByrVq3CmDFjTD3CV155BR07drzj82fPnr3rObgV8gDQoUMHLF68GLNnz4bBYICDgwOWLl0KHx8fREdHY86cOdDr9XB3d8eyZcvuaKNTp0544403MGXKFAA3/5cRGRlpgW/Dvt3tXP6VqKgozJ8/H+vWrUO3bt3g6+uLpk2b1rqG8PBwfPTRRxg5ciScnJzg5eWFTz/9FNOnT8fcuXOxbds2eHp6onfv3rVuw1q4AxcRNUilpaVo3LgxFAoF0tPTMWnSJPz4449wdXW1dWk2xx4/ETVIx44dQ3R0tGlsfvHixQz9/2KPn4hIZnhxl4hIZhj8REQyw+AnIpIZBj+RBfj5+SEnJ8fWZRDdFS/uEhHJDHv8REQyw+An2Vi7di0ef/xx+Pn54amnnkJqaipiYmLw6quv4rXXXoOfnx9CQ0OrLLCVm5uLmTNnom/fvggKCsKGDRtMrxkMBqxZswZDhgyBn58fRo0ahStXrgAAfHx8TAvz6XQ6REVFYeDAgXjsscewcOFC05IRBQUFeOmll9CrVy/06dMHYWFhMBqNVvxWSI4Y/CQLGRkZiIuLw3fffYfjx4/jyy+/RLt27QAAu3btwtChQ3H48GEMHz4cM2bMQGVlJYxGI6ZPnw4fHx/s3UD81i4AAAMRSURBVLsX69evx/r167Fv3z4AwLp165CUlIS1a9fi2LFjWLJkCRo1anRH28uWLUNmZiYSEhKwc+dO5OXlITY21nSMVq1aITU1FQcOHMDrr78ueWExotpi8JMsODg4QKfT4fz586isrET79u1NS/h269YNQ4cOhZOTEyZPngydTocTJ07g1KlTKCgowCuvvAKVSgW1Wo1x48aZtuLbtGkTZs2ahQcffBAKhQK+vr5o1qxZlXaFENi0aRPmz58PNzc3uLi44KWXXkJSUhIAwNHREVevXsXly5fh5OSEXr16MfjJ4rhkA8mCl5cX5s+fj5iYGKSnpyMgIADz5s0DALRu3dr0PqVSiVatWiEvLw8AkJeXh169epleNxgMpscajeYv18ovKChAWVlZlT12hRCm4ZypU6di1apVpkXaxo8fj/DwcDP8jYmqx+An2bi1MmpxcTEWLlyI5cuXw9PTExqNxvQeo9GI3NxctGzZEg4ODmjfvj127tx51+O1bt0a2dnZ6Ny5c7VtNmvWDI0aNUJSUtJdNw9xcXHBvHnzMG/ePKSlpeG5555D9+7dueoqWRSHekgWMjIykJqaCp1OB5VKBWdnZzg4OAC4uVb7zp07odfrsX79eqhUKvTo0QMPP/wwXFxcsHbtWpSXl8NgMODcuXM4efIkAGDs2LFYuXIlLly4ACEEzpw5g+vXr1dpV6lUYuzYsViyZAny8/MB3LxgfOs6we7du5GVlWXar8HBwQFKJf9ZkmXxJ4xkQafT4aOPPoK/vz8CAgJQUFCA2bNnAwAGDx6M7du3o3fv3ti6dStiYmLg5OQEBwcHrF69GmfOnMHgwYPRt29fvP3226btEydPnoynn34aU6ZMwSOPPIIFCxbcdeelN998E15eXhg3bhweeeQRvPDCC8jMzAQAZGVlYfLkyfDz88P48ePxzDPPwN/f33pfDMkSb+AiWYuJiUFWVhaWL19u61KIrIY9fiIimWHwExHJDId6iIhkhj1+IiKZYfATEckMg5+ISGYY/EREMsPgJyKSGQY/EZHM/D+g7JurRVk9TAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Create box plot with Seaborn`s default settings\n", "_ = sns.boxplot(x='species', y='petal length (cm)', data=df)\n", "\n", "# Label the axes\n", "_ = plt.xlabel('species')\n", "_ = plt.ylabel('petal length (cm)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Variance and standard deviation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ variance = \\dfrac{1}{n}\\sum^{n}_{i=1}(x_i - \\bar{x})^2 $$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the variance\n", "It is important to have some understanding of what commonly-used functions are doing under the hood. Though you may already know how to compute variances, this is a beginner course that does not assume so. In this exercise, we will explicitly compute the variance of the petal length of Iris veriscolor using the equations discussed in the videos. We will then use ```np.var()``` to compute it." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.21640000000000004 0.21640000000000004\n" ] } ], "source": [ "# Array of differnces to mean: differences\n", "differences = np.array(versicolor_petal_length - np.mean(versicolor_petal_length))\n", "\n", "# Square the differences: diff_sq\n", "diff_sq = differences ** 2\n", "\n", "# Compute the mean square differences: variance_explicit\n", "variance_explicit = np.mean(diff_sq)\n", "\n", "# Compute the variance using NumPy: variance_np\n", "variance_np = np.var(differences)\n", "\n", "# Print the results\n", "print(variance_explicit, variance_np)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The standard deviation and the variance\n", "As mentioned in the video, the standard deviation is the square root of the variance. You will see this for yourself by computing the standard deviation using ```np.std()``` and comparing it to what you get by computing the variance with ```np.var()``` and then computing the square root." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4651881339845204\n", "0.4651881339845204\n" ] } ], "source": [ "# Compute the variance: variance\n", "variance = np.var(versicolor_petal_length)\n", "\n", "# Print the square root of the variance\n", "print(np.sqrt(variance))\n", "\n", "# Print the standard deviation\n", "print(np.std(versicolor_petal_length))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Covariance and the Pearson correlation coefficient\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "$$ covariance = \\dfrac{1}{n}\\sum^{n}_{i=1}(x_i - \\bar{x})(y_i - \\bar{y})$$\n", "$$ \\begin{align} \\rho &= \\text{Pearson correlation} = \\dfrac{\\text{covariance}}{(\\text{std of x})(\\text{std of y})} \\\\ &= \\dfrac{\\text{variability due to codependence}}{\\text{independent variability}} \\end{align}$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scatter plots\n", "When you made bee swarm plots, box plots, and ECDF plots in previous exercises, you compared the petal lengths of different species of iris. But what if you want to compare two properties of a single species? This is exactly what we will do in this exercise. We will make a scatter plot of the petal length and width measurements of Anderson's Iris versicolor flowers. If the flower scales (that is, it preserves its proportion as it grows), we would expect the length and width to be correlated.\n", "\n", "For your reference, the code used to produce the scatter plot in the video is provided below:\n", "```python\n", "_ = plt.plot(total_votes/1000, dem_share, marker='.', linestyle='none')\n", "_ = plt.xlabel('total votes (thousands)')\n", "_ = plt.ylabel('percent of vote for Obama')\n", "```" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAEMCAYAAAAxoErWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3df1wUdf4H8NfuIiZKgLgisFImydWJQqLJpWaLv05UyCI1PO6BCqkpPlJRz86vv+g6sLMzM607UyvSsh8WD0jxR1xi5q/D4Ewg/HGeK6igkLie4O58/zA2OZ3dWWBnFvb1fDx8PFh2PjPvee+4r52ZZUYlCIIAIiKie1ArXQARETkvhgQREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEoN6ULaGlXr16H2Wz/n374+nZCVVWtAypqO9gj69gf29gj65Toj1qtgo9PR9Hn21xImM1Ck0KiYSxZxx5Zx/7Yxh5Z52z94eEmIiISxZAgIiJRDAkiIhIlS0ikp6dDr9cjJCQEpaWl95ymqqoKycnJGDt2LEaNGoVly5bh1q1bcpRHREQiZAmJqKgoZGZmIjAwUHSaDRs2oGfPnsjKykJWVhZOnDiB3NxcOcojIiIRsny7KSIiwuY0KpUK169fh9lsRl1dHerr6+Hn5ydDdUREjZUZalBy7ipCgnwQHOildDmKcppzEjNnzsSZM2cwaNAgy79+/fopXRYRuZgyQw1WbS3AZ9+cxqqtBSgz1ChdkqKc5u8kdu7ciZCQEGzZsgXXr19HUlISdu7ciVGjRtk1H1/fTk2uQav1bPJYV8EeWcf+2ObsPcorLIfJZIYgACaTGeerjIgM08m2fGfrj9OExAcffIA//elPUKvV8PT0hF6vx6FDh+wOiaqq2ib9MYpW64nLl6/ZPc6VsEfWsT+2tYYe6Xw9oNGoAZMZGo0aOl8P2WpWoj9qtcrqh2unCQmdTodvvvkGffr0QV1dHQ4ePIjhw4crXRYRuZjgQC+kTgrnOYmfyXJOIi0tDUOGDEFFRQUSExMRHR0NAEhKSkJRUREAYPHixTh27BjGjh2L2NhYPPjgg3juuefkKI+IqJHgQC9ERz7o8gEBACpBEJzrQiHNxMNNjsMeWcf+2MYeWeeMh5uc5ttNRETkfBgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCRKltuXpqenY9euXTAYDMjKykKvXr3ummbBggUoKSmxPC4pKcG6desQFRUlR4lERHQPsoREVFQUEhISEB8fLzpNRkaG5efi4mL8/ve/x+DBg+Uoj4hIcWWGGuQVlkPn6+FUt02VJSQiIiLsmv6TTz7B2LFj4e7u7qCKiIicR5mhBqu2FsBkMkOjUSN1UrjTBIUsIWGPuro6ZGVlYfPmzU0ab+1erbZotZ5NHusq2CPr2B/b2KO75RWWw2QywywAMJlxvsqIyDCd0mUBcMKQ2LNnDwICAvDII480aXxVVS3MZsHucbxBu23skXXsj23s0b3pfD2g0aiBn/ckdL4esvVJrVZZ/XDtdCHx6aef4plnnlG6DCIi2QQHeiF1UjjOVxmd7pyEU30FtqKiAseOHcOYMWOULoWISFbBgV6Ii+rlVAEByBQSaWlpGDJkCCoqKpCYmIjo6GgAQFJSEoqKiizTff7553jqqafg7e0tR1lERGSDShAE+w/gOzGek3Ac9sg69sc29sg6Jfpj65yEUx1uIiIi58KQICIiUQwJIiISxZAgIiJRDAkiIhLFkCAiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRDAkiIhLFkCAiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRsoREeno69Ho9QkJCUFpaKjpdTk4Oxo4dizFjxmDs2LGorKyUozwiIhLhJsdCoqKikJCQgPj4eNFpioqK8Oabb2LLli3QarW4du0a3N3d5SiPiGRQZqhBXmE5dL4est3HucxQg5JzVxES5ON0945uSY5cT1lCIiIiwuY0mzdvxpQpU6DVagEAnp6eji6LiGRSZqjBqq0FMJnM0GjUSJ0U7vA37YZl3jKZ4SbTMpXg6PWUJSSkOHXqFHQ6HeLj42E0GjF8+HDMmDEDKpXKrvlYu1erLVotg8kW9sg69ufe8grLYTKZYRYAmMw4X2VEZJhOlmUKAmCSaZktwd5tyNHr6TQhYTKZUFJSgk2bNqGurg7Tpk1DQEAAYmNj7ZpPVVUtzGbB7uXzBu22sUfWsT/idL4e0GjUwM97EjpfD4f3SollNldTtqHmrqdarbL64dppQiIgIACjRo2Cu7s73N3dERUVhcLCQrtDgoicT3CgF1InheN8lVG2cxINy2zr5yQcvZ5OExJjxozBP/7xD8TExODWrVv47rvvMHLkSKXLIqIWEhzohcgwnayf5oMDvdpsONzJkespy1dg09LSMGTIEFRUVCAxMRHR0dEAgKSkJBQVFQEAoqOj4evri9GjRyM2NhbBwcF49tln5SiPiIhEqARBsP8AvhPjOQnHYY+sY39sY4+sU6I/ts5J8C+uiYhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIREm6VHh1dTXeffddnDx5EkajsdFzmZmZDimMiIiUJykk5s2bh7q6Ovz2t79Fhw4dHF0TERE5CUkhUVBQgO+++w7u7u6OroeIiJyIpHMSISEhqKioaNaC0tPTodfrERISgtLS0ntOs3btWkRGRiImJgYxMTFYvnx5s5ZJRETNI7on8cknn1h+HjhwIKZNm4bx48ejS5cujaaTeve4qKgoJCQkID4+3up0sbGxWLhwoaR5EhGRY4mGxBdffNHosZ+fHw4cONDodyqVSnJIRERENKE8IhJTZqhBybmrCAnyaRX3cf7LtgL8aKjBw4FemDcxXPK45qynEj3KO27AsZJL6BfSFUPDAmVZpiOJhsT7778vZx0W2dnZyM/Ph1arxezZsxEeLn1jInIVZYYarNpagFsmM9w0aqROCnfqoPjLtgKcOHsVAHDi7FX8ZVuBpKBoznoq0aO84wa8t7MEAHDizO31be1BIenEdWxsLHbs2HHX78ePH4/PPvusxYqZOHEipk+fjnbt2uHAgQOYOXMmcnJy4OPjI3ke1u7VaotW69nksa6CPbJOrv7kFZbDZDJDEACTyYzzVUZEhulkWXZT/GioueuxlF41Zz2V6FHR6St3PY4b/iu75uFs/8ckhcS///3vu34nCALOnz/fosVotVrLz0888QT8/f3x448/YsCAAZLnUVVVC7NZaMKyeYN2W9gj6+Tsj87XAxqNGjCZodGoofP1cOrX5uFAL8ueRMNjKfU2Zz2V6FHoQ51RUHq50WN7lqnE/zG1WmX1w7XVkFiwYAEAoL6+3vJzA4PBgODg4BYo8RcXL16En58fAODkyZMwGAzo0aNHiy6DqC0IDvRC6qTwVnNOYt7E8Cadk2jOeirRo4ZDSy5xTgIAgoKC7vkzADz22GMYNWqU5AWlpaUhNzcXlZWVSExMhLe3N7Kzs5GUlISUlBSEhoZi9erVOHHiBNRqNdq1a4eMjIxGexdE9IvgQC+nD4c7zZsY3qRPys1ZTyV6NDQssE2EQwOVIAg2j83s378fgwcPlqOeZuPhJsdhj6xjf2xjj6xrVYebDh48+MtEbm6NHt8pMjKyGeUREZEzEw2Jl19+2fKzSqXCxYsXAQDe3t6orq4GcPtvJ/bu3evgEomISCmiIbFv3z7Lzxs2bEB1dTXmzJmDDh064MaNG3jjjTfg7e0tS5FERKQMSddu2rx5M+bNm2e5AmyHDh0wd+5cbNq0yaHFERGRsiSFhIeHBwoLCxv9rqioiJcNJyJq4yT9MV1KSgqmTZsGvV6Pbt26oaKiAl9//TX+7//+z9H1ERGRgiRflqN3797YtWsXLl26hB49emDGjBkt/sd0RETkXCSFBAAEBwczFIiIXIxoSCxZsgQrV64EAKSmpkKlUt1zuoyMDMdURkREihMNCZ3ul6slPvDAA7IUQ0REzkU0JF544QXLz7NmzZKlGCIici6SvgI7a9YsbNmyBSdPnnR0PURE5EQknbh+8skncfToUWzZsgW1tbV47LHHMGDAAERERKBPnz6OrpGIiBQiKSTi4uIQFxcH4PZ9JD7++GOsW7cORqORexdERG2YpJA4deoUjhw5giNHjuDYsWPo0qULJkyYYNcd44iIqPWRFBLR0dEICgpCcnIyVq5cCQ8PD0fXRURETkDSiev09HQMHDgQ7777LsaPH48lS5bgyy+/RHl5ueQFpaenQ6/XIyQkBKWlpVanPX36NPr27Yv09HTJ8yciopYnaU8iJiYGMTExAIDKykq8//77WL58uV3nJKKiopCQkID4+Hir05lMJixduhTDhg2TNF8iInIcSSHxww8/4PDhwzh8+DCOHTuG9u3bY+jQoXadk4iIiJA03TvvvIOhQ4fCaDTCaDRKnj9RSygz1KDk3FWEBPm0qvtHyyXvuAHHSi6hX0hX2e7j3JzXpKn1cjv4haSQmDVrFgYMGAC9Xo9FixYhKCjIIcUUFxcjPz8f7733Ht566y2HLINITJmhBqu2FuCWyQw3jRqpk8Jd/g3iTnnHDXhvZwkA4MSZqwDg8KBozmvS1Hq5HTQmKSTuvEudo9TX12PJkiV49dVXodFomjwfazf0tkWr9WzyWFfRlnuUV1gOk8kMQQBMJjPOVxkRGaazPfAObbk/Raev3PU4bviv7J6PPT1qzmvS1HpbYjtoDmfbhiRfBdbRLl++jHPnziE5ORkA8NNPP0EQBNTW1louNChFVVUtzGbB7uVrtZ64fPma3eNcSVvvkc7XAxqNGjCZodGoofP1sGt923p/Qh/qjILSy40e27u+9vaoOa9JU+tt7nbQHEpsQ2q1yuqHa6cJiYCAABw6dMjyeO3atTAajVi4cKGCVZErCQ70QuqkcB6LFtFwqEbOcxLNeU2aWi+3g8ZkC4m0tDTk5uaisrISiYmJ8Pb2RnZ2NpKSkpCSkoLQ0FC5SiESFRzo5fJvCtYMDQuU7YR1g+a8Jk2tl9vBL1SCINh/bMaJ8XCT47BH1rE/trFH1rWqw01r1qyRtIA5c+bYXxUREbUKoiFRUVEhZx1EROSEREPi1VdflbMOIiJyQnaduK6trcXVq1cb/a579+4tWhARETkPSSFRVlaG+fPno7i4GCqVCoIgQKVSAQDvJ0FE1IZJugrs8uXL8fjjj+Pw4cPo1KkTjhw5ggkTJuDPf/6zo+sjIiIFSQqJ4uJizJ8/H/fffz8EQYCnpycWLFgg+RtQRETUOkkKifbt2+PWrVsAAB8fH1y4cAFmsxnV1dUOLY6IiJQl6ZxEv3798NVXX2H8+PEYOXIkkpKS4O7ujoEDBzq6PiIiUpCkkLjzsNLcuXPx8MMP4/r163j66acdVhgRESlP0uGmjRs3/jJArUZMTAyef/55bNu2zWGFERGR8iSFxLp16+75+/Xr17doMURE5FysHm46ePAgAMBsNuO7777DndcCPH/+PDp27OjY6oiISFFWQ+Lll18GANy8eROLFy+2/F6lUqFLly744x//6NjqiIhIUVZDouG2pQsWLEBGRoYsBRERkfOQdE4iIyMD9fX1OHr0KHJycgAARqMRRqPRocUREZGyJH0FtqSkBDNmzIC7uzsuXryI0aNH48iRI/j888/x17/+1eb49PR07Nq1CwaDAVlZWejVq9dd03z66afYvHkz1Go1zGYz4uLikJCQYP8aERFRi5G0J7Fs2TKkpKRg586dcHO7nSv9+/fHsWPHJC0kKioKmZmZCAwUv43gyJEj8eWXX+KLL77A1q1bsWnTJhQXF0uaPzmvMkMNsg+eRZmhRulSHGr712VIfnUPtn9dZvfYvOMG/OWjAuQdN8gyrjljm/N6lhlqsH1vqd1jXWUbclaSrwIbExMDAJarv3p4eODmzZuSFhIREWFzmk6dfrl93n//+1/U19dblkWtU5mhBqu2FuCWyQw3jRqpk8Lb5H2Dt39dhq8OnQMAlFdeBwDEPRUsaWzecQPe21kCADhx5vZl+KXck7mp45oztjmvZ8NYk8kMjR1jXWUbcmaSQiIwMBD/+te/EBoaavldYWEhgoKCWrSYvXv3YvXq1Th37hzmzZuHkJAQu+dh7V6ttmi1nk0e6yrs6VFeYTlMJjMEATCZzDhfZURkmM6B1Snj+Kmqux7PfC5c0tii01fuehw3/FcOG9ecsc15PRvGmgUAdox1lW3oTs72PiQpJObMmYMXXngBEydORH19Pd5++21s27YNK1eubNFioqKiEBUVhQsXLuDFF1/EkCFD8NBDD9k1j6qqWpjNgu0J/wdv0G6bvT3S+XpAo1EDP3961Pl6tMkeh/X0texBNDyWup6hD3VGQenlRo+ljG3quOaMbc7r2dSxrrINNVDifUitVln9cK1ZtmzZMlsz6dGjBwYOHIhjx45ZLheempoq6TDSnbZs2YIxY8bA19fX6nSenp744YcfUF1djfBwaZ/IGty4UQfB/oxAx47tYTTW2T/Qhdjbo87334dHHvCB1rsDxg3q0WYPE/y6R2fU1Ztw/eYtDA71l3yoCQAe7HY/vDq545bJjN8OfEDyIaOmjmvO2Oa8ng1jHwz0xm8HBEke6yrbUAMl3odUKhU8PNzFnxeEprylNo1er8eGDRvu+e2mU6dOoWfPngCAK1euYNKkSViyZAkGDRpk1zK4J+E47JF17I9t7JF1zrgnIelwU11dHdavX4/s7GxcunQJXbt2xejRozFjxgy0b9/e5vi0tDTk5uaisrISiYmJ8Pb2RnZ2NpKSkpCSkoLQ0FB89NFHOHDgANzc3CAIAiZPnmx3QBARUcuStCexePFinDlzBtOnT0dgYCAMBgPeeecdBAUF4dVXX5WjTsm4J+E47JF17I9t7JF1rXZPYu/evdi9ezfuv/9+AEBwcDD69u2LESNGtEyVRETklCT9MV2XLl1w48aNRr+7efMmtFqtQ4oiIiLnIGlPIiYmBtOmTcPvfvc7+Pn5oaKiApmZmYiJibFcThwAIiMjHVYoERHJT9I5Cb1eb3tGKhX27t3bIkU1B89JOA57ZB37Yxt7ZF2rPSfRcMlwIiJyLZLOSRARkWtiSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKNlCIj09HXq9HiEhISgtLb3nNOvWrUN0dDTGjRuH8ePHY//+/XKVR0RE9yDp2k0tISoqCgkJCYiPjxedpk+fPpgyZQo6dOiA4uJiTJ48Gfn5+bjvvvvkKpOIiO4g255EREQE/P39rU4zePBgdOjQAQAQEhICQRBQXV0tR3lEAIAyQw2yD55FmaHG7nHb95baPa45mlpra1tmc7S2ep2RbHsS9tqxYweCgoLQrVs3pUshF1FmqMGqrQW4ZTLDTaNG6qRwBAd6SR5nMpmhsWOcErW2tmU2R2ur11k5ZUgcPnwYa9aswbvvvmv3WGvXRbdFq/Vs8lhX0ZZ7lFdYDpPJDEEATCYzzlcZERmmkzzOLACwY5wStTrDMuXahpToUUtwtv9jThcSBQUFSE1NxVtvvYWHHnrI7vG86ZDjtPUe6Xw9oNGogZ/3CHS+HpLWt6njlKhV6WXKuQ0p0aPmcsabDkm6M11L0uv12LBhA3r16nXXc4WFhUhJScGaNWvQt2/fJs2fIeE4rtCjMkMNSs5dRUiQj12HJsoMNThfZYTO10O2QxpNrVXJZcq9DSnRo+Zw6ZBIS0tDbm4uKisr4ePjA29vb2RnZyMpKQkpKSkIDQ3FM888A4PBAD8/P8u4jIwMhISESF4OQ8Jx2CPr2B/b2CPrXDok5MKQcBz2yDr2xzb2yDpnDAn+xTUREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKIYEERGJYkgQEZEohgQREYliSBARkSiGBBERiWJIEBGRKFlCIj09HXq9HiEhISgtLb3nNPn5+Rg/fjx69+6N9PR0OcoiIiIbZAmJqKgoZGZmIjAwUHSa7t27Iy0tDVOnTpWjJCIikkCWkIiIiIC/v7/VaR544AE8+uijcHNzk6OkRsoMNdi+txRlhhrZl00tr8xQg+yDZ5v0ejZnLFFbJP87spMpM9Rg1dYCmExmaDRqpE4KR3Cgl9JlURM1vJ63TGa42fl6NmcsUVvV5kLC2g297yWvsBwmkxlmAYDJjPNVRkSG6RxTXBug1XoqXYJVDa+nIAAmO1/P5oxt4Oz9cQbskXXO1p82FxJVVbUwmwXJ0+t8PaDRqIGf9yR0vh64fPmaAytsvbRaT6fvTXNez+ZuC62hP0pjj6xToj9qtcrqh+s2FxL2Cg70QuqkcJyvMkLn68HDC61cw+tZcu4qQoJ87Ho9mzOWqK1SCYIg/WN3E6WlpSE3NxeVlZXw8fGBt7c3srOzkZSUhJSUFISGhuLo0aOYO3cuamtrIQgCPD098corr2Dw4MF2LcvePYkG/IRjG3tkHftjG3tknTPuScgSEnJiSDgOe2Qd+2Mbe2SdM4YE/+KaiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISBRDgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KIiEQxJIiISJQsIZGeng69Xo+QkBCUlpbecxqTyYTly5dj2LBhGD58OLZv3y5HaUREZIUs97iOiopCQkIC4uPjRafJysrCuXPnkJubi+rqasTGxiIyMhI6nU6OEluVvOMGHCu5hH4hXTE0LFC2ZRadvoLQhzrLtswyQ02rud90maEGeYXlTbpPemtaT3I9soRERESEzWlycnIQFxcHtVqNzp07Y9iwYdi5cyemTZsmQ4WtR95xA97bWQIAOHHmKgA4/E37zmUWlF6WZZllhhqs2lqAWyYz3DRqpE4Kd9o30IZaTSYzNHbW2prWk1yTLCEhRXl5OQICAiyP/f39UVFRYfd8rN2r1Rat1rPJY+VSdPrKXY/jhv+qzS0zr7AcJpMZggCYTGacrzIiMsw59yobajULAOystTWtZ0tpDf/PlORs/XGakGgpVVW1MJsFu8e1lhu0hz7U2fJpvuGxo+tWYpk6Xw9oNGrg50/nOl8Pp319mlNra1rPltBa/p8pRYn+qNUqqx+unSYk/P39ceHCBfTp0wfA3XsWdFvDYR45z0k0LEPOcxLBgV5InRTeKo7VN9R6vspo9zmJ1rSe5JqcJiRGjRqF7du3Y8SIEaiursaePXuQmZmpdFlOaWhYoGwnj+9cZtzwX8n6KSc40KvVvGkGB3ohMkzXpP60pvUk1yPLV2DT0tIwZMgQVFRUIDExEdHR0QCApKQkFBUVAQBiYmKg0+kwYsQIPPfcc3jxxRfRvXt3OcojIiIRKkEQ7D+A78Ta+jkJJbFH1rE/trFH1jnjOQn+xTUREYliSBARkSiGBBERiXKabze1FLVapchYV8EeWcf+2MYeWSd3f2wtr82duCYiopbDw01ERCSKIUFERKIYEkREJIohQUREohgSREQkiiFBRESiGBJERCSKIUFERKIYEkREJKrNXZbDmpkzZ+L8+fNQq9Xw8PDAkiVL8MgjjzSaxmQyIS0tDfv374dKpUJycjLi4uIUqlh+Unq0du1afPjhh+jatSsA4LHHHsPSpUuVKFcxb775JtauXYusrCz06tWr0XM3btzAH/7wB5w4cQIajQYLFy7EU089pVClyrHWo0WLFuHbb7+Fj48PgNs3HZsxY4YSZcpOr9fD3d0d7du3BwDMnz8fgwcPbjSNM21DLhUS6enp8PS8fZPxPXv2YPHixfj8888bTZOVlYVz584hNzcX1dXViI2NRWRkJHS6tn1z+gZSegQAsbGxWLhwodzlOYUTJ07g+PHjorfX3bhxIzp27Ijdu3fj7NmziI+PR25uLjp27Chzpcqx1SMASE5OxuTJk2Wsynm88cYbdwXnnZxpG3Kpw00Nb34AUFtbC5Xq7gtb5eTkIC4uDmq1Gp07d8awYcOwc+dOOctUlJQeubK6ujqsWLECS5cuFe3NV199hYkTJwIAHnzwQfTu3RvffPONnGUqSkqPyDpn2oZcak8CAF5++WUcOHAAgiDg73//+13Pl5eXN/r04+/vj4qKCjlLVJytHgFAdnY28vPzodVqMXv2bISHh8tcpTLWrFmDcePGWb217oULFxAY+Ms9yF1tG5LSIwDYtGkTPvroI3Tv3h3z5s1Dz549ZapQefPnz4cgCOjXrx/mzp2L+++/v9HzzrQNudSeBAC88soryMvLw0svvYSMjAyly3FKtno0ceJE7N27F1lZWZg6dSpmzpyJq1evKlCpvAoKClBUVITnn39e6VKcltQevfTSS9i9ezeysrIwYsQITJs2DSaTSaYqlZWZmYkvv/wSn376KQRBwIoVK5QuySqXC4kGsbGxOHTo0F1vbv7+/rhw4YLlcXl5Obp16yZ3eU5BrEdarRbt2rUDADzxxBPw9/fHjz/+qESJsjpy5AhOnz6NqKgo6PV6VFRUYOrUqcjPz280XUBAAAwGg+WxK21DUnvk5+cHtfr2209sbCyMRqPL7G35+/sDANzd3fH888/jn//8513TONM25DIhcf36dZSXl1se79u3D15eXvD29m403ahRo7B9+3aYzWZcuXIFe/bswciRI+UuVxFSe3Tx4kXLzydPnoTBYECPHj1kq1MpycnJyM/Px759+7Bv3z5069YNGzduxKBBgxpNN2rUKHz00UcAgLNnz6KoqOiub6+0VVJ7dOc2tH//fqjVavj5+cldruyMRiOuXbsGABAEATk5OXd9exBwrm3IZc5J3LhxA3PmzMGNGzegVqvh5eWFDRs2QKVSISkpCSkpKQgNDUVMTAy+//57jBgxAgDw4osv2jy22lZI7dHq1Ys6S+0AAAU1SURBVKtx4sQJqNVqtGvXDhkZGdBqtUqXr6iYmBi888478PPzw9SpU7Fo0SIMHz4carUaK1asQKdOnZQuUXF39mjhwoWoqqqCSqVCp06dsH79eri5tf23o6qqKsyePRsmkwlmsxk9e/a0fH3cWbch3pmOiIhEuczhJiIish9DgoiIRDEkiIhIFEOCiIhEMSSIiEgUQ4KoBSxatAivv/76PZ/77LPPMGnSJJkrus1aXURSMCSI7kGv1+Pbb79Vugy7KBlG1HYxJIiISBRDgtosvV6Pt99+G6NHj0b//v3xhz/8ATdv3rQ8//XXXyMmJgYRERGYOHEiiouLAQCpqam4cOECpk+fjvDwcPztb38DAKSkpOCJJ55Av379EB8f3+TrVZ06dQqJiYkYMGAARo4ciZycHMtzixYtwvLly5GcnIzw8HDExcXh3Llzlufz8/MxcuRI9OvXD8uWLcPkyZOxfft2nDp1CkuXLsXx48cRHh6OiIgIy5iffvpJdH5EtjAkqE3LysrCxo0bsXv3bpw5cwZvvfUWgNs3xVm8eDFWrFiBQ4cOYcKECZg5cybq6uqwatUqBAQEYMOGDSgoKEBSUhIAYMiQIdi1axcOHjyIRx99FPPnz7e7HqPRiClTpmDMmDH49ttvsXr1aixfvrxR4GRnZ2PWrFk4cuQIgoKCLOcUrly5gpSUFMybNw+HDh1Cjx49UFBQAADo2bMnli9fjrCwMBQUFODo0aM250ckBUOC2rT4+Hj4+/vD29sbM2bMQHZ2NgDg448/xoQJE9C3b19oNBo8/fTTaNeuHY4fPy46r2effRadOnWCu7s7Zs+ejeLiYsvF2qTKy8tDYGAgnnnmGbi5ueHXv/41Ro4ciV27dlmmGT58OPr06QM3NzeMGzcOJ0+eBAB88803ePjhhzFixAi4ubkhISEBXbp0sblMsfkRSdH2r6hFLq3hsszA7csvX7p0CcDtm7rs2LEDH3zwgeX5+vp6y/P/y2Qy4fXXX8fOnTtx5coVy2Wur1692uhufrYYDAYUFhY2OhxkMpkwbtw4y+M73/jvu+8+GI1GAMClS5caXS5apVJJuny02PyIpGBIUJt256XPL1y4gK5duwK4HR7Tp0/HjBkzJM0nKysLe/fuxaZNm6DT6XDt2jX0798f9l4f09/fH/3798emTZvsGgfcvo/HnZfYFgSh0T0YeKtQcgQebqI27cMPP0RFRQWqq6stJ7EBIC4uDtu2bcP3338PQRBgNBqRl5eH2tpaALc/ff/nP/+xzOf69etwd3eHj48Pbty4gdWrVzepnqFDh+Ls2bPYsWMH6uvrUV9fj8LCQpw6dcrm2CeffBIlJSXYs2cPbt26hczMTFRWVlqe9/X1xcWLF1FXV9ek2ojuhSFBbdqYMWMwZcoUDBs2DN27d7fsOYSGhmLlypVYsWIF+vfvjxEjRuCzzz6zjEtOTsb69esRERGBjRs3IjY2FgEBARg8eDCio6MRFhbWpHo6deqEjRs3IicnB4MHD8agQYPw2muvSXpj79y5M9asWYNVq1bh8ccfR1lZGXr37m25S+DAgQMRHByMQYMG4fHHH29SfUT/i/eToDZLr9cjLS0Nv/nNb5QuxSHMZjOGDBmC1157DQMHDlS6HGqjuCdB1Irs378fP/30E+rq6rBhwwYAaPJeDZEUPHFN1IocP34c8+fPR11dHYKDg7Fu3Trcd999SpdFbRgPNxERkSgebiIiIlEMCSIiEsWQICIiUQwJIiISxZAgIiJRDAkiIhL1/2Qbv3QWHoR0AAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Make a scatter plot\n", "_ = plt.plot(versicolor_petal_length, versicolor_petal_width, marker='.', linestyle='none')\n", "\n", "# Label the axes\n", "_ = plt.xlabel('petal length')\n", "_ = plt.ylabel('petal width')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the covariance\n", "The covariance may be computed using the Numpy function ```np.cov()```. For example, we have two sets of data ```x``` and ```y```, ```np.cov(x, y)``` returns a 2D array where entries ```[0,1]``` and ```[1,0]``` are the covariances. Entry ```[0,0]``` is the variance of the data in x, and entry ```[1,1]``` is the variance of the data in y. This 2D output array is called the covariance matrix, since it organizes the self- and covariance." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.22081633 0.07310204]\n", " [0.07310204 0.03910612]]\n", "0.07310204081632653\n" ] } ], "source": [ "# Compute the covariance matrix: covariance_matrix\n", "covariance_matrix = np.cov(versicolor_petal_length, versicolor_petal_width)\n", "\n", "# Print covariance matrix\n", "print(covariance_matrix)\n", "\n", "# Extract covariance of length and width of petals: petal_cov\n", "petal_cov = covariance_matrix[0, 1]\n", "\n", "# Print the length/width covariance\n", "print(petal_cov)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Computing the Pearson correlation coefficient\n", "In this exercise, you will write a function, ```pearson_r(x, y)``` that takes in two arrays and returns the Pearson correlation coefficient. You will then use this function to compute it for the petal lengths and widths of I. versicolor." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.7866680885228169\n" ] } ], "source": [ "def pearson_r(x, y):\n", " \"\"\"Compute Pearson correlation coefficient between two arrays\n", " \n", " Args:\n", " x: arrays\n", " y: arrays\n", " \n", " returns:\n", " r: int\n", " \"\"\"\n", " # Compute correlation matrix: corr_mat\n", " corr_mat = np.corrcoef(x, y)\n", " \n", " # Return entry[0, 1]\n", " return corr_mat[0, 1]\n", "\n", "# Compute Pearson correlation coefficient for I. versicolor: r\n", "r = pearson_r(versicolor_petal_length, versicolor_petal_width)\n", "\n", "# Print the result\n", "print(r)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }