{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Correlationâ€¨" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A correlation is the degree of association between two variables. One of my favorite books on this topic is: \n", "\n", "\n", "\n", "and they illustrate it by looking at\n", "## Ladies expenditures on clothes and makeup\n", "\n", "\n", "\n", "\n", "So let's go ahead and create that data in Pandas and show the table:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
clothesmakeup
Ms A70003000
Ms B80005000
Ms C2500012000
Ms D50002000
Ms E120007000
Ms F3000015000
Ms G100005000
Ms H150006000
Ms I200008000
Ms J1800010000
\n", "

\n", "\n" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "# TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When the data points are close to a straight line going up, we say that there is a positive correlation between the two variables. It's sort of like saying:\n", "\n", " Ann: Is there any relationship between how much a person spends on makeup and how much they spend on clothes?\n", " Ms. Graph: Yes. The more a person spends on makeup the more she spends on clothes.\n", "\n", "Let's look at a few more examples:\n", "\n", "## Weight and calories in 1-3 yr/old children\n", "This is real data on the weight and calories consumed for children between 1 and 3 years of age." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
caloriesweight
03607.7
14007.8
25008.6
33708.5
45258.6
58009.0
690010.1
7120011.5
8100011.0
9140010.2
10160011.9
1185010.4
125759.3
134259.1
149508.5
1580011.0
\n", "
" ], "text/plain": [ " calories weight\n", "0 360 7.7\n", "1 400 7.8\n", "2 500 8.6\n", "3 370 8.5\n", "4 525 8.6\n", "5 800 9.0\n", "6 900 10.1\n", "7 1200 11.5\n", "8 1000 11.0\n", "9 1400 10.2\n", "10 1600 11.9\n", "11 850 10.4\n", "12 575 9.3\n", "13 425 9.1\n", "14 950 8.5\n", "15 800 11.0" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "weight = [7.7, 7.8, 8.6, 8.5, 8.6, 9, 10.1, 11.5, 11, 10.2, 11.9, 10.4, 9.3, 9.1, 8.5, 11]\n", "calories = [360, 400, 500, 370, 525, 800, 900, 1200, 1000, 1400, 1600, 850, 575, 425, 950, 800]\n", "kids = DataFrame({'weight': weight, 'calories': calories})\n", "kids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "

### Task 2: draw the scatter plot with appropriate labels

\n", "We are interested in seeing the correlation between weight and calories consumed.\n", "\n" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

#### What do you think? Is there a correlation?

\n", "\n", "\n", "\n", "\n", "\n", "\n", "## The stronger the correlation the closer to a straight line:\n", "The more the datapoints are in a straight line, the sronger the correlation. Let's look at students' midtern and final grades.\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
finalmidterm
07265
16870
27572
37675
47978
57378
67980
78583
88987
98787
109290
119390
129492
139193
149795
1598100
\n", "
" ], "text/plain": [ " final midterm\n", "0 72 65\n", "1 68 70\n", "2 75 72\n", "3 76 75\n", "4 79 78\n", "5 73 78\n", "6 79 80\n", "7 85 83\n", "8 89 87\n", "9 87 87\n", "10 92 90\n", "11 93 90\n", "12 94 92\n", "13 91 93\n", "14 97 95\n", "15 98 100" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midterm = [65, 70, 72, 75, 78, 78, 80, 83, 87, 87, 90, 90, 92, 93, 95, 100]\n", "final = [72, 68, 75, 76, 79, 73, 79, 85, 89, 87, 92, 93, 94, 91, 97, 98]\n", "grades = DataFrame({'midterm': midterm, 'final': final})\n", "grades" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

### Task 3: Can you create that scatter plot?

\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Your code here" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Medicine Hat Tigers\n", "\n", "\n", "\n", "Let's load in some data about players from the Medicine Hat Tigers.\n", "After that we'll create a few new columns:\n", "\n", "* heightcm is the height of the player in centimeters (note that the Height column is the player's height in inches)\n", "* hometownLength is the number of characters in the string representing the hometown. for example, number 11, Jordon Hickmott's hometown is Mission BC which has 10 letters as indicated in hometownLength." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePos.L/RHeightWeightBirthdayHometownheightcmhometownLength
0Brennan BoschCR6817302/14/88Martensville, SK172.7216
1Scott WasdenCR7318801/04/88Westbank, BC185.4212
2Colton GrantLWL6917703/20/89Standard, AB175.2612
3Darren HelmLWL7218201/21/87St. Andrews, MB182.8815
4Derek DorsettRWL7117812/20/86Kindersley, SK180.3414
5Daine ToddCR7017301/10/87Red Deer, AB177.8012
6Tyler SwystunRWR7118501/15/88Cochrane, AB180.3412
7Matt LowryCR7218603/02/88Neepawa, MB182.8811
8Kevin UndershuteLWL7217804/12/87Medicine Hat, AB182.8816
9Jerrid SauerRWR7119609/12/87Medicine Hat, AB180.3416
10Tyler EnnisCL6916010/06/89Edmonton, AB175.2612
11Jordan HickmottCR7218304/11/90Mission BC182.8810
12Jakub RumpelRWR6816601/27/87Hrnciarovce, SLO172.7216
13Bretton CameronCR7116801/26/89Didsbury, AB180.3412
14Chris StevensLWL7019708/20/86Dawson Creek, BC177.8016
15Gord BaldwinDL7720503/01/87Winnipeg, MB195.5812
16David SchlemkoDL7319505/07/87Edmonton, AB185.4212
17Trever GlassDL7219001/22/88Cochrane, AB182.8812
18Kris RussellDL7017705/02/87Caroline, AB177.8012
19Michael SauerDR7320508/07/87Sartell, MN185.4211
20Mark IsherwoodDR7218301/31/89Abbotsford, BC182.8814
21Shayne BrownDL7319802/20/89Stony Plain, AB185.4215
22Jordan BendfeldDR7523002/09/88Ledue, AB190.509
23Ryan HolfeldGL7116606/29/89LeRoy, SK180.349
24Matt KeetleyGR7418904/27/86Medicine Hat, AB187.9616
\n", "
" ], "text/plain": [ " Name Pos. L/R Height Weight Birthday Hometown \\\n", "0 Brennan Bosch C R 68 173 02/14/88 Martensville, SK \n", "1 Scott Wasden C R 73 188 01/04/88 Westbank, BC \n", "2 Colton Grant LW L 69 177 03/20/89 Standard, AB \n", "3 Darren Helm LW L 72 182 01/21/87 St. Andrews, MB \n", "4 Derek Dorsett RW L 71 178 12/20/86 Kindersley, SK \n", "5 Daine Todd C R 70 173 01/10/87 Red Deer, AB \n", "6 Tyler Swystun RW R 71 185 01/15/88 Cochrane, AB \n", "7 Matt Lowry C R 72 186 03/02/88 Neepawa, MB \n", "8 Kevin Undershute LW L 72 178 04/12/87 Medicine Hat, AB \n", "9 Jerrid Sauer RW R 71 196 09/12/87 Medicine Hat, AB \n", "10 Tyler Ennis C L 69 160 10/06/89 Edmonton, AB \n", "11 Jordan Hickmott C R 72 183 04/11/90 Mission BC \n", "12 Jakub Rumpel RW R 68 166 01/27/87 Hrnciarovce, SLO \n", "13 Bretton Cameron C R 71 168 01/26/89 Didsbury, AB \n", "14 Chris Stevens LW L 70 197 08/20/86 Dawson Creek, BC \n", "15 Gord Baldwin D L 77 205 03/01/87 Winnipeg, MB \n", "16 David Schlemko D L 73 195 05/07/87 Edmonton, AB \n", "17 Trever Glass D L 72 190 01/22/88 Cochrane, AB \n", "18 Kris Russell D L 70 177 05/02/87 Caroline, AB \n", "19 Michael Sauer D R 73 205 08/07/87 Sartell, MN \n", "20 Mark Isherwood D R 72 183 01/31/89 Abbotsford, BC \n", "21 Shayne Brown D L 73 198 02/20/89 Stony Plain, AB \n", "22 Jordan Bendfeld D R 75 230 02/09/88 Ledue, AB \n", "23 Ryan Holfeld G L 71 166 06/29/89 LeRoy, SK \n", "24 Matt Keetley G R 74 189 04/27/86 Medicine Hat, AB \n", "\n", " heightcm hometownLength \n", "0 172.72 16 \n", "1 185.42 12 \n", "2 175.26 12 \n", "3 182.88 15 \n", "4 180.34 14 \n", "5 177.80 12 \n", "6 180.34 12 \n", "7 182.88 11 \n", "8 182.88 16 \n", "9 180.34 16 \n", "10 175.26 12 \n", "11 182.88 10 \n", "12 172.72 16 \n", "13 180.34 12 \n", "14 177.80 16 \n", "15 195.58 12 \n", "16 185.42 12 \n", "17 182.88 12 \n", "18 177.80 12 \n", "19 185.42 11 \n", "20 182.88 14 \n", "21 185.42 15 \n", "22 190.50 9 \n", "23 180.34 9 \n", "24 187.96 16 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "medicineHat = pd.read_csv('https://raw.githubusercontent.com/zacharski/machine-learning/master/data/medicineHatTigers.csv')\n", "medicineHat['heightcm'] = medicineHat['Height'] * 2.54\n", "medicineHat['hometownLength'] = medicineHat['Hometown'].str.len()\n", "medicineHat\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's explore this data a bit. (Some of these are screwy examples but they illustrate a point)\n", "\n", "

### Task 4: Can you create a scatter plot showing the correlation between height in inches and height in cm?

\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

### Task 5: A scatter plot showing the correlation between a person's height in inches and the number of characters in the hometown

\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

### Task 6: A scatter plot showing the correlation between a person's height in cm and their weight.

\n", "\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

### Task 7: Which of these has the best correlation? Which shows no correlation?

\n", "Do you think correlation means causation? This means that Do you think as a person gets taller somehow that causes a perons's hair to be shorter?\n", "\n", "\n", "\n", "\n", "\n", "\n", "## The Effects of Alcohol on the Dexterity Score\n", "#### a strong negative correlation" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEWCAYAAAB8LwAVAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAIABJREFUeJzt3XmYXFW57/HvLwPQiBACMZeEIYAYZDgQDIiAioJGUSHX\nKxxRICoH1AeZrgQT9SqOoBFFBQdkioBoxBjicIkYBmcwEDBAiGgYOyEJQ5hOCxne88daRSrl7u7q\n7lTtdNfv8zz9dO1xvbt27f3uvdYeFBGYmZnVGlR2AGZmtnFygjAzs0JOEGZmVsgJwszMCjlBmJlZ\nIScIMzMr5ATRB5LOkXRVk8r6qKRlkp6TtI2kgyXdn7snNiOGnpDUJukXkp6W9NNeTB+SXtnHGB6U\ndHgvp+1z+QOBpNdLWlR2HD0haUxef0M2glh2zNvo4LJj6Q0niC7kFVv5Wyupo6r7/U2MYyjwdeCt\nEbFFRDwBfB64MHfP6sO8e70T7cZ7gJHANhFxdAPmbw1Qmxgj4vcRMbbMmMoi6VBJj/ZlHhHxcN5G\n1+R53izpvzZMhI3nBNGFvGK3iIgtgIeBd1X1u7qJoYwENgPuqeq3U033xmYn4O8RsbrsQMzKsDGc\nwfRZRPivjj/gQeDwmn7nADOAHwLPknbY46uGjwJ+BqwAHgBO62L+mwJfIyWiZcD3gDbgVcDzQADP\nATcC/wTWAh2536bAVsClwFKgHfgiMLhq/icBC3Oc9wL7AVfWzOdsUiK6CngCWAn8FRjZScyvBm7O\n490DHJn7fw54EViV53tiwbQHAH/O0y4FLgQ2qRoewCvz5zbgfOAh4GngD0BbHnZkLntljuXVNevs\nLOBvebqfAJvVfCf/AJ4EZgOjisoviH04cDmwBHgKmNWDeX4EuD/HexGgPOyVwC05zseBn+T+Y/J0\nQ6rmczPwX/nzB4A/At/I81wMHJT7PwIsByZVTXsF6bd1Q/4t3ALslIf9Lpf1fF5v/wkcCjza3Tqv\nmvdFwK/yvG8Fdu3iN9/rdVczn8GkbefxvPynVH9ndLFtAN8FflY1r68Ac4GXkbaLtfm7eI60PQ8C\nppC2wSdI2//wmnV1Imk7/l31+gO+BKwB/pXnd2H+vs6vWZ7ZwJll7/Miwgmi7i+q8wTxL+CI/CM9\nF/hLHjYIuB34DLAJsEv+8U7oZP7fyD+M4cDLgV8A59b88IZ0Fg/wc+D7+Yf9CuA24MN52NF5w9gf\nEGlntFMn8/lwLnvzvEyvAbYsiHcoaUf4ybx8bybtFMZWfTdXdfF9vgY4MG84Y0jJ64yq4dUJ4iLS\nDmR0jukgUlKsJM+35HjOzjFtUrVst5E27OG5jI/kYW8m7VD2y/P6NvC7ovILYv8VaYe1dS73jT2Y\n5y+BYcCOpAOHt+Vh1wCfyr+bzYBDulj3N7N+glgNfDB/N18k7ZwuyjG8Na+XLfL4V+TuN+Th3wT+\n0NlyU5Ug6ljnV5B2mgfk9Xo18ONOvsNer7uCeX0EuA/YIY97E+sniK62jc2Bv+fv8fV5/W1fu+xV\nZZ0O/AXYPn9/3weuqVlXP8xltdWuv+p1l7sPIB1oDMrd2wL/TScHZU3f75UdQH/5o/ME8duq7j2A\njvz5tcDDNeNPBS4vmLfyxrJrVb/XAQ/E+j+8wgRBqoJ6gXxUnfsdC9yUP88BTq9nuYAPAX8C/qOb\n7+P1wGOVH3budw1wTtV302mCKJjfGcDPq7qDlMgGkY7k9imY5v8BM6q6B5ES4aFVy3Zc1fCvAt/L\nny8Fvlo1bAvSGc+Y6vILytyOdFS5dcGweuZ5SNXwGcCU/PmHwMXknVPVOEXr/mbWTxD3Vw3bO48/\nsqrfE8C++fMVVO20c4xrgB2Klpv1E0R36/wK4JKqYUcA93Wyvnu97grmdSNVyYOUFCtH7V1uG1Xb\n6pOkM9Rji5a9qt9C4LCa38Mq1h3oBLBLZ+uPmgRRNc+35M8fA35d73bT6L/+X0dWvseqPv83sFmu\ne9wJGCVpZdXwwcDvC+YxgnQkc7ukSj/l8euxE+kobGnV9INIVQyQjqz+Wee8rszj/1jSMFJ106ci\nYlXNeKOARyJibVW/h0hH+d2S9CpSw/t40rIPIZ1x1dqWdERdFP+oXCYAEbFW0iM1MdSun1FV095R\nNe1zkp7I0z7YReg7AE9GxFOdxNPdPGvj2SJ/Phv4AnCbpKdI1Q6XdRFHtWVVnzty2bX9tqjqrvwu\nKjE+mWN/hK7Vs847W76iefV23RXGVRNTRXfbBhFxq6TFpLOLGZ2UUT2/n0uq/g7WkBJRRXffY63p\nwHGkar/jSGd1GwU3UjfOI6QzgGFVfy+PiCMKxn2ctBHvWTXuVpEax+st6wVg26rpt4yIPauG79rJ\ntLFeR8SqiPhcROxBqsp5J3BCwXRLgB0kVf+GdiQdBdbju6Rqgd0iYktStYUKxnucVI1XFP8S0gYL\ngNIeYIc6Y6id9mXANnVM+wgwPCfPDTVPIuKxiDgpIkaRqvm+k68mej6PsnnV6P+ru/l1Y4eqGLcg\nVcssqWO6vq7z2nn1dt3VWkrVMuWYKrrbNpB0Cqm6aAkpUVest21Uze/tNdv1ZhHR3s10XQ27CjhK\n0j6kNp5eX5W4oTlBNM5twLOSPpHvCRgsaS9J+9eOmI/IfgB8Q9IrACSNljShnoIiYinwG+B8SVtK\nGiRpV0lvzKNcApwl6TVKXimpsnEuI7WPkMt9k6S983Xbz5BOn6uPlipuJR3VnS1pqKRDgXcBP64n\nZlI7yzPAc5J2Bz7aybKtBS4Dvi5pVP4eXydpU9LR3jskHZYvBf44aWfwpzrKvwb4oKR987y+DNwa\nEQ92NVH+rv8/aQe+dV72N/RlngCSjpa0fe58irQjWRsRK0g7zePysn+IzpN9vY6QdIikTUhnLX+J\niMpR73q/hxp9XefV+rLuiuZ1mqTtJW1NakQGut828pnsF0lH7sfnZds3T74M2EbSVlVlfQ/4UmX7\nkTRC0lE9iPXfvt+IeJR0MciVpAbzjh7Mr6GcIBok0nXP7wT2JV3B9DhpR71VJ5N8gtRI9xdJzwC/\nBXpy/fkJpIbDe0k7mGtJ9aNExE9JV1D8iNSoOIt01AipYf3TklZKOot0dHotaee9kHSVy5UFy/ci\naefw9rxs3wFOiIj76oz3LOB9OZ4fkBp9uxp3AWkjepJ0pcmgiFhE2rC/nWN4F+lS5Be7Kzwifkuq\nB/8Z6Qh0V+C9dcZ+PClx3ke6SuiMDTDP/YFbJT1Huljh9IhYnIedBEwmtSXsSe92otV+BHyW9F2+\nhvQdVpwDTM+/h2OqJ9oA67x6Xr1edwV+QGpnu4tUxTezZnjhtpGrgq8CvhIRd0XE/aQz2SslbZqX\n6xpgcf4+RpGqf2YDv5H0LKnB+rU9iPWbwHskPSXpW1X9p5Paj/5tWytT5RI7M2sBkq4gNbx+uuxY\nbJ18FnoV6erCjWan7DMIM7MS5Sq200lXgG00yQGcIMzMSiPp1aQbBbcDLig5nH/TsComSZeR6uCX\nR8Reud9wUl3zGNJlf8dULheUNJV0B+Ia0h3HcxoSmJmZ1aWRZxBXAG+r6TcFmBsRu5FuZ58CIGkP\nUmPennma76ifPv3QzGygaNiNchHxO0ljanofRbo7EVKr/c2kq3eOIt3d+QLwgKR/sO5ZPZ3adttt\nY8yY2iLMzKwrt99+++MRMaK78Zp9J/XIfF0ypLskK3cfjiZdLlbxKJ3ckSvpZOBkgB133JF58+Y1\nKFQzs4FJ0kPdj1ViI3Vure9xA0hEXBwR4yNi/IgR3SZAMzPrpWYniGWStgPI/5fn/u2sf6v89vTu\nlnszM9tAmp0gZgOT8udJwHVV/d8raVNJOwO7kR5VYWZmJWlYG4Ska0gN0tvm1/Z9FjgPmCHpRNIT\nF48BiIh7JM0g3Qq/GjglP6rCzMxK0sirmI7tZNBhnYz/JdLzgszMbCPQku+DmDW/nWlzFrFkZQej\nhrUxecJYJo6r6zUG/bJcM7PeaLkEMWt+O1NnLqBjVarBal/ZwdSZCwAaurMuq1wzs95quWcxTZuz\n6KWddEXHqjVMm7NoQJZrZtZbLZcglqwsfhdHZ/37e7lmZr3Vcgli1LC2HvXv7+WamfVWyyWIyRPG\n0jZ0/ecAtg0dzOQJPXl5W/8p18yst1qukbrSINzsq4nKKtfMrLf69StHx48fH35Yn5lZz0i6PSLG\ndzdey1UxmZlZfZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAq13H0QrcpPkjWznnKCaAF+kqyZ\n9YarmFqAnyRrZr3hBNEC/CRZM+sNJ4gW4CfJmllvOEG0AD9J1sx6w43ULcBPkjWz3nCCaBETx412\nQjCzHnEVk5mZFXKCMDOzQk4QZmZWyAnCzMwKOUGYmVkhJwgzMyvkBGFmZoWcIMzMrJAThJmZFXKC\nMDOzQk4QZmZWyAnCzMwKOUGYmVkhJwgzMytUSoKQdKakeyTdLekaSZtJGi7pBkn35/9blxGbmZkl\nTU8QkkYDpwHjI2IvYDDwXmAKMDcidgPm5m4zMytJWVVMQ4A2SUOAzYElwFHA9Dx8OjCxpNjMzIwS\nEkREtANfAx4GlgJPR8RvgJERsTSP9hgwstmxmZnZOmVUMW1NOlvYGRgFvEzScdXjREQA0cn0J0ua\nJ2neihUrGh6vmVmrKqOK6XDggYhYERGrgJnAQcAySdsB5P/LiyaOiIsjYnxEjB8xYkTTgjYzazVl\nJIiHgQMlbS5JwGHAQmA2MCmPMwm4roTYzMwsG9LsAiPiVknXAncAq4H5wMXAFsAMSScCDwHHNDs2\nMzNbp+kJAiAiPgt8tqb3C6SzCTMz2wj4TmozMyvkBGFmZoWcIMzMrJAThJmZFXKCMDOzQk4QZmZW\nyAnCzMwKOUGYmVkhJwgzMyvkBGFmZoWcIMzMrJAThJmZFXKCMDOzQk4QZmZWyAnCzMwKOUGYmVkh\nJwgzMyvkBGFmZoWcIMzMrJAThJmZFao7QUjavJGBmJnZxqXbBCHpIEn3Avfl7n0kfafhkZmZWanq\nOYP4BjABeAIgIu4C3tDIoMzMrHx1VTFFxCM1vdY0IBYzM9uIDKljnEckHQSEpKHA6cDCxoZlZmZl\nq+cM4iPAKcBooB3YN3ebmdkA1uUZhKTBwPER8f4mxWO2Qcya3860OYtYsrKDUcPamDxhLBPHjS47\nLLN+pcsziIhYA7yvSbGYbRCz5rczdeYC2ld2EED7yg6mzlzArPntZYdm1q/UU8X0B0kXSnq9pP0q\nfw2PzKyXps1ZRMeq9a+j6Fi1hmlzFpUUkVn/VE8j9b75/+er+gXw5g0fjlnfLVnZ0aP+Zlas2wQR\nEW9qRiA2MJXRFjBqWBvtBclg1LC2hpZrNtDUcyf1VpK+Lmle/jtf0lbNCM76t7LaAiZPGEvb0MHr\n9WsbOpjJE8Y2tFyzgaaeNojLgGeBY/LfM8DljQzKBoay2gImjhvNue/em9HD2hAwelgb5757b1/F\nZNZD9bRB7BoR/6eq+3OS7mxUQDZwlNkWMHHcaCcEsz6q5wyiQ9IhlQ5JBwNu7bNudVbn77YAs/6h\nngTxUeAiSQ9KehC4kHR3tVmX3BZg1r/VcxXTncA+krbM3c80PCobECpVPL6j2ax/6jZBSPoy8NWI\nWJm7twY+HhGf7m2hkoYBlwB7ke6p+BCwCPgJMAZ4EDgmIp7qbRm2cXBbgFn/VU8V09sryQEg77SP\n6GO53wSuj4jdgX1IT4edAsyNiN2AubnbzMxKUk+CGCxp00qHpDZg0y7G71K+h+INwKUAEfFiTkBH\nAdPzaNOBib0tw8zM+q6ey1yvBuZKqtz78EHW7ch7Y2dgBXC5pH2A20nvmBgZEUvzOI8BI4smlnQy\ncDLAjjvu2IcwzMysK92eQUTEV4AvAq8Gdge+EBFf7UOZQ4D9gO9GxDjgeWqqkyIiSG0TRfFcHBHj\nI2L8iBEj+hCGmZl1pd5Xjl4PnAv8CXi8j2U+CjwaEbfm7mtJCWOZpO0A8v/lfSzHzMz6oNMEIemX\nkvbKn7cD7iZdbXSlpDN6W2BEPEZ6jWnlYvjDgHuB2cCk3G8ScF1vyzAzs77rqg1i54i4O3/+IHBD\nRJwg6eXAH4EL+lDuqcDVkjYBFuf5DwJmSDoReIj03CczMytJVwliVdXnw4AfAETEs5LW9qXQfPPd\n+IJBh/VlvmZmtuF0lSAekXQqqc1gP+B6eOky16FNiM3MzErUVSP1icCewAeA/6y6We5A/LhvM7MB\nr9MziIhYTsFD+SLiJuCmRgZlZmblq+syVzMzaz1OEGZmVqied1Jv04xAzMxs41LPGcRfJP1U0hGS\n1PCIzMxso1BPgngVcDFwPHC/pC9LelVjwzIzs7LV87C+iIgbIuJY4CTSYzBuk3SLpNc1PEIzMytF\nPW+U2wY4jnQGsYz0mIzZwL7AT0mP7zYzswGmnvdB/Bm4EpgYEY9W9Z8n6XuNCcvMzMpWTxvEpyPi\nC9XJQdLR8NK7IszMbACqJ0EUvRt66oYOxMzMNi6dVjFJejtwBDBa0reqBm0JrG50YGZmVq6u2iCW\nAPOAI0nvja54FjizkUGZ9Wez5rczbc4ilqzsYNSwNiZPGMvEcaPLDsusx7p6WN9dwF2Sro4InzGY\n1WHW/HamzlxAx6o1ALSv7GDqzAUAThLW73T1ytEZ+eN8SX+r/WtSfGb9yrQ5i15KDhUdq9Ywbc6i\nkiIy672uqphOz//f2YxAzAaCJSs7etTfbGPWVRXTUkmDgSsi4k1NjMms3xo1rI32gmQwalhbCdGY\n9U2Xl7lGxBpgraStmhSPWb82ecJY2oYOXq9f29DBTJ4wtqSIzHqvnjupnwMWSLoBeL7SMyJOa1hU\nZv1UpSHaVzHZQFBPgpiZ/8ysDhPHjS4lIfjyWtvQuk0QETFdUhuwY0T4UgyzjZAvr7VGqOeNcu8C\n7gSuz937Sprd6MDMrH5lXl47a347B593IztP+RUHn3cjs+a3N7xMa456nsV0DnAAsBIgIu4Edmlg\nTGbWQ2VdXls5c2lf2UGw7szFSWJgqCdBrIqIp2v6rW1EMGbWO51dRtvoy2t9Y+DAVk+CuEfS+4DB\nknaT9G3gTw2Oy8x6oKzLa31j4MBWT4I4FdgTeAH4EfA06+6yNrONwMRxozn33XszelgbAkYPa+Pc\nd+/d8Abqss5crDnqucz1HRHxKeBTlR75hUE/bVhUZtZjZVxeO3nC2PWungLfGDiQ1HMGUfRyIL8w\nyMxKO3Ox5vALg8ysT8q6MdAazy8MMjOzQvW8MGhuRDxaPUzSWOCpRgdnZmblqacNYq6kYyodkj4O\n/LxxIZmZ2cagnquYDgUuzlcujQQWku6sNjOzAazbM4iIWEp6DtPrgDHA9Ih4rsFxmZlZyep5WN9v\ngdcCewHvAC6Q9LW+FixpsKT5kn6Zu4dLukHS/fn/1n0tw8zMeq+eNogLI+KEiFgZEQuAg0h3U/fV\n6aTqqoopwNyI2A2Ym7vNzKwk9VQxzZK0k6TDc6+hwAV9KVTS9qSzkUuqeh8FTM+fpwMT+1KGmZn1\nTT1VTCcB1wLfz722B2b1sdwLgLNZ/6mwI3N7B8BjpAbxonhOljRP0rwVK1b0MQwzM+tMPVVMpwAH\nA88ARMT9wCt6W6CkdwLLI+L2zsaJiACik2EXR8T4iBg/YsSI3oZhZmbdqOcy1xci4kVJAEgaQic7\n7zodDBwp6QhgM2BLSVcByyRtFxFLJW0HLO9DGWZm1kf1nEHcIumTQJukt5Ce4vqL3hYYEVMjYvuI\nGAO8F7gxIo4DZgOT8miTgOt6W4aZmfVdPQliCrACWAB8GPg18OkGxHIe8BZJ9wOH524zMytJt1VM\nEbFW0ixgVkRs0FbhiLgZuDl/fgI4bEPO38zMeq/TMwgl50h6HFgELJK0QtJnmheemZmVpasqpjNJ\nDcr7R8TwiBhOuqP6YEl+3LeZ2QDXVYI4Hjg2Ih6o9IiIxcBxwAmNDszMzMrVVYIYGhGP1/bM7RBD\nGxeSmZltDLpKEC/2cpiZmQ0AXV3FtI+kZwr6i3SDm5mZDWBdvXJ0cDMDMTOzjUs9N8qZmVkLcoIw\nM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLM\nzAo5QZiZWSEnCDMzK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMz\nK+QEYWZmhZwgzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAo5QZiZWSEnCDMzK9T0BCFpB0k3SbpX\n0j2STs/9h0u6QdL9+f/WzY7NzMzWKeMMYjXw8YjYAzgQOEXSHsAUYG5E7AbMzd1mZlaSpieIiFga\nEXfkz88CC4HRwFHA9DzadGBis2MzM7N1Sm2DkDQGGAfcCoyMiKV50GPAyE6mOVnSPEnzVqxY0ZQ4\nzcxa0ZCyCpa0BfAz4IyIeEbSS8MiIiRF0XQRcTFwMcD48eMLxzGz1jBrfjvT5ixiycoORg1rY/KE\nsUwcN7rssAaMUhKEpKGk5HB1RMzMvZdJ2i4ilkraDlheRmxm1j/Mmt/O1JkL6Fi1BoD2lR1MnbkA\nwEliAynjKiYBlwILI+LrVYNmA5Py50nAdc2Ozcz6j2lzFr2UHCo6Vq1h2pxFJUU08JRxBnEwcDyw\nQNKdud8ngfOAGZJOBB4CjikhNjPrJ5as7OhR/w2pVaq2mp4gIuIPgDoZfFgzYzGz/mvUsDbaC5LB\nqGFtDS23laq2fCe1mfVLkyeMpW3o4PX6tQ0dzOQJYxtabitVbZV2FZOZWV9UjtabXdVTZtVWszlB\nmFm/NXHc6KZX65RVtVUGVzGZmfVAWVVbZfAZhJlZD5RVtVUGJwgzsx4qo2qrDK5iMjOzQk4QZmZW\nyAnCzMwKOUGYmVkhJwgzMyvkBGFmZoWcIMzMrJAThJmZFXKCMDOzQk4QZmZWyI/aMDPrR5r5Njsn\nCDOzfqLZb7NzFZOZWT/R7LfZOUGYmfUTzX6bnROEmVk/0dlb6xr1NjsnCDOzfqLZb7NzI7WZWT/R\n7LfZOUGYmfUjzXybnauYzMyskBOEmZkVcoIwM7NCThBmZlbICcLMzAopIsqOodckrQAe6sMstgUe\n30Dh9BettsyttrzgZW4VfVnmnSJiRHcj9esE0VeS5kXE+LLjaKZWW+ZWW17wMreKZiyzq5jMzKyQ\nE4SZmRVq9QRxcdkBlKDVlrnVlhe8zK2i4cvc0m0QZmbWuVY/gzAzs044QZiZWaGWTBCS3iZpkaR/\nSJpSdjyNJmkHSTdJulfSPZJOLzumZpE0WNJ8Sb8sO5ZmkDRM0rWS7pO0UNLryo6pkSSdmX/Td0u6\nRtJmZce0oUm6TNJySXdX9Rsu6QZJ9+f/Wzei7JZLEJIGAxcBbwf2AI6VtEe5UTXcauDjEbEHcCBw\nSgssc8XpwMKyg2iibwLXR8TuwD4M4GWXNBo4DRgfEXsBg4H3lhtVQ1wBvK2m3xRgbkTsBszN3Rtc\nyyUI4ADgHxGxOCJeBH4MHFVyTA0VEUsj4o78+VnSTqM5D5QvkaTtgXcAl5QdSzNI2gp4A3ApQES8\nGBEry42q4YYAbZKGAJsDS0qOZ4OLiN8BT9b0PgqYnj9PByY2ouxWTBCjgUequh+lBXaWFZLGAOOA\nW8uNpCkuAM4G1pYdSJPsDKwALs/VapdIelnZQTVKRLQDXwMeBpYCT0fEb8qNqmlGRsTS/PkxYGQj\nCmnFBNGyJG0B/Aw4IyKeKTueRpL0TmB5RNxedixNNATYD/huRIwDnqdBVQ8bg1zvfhQpMY4CXibp\nuHKjar5I9yo05H6FVkwQ7cAOVd3b534DmqShpORwdUTMLDueJjgYOFLSg6RqxDdLuqrckBruUeDR\niKicHV5LShgD1eHAAxGxIiJWATOBg0qOqVmWSdoOIP9f3ohCWjFB/BXYTdLOkjYhNWrNLjmmhpIk\nUr30woj4etnxNENETI2I7SNiDGkd3xgRA/roMiIeAx6RNDb3Ogy4t8SQGu1h4EBJm+ff+GEM4Eb5\nGrOBSfnzJOC6RhQypBEz3ZhFxGpJHwPmkK56uCwi7ik5rEY7GDgeWCDpztzvkxHx6xJjssY4Fbg6\nH/wsBj5YcjwNExG3SroWuIN0pd58BuAjNyRdAxwKbCvpUeCzwHnADEknkl55cExDyvajNszMrEgr\nVjGZmVkdnCDMzKyQE4SZmRVygjAzs0JOEGZmVsgJwvoVSSHp/KrusySds4HmfYWk92yIeXVTztH5\nSas31fQfI6kjPyZjoaTbJH2gi/mMl/StbsoaU/0UULOeaLn7IKzfewF4t6RzI+LxsoOpkDQkIlbX\nOfqJwEkR8YeCYf/Mj8lA0i7ATEmKiMsLypsHzOtT4GZd8BmE9TerSTdDnVk7oPYMQNJz+f+hkm6R\ndJ2kxZLOk/T+fIS+QNKuVbM5XNI8SX/Pz3OqvFNimqS/SvqbpA9Xzff3kmZTcMeypGPz/O+W9JXc\n7zPAIcClkqZ1taARsRj4v6RHWiPpHElXSvojcGUu/5dVwy6TdHNextMK4tkln53sL2nPvPx35mXa\nratYrDX5DML6o4uAv0n6ag+m2Qd4NemxyYuBSyLiAKWXJ50KnJHHG0N6JPyuwE2SXgmcQHpS6P6S\nNgX+KKny1ND9gL0i4oHqwiSNAr4CvAZ4CviNpIkR8XlJbwbOymcA3bkD2L2qew/gkIjokHRozbi7\nA28CXg4skvTdqnjGkp5J9YGIuEvSt4FvRkTlruvBdcRiLcZnENbv5CfR/pB8ZF2nv+b3YrwA/BOo\n7OAXkJJCxYyIWBsR95MSye7AW4ET8mNKbgW2ASpH3LfVJodsf+Dm/CC51cDVpHc19JRqumdHREcn\n4/4qIl7IVW/LWfcI6BGkZ/W8PyLuyv3+DHxS0ieAnbqYp7UwJwjrry4g1eVXv+9gNfk3LWkQsEnV\nsBeqPq+t6l7L+mfStc+eCdJO+tSI2Df/7Vz13oHn+7QU3RvH+g+g66q86mVcw7rlepr0YLtDKgMj\n4kfAkUAf/a7bAAABBElEQVQH8Ot8VmO2HicI65ci4klgBilJVDxIqtKBtPMb2otZHy1pUG6X2AVY\nRHqw40fzI9OR9Ko6XsRzG/BGSdsqveb2WOCWngSSX+70NeDbPVuEf/Mi8L9JZ0Hvy/PeBVgcEd8i\nnV38Rx/LsAHIbRDWn50PfKyq+wfAdZLuAq6nd0f3D5N27lsCH4mIf0m6hFQNdUd+rPQKunnFY0Qs\nlTQFuIl0BvKriKjnkcy7SpoPbAY8C3wrIq7oxXLUxvN8bnS/ITfe7wEcL2kV6Y1kX+5rGTbw+Gmu\nZmZWyFVMZmZWyAnCzMwKOUGYmVkhJwgzMyvkBGFmZoWcIMzMrJAThJmZFfofUpgTaesaImwAAAAA\nSUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "drinks = list(range(11))\n", "dexterity = [96, 96, 90, 63, 65, 50, 47, 46, 18, 17, 9]\n", "drinking = DataFrame({'drinks': drinks, 'dexterity': dexterity})\n", "\n", "plt.scatter(drinking['drinks'], drinking['dexterity'])\n", "\n", "\n", "\n", "# Add axis labels\n", "plt.xlabel(\"Number of Drinks\")\n", "plt.ylabel(\"Dexterity Score\")\n", "\n", "\n", "# Add title\n", "plt.title(\"The effects of alcohol consumption on dexterity\")\n", "\n", "\n", "# Show plot\n", "plt.show()\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numeric Represenstation of the Strength of the Correlation\n", "\n", "\n", "So far we've looked at correlations visually. The more a scatter plot looks like a straight line the stronger the correlation. If the line goes up from left to right it is a positive correlation. If it goes down it is a negative correlation. We can also determine correlation using a number...\n", "\n", "## Pearson Correlation Coefficient\n", "\n", "The Pearson Correlation Coefficient ranges from -1 to 1.\n", "1 is perfect positive correlation, -1 is perfect negative. The formula is as follows:\n", "\n", "$$r=\\frac{\\sum_{i=1}^n(x_i - \\bar{x})(y_i-\\bar{y})}{\\sqrt{\\sum_{i=1}^n(x_i - \\bar{x})} \\sqrt{\\sum_{i=1}^n(y_i - \\bar{y})}}$$\n", "\n", "Let me say immediately that you don't need to know that formula for this class. Should you dive deeper into Data Science you will need to interpret and apply formulas like this, but the good news is that these formulas always look a lot harder than they actually are. (Take it from me -- I had zero real math classes in high school and none in college (I have a Bachelor of Fine Arts degree))\n", "\n", "Anyway, let's ignore the formula and think that a number near 1 means a positive correlation; 0 means no correlation, and near -1 means a strong negative correlation. \n", "\n", "### Japanese ladies expensives on makeup and clothes\n", "Let's go back to our first example.\n", "First here is the data:" ] }, { "cell_type": "code", "execution_count": 77, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
clothesmakeup
Ms A70003000
Ms B80005000
Ms C2500012000
Ms D50002000
Ms E120007000
Ms F3000015000
Ms G100005000
Ms H150006000
Ms I200008000
Ms J1800010000
\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
clothesmakeup
clothes1.000000.96802
makeup0.968021.00000
\n", "
" ], "text/plain": [ " clothes makeup\n", "clothes 1.00000 0.96802\n", "makeup 0.96802 1.00000" ] }, "execution_count": 80, "metadata": {}, "output_type": "execute_result" } ], "source": [ "monthly.corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we have a nice table with 1s on the diagonal. The diagonal states the obvious: clothes expenditures are perfectly correlated with clothes expenditures and makeup expenditures are perfectly correlated with makeup expenditures. \n", "\n", "The other cells are the interesting ones and they show that the correlation between clothes and makeup is very strong at 0.96802.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's look at the correlation between the number of drinks and dexterity: " ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
dexteritydrinks
dexterity1.000000-0.979237
drinks-0.9792371.000000
\n", "
" ], "text/plain": [ " dexterity drinks\n", "dexterity 1.000000 -0.979237\n", "drinks -0.979237 1.000000" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "drinking.corr()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So -0.97923 indicates a very strong negative correlation" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

## Task 8: School Music Program: a case study

\n", "\n", "We are interested in seeing whether a relationship exists between music performance\n", "grades and students' individual Iowa Test of Basic Skills (ITBS) scores. Let's load in the data" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
BandMathLanguageScience
Student
13220215220
24240210225
33210250235
42215230210
54260240220
63230270250
74240240220
84259220240
93245230250
104280270230
115300260220
122230250225
133250245235
144275235210
153200260220
162200255250
174290250220
183250230240
194270245250
204260270230
213260270260
224270250230
235310260260
242170250220
\n", "
" ], "text/plain": [ " Band Math Language Science\n", "Student \n", "1 3 220 215 220\n", "2 4 240 210 225\n", "3 3 210 250 235\n", "4 2 215 230 210\n", "5 4 260 240 220\n", "6 3 230 270 250\n", "7 4 240 240 220\n", "8 4 259 220 240\n", "9 3 245 230 250\n", "10 4 280 270 230\n", "11 5 300 260 220\n", "12 2 230 250 225\n", "13 3 250 245 235\n", "14 4 275 235 210\n", "15 3 200 260 220\n", "16 2 200 255 250\n", "17 4 290 250 220\n", "18 3 250 230 240\n", "19 4 270 245 250\n", "20 4 260 270 230\n", "21 3 260 270 260\n", "22 4 270 250 230\n", "23 5 310 260 260\n", "24 2 170 250 220" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "musicFile = \"https://raw.githubusercontent.com/zacharski/machine-learning/master/data/musicITBS.csv\"\n", "music = pd.read_csv(musicFile)\n", "music.set_index('Student', inplace=True)\n", "\n", "music" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to know what test score (math, language, science) that the music performance grade is most highly correlated with. Write code that computes the correlation table. Also, for the most highly correlated pair, please draw a scatter plot. For example, if science is most highly correlated with band draw that scatter plot.\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "# Your code here\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using a scatter plot graph the strongest association\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

### Task 9: What do you think?

\n", "\n", "1. Is knowing a person's music performance grade a good predictor for that person's score in math? language? science?\n", "\n", "2. Does this study offer support for an argument not to eliminate the music program at our high school because it improves math scores?\n", "\n", "3. Does this study offer support for an argument that eliminating the music program will have no effect on language scores?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.1" } }, "nbformat": 4, "nbformat_minor": 1 }