{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Way number eight of looking at the correlation coefficient\n", "\n", "This is a notebook to accompany the blog post [\"Way number eight of looking at the correlation coefficient\"](http://composition.al/blog/2019/01/31/way-number-eight-of-looking-at-the-correlation-coefficient/). Read the post for additional context!" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from datascience import *\n", "from datetime import *\n", "import matplotlib\n", "%matplotlib inline\n", "import matplotlib.pyplot as plots\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import pandas as pd\n", "import math" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Recap from last time\n", "\n", "As [before](http://composition.al/blog/2018/08/31/understanding-the-regression-line-with-standard-units/), we're using the [datascience](http://data8.org/datascience/) package, and everything else we're using is pretty standard.\n", "\n", "And, as before, here's the data we'll be working with, [converted to standard units](https://www.inferentialthinking.com/chapters/14/2/Variability#standard-units) and plotted:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date Height (standard units) Weight (standard units)
07/28/2017 -1.26135 -1.3158
08/07/2017 -1.08691 -1.13054
08/25/2017 -0.912464 -0.808628
09/25/2017 -0.228116 -0.399485
11/28/2017 0.107349 0.254728
01/26/2018 0.617255 0.728253
04/27/2018 1.12716 1.2537
07/30/2018 1.63707 1.41777
" ], "text/plain": [ "Date | Height (standard units) | Weight (standard units)\n", "07/28/2017 | -1.26135 | -1.3158\n", "08/07/2017 | -1.08691 | -1.13054\n", "08/25/2017 | -0.912464 | -0.808628\n", "09/25/2017 | -0.228116 | -0.399485\n", "11/28/2017 | 0.107349 | 0.254728\n", "01/26/2018 | 0.617255 | 0.728253\n", "04/27/2018 | 1.12716 | 1.2537\n", "07/30/2018 | 1.63707 | 1.41777" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "heightweight = Table().with_columns([\n", " 'Date', ['07/28/2017', '08/07/2017', '08/25/2017', '09/25/2017', '11/28/2017', '01/26/2018', '04/27/2018', '07/30/2018'],\n", " 'Height (cm)', [ 53.3, 54.6, 55.9, 61, 63.5, 67.3, 71.1, 74.9],\n", " 'Weight (kg)', [ 4.204, 4.65, 5.425, 6.41, 7.985, 9.125, 10.39, 10.785],\n", " ])\n", "def standard_units(nums):\n", " return (nums - np.mean(nums)) / np.std(nums)\n", "\n", "heightweight_standard = Table().with_columns(\n", " 'Date', heightweight.column('Date'),\n", " 'Height (standard units)', standard_units(heightweight.column('Height (cm)')),\n", " 'Weight (standard units)', standard_units(heightweight.column('Weight (kg)')))\n", "heightweight_standard" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "heightweight_standard.scatter(\n", " 'Height (standard units)',\n", " 'Weight (standard units)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the data in \"person space\"\n", "\n", "So far, this is all a recap of [last time](http://composition.al/blog/2018/08/31/understanding-the-regression-line-with-standard-units/). Now, let's try turning our data sideways.\n", "\n", "The hacky way I have of doing this is to convert the data first to a numpy [ndarray](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.html), then to a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), and then [transposing](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.T.html#pandas.DataFrame.T) the DataFrame. This is kind of silly, but I don't know a better way to transpose a [structured ndarray](https://docs.scipy.org/doc/numpy/user/basics.rec.html). If you do, let me know." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
01234567
Date07/28/201708/07/201708/25/201709/25/201711/28/201701/26/201804/27/201807/30/2018
Height (standard units)-1.26135-1.08691-0.912464-0.2281160.1073490.6172551.127161.63707
Weight (standard units)-1.3158-1.13054-0.808628-0.3994850.2547280.7282531.25371.41777
\n", "
" ], "text/plain": [ " 0 1 2 3 \\\n", "Date 07/28/2017 08/07/2017 08/25/2017 09/25/2017 \n", "Height (standard units) -1.26135 -1.08691 -0.912464 -0.228116 \n", "Weight (standard units) -1.3158 -1.13054 -0.808628 -0.399485 \n", "\n", " 4 5 6 7 \n", "Date 11/28/2017 01/26/2018 04/27/2018 07/30/2018 \n", "Height (standard units) 0.107349 0.617255 1.12716 1.63707 \n", "Weight (standard units) 0.254728 0.728253 1.2537 1.41777 " ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# First convert to a plain old numpy ndarray.\n", "heightweight_standard_np = heightweight_standard.to_array()\n", "\n", "# Now convert *that* to a pandas DataFrame.\n", "df = pd.DataFrame(heightweight_standard_np)\n", "\n", "# Get the transpose of the DataFrame.\n", "df = df.T\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "pandas defaults to using `RangeIndex (0, 1, 2, …, n)` for the column labels, but we want the dates from the first row to be the column headers rather than being an actual row. That's [an easy change to make](https://stackoverflow.com/questions/26147180/convert-row-to-column-header-for-pandas-dataframe), though." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date07/28/201708/07/201708/25/201709/25/201711/28/201701/26/201804/27/201807/30/2018
Height (standard units)-1.26135-1.08691-0.912464-0.2281160.1073490.6172551.127161.63707
Weight (standard units)-1.3158-1.13054-0.808628-0.3994850.2547280.7282531.25371.41777
\n", "
" ], "text/plain": [ "Date 07/28/2017 08/07/2017 08/25/2017 09/25/2017 \\\n", "Height (standard units) -1.26135 -1.08691 -0.912464 -0.228116 \n", "Weight (standard units) -1.3158 -1.13054 -0.808628 -0.399485 \n", "\n", "Date 11/28/2017 01/26/2018 04/27/2018 07/30/2018 \n", "Height (standard units) 0.107349 0.617255 1.12716 1.63707 \n", "Weight (standard units) 0.254728 0.728253 1.2537 1.41777 " ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.columns = df.iloc[0]\n", "df = df.drop(\"Date\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While we're at it, we'll convert the values in our DataFrame to numeric values, so that we can visualize them in a moment." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date07/28/201708/07/201708/25/201709/25/201711/28/201701/26/201804/27/201807/30/2018
Height (standard units)-1.261347-1.086906-0.912464-0.2281160.1073490.6172551.1271611.637068
Weight (standard units)-1.315798-1.130542-0.808628-0.3994850.2547280.7282531.2537001.417773
\n", "
" ], "text/plain": [ "Date 07/28/2017 08/07/2017 08/25/2017 09/25/2017 \\\n", "Height (standard units) -1.261347 -1.086906 -0.912464 -0.228116 \n", "Weight (standard units) -1.315798 -1.130542 -0.808628 -0.399485 \n", "\n", "Date 11/28/2017 01/26/2018 04/27/2018 07/30/2018 \n", "Height (standard units) 0.107349 0.617255 1.127161 1.637068 \n", "Weight (standard units) 0.254728 0.728253 1.253700 1.417773 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.apply(pd.to_numeric)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eight dimensions are too many to try to visualize, but we can pare it down to three. We'll pick three -- the first (07/28/2017), the last (07/30/2018), and one in the middle (01/26/2018) -- and drop the rest." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date07/28/201701/26/201807/30/2018
Height (standard units)-1.2613470.6172551.637068
Weight (standard units)-1.3157980.7282531.417773
\n", "
" ], "text/plain": [ "Date 07/28/2017 01/26/2018 07/30/2018\n", "Height (standard units) -1.261347 0.617255 1.637068\n", "Weight (standard units) -1.315798 0.728253 1.417773" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_3dim = df.drop(df.columns[[1, 2, 3, 4, 6]],axis=1)\n", "df_3dim" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can visualize the data with a three-dimensional scatter plot." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/javascript": [ "/* Put everything inside the global mpl namespace */\n", "window.mpl = {};\n", "\n", "\n", "mpl.get_websocket_type = function() {\n", " if (typeof(WebSocket) !== 'undefined') {\n", " return WebSocket;\n", " } else if (typeof(MozWebSocket) !== 'undefined') {\n", " return MozWebSocket;\n", " } else {\n", " alert('Your browser does not have WebSocket support.' +\n", " 'Please try Chrome, Safari or Firefox ≥ 6. ' +\n", " 'Firefox 4 and 5 are also supported but you ' +\n", " 'have to enable WebSockets in about:config.');\n", " };\n", "}\n", "\n", "mpl.figure = function(figure_id, websocket, ondownload, parent_element) {\n", " this.id = figure_id;\n", "\n", " this.ws = websocket;\n", "\n", " this.supports_binary = (this.ws.binaryType != undefined);\n", "\n", " if (!this.supports_binary) {\n", " var warnings = document.getElementById(\"mpl-warnings\");\n", " if (warnings) {\n", " warnings.style.display = 'block';\n", " warnings.textContent = (\n", " \"This browser does not support binary websocket messages. \" +\n", " \"Performance may be slow.\");\n", " }\n", " }\n", "\n", " this.imageObj = new Image();\n", "\n", " this.context = undefined;\n", " this.message = undefined;\n", " this.canvas = undefined;\n", " this.rubberband_canvas = undefined;\n", " this.rubberband_context = undefined;\n", " this.format_dropdown = undefined;\n", "\n", " this.image_mode = 'full';\n", "\n", " this.root = $('
');\n", " this._root_extra_style(this.root)\n", " this.root.attr('style', 'display: inline-block');\n", "\n", " $(parent_element).append(this.root);\n", "\n", " this._init_header(this);\n", " this._init_canvas(this);\n", " this._init_toolbar(this);\n", "\n", " var fig = this;\n", "\n", " this.waiting = false;\n", "\n", " this.ws.onopen = function () {\n", " fig.send_message(\"supports_binary\", {value: fig.supports_binary});\n", " fig.send_message(\"send_image_mode\", {});\n", " if (mpl.ratio != 1) {\n", " fig.send_message(\"set_dpi_ratio\", {'dpi_ratio': mpl.ratio});\n", " }\n", " fig.send_message(\"refresh\", {});\n", " }\n", "\n", " this.imageObj.onload = function() {\n", " if (fig.image_mode == 'full') {\n", " // Full images could contain transparency (where diff images\n", " // almost always do), so we need to clear the canvas so that\n", " // there is no ghosting.\n", " fig.context.clearRect(0, 0, fig.canvas.width, fig.canvas.height);\n", " }\n", " fig.context.drawImage(fig.imageObj, 0, 0);\n", " };\n", "\n", " this.imageObj.onunload = function() {\n", " fig.ws.close();\n", " }\n", "\n", " this.ws.onmessage = this._make_on_message_function(this);\n", "\n", " this.ondownload = ondownload;\n", "}\n", "\n", "mpl.figure.prototype._init_header = function() {\n", " var titlebar = $(\n", " '
');\n", " var titletext = $(\n", " '
');\n", " titlebar.append(titletext)\n", " this.root.append(titlebar);\n", " this.header = titletext[0];\n", "}\n", "\n", "\n", "\n", "mpl.figure.prototype._canvas_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "\n", "mpl.figure.prototype._root_extra_style = function(canvas_div) {\n", "\n", "}\n", "\n", "mpl.figure.prototype._init_canvas = function() {\n", " var fig = this;\n", "\n", " var canvas_div = $('
');\n", "\n", " canvas_div.attr('style', 'position: relative; clear: both; outline: 0');\n", "\n", " function canvas_keyboard_event(event) {\n", " return fig.key_event(event, event['data']);\n", " }\n", "\n", " canvas_div.keydown('key_press', canvas_keyboard_event);\n", " canvas_div.keyup('key_release', canvas_keyboard_event);\n", " this.canvas_div = canvas_div\n", " this._canvas_extra_style(canvas_div)\n", " this.root.append(canvas_div);\n", "\n", " var canvas = $('');\n", " canvas.addClass('mpl-canvas');\n", " canvas.attr('style', \"left: 0; top: 0; z-index: 0; outline: 0\")\n", "\n", " this.canvas = canvas[0];\n", " this.context = canvas[0].getContext(\"2d\");\n", "\n", " var backingStore = this.context.backingStorePixelRatio ||\n", "\tthis.context.webkitBackingStorePixelRatio ||\n", "\tthis.context.mozBackingStorePixelRatio ||\n", "\tthis.context.msBackingStorePixelRatio ||\n", "\tthis.context.oBackingStorePixelRatio ||\n", "\tthis.context.backingStorePixelRatio || 1;\n", "\n", " mpl.ratio = (window.devicePixelRatio || 1) / backingStore;\n", "\n", " var rubberband = $('');\n", " rubberband.attr('style', \"position: absolute; left: 0; top: 0; z-index: 1;\")\n", "\n", " var pass_mouse_events = true;\n", "\n", " canvas_div.resizable({\n", " start: function(event, ui) {\n", " pass_mouse_events = false;\n", " },\n", " resize: function(event, ui) {\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " stop: function(event, ui) {\n", " pass_mouse_events = true;\n", " fig.request_resize(ui.size.width, ui.size.height);\n", " },\n", " });\n", "\n", " function mouse_event_fn(event) {\n", " if (pass_mouse_events)\n", " return fig.mouse_event(event, event['data']);\n", " }\n", "\n", " rubberband.mousedown('button_press', mouse_event_fn);\n", " rubberband.mouseup('button_release', mouse_event_fn);\n", " // Throttle sequential mouse events to 1 every 20ms.\n", " rubberband.mousemove('motion_notify', mouse_event_fn);\n", "\n", " rubberband.mouseenter('figure_enter', mouse_event_fn);\n", " rubberband.mouseleave('figure_leave', mouse_event_fn);\n", "\n", " canvas_div.on(\"wheel\", function (event) {\n", " event = event.originalEvent;\n", " event['data'] = 'scroll'\n", " if (event.deltaY < 0) {\n", " event.step = 1;\n", " } else {\n", " event.step = -1;\n", " }\n", " mouse_event_fn(event);\n", " });\n", "\n", " canvas_div.append(canvas);\n", " canvas_div.append(rubberband);\n", "\n", " this.rubberband = rubberband;\n", " this.rubberband_canvas = rubberband[0];\n", " this.rubberband_context = rubberband[0].getContext(\"2d\");\n", " this.rubberband_context.strokeStyle = \"#000000\";\n", "\n", " this._resize_canvas = function(width, height) {\n", " // Keep the size of the canvas, canvas container, and rubber band\n", " // canvas in synch.\n", " canvas_div.css('width', width)\n", " canvas_div.css('height', height)\n", "\n", " canvas.attr('width', width * mpl.ratio);\n", " canvas.attr('height', height * mpl.ratio);\n", " canvas.attr('style', 'width: ' + width + 'px; height: ' + height + 'px;');\n", "\n", " rubberband.attr('width', width);\n", " rubberband.attr('height', height);\n", " }\n", "\n", " // Set the figure to an initial 600x600px, this will subsequently be updated\n", " // upon first draw.\n", " this._resize_canvas(600, 600);\n", "\n", " // Disable right mouse context menu.\n", " $(this.rubberband_canvas).bind(\"contextmenu\",function(e){\n", " return false;\n", " });\n", "\n", " function set_focus () {\n", " canvas.focus();\n", " canvas_div.focus();\n", " }\n", "\n", " window.setTimeout(set_focus, 100);\n", "}\n", "\n", "mpl.figure.prototype._init_toolbar = function() {\n", " var fig = this;\n", "\n", " var nav_element = $('
')\n", " nav_element.attr('style', 'width: 100%');\n", " this.root.append(nav_element);\n", "\n", " // Define a callback function for later on.\n", " function toolbar_event(event) {\n", " return fig.toolbar_button_onclick(event['data']);\n", " }\n", " function toolbar_mouse_event(event) {\n", " return fig.toolbar_button_onmouseover(event['data']);\n", " }\n", "\n", " for(var toolbar_ind in mpl.toolbar_items) {\n", " var name = mpl.toolbar_items[toolbar_ind][0];\n", " var tooltip = mpl.toolbar_items[toolbar_ind][1];\n", " var image = mpl.toolbar_items[toolbar_ind][2];\n", " var method_name = mpl.toolbar_items[toolbar_ind][3];\n", "\n", " if (!name) {\n", " // put a spacer in here.\n", " continue;\n", " }\n", " var button = $('