{ "cells": [ { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import seaborn as sns\n", "import matplotlib.pyplot as plt\n", "from scipy import spatial\n", "from sklearn import decomposition, cluster\n", "sns.set(rc={'figure.figsize' : (10,10)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. In the chapter, we mentioned the use of correlation-based distance and Euclidean distance as dissimilarity measures for hierarchical clustering. It turns out that these two measures are almost equivalent: if each observation has been centered to have mean zero and standard deviation one, and if we let `rij` denote the correlation between the `ith` and `jth` observations, then the quantity `1 − rij` is proportional to the squared Euclidean distance between the `ith` and `jth` observations. On the USArrests data, show that this proportionality holds." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | Murder | \n", "Assault | \n", "UrbanPop | \n", "Rape | \n", "
|---|---|---|---|---|
| 0 | \n", "1.242564 | \n", "0.782839 | \n", "-0.520907 | \n", "-0.003416 | \n", "
| 1 | \n", "0.507862 | \n", "1.106823 | \n", "-1.211764 | \n", "2.484203 | \n", "
| 2 | \n", "0.071633 | \n", "1.478803 | \n", "0.998980 | \n", "1.042878 | \n", "
| 3 | \n", "0.232349 | \n", "0.230868 | \n", "-1.073593 | \n", "-0.184917 | \n", "
| 4 | \n", "0.278268 | \n", "1.262814 | \n", "1.758923 | \n", "2.067820 | \n", "
| \n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "30 | \n", "31 | \n", "32 | \n", "33 | \n", "34 | \n", "35 | \n", "36 | \n", "37 | \n", "38 | \n", "39 | \n", "
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | \n", "-0.961933 | \n", "0.441803 | \n", "-0.975005 | \n", "1.417504 | \n", "0.818815 | \n", "0.316294 | \n", "-0.024967 | \n", "-0.063966 | \n", "0.031497 | \n", "-0.350311 | \n", "... | \n", "-0.509591 | \n", "-0.216726 | \n", "-0.055506 | \n", "-0.484449 | \n", "-0.521581 | \n", "1.949135 | \n", "1.324335 | \n", "0.468147 | \n", "1.061100 | \n", "1.655970 | \n", "
| 1 | \n", "-0.292526 | \n", "-1.139267 | \n", "0.195837 | \n", "-1.281121 | \n", "-0.251439 | \n", "2.511997 | \n", "-0.922206 | \n", "0.059543 | \n", "-1.409645 | \n", "-0.656712 | \n", "... | \n", "1.700708 | \n", "0.007290 | \n", "0.099062 | \n", "0.563853 | \n", "-0.257275 | \n", "-0.581781 | \n", "-0.169887 | \n", "-0.542304 | \n", "0.312939 | \n", "-1.284377 | \n", "
| 2 | \n", "0.258788 | \n", "-0.972845 | \n", "0.588486 | \n", "-0.800258 | \n", "-1.820398 | \n", "-2.058924 | \n", "-0.064764 | \n", "1.592124 | \n", "-0.173117 | \n", "-0.121087 | \n", "... | \n", "-0.615472 | \n", "0.009999 | \n", "0.945810 | \n", "-0.318521 | \n", "-0.117889 | \n", "0.621366 | \n", "-0.070764 | \n", "0.401682 | \n", "-0.016227 | \n", "-0.526553 | \n", "
| 3 | \n", "-1.152132 | \n", "-2.213168 | \n", "-0.861525 | \n", "0.630925 | \n", "0.951772 | \n", "-1.165724 | \n", "-0.391559 | \n", "1.063619 | \n", "-0.350009 | \n", "-1.489058 | \n", "... | \n", "-0.284277 | \n", "0.198946 | \n", "-0.091833 | \n", "0.349628 | \n", "-0.298910 | \n", "1.513696 | \n", "0.671185 | \n", "0.010855 | \n", "-1.043689 | \n", "1.625275 | \n", "
| 4 | \n", "0.195783 | \n", "0.593306 | \n", "0.282992 | \n", "0.247147 | \n", "1.978668 | \n", "-0.871018 | \n", "-0.989715 | \n", "-1.032253 | \n", "-1.109654 | \n", "-0.385142 | \n", "... | \n", "-0.692998 | \n", "-0.845707 | \n", "-0.177497 | \n", "-0.166491 | \n", "1.483155 | \n", "-1.687946 | \n", "-0.141430 | \n", "0.200778 | \n", "-0.675942 | \n", "2.220611 | \n", "
5 rows × 40 columns
\n", "