{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lab excercise 1/1. Unsupervised learning / clustering.\n", "\n", "\n", "---\n", "\n", "\n", "\n", "1, Get data\n", "* A Download the data from the article *\"Hurricane-induced selection on the morphology of an island lizard\"*.\n", " * https://www.nature.com/articles/s41586-018-0352-3\n", "\n", "2, PCA \n", "* A, Perform PCA on meaningful lizard body measurement data. \n", "* B, Plot, and interpret the first 3 components using the descriptive labels ('Origin', 'Sex', 'Hurricane').\n", "\n", "3, T-SNE\n", "* A, Perform T-SNE on meaningful lizard body measurement data. \n", "* B, Plot, and interpret the emerging clusters using the descriptive labels ('Origin', 'Sex', 'Hurricane'). \n", "* C, Repeat T-SNE 3 times using random seeds (0,1,2) and compare them visually. \n", "* D, Try T-SNE using 3 components too, do new clusters emerge which explain other descriptive labels?\n", "\n", "4, K-means\n", "* A, Perform K-means clustering on meaningful lizard body measurement data with 2 clusters. \n", "* B, Interpret the clusters using the descriptive labels ('Origin', 'Sex', 'Hurricane'). \n", "* C, Repeat A and B in the 2D T-SNE embedding space.\n", "* D, Perform K-means clustering on the original data with 3 and 4 clusters. Assess visually the meaning of clusters in the 2D space of the 1st and 3rd PCA component. What is the relationship between the clusters and the descriptive labels? \n", "\n", "5, Hierarchical clustering\n", "* A, Perform hierarchical clustering on meaningful lizard body measurement data. Show the results on a dendrogram.\n", "* B, Interpret the dendrogram using the descriptive labels ('Origin', 'Sex', 'Hurricane').\n", "\n", "\n", "---\n", "*Dezso Ribli*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "# Example solution file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load modules" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "%pylab inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from sklearn.decomposition import PCA\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.cluster import KMeans\n", "import seaborn as sns\n", "figsize(8,8)\n", "mpl.rcParams['font.size']=16" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1, Get data\n", "* A Download the data from the article *\"Hurricane-induced selection on the morphology of an island lizard\"*.\n", " * https://www.nature.com/articles/s41586-018-0352-3\n", "\n", "It's here : https://datadryad.org/resource/doi:10.5061/dryad.2t41r64" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "%%bash\n", "# wget https://datadryad.org/resource/doi:10.5061/dryad.2t41r64" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2, PCA \n", "* A, Perform PCA on meaningful lizard body measurement data. \n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load data and find \"meaningful lizard body measurement\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "data = pd.read_csv('hurricane.csv') # renamed it" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check data" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | ID | \n", "Hurricane | \n", "Origin | \n", "Sex | \n", "SVL | \n", "Femur | \n", "Tibia | \n", "Metatarsal | \n", "LongestToe | \n", "Humerus | \n", "... | \n", "FingerArea2 | \n", "FingerArea3 | \n", "ToeArea1 | \n", "ToeArea2 | \n", "ToeArea3 | \n", "MeanFingerArea | \n", "MeanToeArea | \n", "SumFingers | \n", "SumToes | \n", "MaxFingerForce | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "537 | \n", "After | \n", "Pine Cay | \n", "Male | \n", "48.69 | \n", "10.39 | \n", "11.87 | \n", "7.52 | \n", "7.43 | \n", "8.66 | \n", "... | \n", "1.338 | \n", "1.339 | \n", "2.529 | \n", "2.402 | \n", "2.369 | \n", "1.332667 | \n", "2.433333 | \n", "2.663 | \n", "4.791 | \n", "0.116 | \n", "
1 | \n", "539 | \n", "After | \n", "Pine Cay | \n", "Female | \n", "40.31 | \n", "8.66 | \n", "9.79 | \n", "6.18 | \n", "6.20 | \n", "8.01 | \n", "... | \n", "0.950 | \n", "0.972 | \n", "1.498 | \n", "1.525 | \n", "1.530 | \n", "0.961333 | \n", "1.517667 | \n", "2.595 | \n", "3.678 | \n", "0.048 | \n", "
2 | \n", "540 | \n", "After | \n", "Pine Cay | \n", "Male | \n", "58.30 | \n", "12.87 | \n", "14.76 | \n", "9.45 | \n", "9.58 | \n", "11.72 | \n", "... | \n", "2.702 | \n", "2.685 | \n", "4.157 | \n", "4.140 | \n", "3.996 | \n", "2.631333 | \n", "4.097667 | \n", "7.347 | \n", "4.682 | \n", "0.424 | \n", "
3 | \n", "541 | \n", "After | \n", "Pine Cay | \n", "Female | \n", "43.15 | \n", "8.55 | \n", "10.29 | \n", "6.60 | \n", "6.26 | \n", "7.43 | \n", "... | \n", "1.175 | \n", "1.186 | \n", "1.898 | \n", "1.871 | \n", "1.867 | \n", "1.177667 | \n", "1.878667 | \n", "2.786 | \n", "5.378 | \n", "0.171 | \n", "
4 | \n", "542 | \n", "After | \n", "Pine Cay | \n", "Female | \n", "45.51 | \n", "10.26 | \n", "11.02 | \n", "6.89 | \n", "7.02 | \n", "7.71 | \n", "... | \n", "1.357 | \n", "1.420 | \n", "2.627 | \n", "2.435 | \n", "2.529 | \n", "1.384333 | \n", "2.530333 | \n", "3.575 | \n", "6.646 | \n", "0.014 | \n", "
5 rows × 26 columns
\n", "\n", " | 0 | \n", "1 | \n", "2 | \n", "3 | \n", "4 | \n", "5 | \n", "6 | \n", "7 | \n", "8 | \n", "9 | \n", "... | \n", "154 | \n", "155 | \n", "156 | \n", "157 | \n", "158 | \n", "159 | \n", "160 | \n", "161 | \n", "162 | \n", "163 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ID | \n", "537 | \n", "539 | \n", "540 | \n", "541 | \n", "542 | \n", "543 | \n", "544 | \n", "545 | \n", "546 | \n", "547 | \n", "... | \n", "WC61 | \n", "WC62 | \n", "WC63 | \n", "WC64 | \n", "WC65 | \n", "WC66 | \n", "WC69 | \n", "WC70 | \n", "WC71 | \n", "WC72 | \n", "
Hurricane | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "After | \n", "... | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "Before | \n", "
Origin | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "Pine Cay | \n", "... | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "Water Cay | \n", "
Sex | \n", "Male | \n", "Female | \n", "Male | \n", "Female | \n", "Female | \n", "Female | \n", "Male | \n", "Male | \n", "Female | \n", "Male | \n", "... | \n", "Male | \n", "Male | \n", "Male | \n", "Female | \n", "Female | \n", "Female | \n", "Female | \n", "Female | \n", "Female | \n", "Female | \n", "
SVL | \n", "48.69 | \n", "40.31 | \n", "58.3 | \n", "43.15 | \n", "45.51 | \n", "46.97 | \n", "52.88 | \n", "57.01 | \n", "43.17 | \n", "54.2 | \n", "... | \n", "55.89 | \n", "55.5 | \n", "55.76 | \n", "41.92 | \n", "41.06 | \n", "43.04 | \n", "42.38 | \n", "45.74 | \n", "40.95 | \n", "40.62 | \n", "
Femur | \n", "10.39 | \n", "8.66 | \n", "12.87 | \n", "8.55 | \n", "10.26 | \n", "10.02 | \n", "12.74 | \n", "11.87 | \n", "9.99 | \n", "11.32 | \n", "... | \n", "12.35 | \n", "13.12 | \n", "12.32 | \n", "9.77 | \n", "9.18 | \n", "9.23 | \n", "9.21 | \n", "9.79 | \n", "9.04 | \n", "8.64 | \n", "
Tibia | \n", "11.87 | \n", "9.79 | \n", "14.76 | \n", "10.29 | \n", "11.02 | \n", "10.78 | \n", "12.43 | \n", "12.91 | \n", "11.13 | \n", "13.07 | \n", "... | \n", "13.73 | \n", "14.26 | \n", "14.32 | \n", "10.09 | \n", "10.13 | \n", "9.96 | \n", "9.8 | \n", "10.08 | \n", "10.08 | \n", "9.77 | \n", "
Metatarsal | \n", "7.52 | \n", "6.18 | \n", "9.45 | \n", "6.6 | \n", "6.89 | \n", "6.85 | \n", "7.9 | \n", "8.24 | \n", "6.88 | \n", "7.77 | \n", "... | \n", "8.47 | \n", "8.83 | \n", "8.97 | \n", "6.2 | \n", "6.57 | \n", "6.29 | \n", "6.68 | \n", "6.61 | \n", "6.26 | \n", "6.14 | \n", "
LongestToe | \n", "7.43 | \n", "6.2 | \n", "9.58 | \n", "6.26 | \n", "7.02 | \n", "7.18 | \n", "8.23 | \n", "8.02 | \n", "6.7 | \n", "7.7 | \n", "... | \n", "8.67 | \n", "8.35 | \n", "8.37 | \n", "5.99 | \n", "6.23 | \n", "5.72 | \n", "6.29 | \n", "6.54 | \n", "5.52 | \n", "6.61 | \n", "
Humerus | \n", "8.66 | \n", "8.01 | \n", "11.72 | \n", "7.43 | \n", "7.71 | \n", "8.45 | \n", "9.88 | \n", "10.31 | \n", "7.78 | \n", "10.19 | \n", "... | \n", "10.33 | \n", "10.89 | \n", "9.94 | \n", "7.78 | \n", "7.42 | \n", "7.12 | \n", "7.38 | \n", "8.05 | \n", "7.27 | \n", "6.91 | \n", "
Radius | \n", "7.99 | \n", "6.51 | \n", "9.54 | \n", "6.6 | \n", "7.25 | \n", "7.15 | \n", "8.4 | \n", "8.79 | \n", "7.11 | \n", "8.7 | \n", "... | \n", "9.02 | \n", "9.38 | \n", "9.11 | \n", "6.69 | \n", "6.3 | \n", "6.73 | \n", "6.62 | \n", "7.23 | \n", "6.66 | \n", "6.38 | \n", "
Metacarpal | \n", "2.22 | \n", "2.38 | \n", "3.54 | \n", "2.79 | \n", "2.52 | \n", "2.39 | \n", "3.15 | \n", "3.18 | \n", "2.82 | \n", "3.05 | \n", "... | \n", "3.28 | \n", "3.5 | \n", "2.87 | \n", "2.6 | \n", "2.26 | \n", "2.5 | \n", "2.17 | \n", "2.4 | \n", "2.24 | \n", "2.52 | \n", "
LongestFinger | \n", "3.19 | \n", "3.55 | \n", "5.09 | \n", "3.55 | \n", "3.37 | \n", "3.26 | \n", "4.3 | \n", "4.2 | \n", "3.36 | \n", "4.12 | \n", "... | \n", "4.47 | \n", "4.36 | \n", "4.3 | \n", "3.2 | \n", "2.6 | \n", "3.07 | \n", "3.28 | \n", "3.41 | \n", "3 | \n", "2.94 | \n", "
FingerCount | \n", "10 | \n", "10 | \n", "14 | \n", "11 | \n", "11 | \n", "12 | \n", "11 | \n", "12 | \n", "12 | \n", "12 | \n", "... | \n", "12 | \n", "12 | \n", "12 | \n", "9 | \n", "10 | \n", "10 | \n", "10 | \n", "11 | \n", "10 | \n", "11 | \n", "
ToeCount | \n", "12 | \n", "13 | \n", "15 | \n", "12 | \n", "13 | \n", "14 | \n", "14 | \n", "12 | \n", "13 | \n", "16 | \n", "... | \n", "14 | \n", "14 | \n", "15 | \n", "11 | \n", "13 | \n", "12 | \n", "12 | \n", "13 | \n", "12 | \n", "13 | \n", "
FingerArea1 | \n", "1.321 | \n", "0.962 | \n", "2.507 | \n", "1.172 | \n", "1.376 | \n", "1.428 | \n", "1.873 | \n", "2.558 | \n", "1.114 | \n", "2.284 | \n", "... | \n", "1.835 | \n", "1.796 | \n", "1.807 | \n", "0.919 | \n", "0.797 | \n", "1.015 | \n", "0.912 | \n", "1.136 | \n", "0.782 | \n", "0.827 | \n", "
FingerArea2 | \n", "1.338 | \n", "0.95 | \n", "2.702 | \n", "1.175 | \n", "1.357 | \n", "1.41 | \n", "1.85 | \n", "2.544 | \n", "1.08 | \n", "2.344 | \n", "... | \n", "1.914 | \n", "1.794 | \n", "1.825 | \n", "0.913 | \n", "0.77 | \n", "1.008 | \n", "0.92 | \n", "1.119 | \n", "0.793 | \n", "0.842 | \n", "
FingerArea3 | \n", "1.339 | \n", "0.972 | \n", "2.685 | \n", "1.186 | \n", "1.42 | \n", "1.44 | \n", "1.85 | \n", "2.574 | \n", "1.04 | \n", "2.27 | \n", "... | \n", "1.901 | \n", "1.766 | \n", "1.764 | \n", "0.931 | \n", "0.782 | \n", "1.031 | \n", "0.894 | \n", "1.146 | \n", "0.777 | \n", "0.845 | \n", "
ToeArea1 | \n", "2.529 | \n", "1.498 | \n", "4.157 | \n", "1.898 | \n", "2.627 | \n", "2.061 | \n", "2.984 | \n", "4.016 | \n", "1.794 | \n", "3.916 | \n", "... | \n", "2.897 | \n", "3.269 | \n", "2.793 | \n", "1.383 | \n", "1.372 | \n", "1.17 | \n", "1.542 | \n", "1.719 | \n", "1.227 | \n", "1.122 | \n", "
ToeArea2 | \n", "2.402 | \n", "1.525 | \n", "4.14 | \n", "1.871 | \n", "2.435 | \n", "2.018 | \n", "2.983 | \n", "3.952 | \n", "1.716 | \n", "3.913 | \n", "... | \n", "2.916 | \n", "3.325 | \n", "2.746 | \n", "1.331 | \n", "1.345 | \n", "1.127 | \n", "1.526 | \n", "1.716 | \n", "1.234 | \n", "1.203 | \n", "
ToeArea3 | \n", "2.369 | \n", "1.53 | \n", "3.996 | \n", "1.867 | \n", "2.529 | \n", "2.029 | \n", "2.958 | \n", "3.968 | \n", "1.805 | \n", "3.88 | \n", "... | \n", "2.869 | \n", "3.258 | \n", "2.748 | \n", "1.365 | \n", "1.342 | \n", "1.148 | \n", "1.527 | \n", "1.703 | \n", "1.252 | \n", "1.116 | \n", "
MeanFingerArea | \n", "1.33267 | \n", "0.961333 | \n", "2.63133 | \n", "1.17767 | \n", "1.38433 | \n", "1.426 | \n", "1.85767 | \n", "2.55867 | \n", "1.078 | \n", "2.29933 | \n", "... | \n", "1.88333 | \n", "1.78533 | \n", "1.79867 | \n", "0.921 | \n", "0.783 | \n", "1.018 | \n", "0.908667 | \n", "1.13367 | \n", "0.784 | \n", "0.838 | \n", "
MeanToeArea | \n", "2.43333 | \n", "1.51767 | \n", "4.09767 | \n", "1.87867 | \n", "2.53033 | \n", "2.036 | \n", "2.975 | \n", "3.97867 | \n", "1.77167 | \n", "3.903 | \n", "... | \n", "2.894 | \n", "3.284 | \n", "2.76233 | \n", "1.35967 | \n", "1.353 | \n", "1.14833 | \n", "1.53167 | \n", "1.71267 | \n", "1.23767 | \n", "1.147 | \n", "
SumFingers | \n", "2.663 | \n", "2.595 | \n", "7.347 | \n", "2.786 | \n", "3.575 | \n", "3.829 | \n", "5.453 | \n", "7.812 | \n", "2.68 | \n", "6.907 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
SumToes | \n", "4.791 | \n", "3.678 | \n", "4.682 | \n", "5.378 | \n", "6.646 | \n", "5.771 | \n", "8.427 | \n", "9.427 | \n", "4.243 | \n", "10.913 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
MaxFingerForce | \n", "0.116 | \n", "0.048 | \n", "0.424 | \n", "0.171 | \n", "0.014 | \n", "0.267 | \n", "0.356 | \n", "0.191 | \n", "0.191 | \n", "0.151 | \n", "... | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
26 rows × 164 columns
\n", "