{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy.random as random\n", "from IPython.display import Image\n", "# seaborn is under active development and throws some scary looking warnings\n", "import warnings \n", "# this will allow us to use the code in peace :) \n", "warnings.filterwarnings(\"ignore\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "#### Lecture 17:\n", "\n", "- Learn how to use the **seaborn** package to produce beautiful plots\n", "\n", "- Learn about kernel density estimates\n", "\n", "- Learn appropriate ways of representing different types of data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### seaborn\n", "\n", "In this lecture, we will learn about **seaborn**. **seaborn** is a package with many tools for data visualization. It allows you to make pretty plots. Almost anything in **seaborn** can be done using **matplotlib**, but with **seaborn**'s built-in functions you can reduce a lot of **matplotlib** code down to a single line. **seaborn** isn't just a pretty face. Its real power is in statistical data analysis. It has a lot of functions built in for visualizing the distribution of your data, for example. \n", "\n", "Let's take a look at some of the plots we can make with this package. We can import it using:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import seaborn as sns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Unusual distributions, kernel density estimates and jointplots\n", "\n", "In some cases, we have distributions of data that don't look like a simple (e.g., normal) distribution, for example, the data could be bimodal or have skewed shaped distributions (remember the histogram of the elevation data from around the world with two humps).\n", "\n", "Let's create some synthetic bimodal data by drawing from two separate normal/lognormal distributions and combine the two into two bimodal data sets. We do this by drawing from **random.normal( )** twice for two normal distributions ($x_1,x_2$) and twice from **random.lognormal( )** for two lognormal distributions ($y_1,y_2$).\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "xdata1=random.normal(20,25,5000) # first x draw\n", "xdata2=random.normal(100,25,5000) # second x draw\n", "ydata1=random.lognormal(2,0.1,8000) # first y draw\n", "ydata2=random.lognormal(3,0.1,2000) # second y draw\n", "xdata=np.append(xdata1,xdata2) # combine the two x data sets\n", "ydata=np.append(ydata1,ydata2) # combine the two y data sets" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When we plot our xdata as a histogram, we can see that we have a broadly bimodal distribution. For fun, let's also plot the mean of the distribution as a red line." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAADe5JREFUeJzt3W+M5dVdx/H3R1rQqCl/diBkd+MSuw+KD6R0AyQkBqEqfxoXk5LQGN2QTdYHkNRUYxefVBMfLCZKQ1SSVQiL0VKsNmzopoq0pDERyqxFCiWEFVcY2bDb8Ecb0hrq1wf3jJ3uzjD37tw7d/bc9yu5ub/f+Z25c87evZ85c37n95tUFZKkfv3ItBsgSZosg16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUufdNuwEAmzZtqm3btk27GdLoDh/+wfZHPjK9dmgmHT58+NtVNbdavQ0R9Nu2bWN+fn7azZBGl/xg2//DWmdJ/mOYek7dSFLnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS5zbElbGSJmfb3i8tW350303r3BJNiyN6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUue8YEobihf3SOPniF6SOmfQS1LnDHpJ6pxz9FInVjq/ITmil6TOGfSS1DmDXpI6Z9BLUuc8GSvNKC9Omx2O6CWpc0MHfZKzknwjyaNt/5IkTyV5Kcnnk5zdys9p+0fa8W2TabokaRijjOg/CbywZP8u4O6q2g68Cexu5buBN6vqg8DdrZ4kaUqGCvokW4CbgL9o+wGuBb7QqhwAbm7bO9s+7fh1rb4kaQqGHdF/Fvgd4H/b/gXAW1X1bttfADa37c3AqwDt+NutviRpClYN+iQfA45X1eGlxctUrSGOLX3dPUnmk8yfOHFiqMZKkkY3zIj+auCXkxwFHmIwZfNZ4Nwki8sztwCvte0FYCtAO/4B4I2TX7Sq9lfVjqraMTc3t6ZOSJJWtuo6+qq6E7gTIMk1wG9X1a8m+Rvg4wzCfxfwSPuSg23/n9vxr1TVKSN6aRSu+ZZO31rW0X8a+FSSIwzm4O9r5fcBF7TyTwF719ZESdJajHRlbFU9ATzRtl8GrlimzneBW8bQNknSGHgLBE2F906X1o+3QJCkzjmil84w/jakUTmil6TOGfSS1DmnbiT9EK9Z6I8jeknqnEEvSZ0z6CWpc87Ra6JcCihNnyN6SeqcI3qd0VwhIq3OoNdIDFbpzOPUjSR1zqCXpM45daMu9TDF5IoljYtBL2ko7/WD50z6ATqLnLqRpM4Z9JLUOYNekjpn0EtS5zwZK02Zq2s0aY7oJalzBr0kdc6gl6TOGfSS1DlPxmosPKEobVyO6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pzr6CWtWQ9/urFnBr2W5QVQUj+cupGkzq0a9El+NMnXk/xrkueT/H4rvyTJU0leSvL5JGe38nPa/pF2fNtkuyBJei/DjOi/B1xbVT8LXAZcn+Qq4C7g7qraDrwJ7G71dwNvVtUHgbtbPUnSlKwa9DXwnbb7/vYo4FrgC638AHBz297Z9mnHr0uSsbVYkjSSoebok5yV5BngOPAY8G/AW1X1bquyAGxu25uBVwHa8beBC5Z5zT1J5pPMnzhxYm29kCStaKigr6rvV9VlwBbgCuBDy1Vrz8uN3uuUgqr9VbWjqnbMzc0N215J0ohGWnVTVW8BTwBXAecmWVyeuQV4rW0vAFsB2vEPAG+Mo7GSpNENs+pmLsm5bfvHgI8CLwBfBT7equ0CHmnbB9s+7fhXquqUEb0kaX0Mc8HUxcCBJGcx+MHwcFU9muRbwENJ/gD4BnBfq38f8JdJjjAYyd86gXZLkoa0atBX1bPAh5cpf5nBfP3J5d8FbhlL6yRJa+aVsZLUOYNekjrnTc00U7zLomaRI3pJ6pxBL0mdc+pG0sQ4VbYxOKKXpM4Z9JLUOYNekjpn0EtS5zwZK42ZJyC10Tiil6TOGfSS1DmDXpI65xy9tE5WmruXJs2gl/AEqvrm1I0kdc6gl6TOGfSS1Dnn6KXTtG3vlzh60r60ERn0M8xgWp3/RuqBUzeS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM6tGvRJtib5apIXkjyf5JOt/PwkjyV5qT2f18qT5J4kR5I8m+TySXdCkrSyYUb07wK/VVUfAq4Cbk9yKbAXeLyqtgOPt32AG4Dt7bEHuHfsrZYkDW3VoK+qY1X1L237v4EXgM3ATuBAq3YAuLlt7wQerIEngXOTXDz2lkuShjLSHH2SbcCHgaeAi6rqGAx+GAAXtmqbgVeXfNlCKzv5tfYkmU8yf+LEidFbLkkaytBBn+QngL8FfrOq/uu9qi5TVqcUVO2vqh1VtWNubm7YZkiSRjRU0Cd5P4OQ/6uq+rtW/PrilEx7Pt7KF4CtS758C/DaeJorSRrV+1arkCTAfcALVfXHSw4dBHYB+9rzI0vK70jyEHAl8PbiFI8kAWzb+6Vly4/uu2mdWzIbVg164Grg14BvJnmmlf0ug4B/OMlu4BXglnbsEHAjcAR4B7htrC2WJI1k1aCvqn9i+Xl3gOuWqV/A7WtslyRpTLwyVpI6Z9BLUucMeknq3DAnY3WGW2mFg6TZ4Ihekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1Ln/FOCkjaMlf7s5dF9N61zS/riiF6SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOS+YkrTheSHV2jiil6TOGfSS1DmDXpI6Z9BLUudWDfok9yc5nuS5JWXnJ3ksyUvt+bxWniT3JDmS5Nkkl0+y8ZKk1Q2z6uYB4E+AB5eU7QUer6p9Sfa2/U8DNwDb2+NK4N72rHWw0soEqVeuxhnOqiP6qvoa8MZJxTuBA237AHDzkvIHa+BJ4NwkF4+rsZKk0Z3uHP1FVXUMoD1f2Mo3A68uqbfQyiRJUzLuk7FZpqyWrZjsSTKfZP7EiRNjboYkadHpBv3ri1My7fl4K18Ati6ptwV4bbkXqKr9VbWjqnbMzc2dZjMkSas53VsgHAR2Afva8yNLyu9I8hCDk7BvL07xaHw86SppFKsGfZLPAdcAm5IsAJ9hEPAPJ9kNvALc0qofAm4EjgDvALdNoM2SpBGsGvRV9YkVDl23TN0Cbl9royRJ4+OVsZLUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzp3unxLUOvBPBkoaB0f0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdc9WNpO6stGLt6L6b1rklG4MjeknqnEEvSZ1z6mYdeQGUpGlwRC9JnXNEL2lmzOpJWkf0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdc9WNpJnX+2ocg34CvDBK0kbi1I0kdc4RvSStoJcpnYmM6JNcn+TFJEeS7J3E95AkDWfsI/okZwF/CvwCsAA8neRgVX1r3N9r3Hr56S1JS01i6uYK4EhVvQyQ5CFgJ7Dhg34lnlyVtNQ4M2E9BpKTCPrNwKtL9heAKyfwfYDR/8EdnUuaNZMI+ixTVqdUSvYAe9rud5K8OIG2nCJ3TeylNwHfntirbxyz0M+h+/hD/9nv+thEGjNBvpcbwBoz6aeGqTSJoF8Ati7Z3wK8dnKlqtoP7J/A95+KJPNVtWPa7Zi0WejnLPQRZqOfs9DHYUxi1c3TwPYklyQ5G7gVODiB7yNJGsLYR/RV9W6SO4C/B84C7q+q58f9fSRJw5nIBVNVdQg4NInX3sC6mYZaxSz0cxb6CLPRz1no46pSdcp5UklSR7zXjSR1zqBfoyS/l+Q/kzzTHjcuOXZnuw3Ei0l+aZrtXKueb2uR5GiSb7b3b76VnZ/ksSQvtefzpt3OUSS5P8nxJM8tKVu2Txm4p723zya5fHotH80K/ZyJz+QoDPrxuLuqLmuPQwBJLmWw4uhngOuBP2u3hzjjLLmtxQ3ApcAnWv968vPt/VtcircXeLyqtgOPt/0zyQMM/t8ttVKfbgC2t8ce4N51auM4PMCp/YTOP5OjMugnZyfwUFV9r6r+HTjC4PYQZ6L/v61FVf0PsHhbi57tBA607QPAzVNsy8iq6mvAGycVr9SnncCDNfAkcG6Si9enpWuzQj9X0tNnciQG/Xjc0X7lvX/Jr/jL3Qpi8/o3bSx66styCviHJIfbFdsAF1XVMYD2fOHUWjc+K/Wpx/e398/kSAz6IST5xyTPLfPYyeDX3J8GLgOOAX+0+GXLvNSZusSpp74s5+qqupzBFMbtSX5u2g1aZ729v7PwmRyJf3hkCFX10WHqJflz4NG2O9StIM4QPfXlFFX1Wns+nuSLDH6dfz3JxVV1rE1jHJ9qI8djpT519f5W1euL2x1/JkfiiH6NTprL/BVg8ez/QeDWJOckuYTBia6vr3f7xqTb21ok+fEkP7m4Dfwig/fwILCrVdsFPDKdFo7VSn06CPx6W31zFfD24hTPmWhGPpMjcUS/dn+Y5DIGvwIeBX4DoKqeT/Iwg/vwvwvcXlXfn1or16Dz21pcBHwxCQw+D39dVV9O8jTwcJLdwCvALVNs48iSfA64BtiUZAH4DLCP5ft0CLiRwcnJd4Db1r3Bp2mFfl7T+2dyVF4ZK0mdc+pGkjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1Ln/A+wvmHq6xWKqAAAAAElFTkSuQmCC\n", "text/plain": [ "