{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 8 exercises\n", "\n", "**[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv)**\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Colab setup ------------------\n", "import os, sys, subprocess\n", "if \"google.colab\" in sys.modules:\n", " cmd = \"pip install --upgrade iqplot bebi103 watermark\"\n", " process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)\n", " stdout, stderr = process.communicate()\n", " data_path = \"https://s3.amazonaws.com/bebi103.caltech.edu/data/\"\n", "else:\n", " data_path = \"../data/\"\n", "# ------------------------------\n", "\n", "import pandas as pd" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "## Exercise 8.1\n", "\n", "In the lesson exercise, we will again work with a subset of the Palmer penguin data set. I will load it and view it now." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
GentooAdelieChinstrap
bill_depth_mmbill_length_mmflipper_length_mmbody_mass_gbill_depth_mmbill_length_mmflipper_length_mmbody_mass_gbill_depth_mmbill_length_mmflipper_length_mmbody_mass_g
016.348.4220.05400.018.536.8193.03500.018.347.6195.03850.0
115.846.3215.05050.016.937.0185.03000.016.742.5187.03350.0
214.247.5209.04600.019.542.0200.04050.016.640.9187.03200.0
315.748.7208.05350.018.342.7196.04075.020.052.8205.04550.0
414.148.7210.04450.018.035.7202.03550.018.745.4188.03525.0
\n", "
" ], "text/plain": [ " Gentoo Adelie \\\n", " bill_depth_mm bill_length_mm flipper_length_mm body_mass_g bill_depth_mm \n", "0 16.3 48.4 220.0 5400.0 18.5 \n", "1 15.8 46.3 215.0 5050.0 16.9 \n", "2 14.2 47.5 209.0 4600.0 19.5 \n", "3 15.7 48.7 208.0 5350.0 18.3 \n", "4 14.1 48.7 210.0 4450.0 18.0 \n", "\n", " Chinstrap \\\n", " bill_length_mm flipper_length_mm body_mass_g bill_depth_mm bill_length_mm \n", "0 36.8 193.0 3500.0 18.3 47.6 \n", "1 37.0 185.0 3000.0 16.7 42.5 \n", "2 42.0 200.0 4050.0 16.6 40.9 \n", "3 42.7 196.0 4075.0 20.0 52.8 \n", "4 35.7 202.0 3550.0 18.7 45.4 \n", "\n", " \n", " flipper_length_mm body_mass_g \n", "0 195.0 3850.0 \n", "1 187.0 3350.0 \n", "2 187.0 3200.0 \n", "3 205.0 4550.0 \n", "4 188.0 3525.0 " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(os.path.join(data_path, \"penguins_subset.csv\"), header=[0, 1])\n", "\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Explain in words what each of the following code cells does as we work toward tidying this data frame." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df.columns.names = ['species', None]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df = df.stack(level='species')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df = df.reset_index(level='species')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "df = df.reset_index(drop=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 8.2\n", "\n", "What is the difference between merging and concatenating data frames?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise 8.3\n", "\n", "Write down any questions or points of confusion that you have." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Computing environment" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPython 3.8.5\n", "IPython 7.18.1\n", "\n", "pandas 1.1.3\n", "jupyterlab 2.2.6\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -p pandas,jupyterlab" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 4 }