{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fb84acb7-7340-4933-99ba-c5fb422cb07f",
   "metadata": {},
   "source": [
    "# Homework 1.2: Palmer penguins and split-apply-combine (30 pts)\n",
    "\n",
    "[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv)\n",
    "\n",
    "<hr />"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cce2733-1b20-4b82-a1c7-e224b8c1efc5",
   "metadata": {},
   "source": [
    "The [Palmer penguins data set](https://towardsdatascience.com/penguins-dataset-overview-iris-alternative-9453bb8c8d95) is a nice data set with which to practice various data science skills. For this exercise, we will use as subset of it, which you can download here: [https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv](https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv). The data set consists of measurements of three different species of penguins acquired at the [Palmer Station in Antarctica](https://en.wikipedia.org/wiki/Palmer_Station). The measurements were made between 2007 and 2009 by [Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php).\n",
    "\n",
    "**a)** Load the data set into a Pandas `DataFrame` called `df`. You will need to use the `header=[0,1]` kwarg of `pd.read_csv()` to load the data set in properly.\n",
    "\n",
    "**b)** Take a look at `df`. Is it tidy? Why or why not?\n",
    "\n",
    "**c)** Perform the following operations to make a new `DataFrame` from the original one you loaded in exercise 1 to generate a new `DataFrame`. You do not need to worry about what these operations do (that is the topic of next week, just do them to answer this question): \n",
    "\n",
    "```python\n",
    "df_tidy = df.stack(\n",
    "    level=0\n",
    ").sort_index(\n",
    "    level=1\n",
    ").reset_index(\n",
    "    level=1\n",
    ").rename(\n",
    "    columns={\"level_1\": \"species\"}\n",
    ")\n",
    "```\n",
    "\n",
    "Is the resulting data frame `df_tidy` tidy? Why or why not?\n",
    "\n",
    "**d)** Using that you created in part (c), `df_tidy`, slice out all of the bill lengths for *Gentoo* penguins.\n",
    "\n",
    "**e)** Make a new data frame containing the mean measured bill depth, bill length, body mass in kg, and flipper length for each species. You can use millimeters for all length measurements.\n",
    "\n",
    "**f)** Make a scatter plot of bill length versus flipper length with the glyphs colored by species."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fdd314dd-fc26-4d89-b105-c5e18bf3f0aa",
   "metadata": {},
   "source": [
    "<br />"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}