{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 1.1: Practice with Pandas and Palmer's Penguins\n",
"\n",
"[Data set download](https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv)\n",
"\n",
"
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The [Palmer penguins data set](https://towardsdatascience.com/penguins-dataset-overview-iris-alternative-9453bb8c8d95) is a nice data set with which to practice various data science skills. For this exercise, we will use as subset of it, which you can download here: [https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv](https://s3.amazonaws.com/bebi103.caltech.edu/data/penguins_subset.csv). The data set consists of measurements of three different species of penguins acquired at the [Palmer Station in Antarctica](https://en.wikipedia.org/wiki/Palmer_Station). The measurements were made between 2007 and 2009 by [Kristen Gorman](https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php).\n",
"\n",
"**a)** Load the data set into a Pandas `DataFrame` called `df`. You will need to use the `header=[0,1]` kwarg of `pd.read_csv()` to load the data set in properly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**b)** Take a look at `df`. Is it tidy? Why or why not?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**c)** Perform the following operations to make a new `DataFrame` from the original one you loaded in exercise 1 to generate a new `DataFrame`. You do not need to worry about what these operations do (you can learn about tidying data frames [here](http://bebi103.caltech.edu.s3-website-us-east-1.amazonaws.com/2020a/lessons/08/index.html)), just do them to answer this question: Is the resulting data frame `df_tidy` tidy? Why or why not?\n",
"\n",
"```python\n",
"df_tidy = df.stack(\n",
" level=0\n",
").sort_index(\n",
" level=1\n",
").reset_index(\n",
" level=1\n",
").rename(\n",
" columns={\"level_1\": \"species\"}\n",
")\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**d)** Using `df_tidy`, slice out all of the bill lengths for *Gentoo* penguins as a Numpy array. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**e)** Make a new data frame containing the mean measured bill depth, bill length, body mass in kg, and flipper length for each species. You can use millimeters for all length measurements."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**f)** Save `df_tidy` as a file named `penguins_subset_tidy.csv`."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 4
}