{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data Manipulation with Pandas\n",
    "\n",
    "- [Download the lecture notes](https://philchodrow.github.io/PIC16A/content/pd/pd_1.ipynb). \n",
    "\n",
    "The `pandas` package for Python offers a set of powerful tools for working with tabular data. The name `pandas` is originally derived from the common term \"panel data.\" While different programmers pronounce the package name in different ways, I prefer to pronounce it like the plural of \"panda [bears]\". I then imagine a small army of adorable bears performing my computations for me. \n",
    "\n",
    "<figure class=\"image\" style=\"width:50%\">\n",
    "  <img src=\"https://i.imgflip.com/yjdc4.jpg\" alt=\"A baby panda lying on its head, with its feet in the air.\">\n",
    "  <figcaption><i>This panda is upside down because it is sorting data in reverse order.</i></figcaption>\n",
    "</figure>\n",
    "\n",
    "The first step when working with `pandas` is always to `import` it: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CSV Data, Revisited\n",
    "\n",
    "A few weeks ago, we discussed some tools for reading CSV data from files using the `csv` module. `pandas` is usually a better choice for reading (and working with) CSV data. To read a CSV file  from data, we use the function `pd.read_csv()`. First, the following code block will place a copy of our data into the current working directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "import urllib\n",
    "\n",
    "url = \"https://philchodrow.github.io/PIC16A/datasets/palmer_penguins.csv\"\n",
    "filedata = urllib.request.urlopen(url)\n",
    "to_write = filedata.read()\n",
    "\n",
    "with open(\"palmer_penguins.csv\", \"wb\") as f:\n",
    "    f.write(to_write)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we can read in the the file as a DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "pandas.core.frame.DataFrame"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins = pd.read_csv(\"palmer_penguins.csv\")\n",
    "type(penguins)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's inspect our new DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>studyName</th>\n",
       "      <th>Sample Number</th>\n",
       "      <th>Species</th>\n",
       "      <th>Region</th>\n",
       "      <th>Island</th>\n",
       "      <th>Stage</th>\n",
       "      <th>Individual ID</th>\n",
       "      <th>Clutch Completion</th>\n",
       "      <th>Date Egg</th>\n",
       "      <th>Culmen Length (mm)</th>\n",
       "      <th>Culmen Depth (mm)</th>\n",
       "      <th>Flipper Length (mm)</th>\n",
       "      <th>Body Mass (g)</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Delta 15 N (o/oo)</th>\n",
       "      <th>Delta 13 C (o/oo)</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>PAL0708</td>\n",
       "      <td>1</td>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N1A1</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/11/07</td>\n",
       "      <td>39.1</td>\n",
       "      <td>18.7</td>\n",
       "      <td>181.0</td>\n",
       "      <td>3750.0</td>\n",
       "      <td>MALE</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Not enough blood for isotopes.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>PAL0708</td>\n",
       "      <td>2</td>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N1A2</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/11/07</td>\n",
       "      <td>39.5</td>\n",
       "      <td>17.4</td>\n",
       "      <td>186.0</td>\n",
       "      <td>3800.0</td>\n",
       "      <td>FEMALE</td>\n",
       "      <td>8.94956</td>\n",
       "      <td>-24.69454</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>PAL0708</td>\n",
       "      <td>3</td>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N2A1</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/16/07</td>\n",
       "      <td>40.3</td>\n",
       "      <td>18.0</td>\n",
       "      <td>195.0</td>\n",
       "      <td>3250.0</td>\n",
       "      <td>FEMALE</td>\n",
       "      <td>8.36821</td>\n",
       "      <td>-25.33302</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>PAL0708</td>\n",
       "      <td>4</td>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N2A2</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/16/07</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Adult not sampled.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>PAL0708</td>\n",
       "      <td>5</td>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N3A1</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/16/07</td>\n",
       "      <td>36.7</td>\n",
       "      <td>19.3</td>\n",
       "      <td>193.0</td>\n",
       "      <td>3450.0</td>\n",
       "      <td>FEMALE</td>\n",
       "      <td>8.76651</td>\n",
       "      <td>-25.32426</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>339</th>\n",
       "      <td>PAL0910</td>\n",
       "      <td>120</td>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N38A2</td>\n",
       "      <td>No</td>\n",
       "      <td>12/1/09</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>340</th>\n",
       "      <td>PAL0910</td>\n",
       "      <td>121</td>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N39A1</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/22/09</td>\n",
       "      <td>46.8</td>\n",
       "      <td>14.3</td>\n",
       "      <td>215.0</td>\n",
       "      <td>4850.0</td>\n",
       "      <td>FEMALE</td>\n",
       "      <td>8.41151</td>\n",
       "      <td>-26.13832</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>341</th>\n",
       "      <td>PAL0910</td>\n",
       "      <td>122</td>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N39A2</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/22/09</td>\n",
       "      <td>50.4</td>\n",
       "      <td>15.7</td>\n",
       "      <td>222.0</td>\n",
       "      <td>5750.0</td>\n",
       "      <td>MALE</td>\n",
       "      <td>8.30166</td>\n",
       "      <td>-26.04117</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>342</th>\n",
       "      <td>PAL0910</td>\n",
       "      <td>123</td>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N43A1</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/22/09</td>\n",
       "      <td>45.2</td>\n",
       "      <td>14.8</td>\n",
       "      <td>212.0</td>\n",
       "      <td>5200.0</td>\n",
       "      <td>FEMALE</td>\n",
       "      <td>8.24246</td>\n",
       "      <td>-26.11969</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>PAL0910</td>\n",
       "      <td>124</td>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "      <td>Adult, 1 Egg Stage</td>\n",
       "      <td>N43A2</td>\n",
       "      <td>Yes</td>\n",
       "      <td>11/22/09</td>\n",
       "      <td>49.9</td>\n",
       "      <td>16.1</td>\n",
       "      <td>213.0</td>\n",
       "      <td>5400.0</td>\n",
       "      <td>MALE</td>\n",
       "      <td>8.36390</td>\n",
       "      <td>-26.15531</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>344 rows × 17 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    studyName  Sample Number                              Species  Region  \\\n",
       "0     PAL0708              1  Adelie Penguin (Pygoscelis adeliae)  Anvers   \n",
       "1     PAL0708              2  Adelie Penguin (Pygoscelis adeliae)  Anvers   \n",
       "2     PAL0708              3  Adelie Penguin (Pygoscelis adeliae)  Anvers   \n",
       "3     PAL0708              4  Adelie Penguin (Pygoscelis adeliae)  Anvers   \n",
       "4     PAL0708              5  Adelie Penguin (Pygoscelis adeliae)  Anvers   \n",
       "..        ...            ...                                  ...     ...   \n",
       "339   PAL0910            120    Gentoo penguin (Pygoscelis papua)  Anvers   \n",
       "340   PAL0910            121    Gentoo penguin (Pygoscelis papua)  Anvers   \n",
       "341   PAL0910            122    Gentoo penguin (Pygoscelis papua)  Anvers   \n",
       "342   PAL0910            123    Gentoo penguin (Pygoscelis papua)  Anvers   \n",
       "343   PAL0910            124    Gentoo penguin (Pygoscelis papua)  Anvers   \n",
       "\n",
       "        Island               Stage Individual ID Clutch Completion  Date Egg  \\\n",
       "0    Torgersen  Adult, 1 Egg Stage          N1A1               Yes  11/11/07   \n",
       "1    Torgersen  Adult, 1 Egg Stage          N1A2               Yes  11/11/07   \n",
       "2    Torgersen  Adult, 1 Egg Stage          N2A1               Yes  11/16/07   \n",
       "3    Torgersen  Adult, 1 Egg Stage          N2A2               Yes  11/16/07   \n",
       "4    Torgersen  Adult, 1 Egg Stage          N3A1               Yes  11/16/07   \n",
       "..         ...                 ...           ...               ...       ...   \n",
       "339     Biscoe  Adult, 1 Egg Stage         N38A2                No   12/1/09   \n",
       "340     Biscoe  Adult, 1 Egg Stage         N39A1               Yes  11/22/09   \n",
       "341     Biscoe  Adult, 1 Egg Stage         N39A2               Yes  11/22/09   \n",
       "342     Biscoe  Adult, 1 Egg Stage         N43A1               Yes  11/22/09   \n",
       "343     Biscoe  Adult, 1 Egg Stage         N43A2               Yes  11/22/09   \n",
       "\n",
       "     Culmen Length (mm)  Culmen Depth (mm)  Flipper Length (mm)  \\\n",
       "0                  39.1               18.7                181.0   \n",
       "1                  39.5               17.4                186.0   \n",
       "2                  40.3               18.0                195.0   \n",
       "3                   NaN                NaN                  NaN   \n",
       "4                  36.7               19.3                193.0   \n",
       "..                  ...                ...                  ...   \n",
       "339                 NaN                NaN                  NaN   \n",
       "340                46.8               14.3                215.0   \n",
       "341                50.4               15.7                222.0   \n",
       "342                45.2               14.8                212.0   \n",
       "343                49.9               16.1                213.0   \n",
       "\n",
       "     Body Mass (g)     Sex  Delta 15 N (o/oo)  Delta 13 C (o/oo)  \\\n",
       "0           3750.0    MALE                NaN                NaN   \n",
       "1           3800.0  FEMALE            8.94956          -24.69454   \n",
       "2           3250.0  FEMALE            8.36821          -25.33302   \n",
       "3              NaN     NaN                NaN                NaN   \n",
       "4           3450.0  FEMALE            8.76651          -25.32426   \n",
       "..             ...     ...                ...                ...   \n",
       "339            NaN     NaN                NaN                NaN   \n",
       "340         4850.0  FEMALE            8.41151          -26.13832   \n",
       "341         5750.0    MALE            8.30166          -26.04117   \n",
       "342         5200.0  FEMALE            8.24246          -26.11969   \n",
       "343         5400.0    MALE            8.36390          -26.15531   \n",
       "\n",
       "                           Comments  \n",
       "0    Not enough blood for isotopes.  \n",
       "1                               NaN  \n",
       "2                               NaN  \n",
       "3                Adult not sampled.  \n",
       "4                               NaN  \n",
       "..                              ...  \n",
       "339                             NaN  \n",
       "340                             NaN  \n",
       "341                             NaN  \n",
       "342                             NaN  \n",
       "343                             NaN  \n",
       "\n",
       "[344 rows x 17 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(344, 17)"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.shape # (rows, columns)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "studyName               object\n",
       "Sample Number            int64\n",
       "Species                 object\n",
       "Region                  object\n",
       "Island                  object\n",
       "Stage                   object\n",
       "Individual ID           object\n",
       "Clutch Completion       object\n",
       "Date Egg                object\n",
       "Culmen Length (mm)     float64\n",
       "Culmen Depth (mm)      float64\n",
       "Flipper Length (mm)    float64\n",
       "Body Mass (g)          float64\n",
       "Sex                     object\n",
       "Delta 15 N (o/oo)      float64\n",
       "Delta 13 C (o/oo)      float64\n",
       "Comments                object\n",
       "dtype: object"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.dtypes # data type of each column. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A data type of `object` means that `pandas` isn't sure what kind of data are in the corresponding columns. This is very common when the columns contain strings. [It is possible](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html) to give string columns a dedicated data type, but we won't focus on that here. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A pleasant way to get a quick overview of the numerical columns in your data set is the `describe` method. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Sample Number</th>\n",
       "      <th>Culmen Length (mm)</th>\n",
       "      <th>Culmen Depth (mm)</th>\n",
       "      <th>Flipper Length (mm)</th>\n",
       "      <th>Body Mass (g)</th>\n",
       "      <th>Delta 15 N (o/oo)</th>\n",
       "      <th>Delta 13 C (o/oo)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>344.000000</td>\n",
       "      <td>342.000000</td>\n",
       "      <td>342.000000</td>\n",
       "      <td>342.000000</td>\n",
       "      <td>342.000000</td>\n",
       "      <td>330.000000</td>\n",
       "      <td>331.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>63.151163</td>\n",
       "      <td>43.921930</td>\n",
       "      <td>17.151170</td>\n",
       "      <td>200.915205</td>\n",
       "      <td>4201.754386</td>\n",
       "      <td>8.733382</td>\n",
       "      <td>-25.686292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>40.430199</td>\n",
       "      <td>5.459584</td>\n",
       "      <td>1.974793</td>\n",
       "      <td>14.061714</td>\n",
       "      <td>801.954536</td>\n",
       "      <td>0.551770</td>\n",
       "      <td>0.793961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>32.100000</td>\n",
       "      <td>13.100000</td>\n",
       "      <td>172.000000</td>\n",
       "      <td>2700.000000</td>\n",
       "      <td>7.632200</td>\n",
       "      <td>-27.018540</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>29.000000</td>\n",
       "      <td>39.225000</td>\n",
       "      <td>15.600000</td>\n",
       "      <td>190.000000</td>\n",
       "      <td>3550.000000</td>\n",
       "      <td>8.299890</td>\n",
       "      <td>-26.320305</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>58.000000</td>\n",
       "      <td>44.450000</td>\n",
       "      <td>17.300000</td>\n",
       "      <td>197.000000</td>\n",
       "      <td>4050.000000</td>\n",
       "      <td>8.652405</td>\n",
       "      <td>-25.833520</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>95.250000</td>\n",
       "      <td>48.500000</td>\n",
       "      <td>18.700000</td>\n",
       "      <td>213.000000</td>\n",
       "      <td>4750.000000</td>\n",
       "      <td>9.172123</td>\n",
       "      <td>-25.062050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>152.000000</td>\n",
       "      <td>59.600000</td>\n",
       "      <td>21.500000</td>\n",
       "      <td>231.000000</td>\n",
       "      <td>6300.000000</td>\n",
       "      <td>10.025440</td>\n",
       "      <td>-23.787670</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Sample Number  Culmen Length (mm)  Culmen Depth (mm)  \\\n",
       "count     344.000000          342.000000         342.000000   \n",
       "mean       63.151163           43.921930          17.151170   \n",
       "std        40.430199            5.459584           1.974793   \n",
       "min         1.000000           32.100000          13.100000   \n",
       "25%        29.000000           39.225000          15.600000   \n",
       "50%        58.000000           44.450000          17.300000   \n",
       "75%        95.250000           48.500000          18.700000   \n",
       "max       152.000000           59.600000          21.500000   \n",
       "\n",
       "       Flipper Length (mm)  Body Mass (g)  Delta 15 N (o/oo)  \\\n",
       "count           342.000000     342.000000         330.000000   \n",
       "mean            200.915205    4201.754386           8.733382   \n",
       "std              14.061714     801.954536           0.551770   \n",
       "min             172.000000    2700.000000           7.632200   \n",
       "25%             190.000000    3550.000000           8.299890   \n",
       "50%             197.000000    4050.000000           8.652405   \n",
       "75%             213.000000    4750.000000           9.172123   \n",
       "max             231.000000    6300.000000          10.025440   \n",
       "\n",
       "       Delta 13 C (o/oo)  \n",
       "count         331.000000  \n",
       "mean          -25.686292  \n",
       "std             0.793961  \n",
       "min           -27.018540  \n",
       "25%           -26.320305  \n",
       "50%           -25.833520  \n",
       "75%           -25.062050  \n",
       "max           -23.787670  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Parts of a Data Frame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When working with data frames, it's important to get comfortable with their different parts. These are: \n",
    "\n",
    "1. The index. The index is used to refer to **rows**. In many cases, you can think of the index as a unique numerical label for a row. \n",
    "2. The column names. These tell you what kinds of data appear in each row. It is important to be able to comfortably grab columns from the data frame for use in computations. \n",
    "3. The data itself. You can think of the data as a set of different arrays, one for each column name. Each array has the same length. Many of the methods of these arrays will be familiar from `np.array`s. \n",
    "\n",
    "<figure class=\"image\" style=\"width:100%\">\n",
    "  <img src=\"https://miro.medium.com/max/3452/1*6p6nF4_5XpHgcrYRrLYVAw.png\" alt=\"A data frame on mountaineering. The labels at the top of each column, such as Range and Coordinates, are highlighted and called column names. The numbers zero through nine appear vertically, giving the number of each row. A few numbers and words appearing inside the columns are highlighted, and labeled as data.\">\n",
    "  <figcaption><i>The parts of a data frame.</i></figcaption>\n",
    "</figure>\n",
    "\n",
    "Let's now begin to look at how to obtain different parts of a data frame.\n",
    "\n",
    "## Selecting Columns\n",
    "\n",
    "The easiest way to select a column of a data frame is to pass the name of the column to the DataFrame with `[]` brackets. In this way, you can think of a data frame as being similar to a dictionary whose keys are the column names. \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      Anvers\n",
       "1      Anvers\n",
       "2      Anvers\n",
       "3      Anvers\n",
       "4      Anvers\n",
       "        ...  \n",
       "339    Anvers\n",
       "340    Anvers\n",
       "341    Anvers\n",
       "342    Anvers\n",
       "343    Anvers\n",
       "Name: Region, Length: 344, dtype: object"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins['Region']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result is no longer a data frame, but rather a `pd.Series` object, which is similar to a `np.array`. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select multiple columns, pass a list of column names: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Species</th>\n",
       "      <th>Region</th>\n",
       "      <th>Island</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Adelie Penguin (Pygoscelis adeliae)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Torgersen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>339</th>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>340</th>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>341</th>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>342</th>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>343</th>\n",
       "      <td>Gentoo penguin (Pygoscelis papua)</td>\n",
       "      <td>Anvers</td>\n",
       "      <td>Biscoe</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>344 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 Species  Region     Island\n",
       "0    Adelie Penguin (Pygoscelis adeliae)  Anvers  Torgersen\n",
       "1    Adelie Penguin (Pygoscelis adeliae)  Anvers  Torgersen\n",
       "2    Adelie Penguin (Pygoscelis adeliae)  Anvers  Torgersen\n",
       "3    Adelie Penguin (Pygoscelis adeliae)  Anvers  Torgersen\n",
       "4    Adelie Penguin (Pygoscelis adeliae)  Anvers  Torgersen\n",
       "..                                   ...     ...        ...\n",
       "339    Gentoo penguin (Pygoscelis papua)  Anvers     Biscoe\n",
       "340    Gentoo penguin (Pygoscelis papua)  Anvers     Biscoe\n",
       "341    Gentoo penguin (Pygoscelis papua)  Anvers     Biscoe\n",
       "342    Gentoo penguin (Pygoscelis papua)  Anvers     Biscoe\n",
       "343    Gentoo penguin (Pygoscelis papua)  Anvers     Biscoe\n",
       "\n",
       "[344 rows x 3 columns]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "penguins[['Species', 'Region', 'Island']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This time, the result is a data frame containing the specified columns. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# L1\n",
    "# intro, parts of a df, select\n",
    "# indexing and filter\n",
    "\n",
    "\n",
    "# L2\n",
    "# mutate\n",
    "# group_by, summarise\n",
    "\n",
    "# L3\n",
    "# code patterns with visualization "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}