{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Joining Data\n",
    "\n",
    "> It’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in.\n",
    ">\n",
    "> \\- Garrett Grolemund, Master Instructor, RStudio"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Applied Review"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Functions vs. Methods"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* There are two types of operations in Python: **functions** and **methods**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* **Functions** are standalone operations from a module -- `print()` is a function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Hello\n"
     ]
    }
   ],
   "source": [
    "print(\"Hello\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* **Methods** are operations that are encapsulated within Python objects -- `DataFrame.head()` is a method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>dep_time</th>\n",
       "      <th>sched_dep_time</th>\n",
       "      <th>dep_delay</th>\n",
       "      <th>arr_time</th>\n",
       "      <th>sched_arr_time</th>\n",
       "      <th>arr_delay</th>\n",
       "      <th>carrier</th>\n",
       "      <th>flight</th>\n",
       "      <th>tailnum</th>\n",
       "      <th>origin</th>\n",
       "      <th>dest</th>\n",
       "      <th>air_time</th>\n",
       "      <th>distance</th>\n",
       "      <th>hour</th>\n",
       "      <th>minute</th>\n",
       "      <th>time_hour</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>517.0</td>\n",
       "      <td>515</td>\n",
       "      <td>2.0</td>\n",
       "      <td>830.0</td>\n",
       "      <td>819</td>\n",
       "      <td>11.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1545</td>\n",
       "      <td>N14228</td>\n",
       "      <td>EWR</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1400</td>\n",
       "      <td>5</td>\n",
       "      <td>15</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>533.0</td>\n",
       "      <td>529</td>\n",
       "      <td>4.0</td>\n",
       "      <td>850.0</td>\n",
       "      <td>830</td>\n",
       "      <td>20.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1714</td>\n",
       "      <td>N24211</td>\n",
       "      <td>LGA</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1416</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>542.0</td>\n",
       "      <td>540</td>\n",
       "      <td>2.0</td>\n",
       "      <td>923.0</td>\n",
       "      <td>850</td>\n",
       "      <td>33.0</td>\n",
       "      <td>AA</td>\n",
       "      <td>1141</td>\n",
       "      <td>N619AA</td>\n",
       "      <td>JFK</td>\n",
       "      <td>MIA</td>\n",
       "      <td>160.0</td>\n",
       "      <td>1089</td>\n",
       "      <td>5</td>\n",
       "      <td>40</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>544.0</td>\n",
       "      <td>545</td>\n",
       "      <td>-1.0</td>\n",
       "      <td>1004.0</td>\n",
       "      <td>1022</td>\n",
       "      <td>-18.0</td>\n",
       "      <td>B6</td>\n",
       "      <td>725</td>\n",
       "      <td>N804JB</td>\n",
       "      <td>JFK</td>\n",
       "      <td>BQN</td>\n",
       "      <td>183.0</td>\n",
       "      <td>1576</td>\n",
       "      <td>5</td>\n",
       "      <td>45</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>554.0</td>\n",
       "      <td>600</td>\n",
       "      <td>-6.0</td>\n",
       "      <td>812.0</td>\n",
       "      <td>837</td>\n",
       "      <td>-25.0</td>\n",
       "      <td>DL</td>\n",
       "      <td>461</td>\n",
       "      <td>N668DN</td>\n",
       "      <td>LGA</td>\n",
       "      <td>ATL</td>\n",
       "      <td>116.0</td>\n",
       "      <td>762</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>2013-01-01 06:00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  month  day  dep_time  sched_dep_time  dep_delay  arr_time  \\\n",
       "0  2013      1    1     517.0             515        2.0     830.0   \n",
       "1  2013      1    1     533.0             529        4.0     850.0   \n",
       "2  2013      1    1     542.0             540        2.0     923.0   \n",
       "3  2013      1    1     544.0             545       -1.0    1004.0   \n",
       "4  2013      1    1     554.0             600       -6.0     812.0   \n",
       "\n",
       "   sched_arr_time  arr_delay carrier  flight tailnum origin dest  air_time  \\\n",
       "0             819       11.0      UA    1545  N14228    EWR  IAH     227.0   \n",
       "1             830       20.0      UA    1714  N24211    LGA  IAH     227.0   \n",
       "2             850       33.0      AA    1141  N619AA    JFK  MIA     160.0   \n",
       "3            1022      -18.0      B6     725  N804JB    JFK  BQN     183.0   \n",
       "4             837      -25.0      DL     461  N668DN    LGA  ATL     116.0   \n",
       "\n",
       "   distance  hour  minute            time_hour  \n",
       "0      1400     5      15  2013-01-01 05:00:00  \n",
       "1      1416     5      29  2013-01-01 05:00:00  \n",
       "2      1089     5      40  2013-01-01 05:00:00  \n",
       "3      1576     5      45  2013-01-01 05:00:00  \n",
       "4       762     6       0  2013-01-01 06:00:00  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "flights_df = pd.read_csv('../data/flights.csv')\n",
    "flights_df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### DataFrame Structure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* Each DataFrame variable is a **Series** and can be accessed with bracket subsetting notation: `DataFrame['SeriesName']`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* The DataFrame has an **Index** that is visible on the far left side"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## General Model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Combining Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* We frequently want to use more than one table at once, so we need to combine them in some way"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* Because tables are two-dimensional, we can combine them **vertically** and **horizontally**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Combining data **vertically** is known as **appending**/**unioning**/**concatenating**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Combiding data **horizontally** is known as **joining**/**merging**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Appending Data Vertically"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "tags": []
   },
   "source": [
    "* When we combine data **vertically**, we are stacking tables on top of one another:\n",
    "\n",
    "<center>\n",
    "<img src=\"images/combine-vertically.png\" alt=\"combine-vertically.png\" width=\"70%\" height=\"70%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>This is particularly useful when all columns are the same between the two tables.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Joining Data Horizontally"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* When we combine data **horizontally**, we are attaching the tables at their sides:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "![combine-horizontally.png](images/combine-horizontally.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* Note that the rows do not need to be in the same order to join/merge two tables:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "![combine-horizontally-unordered.png](images/combine-horizontally-unordered.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* The joining occurs by matching on a **key column**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "![combine-horizontally-key.png](images/combine-horizontally-key.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Combining DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Appending DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "* When we combine DataFrames vertically, we want to stack two DataFrames on top of one another"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Let's start by creating two DataFrames with the same variables:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "cell_style": "split",
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  1  a\n",
       "1  2  b"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_1 = pd.DataFrame({'x': [1, 2], 'y': ['a', 'b']})\n",
    "df_1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "cell_style": "split",
    "scrolled": true,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  3  c\n",
       "1  4  d"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_2 = pd.DataFrame({'x': [3, 4], 'y': ['c', 'd']})\n",
    "df_2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can stack `df_1` and `df_2` on top of one another using the `concat()` function from `pandas` with a list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  1  a\n",
       "1  2  b\n",
       "0  3  c\n",
       "1  4  d"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df_1, df_2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Question</p></b>\n",
    "    <p>Does anything about this result seem weird?</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "The **Index** is repeating..."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can add the `ignore_index = True` to make the Index reset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  1  a\n",
       "1  2  b\n",
       "2  3  c\n",
       "3  4  d"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df_1, df_2], ignore_index = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We can also use the `DataFrame.reset_index()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>x</th>\n",
       "      <th>y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   x  y\n",
       "0  1  a\n",
       "1  2  b\n",
       "2  3  c\n",
       "3  4  d"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat([df_1, df_2]).reset_index(drop = True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "<div class=\"admonition warning alert alert-danger\">\n",
    " <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Caution!</p></b>\n",
    " <p>Using pd.concat() to vertically combine dataframes should only be used when we know that the DataFrames' schemas are consistent.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Joining DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* Joining DataFrames may be one of the most important skills to learn in Python"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* As a reminder, joining DataFrames is the horizontal combining of two DataFrames on some **key column**:\n",
    "\n",
    "![combine-horizontally-key.png](images/combine-horizontally-key.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* We have `flights_df`, but we need another DataFrame to join to `flights_df` that has a common **key column**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* As an example, assume we want to know which airline carried each flight in `flights_df`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "airlines_df = pd.read_csv('../data/airlines.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "* Now let's examine the columns/variables in our two DataFrames using the `DataFrame.columns` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['year', 'month', 'day', 'dep_time', 'sched_dep_time', 'dep_delay',\n",
       "       'arr_time', 'sched_arr_time', 'arr_delay', 'carrier', 'flight',\n",
       "       'tailnum', 'origin', 'dest', 'air_time', 'distance', 'hour', 'minute',\n",
       "       'time_hour'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flights_df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['carrier', 'name'], dtype='object')"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "airlines_df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition tip alert alert-warning\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Question</p></b>\n",
    "    <p>Which column should be our key column?</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    },
    "tags": []
   },
   "source": [
    "The `carrier` column is our key because it's in both DataFrames.\n",
    "\n",
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Tip!</p></b>\n",
    "    <p>There is an intersection() method that makes it easy to find common columns between two DataFrames.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['carrier'], dtype='object')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flights_df.columns.intersection(airlines_df.columns)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "We can join/merge the DataFrames together using the `merge()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>dep_time</th>\n",
       "      <th>sched_dep_time</th>\n",
       "      <th>dep_delay</th>\n",
       "      <th>arr_time</th>\n",
       "      <th>sched_arr_time</th>\n",
       "      <th>arr_delay</th>\n",
       "      <th>carrier</th>\n",
       "      <th>flight</th>\n",
       "      <th>tailnum</th>\n",
       "      <th>origin</th>\n",
       "      <th>dest</th>\n",
       "      <th>air_time</th>\n",
       "      <th>distance</th>\n",
       "      <th>hour</th>\n",
       "      <th>minute</th>\n",
       "      <th>time_hour</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>517.0</td>\n",
       "      <td>515</td>\n",
       "      <td>2.0</td>\n",
       "      <td>830.0</td>\n",
       "      <td>819</td>\n",
       "      <td>11.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1545</td>\n",
       "      <td>N14228</td>\n",
       "      <td>EWR</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1400</td>\n",
       "      <td>5</td>\n",
       "      <td>15</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>533.0</td>\n",
       "      <td>529</td>\n",
       "      <td>4.0</td>\n",
       "      <td>850.0</td>\n",
       "      <td>830</td>\n",
       "      <td>20.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1714</td>\n",
       "      <td>N24211</td>\n",
       "      <td>LGA</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1416</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  month  day  dep_time  sched_dep_time  dep_delay  arr_time  \\\n",
       "0  2013      1    1     517.0             515        2.0     830.0   \n",
       "1  2013      1    1     533.0             529        4.0     850.0   \n",
       "\n",
       "   sched_arr_time  arr_delay carrier  flight tailnum origin dest  air_time  \\\n",
       "0             819       11.0      UA    1545  N14228    EWR  IAH     227.0   \n",
       "1             830       20.0      UA    1714  N24211    LGA  IAH     227.0   \n",
       "\n",
       "   distance  hour  minute            time_hour                   name  \n",
       "0      1400     5      15  2013-01-01 05:00:00  United Air Lines Inc.  \n",
       "1      1416     5      29  2013-01-01 05:00:00  United Air Lines Inc.  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(flights_df, airlines_df, on = 'carrier').head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "* This joined `flights_df` and `airlines_df` together to attach the `name` from `airlines_df` to each flight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "1\\. Import `planes.csv` into a DataFrame named `planes_df`\n",
    "\n",
    "2\\. Identify the common column names between the two DataFrames.\n",
    "\n",
    "3\\. Fill in the blanks below to join the `flights_df` to `planes_df`:\n",
    "\n",
    "```python\n",
    "pd._____(flights_df, planes_df, on = '_____')\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Join Types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Inner Joins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "All of our joins have been **inner joins**:\n",
    "\n",
    "<center>\n",
    "<img src=\"images/inner-join.png\" alt=\"inner-join.png\" width=\"900\" height=\"900\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p><b>Inner joins</b> only keep rows where the key is in <i>both tables</i>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Left Joins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Sometimes we only want to include data that is **in the left table** regardless of whether it's in the right table:\n",
    "\n",
    "<center>\n",
    "<img src=\"images/left-outer-join.png\" alt=\"left-outer-join.png\" width=\"90%\" height=\"90%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>Left outer joins, or simply <b>left joins</b>, keep rows where the key is in the <i>left table</i>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Right Joins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Sometimes we only want to include data that is **in the right table** regardless of whether it's in the left table:\n",
    "\n",
    "<center>\n",
    "<img src=\"images/right-outer-join.png\" alt=\"right-outer-join.png\" width=\"90%\" height=\"90%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>Right outer joins, or simply <b>right joins</b>, keep rows where the key is in the <i>right table</i>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Outer Joins"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "Sometimes we want to include **all rows** in either the left table or the right table:\n",
    "\n",
    "<center>\n",
    "<img src=\"images/full-outer-join.png\" alt=\"full-outer-join.png\" width=\"90%\" height=\"90%\">\n",
    "</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "cell_style": "center",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<div class=\"admonition note alert alert-info\">\n",
    "    <b><p class=\"first admonition-title\" style=\"font-weight: bold\">Note</p></b>\n",
    "    <p>Full outer joins keep <i>all rows in both tables</i>.</p>\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Applying Different Join Types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "We can apply these different join types using the `how` parameter of the `merge()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>dep_time</th>\n",
       "      <th>sched_dep_time</th>\n",
       "      <th>dep_delay</th>\n",
       "      <th>arr_time</th>\n",
       "      <th>sched_arr_time</th>\n",
       "      <th>arr_delay</th>\n",
       "      <th>carrier</th>\n",
       "      <th>flight</th>\n",
       "      <th>tailnum</th>\n",
       "      <th>origin</th>\n",
       "      <th>dest</th>\n",
       "      <th>air_time</th>\n",
       "      <th>distance</th>\n",
       "      <th>hour</th>\n",
       "      <th>minute</th>\n",
       "      <th>time_hour</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>517.0</td>\n",
       "      <td>515</td>\n",
       "      <td>2.0</td>\n",
       "      <td>830.0</td>\n",
       "      <td>819</td>\n",
       "      <td>11.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1545</td>\n",
       "      <td>N14228</td>\n",
       "      <td>EWR</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1400</td>\n",
       "      <td>5</td>\n",
       "      <td>15</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>533.0</td>\n",
       "      <td>529</td>\n",
       "      <td>4.0</td>\n",
       "      <td>850.0</td>\n",
       "      <td>830</td>\n",
       "      <td>20.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1714</td>\n",
       "      <td>N24211</td>\n",
       "      <td>LGA</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1416</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>554.0</td>\n",
       "      <td>558</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>740.0</td>\n",
       "      <td>728</td>\n",
       "      <td>12.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1696</td>\n",
       "      <td>N39463</td>\n",
       "      <td>EWR</td>\n",
       "      <td>ORD</td>\n",
       "      <td>150.0</td>\n",
       "      <td>719</td>\n",
       "      <td>5</td>\n",
       "      <td>58</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  month  day  dep_time  sched_dep_time  dep_delay  arr_time  \\\n",
       "0  2013      1    1     517.0             515        2.0     830.0   \n",
       "1  2013      1    1     533.0             529        4.0     850.0   \n",
       "2  2013      1    1     554.0             558       -4.0     740.0   \n",
       "\n",
       "   sched_arr_time  arr_delay carrier  flight tailnum origin dest  air_time  \\\n",
       "0             819       11.0      UA    1545  N14228    EWR  IAH     227.0   \n",
       "1             830       20.0      UA    1714  N24211    LGA  IAH     227.0   \n",
       "2             728       12.0      UA    1696  N39463    EWR  ORD     150.0   \n",
       "\n",
       "   distance  hour  minute            time_hour                   name  \n",
       "0      1400     5      15  2013-01-01 05:00:00  United Air Lines Inc.  \n",
       "1      1416     5      29  2013-01-01 05:00:00  United Air Lines Inc.  \n",
       "2       719     5      58  2013-01-01 05:00:00  United Air Lines Inc.  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(flights_df, airlines_df, on = 'carrier', how = 'inner').head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "While `how = 'inner'` is the default, we can also use `'left'`, `'right'`, and `'outer'`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>year</th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>dep_time</th>\n",
       "      <th>sched_dep_time</th>\n",
       "      <th>dep_delay</th>\n",
       "      <th>arr_time</th>\n",
       "      <th>sched_arr_time</th>\n",
       "      <th>arr_delay</th>\n",
       "      <th>carrier</th>\n",
       "      <th>flight</th>\n",
       "      <th>tailnum</th>\n",
       "      <th>origin</th>\n",
       "      <th>dest</th>\n",
       "      <th>air_time</th>\n",
       "      <th>distance</th>\n",
       "      <th>hour</th>\n",
       "      <th>minute</th>\n",
       "      <th>time_hour</th>\n",
       "      <th>name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>517.0</td>\n",
       "      <td>515</td>\n",
       "      <td>2.0</td>\n",
       "      <td>830.0</td>\n",
       "      <td>819</td>\n",
       "      <td>11.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1545</td>\n",
       "      <td>N14228</td>\n",
       "      <td>EWR</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1400</td>\n",
       "      <td>5</td>\n",
       "      <td>15</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>533.0</td>\n",
       "      <td>529</td>\n",
       "      <td>4.0</td>\n",
       "      <td>850.0</td>\n",
       "      <td>830</td>\n",
       "      <td>20.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1714</td>\n",
       "      <td>N24211</td>\n",
       "      <td>LGA</td>\n",
       "      <td>IAH</td>\n",
       "      <td>227.0</td>\n",
       "      <td>1416</td>\n",
       "      <td>5</td>\n",
       "      <td>29</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2013</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>554.0</td>\n",
       "      <td>558</td>\n",
       "      <td>-4.0</td>\n",
       "      <td>740.0</td>\n",
       "      <td>728</td>\n",
       "      <td>12.0</td>\n",
       "      <td>UA</td>\n",
       "      <td>1696</td>\n",
       "      <td>N39463</td>\n",
       "      <td>EWR</td>\n",
       "      <td>ORD</td>\n",
       "      <td>150.0</td>\n",
       "      <td>719</td>\n",
       "      <td>5</td>\n",
       "      <td>58</td>\n",
       "      <td>2013-01-01 05:00:00</td>\n",
       "      <td>United Air Lines Inc.</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   year  month  day  dep_time  sched_dep_time  dep_delay  arr_time  \\\n",
       "0  2013      1    1     517.0             515        2.0     830.0   \n",
       "1  2013      1    1     533.0             529        4.0     850.0   \n",
       "2  2013      1    1     554.0             558       -4.0     740.0   \n",
       "\n",
       "   sched_arr_time  arr_delay carrier  flight tailnum origin dest  air_time  \\\n",
       "0             819       11.0      UA    1545  N14228    EWR  IAH     227.0   \n",
       "1             830       20.0      UA    1714  N24211    LGA  IAH     227.0   \n",
       "2             728       12.0      UA    1696  N39463    EWR  ORD     150.0   \n",
       "\n",
       "   distance  hour  minute            time_hour                   name  \n",
       "0      1400     5      15  2013-01-01 05:00:00  United Air Lines Inc.  \n",
       "1      1416     5      29  2013-01-01 05:00:00  United Air Lines Inc.  \n",
       "2       719     5      58  2013-01-01 05:00:00  United Air Lines Inc.  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(flights_df, airlines_df, on = 'carrier', how = 'outer').head(3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Your Turn\n",
    "\n",
    "1. What type of join includes the rows where the key is in the left table regardless of whether the key is in the right table?\n",
    "\n",
    "2. Join the `flights_df` to `planes_df` and keep all rows from both tables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# Questions\n",
    "\n",
    "Are there questions before we move on?"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Slideshow",
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  },
  "rise": {
   "autolaunch": true,
   "transition": "none"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}