{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Data School: 21 more pandas tricks ([video](https://www.youtube.com/watch?v=tWFQqaRtSQA&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=36))\n",
    "\n",
    "**See also:** \"My top 25 pandas tricks\" [video](https://www.youtube.com/watch?v=RlIiVeig3hc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=35) and [notebook](https://nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/top_25_pandas_tricks.ipynb)\n",
    "\n",
    "- Watch the [complete pandas video series](https://www.dataschool.io/easier-data-analysis-with-pandas/)\n",
    "- Connect on [Twitter](https://twitter.com/justmarkham), [Facebook](https://www.facebook.com/DataScienceSchool/), and [LinkedIn](https://www.linkedin.com/in/justmarkham/)\n",
    "- Subscribe on [YouTube](https://www.youtube.com/dataschool?sub_confirmation=1)\n",
    "- Join the [email newsletter](https://www.dataschool.io/subscribe/)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Table of contents\n",
    "\n",
    "1. <a href=\"#1.-Check-for-equality\">Check for equality</a>\n",
    "2. <a href=\"#2.-Check-for-equality-(alternative)\">Check for equality (alternative)</a>\n",
    "3. <a href=\"#3.-Use-NumPy-without-importing-NumPy\">Use NumPy without importing NumPy</a>\n",
    "4. <a href=\"#4.-Calculate-memory-usage\">Calculate memory usage</a>\n",
    "5. <a href=\"#5.-Count-the-number-of-words-in-a-column\">Count the number of words in a column</a>\n",
    "6. <a href=\"#6.-Convert-one-set-of-values-to-another\">Convert one set of values to another</a>\n",
    "7. <a href=\"#7.-Convert-continuous-data-into-categorical-data-(alternative)\">Convert continuous data into categorical data (alternative)</a>\n",
    "8. <a href=\"#8.-Create-a-cross-tabulation\">Create a cross-tabulation</a>\n",
    "9. <a href=\"#9.-Create-a-datetime-column-from-multiple-columns\">Create a datetime column from multiple columns</a>\n",
    "10. <a href=\"#10.-Resample-a-datetime-column\">Resample a datetime column</a>\n",
    "11. <a href=\"#11.-Read-and-write-from-compressed-files\">Read and write from compressed files</a>\n",
    "12. <a href=\"#12.-Fill-missing-values-using-interpolation\">Fill missing values using interpolation</a>\n",
    "13. <a href=\"#13.-Check-for-duplicate-merge-keys\">Check for duplicate merge keys</a>\n",
    "14. <a href=\"#14.-Transpose-a-wide-DataFrame\">Transpose a wide DataFrame</a>\n",
    "15. <a href=\"#15.-Create-an-example-DataFrame-(alternative)\">Create an example DataFrame (alternative)</a>\n",
    "16. <a href=\"#16.-Identify-rows-that-are-missing-from-a-DataFrame\">Identify rows that are missing from a DataFrame</a>\n",
    "17. <a href=\"#17.-Use-query-to-avoid-intermediate-variables\">Use query to avoid intermediate variables</a>\n",
    "18. <a href=\"#18.-Reshape-a-DataFrame-from-wide-format-to-long-format\">Reshape a DataFrame from wide format to long format</a>\n",
    "19. <a href=\"#19.-Reverse-row-order-(alternative)\">Reverse row order (alternative)</a>\n",
    "20. <a href=\"#20.-Reverse-column-order-(alternative)\">Reverse column order (alternative)</a>\n",
    "21. <a href=\"#21.-Split-a-string-into-multiple-columns-(alternative)\">Split a string into multiple columns (alternative)</a>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "drinks = pd.read_csv('http://bit.ly/drinksbycountry')\n",
    "stocks = pd.read_csv('http://bit.ly/smallstocks', parse_dates=['Date'])\n",
    "titanic = pd.read_csv('http://bit.ly/kaggletrain')\n",
    "ufo = pd.read_csv('http://bit.ly/uforeports', parse_dates=['Time'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Check for equality"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create an example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2.0</td>\n",
       "      <td>2.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     a    b\n",
       "0  1.0  1.0\n",
       "1  2.0  2.0\n",
       "2  NaN  NaN"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'a':[1, 2, np.nan], 'b':[1, 2, np.nan]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Do you ever have two DataFrame columns that look similar, and you want to know if they are actually identical?\n",
    "\n",
    "This is not a reliable method for checking:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1     True\n",
       "2    False\n",
       "dtype: bool"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.a == df.b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You would think that would return 3 `True` values, but it actually returns `False` any time there is a missing value:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.nan == np.nan"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Instead, you can check for equality using the `equals()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.a.equals(df.b)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similarly, this is how you would check if two DataFrames are identical:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_new = df.copy()\n",
    "df_new.equals(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We made a copy of \"df\" and then used the DataFrame `equals()` method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Check for equality (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create another example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.000005</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   c    d         e\n",
       "0  1  1.0  1.000000\n",
       "1  2  2.0  2.000000\n",
       "2  3  3.0  3.000005"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'c':[1, 2, 3], 'd':[1.0, 2.0, 3.0], 'e':[1.0, 2.0, 3.000005]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It's important to note that the `equals()` method (shown in the first trick) requires identical data types in order to return `True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.c.equals(df.d)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This returned `False` because \"c\" is integer and \"d\" is float.\n",
    "\n",
    "For more flexibility in how the equality checking is done, use the `assert_series_equal()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_series_equal(df.c, df.d, check_names=False, check_dtype=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The assertion passed (thus no error was raised) because we specified that data type can be ignored.\n",
    "\n",
    "As well, you can check whether values are approximately equal, rather than identical:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "pd.testing.assert_series_equal(df.d, df.e, check_names=False, check_exact=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The assertion passed even though \"d\" and \"e\" have slightly different values.\n",
    "\n",
    "For checking DataFrames, there's a similar function called `assert_frame_equal()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_new = df.copy()\n",
    "pd.testing.assert_frame_equal(df, df_new)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Use NumPy without importing NumPy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although pandas is mostly a superset of NumPy's functionality, there are occasions on which you still have to import NumPy. One example is if you want to create a DataFrame of random values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.548814</td>\n",
       "      <td>0.715189</td>\n",
       "      <td>0.602763</td>\n",
       "      <td>0.544883</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.423655</td>\n",
       "      <td>0.645894</td>\n",
       "      <td>0.437587</td>\n",
       "      <td>0.891773</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2         3\n",
       "0  0.548814  0.715189  0.602763  0.544883\n",
       "1  0.423655  0.645894  0.437587  0.891773"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.random.seed(0)\n",
    "pd.DataFrame(np.random.rand(2, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, it turns out that you can actually access all of NumPy's functionality from within pandas, simply by typing `pd.np.` before the NumPy function name:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/zn/8p5wr_855bjbd9s6fvn6wd7w0000gn/T/ipykernel_39811/2633834213.py:2: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.\n",
      "  pd.np.random.seed(0)\n",
      "/var/folders/zn/8p5wr_855bjbd9s6fvn6wd7w0000gn/T/ipykernel_39811/2633834213.py:3: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.\n",
      "  pd.DataFrame(pd.np.random.rand(2, 4))\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.548814</td>\n",
       "      <td>0.715189</td>\n",
       "      <td>0.602763</td>\n",
       "      <td>0.544883</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.423655</td>\n",
       "      <td>0.645894</td>\n",
       "      <td>0.437587</td>\n",
       "      <td>0.891773</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2         3\n",
       "0  0.548814  0.715189  0.602763  0.544883\n",
       "1  0.423655  0.645894  0.437587  0.891773"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.np is deprecated in pandas 1.0\n",
    "pd.np.random.seed(0)\n",
    "pd.DataFrame(pd.np.random.rand(2, 4))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To be clear, this would have worked even if we had not explicitly imported NumPy at the start of the notebook.\n",
    "\n",
    "This could also be used to set a value as missing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/zn/8p5wr_855bjbd9s6fvn6wd7w0000gn/T/ipykernel_39811/4135636971.py:1: FutureWarning: The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead.\n",
      "  df.loc[0, 'e'] = pd.np.nan\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>2.0</td>\n",
       "      <td>2.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.000005</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   c    d         e\n",
       "0  1  1.0       NaN\n",
       "1  2  2.0  2.000000\n",
       "2  3  3.0  3.000005"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.np is deprecated in pandas 1.0\n",
    "df.loc[0, 'e'] = pd.np.nan\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That being said, I would still recommend following the convention of `import numpy as np` rather than using `pd.np` since that convention is so widespread."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Calculate memory usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a DataFrame of UFO sightings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "      <td>1930-06-01 22:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "      <td>1930-06-30 20:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "      <td>1931-02-15 14:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "      <td>1931-06-01 13:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "      <td>1933-04-18 19:00:00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   City Colors Reported Shape Reported State  \\\n",
       "0                Ithaca             NaN       TRIANGLE    NY   \n",
       "1           Willingboro             NaN          OTHER    NJ   \n",
       "2               Holyoke             NaN           OVAL    CO   \n",
       "3               Abilene             NaN           DISK    KS   \n",
       "4  New York Worlds Fair             NaN          LIGHT    NY   \n",
       "\n",
       "                 Time  \n",
       "0 1930-06-01 22:00:00  \n",
       "1 1930-06-30 20:00:00  \n",
       "2 1931-02-15 14:00:00  \n",
       "3 1931-06-01 13:00:00  \n",
       "4 1933-04-18 19:00:00  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can calculate the memory used by the entire DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 18241 entries, 0 to 18240\n",
      "Data columns (total 5 columns):\n",
      " #   Column           Non-Null Count  Dtype         \n",
      "---  ------           --------------  -----         \n",
      " 0   City             18216 non-null  object        \n",
      " 1   Colors Reported  2882 non-null   object        \n",
      " 2   Shape Reported   15597 non-null  object        \n",
      " 3   State            18241 non-null  object        \n",
      " 4   Time             18241 non-null  datetime64[ns]\n",
      "dtypes: datetime64[ns](1), object(4)\n",
      "memory usage: 4.0 MB\n"
     ]
    }
   ],
   "source": [
    "ufo.info(memory_usage='deep')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also calculate memory used by each column (in bytes):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index                  128\n",
       "City               1205618\n",
       "Colors Reported     671313\n",
       "Shape Reported     1065230\n",
       "State              1076219\n",
       "Time                145928\n",
       "dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.memory_usage(deep=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This information might help you to decide how to optimize your DataFrame storage."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Count the number of words in a column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's count the values in this column from the \"ufo\" DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RED                       780\n",
       "GREEN                     531\n",
       "ORANGE                    528\n",
       "BLUE                      450\n",
       "YELLOW                    169\n",
       "RED GREEN                  89\n",
       "RED BLUE                   78\n",
       "RED ORANGE                 44\n",
       "GREEN BLUE                 34\n",
       "RED GREEN BLUE             33\n",
       "ORANGE YELLOW              26\n",
       "RED YELLOW                 25\n",
       "ORANGE GREEN               23\n",
       "YELLOW GREEN               17\n",
       "ORANGE BLUE                10\n",
       "RED YELLOW GREEN            9\n",
       "YELLOW BLUE                 6\n",
       "YELLOW GREEN BLUE           5\n",
       "ORANGE GREEN BLUE           5\n",
       "RED YELLOW GREEN BLUE       4\n",
       "RED ORANGE YELLOW           4\n",
       "RED YELLOW BLUE             3\n",
       "RED ORANGE GREEN            3\n",
       "RED ORANGE BLUE             3\n",
       "RED ORANGE YELLOW BLUE      1\n",
       "ORANGE YELLOW GREEN         1\n",
       "ORANGE YELLOW BLUE          1\n",
       "Name: Colors Reported, dtype: int64"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo['Colors Reported'].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that many of the entries have mulitple colors. What if all we cared about was the number of colors, and not the colors themselves?\n",
    "\n",
    "We can count the colors by using a string method to count the number of spaces, and then add 1:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.0    2458\n",
       "2.0     352\n",
       "3.0      67\n",
       "4.0       5\n",
       "Name: Colors Reported, dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(ufo['Colors Reported'].str.count(' ') + 1).value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Convert one set of values to another"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using the Titanic dataset as an example, I'm going to highlight three different ways that you can convert one set of values to another.\n",
    "\n",
    "Let's start with the \"Sex\" column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      male\n",
       "1    female\n",
       "2    female\n",
       "3    female\n",
       "4      male\n",
       "Name: Sex, dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.Sex.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two different values in this column. If you need to convert these values to 0 and 1, you can use the `map()` method and pass it a dictionary specifying how you want to map the values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    1\n",
       "3    1\n",
       "4    0\n",
       "Name: Sex_num, dtype: int64"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic['Sex_num'] = titanic.Sex.map({'male':0, 'female':1})\n",
    "titanic.Sex_num.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we specified, \"male\" has become 0 and \"female\" has become 1.\n",
    "\n",
    "Next, let's look at the \"Embarked\" column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    S\n",
       "1    C\n",
       "2    S\n",
       "3    S\n",
       "4    S\n",
       "5    Q\n",
       "6    S\n",
       "7    S\n",
       "8    S\n",
       "9    C\n",
       "Name: Embarked, dtype: object"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.Embarked.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are three different values in this column: S, C, and Q. If you need to convert them to 0, 1, and 2, you could use the `map()` method, but the `factorize()` method is even easier:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    0\n",
       "3    0\n",
       "4    0\n",
       "5    2\n",
       "6    0\n",
       "7    0\n",
       "8    0\n",
       "9    1\n",
       "Name: Embarked_num, dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic['Embarked_num'] = titanic.Embarked.factorize()[0]\n",
    "titanic.Embarked_num.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`factorize()` returns a tuple in which the first element contains the new values, which is why I had to use `[0]` to extract the values.\n",
    "\n",
    "You can see that \"S\" has become 0, \"C\" has become 1, and \"Q\" has become 2. It chose that mapping based on the order in which the values appear in the Series, and if you need to reference the mapping, it's stored in the second value in the tuple:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['S', 'C', 'Q'], dtype='object')"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.Embarked.factorize()[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's look at the \"SibSp\" column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    1\n",
       "2    0\n",
       "3    1\n",
       "4    0\n",
       "5    0\n",
       "6    0\n",
       "7    3\n",
       "8    0\n",
       "9    1\n",
       "Name: SibSp, dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.SibSp.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's say that you needed to keep the zeros as-is and convert all other values to one. You can express this as a condition, `SibSp > 0`, which will return a boolean Series that you can convert to integers using the `astype()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    1\n",
       "2    0\n",
       "3    1\n",
       "4    0\n",
       "5    0\n",
       "6    0\n",
       "7    1\n",
       "8    0\n",
       "9    1\n",
       "Name: SibSp_binary, dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic['SibSp_binary'] = (titanic.SibSp > 0).astype('int')\n",
    "titanic.SibSp_binary.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the only value greater than 1 has been converted to a 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Convert continuous data into categorical data (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the main tricks video, I used the `cut()` function to convert the \"Age\" column from continuous to categorical data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    young adult\n",
       "1          adult\n",
       "2          adult\n",
       "3          adult\n",
       "4          adult\n",
       "5            NaN\n",
       "6          adult\n",
       "7          child\n",
       "8          adult\n",
       "9          child\n",
       "Name: Age, dtype: category\n",
       "Categories (3, object): ['child' < 'young adult' < 'adult']"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(titanic.Age, bins=[0, 18, 25, 99], labels=['child', 'young adult', 'adult']).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using `cut()`, we had to choose the edges of each bin. But if you want pandas to choose the bin edges for you, you can use the `qcut()` function instead:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    (0.419, 23.0]\n",
       "1     (34.0, 80.0]\n",
       "2     (23.0, 34.0]\n",
       "3     (34.0, 80.0]\n",
       "4     (34.0, 80.0]\n",
       "5              NaN\n",
       "6     (34.0, 80.0]\n",
       "7    (0.419, 23.0]\n",
       "8     (23.0, 34.0]\n",
       "9    (0.419, 23.0]\n",
       "Name: Age, dtype: category\n",
       "Categories (3, interval[float64, right]): [(0.419, 23.0] < (23.0, 34.0] < (34.0, 80.0]]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.qcut(titanic.Age, q=3).head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We told `qcut()` to create 3 bins, and it chose bin edges that would result in bins of approximately equal size:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0.419, 23.0]    246\n",
       "(34.0, 80.0]     236\n",
       "(23.0, 34.0]     232\n",
       "Name: Age, dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.qcut(titanic.Age, q=3).value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see, the three bins are ages 0 to 23, 23 to 34, and 34 to 80, and they all contain roughly the same number of observations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Create a cross-tabulation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes you just want to count the number of observations in each category. If you're interested in a single column, you would use the `value_counts()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "male      577\n",
       "female    314\n",
       "Name: Sex, dtype: int64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.Sex.value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But if you want to count the number of observations that appear in each combination of categories, you would use the `crosstab()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Pclass</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>94</td>\n",
       "      <td>76</td>\n",
       "      <td>144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>122</td>\n",
       "      <td>108</td>\n",
       "      <td>347</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Pclass    1    2    3\n",
       "Sex                  \n",
       "female   94   76  144\n",
       "male    122  108  347"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(titanic.Sex, titanic.Pclass)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Just like a pivot table, you can include row and column totals by setting `margins=True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Pclass</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>94</td>\n",
       "      <td>76</td>\n",
       "      <td>144</td>\n",
       "      <td>314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>122</td>\n",
       "      <td>108</td>\n",
       "      <td>347</td>\n",
       "      <td>577</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>216</td>\n",
       "      <td>184</td>\n",
       "      <td>491</td>\n",
       "      <td>891</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Pclass    1    2    3  All\n",
       "Sex                       \n",
       "female   94   76  144  314\n",
       "male    122  108  347  577\n",
       "All     216  184  491  891"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(titanic.Sex, titanic.Pclass, margins=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, you can actually create this same table using the `pivot_table()` method with `'count'` as the aggregation function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Pclass</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Sex</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>female</th>\n",
       "      <td>94</td>\n",
       "      <td>76</td>\n",
       "      <td>144</td>\n",
       "      <td>314</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>male</th>\n",
       "      <td>122</td>\n",
       "      <td>108</td>\n",
       "      <td>347</td>\n",
       "      <td>577</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>216</td>\n",
       "      <td>184</td>\n",
       "      <td>491</td>\n",
       "      <td>891</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Pclass    1    2    3  All\n",
       "Sex                       \n",
       "female   94   76  144  314\n",
       "male    122  108  347  577\n",
       "All     216  184  491  891"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "titanic.pivot_table(index='Sex', columns='Pclass', values='Survived',\n",
    "                    aggfunc='count', margins=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Create a datetime column from multiple columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create an example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>year</th>\n",
       "      <th>holiday</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>25</td>\n",
       "      <td>2019</td>\n",
       "      <td>christmas</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>28</td>\n",
       "      <td>2019</td>\n",
       "      <td>thanksgiving</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   month  day  year       holiday\n",
       "0     12   25  2019     christmas\n",
       "1     11   28  2019  thanksgiving"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([[12, 25, 2019, 'christmas'], [11, 28, 2019, 'thanksgiving']],\n",
    "                  columns=['month', 'day', 'year', 'holiday'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can create a new datetime column simply by passing the relevant columns to `pd.to_datetime()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>month</th>\n",
       "      <th>day</th>\n",
       "      <th>year</th>\n",
       "      <th>holiday</th>\n",
       "      <th>date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12</td>\n",
       "      <td>25</td>\n",
       "      <td>2019</td>\n",
       "      <td>christmas</td>\n",
       "      <td>2019-12-25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>11</td>\n",
       "      <td>28</td>\n",
       "      <td>2019</td>\n",
       "      <td>thanksgiving</td>\n",
       "      <td>2019-11-28</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   month  day  year       holiday       date\n",
       "0     12   25  2019     christmas 2019-12-25\n",
       "1     11   28  2019  thanksgiving 2019-11-28"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['date'] = pd.to_datetime(df[['month', 'day', 'year']])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The new date column has a datetime data type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "month               int64\n",
       "day                 int64\n",
       "year                int64\n",
       "holiday            object\n",
       "date       datetime64[ns]\n",
       "dtype: object"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dtypes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Keep in mind that you must include month, day, and year columns at a minimum, but you can also include hour, minute, and second."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Resample a datetime column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take a look at the stocks dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Symbol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>31.50</td>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>57.42</td>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>57.24</td>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>31.35</td>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>57.64</td>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>31.59</td>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date   Close    Volume Symbol\n",
       "0 2016-10-03   31.50  14070500   CSCO\n",
       "1 2016-10-03  112.52  21701800   AAPL\n",
       "2 2016-10-03   57.42  19189500   MSFT\n",
       "3 2016-10-04  113.00  29736800   AAPL\n",
       "4 2016-10-04   57.24  20085900   MSFT\n",
       "5 2016-10-04   31.35  18460400   CSCO\n",
       "6 2016-10-05   57.64  16726400   MSFT\n",
       "7 2016-10-05   31.59  11808600   CSCO\n",
       "8 2016-10-05  113.05  21453100   AAPL"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if you wanted to calculate the mean closing price by day across all stocks? Use the `resample()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Date\n",
       "2016-10-03    67.146667\n",
       "2016-10-04    67.196667\n",
       "2016-10-05    67.426667\n",
       "Freq: D, Name: Close, dtype: float64"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks.resample('D', on='Date').Close.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can think of resampling as a `groupby()` for datetime data, and in fact the structure of the command looks very similar to a `groupby()`. \"D\" specifies that the resampling frequency should be daily, and the \"on\" parameter specifics the column on which we're resampling.\n",
    "\n",
    "If the datetime column is the index, you can skip the `on` parameter. For example, let's give the ufo DataFrame a DatetimeIndex:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1930-06-01 22:00:00</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1930-06-30 20:00:00</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1931-02-15 14:00:00</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1931-06-01 13:00:00</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1933-04-18 19:00:00</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     City Colors Reported Shape Reported State\n",
       "Time                                                                          \n",
       "1930-06-01 22:00:00                Ithaca             NaN       TRIANGLE    NY\n",
       "1930-06-30 20:00:00           Willingboro             NaN          OTHER    NJ\n",
       "1931-02-15 14:00:00               Holyoke             NaN           OVAL    CO\n",
       "1931-06-01 13:00:00               Abilene             NaN           DISK    KS\n",
       "1933-04-18 19:00:00  New York Worlds Fair             NaN          LIGHT    NY"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo = ufo.set_index('Time')\n",
    "ufo.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can use `resample()` and it will automatically resample based on the index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Time\n",
       "1996-12-31     851\n",
       "1997-12-31    1237\n",
       "1998-12-31    1743\n",
       "1999-12-31    2774\n",
       "2000-12-31    2635\n",
       "Freq: A-DEC, Name: State, dtype: int64"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.resample('Y').State.count().tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's the count of the number of UFO sightings by year.\n",
    "\n",
    "We can calculate the count by month by changing the resampling frequency from \"Y\" to \"M\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Time\n",
       "2000-08-31    250\n",
       "2000-09-30    257\n",
       "2000-10-31    278\n",
       "2000-11-30    200\n",
       "2000-12-31    192\n",
       "Freq: M, Name: State, dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo.resample('M').State.count().tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The string that you pass to `resample()` is known as the offset alias, and pandas supports [many offset aliases](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) other than just \"D\", \"M\", and \"Y\"."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. Read and write from compressed files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you want to save a DataFrame to a CSV file, you use the `to_csv()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "ufo.to_csv('ufo.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, you can actually compress the CSV file as well:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "ufo.to_csv('ufo.csv.zip')\n",
    "ufo.to_csv('ufo.csv.gz')\n",
    "ufo.to_csv('ufo.csv.bz2')\n",
    "ufo.to_csv('ufo.csv.xz')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By using one of these file extensions, pandas infers the type of compression you want it to use.\n",
    "\n",
    "You can use a shell command to see all of the files we've created:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-rw-r--r--  1 kevin  staff  748033 Apr 27 14:27 ufo.csv\r\n",
      "-rw-r--r--  1 kevin  staff  129189 Apr 27 14:27 ufo.csv.bz2\r\n",
      "-rw-r--r--  1 kevin  staff  198030 Apr 27 14:27 ufo.csv.gz\r\n",
      "-rw-r--r--  1 kevin  staff  149320 Apr 27 14:27 ufo.csv.xz\r\n",
      "-rw-r--r--  1 kevin  staff  200317 Apr 27 14:27 ufo.csv.zip\r\n"
     ]
    }
   ],
   "source": [
    "!ls -l ufo.*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that all of the compressed files are significantly smaller than the uncompressed CSV file.\n",
    "\n",
    "Finally, you can actually read directly from a compressed file using `read_csv()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>City</th>\n",
       "      <th>Colors Reported</th>\n",
       "      <th>Shape Reported</th>\n",
       "      <th>State</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Time</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1930-06-01 22:00:00</th>\n",
       "      <td>Ithaca</td>\n",
       "      <td>NaN</td>\n",
       "      <td>TRIANGLE</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1930-06-30 20:00:00</th>\n",
       "      <td>Willingboro</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OTHER</td>\n",
       "      <td>NJ</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1931-02-15 14:00:00</th>\n",
       "      <td>Holyoke</td>\n",
       "      <td>NaN</td>\n",
       "      <td>OVAL</td>\n",
       "      <td>CO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1931-06-01 13:00:00</th>\n",
       "      <td>Abilene</td>\n",
       "      <td>NaN</td>\n",
       "      <td>DISK</td>\n",
       "      <td>KS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1933-04-18 19:00:00</th>\n",
       "      <td>New York Worlds Fair</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LIGHT</td>\n",
       "      <td>NY</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                     City Colors Reported Shape Reported State\n",
       "Time                                                                          \n",
       "1930-06-01 22:00:00                Ithaca             NaN       TRIANGLE    NY\n",
       "1930-06-30 20:00:00           Willingboro             NaN          OTHER    NJ\n",
       "1931-02-15 14:00:00               Holyoke             NaN           OVAL    CO\n",
       "1931-06-01 13:00:00               Abilene             NaN           DISK    KS\n",
       "1933-04-18 19:00:00  New York Worlds Fair             NaN          LIGHT    NY"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo_new = pd.read_csv('ufo.csv.gz', index_col='Time', parse_dates=['Time'])\n",
    "ufo_new.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can confirm that the new ufo DataFrame is equivalent to the original ufo DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ufo_new.equals(ufo)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 12. Fill missing values using interpolation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create an example time series DataFrame with some missing values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-01-01</th>\n",
       "      <td>100.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-02-01</th>\n",
       "      <td>120.0</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-03-01</th>\n",
       "      <td>130.0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-04-01</th>\n",
       "      <td>NaN</td>\n",
       "      <td>7.5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-05-01</th>\n",
       "      <td>140.0</td>\n",
       "      <td>6.5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                a    b\n",
       "2019-01-01  100.0  9.0\n",
       "2019-02-01  120.0  9.0\n",
       "2019-03-01  130.0  NaN\n",
       "2019-04-01    NaN  7.5\n",
       "2019-05-01  140.0  6.5"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'a':[100, 120, 130, np.nan, 140], 'b':[9, 9, np.nan, 7.5, 6.5]})\n",
    "df.index = pd.to_datetime(['2019-01', '2019-02', '2019-03', '2019-04', '2019-05'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If appropriate, you can fill in the missing values using interpolation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-01-01</th>\n",
       "      <td>100.0</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-02-01</th>\n",
       "      <td>120.0</td>\n",
       "      <td>9.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-03-01</th>\n",
       "      <td>130.0</td>\n",
       "      <td>8.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-04-01</th>\n",
       "      <td>135.0</td>\n",
       "      <td>7.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-05-01</th>\n",
       "      <td>140.0</td>\n",
       "      <td>6.50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                a     b\n",
       "2019-01-01  100.0  9.00\n",
       "2019-02-01  120.0  9.00\n",
       "2019-03-01  130.0  8.25\n",
       "2019-04-01  135.0  7.50\n",
       "2019-05-01  140.0  6.50"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.interpolate()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This uses linear interpolation by default, though other methods are supported."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 13. Check for duplicate merge keys"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create two example DataFrames:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>green</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>yellow</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>red</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    color  num\n",
       "0   green    1\n",
       "1  yellow    2\n",
       "2     red    3"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "left = pd.DataFrame({'color': ['green', 'yellow', 'red'], 'num':[1, 2, 3]})\n",
    "left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>green</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>yellow</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>pink</td>\n",
       "      <td>L</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>green</td>\n",
       "      <td>XL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    color size\n",
       "0   green    S\n",
       "1  yellow    M\n",
       "2    pink    L\n",
       "3   green   XL"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "right = pd.DataFrame({'color': ['green', 'yellow', 'pink', 'green'], 'size':['S', 'M', 'L', 'XL']})\n",
    "right"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to merge these DataFrames.\n",
    "\n",
    "What if we wanted to confirm that the merge keys (\"color\" in this case) are unique in the left dataset? We would use \"one-to-many\" validation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>color</th>\n",
       "      <th>num</th>\n",
       "      <th>size</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>green</td>\n",
       "      <td>1</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>green</td>\n",
       "      <td>1</td>\n",
       "      <td>XL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>yellow</td>\n",
       "      <td>2</td>\n",
       "      <td>M</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    color  num size\n",
       "0   green    1    S\n",
       "1   green    1   XL\n",
       "2  yellow    2    M"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(left, right, how='inner', validate='one_to_many')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It did the merge, and validated that the values of \"color\" in the left dataset are unique.\n",
    "\n",
    "What if we wanted to confirm that the merge keys are unique in the right dataset? We would use \"many-to-one\" validation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "# pd.merge(left, right, how='inner', validate='many_to_one')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This resulted in an error, because the values of \"color\" in the right dataset are not unique."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 14. Transpose a wide DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create an example DataFrame with 200 rows and 25 columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame(np.random.rand(200, 25), columns=list('ABCDEFGHIJKLMNOPQRSTUVWXY'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you wanted to get a sense of the data by examining the head, you wouldn't see all of the columns due to the default display options:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>E</th>\n",
       "      <th>F</th>\n",
       "      <th>G</th>\n",
       "      <th>H</th>\n",
       "      <th>I</th>\n",
       "      <th>J</th>\n",
       "      <th>...</th>\n",
       "      <th>P</th>\n",
       "      <th>Q</th>\n",
       "      <th>R</th>\n",
       "      <th>S</th>\n",
       "      <th>T</th>\n",
       "      <th>U</th>\n",
       "      <th>V</th>\n",
       "      <th>W</th>\n",
       "      <th>X</th>\n",
       "      <th>Y</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.963663</td>\n",
       "      <td>0.383442</td>\n",
       "      <td>0.791725</td>\n",
       "      <td>0.528895</td>\n",
       "      <td>0.568045</td>\n",
       "      <td>0.925597</td>\n",
       "      <td>0.071036</td>\n",
       "      <td>0.087129</td>\n",
       "      <td>0.020218</td>\n",
       "      <td>0.832620</td>\n",
       "      <td>...</td>\n",
       "      <td>0.780529</td>\n",
       "      <td>0.118274</td>\n",
       "      <td>0.639921</td>\n",
       "      <td>0.143353</td>\n",
       "      <td>0.944669</td>\n",
       "      <td>0.521848</td>\n",
       "      <td>0.414662</td>\n",
       "      <td>0.264556</td>\n",
       "      <td>0.774234</td>\n",
       "      <td>0.456150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.568434</td>\n",
       "      <td>0.018790</td>\n",
       "      <td>0.617635</td>\n",
       "      <td>0.612096</td>\n",
       "      <td>0.616934</td>\n",
       "      <td>0.943748</td>\n",
       "      <td>0.681820</td>\n",
       "      <td>0.359508</td>\n",
       "      <td>0.437032</td>\n",
       "      <td>0.697631</td>\n",
       "      <td>...</td>\n",
       "      <td>0.315428</td>\n",
       "      <td>0.363711</td>\n",
       "      <td>0.570197</td>\n",
       "      <td>0.438602</td>\n",
       "      <td>0.988374</td>\n",
       "      <td>0.102045</td>\n",
       "      <td>0.208877</td>\n",
       "      <td>0.161310</td>\n",
       "      <td>0.653108</td>\n",
       "      <td>0.253292</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.466311</td>\n",
       "      <td>0.244426</td>\n",
       "      <td>0.158970</td>\n",
       "      <td>0.110375</td>\n",
       "      <td>0.656330</td>\n",
       "      <td>0.138183</td>\n",
       "      <td>0.196582</td>\n",
       "      <td>0.368725</td>\n",
       "      <td>0.820993</td>\n",
       "      <td>0.097101</td>\n",
       "      <td>...</td>\n",
       "      <td>0.604846</td>\n",
       "      <td>0.739264</td>\n",
       "      <td>0.039188</td>\n",
       "      <td>0.282807</td>\n",
       "      <td>0.120197</td>\n",
       "      <td>0.296140</td>\n",
       "      <td>0.118728</td>\n",
       "      <td>0.317983</td>\n",
       "      <td>0.414263</td>\n",
       "      <td>0.064147</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.692472</td>\n",
       "      <td>0.566601</td>\n",
       "      <td>0.265389</td>\n",
       "      <td>0.523248</td>\n",
       "      <td>0.093941</td>\n",
       "      <td>0.575946</td>\n",
       "      <td>0.929296</td>\n",
       "      <td>0.318569</td>\n",
       "      <td>0.667410</td>\n",
       "      <td>0.131798</td>\n",
       "      <td>...</td>\n",
       "      <td>0.828940</td>\n",
       "      <td>0.004695</td>\n",
       "      <td>0.677817</td>\n",
       "      <td>0.270008</td>\n",
       "      <td>0.735194</td>\n",
       "      <td>0.962189</td>\n",
       "      <td>0.248753</td>\n",
       "      <td>0.576157</td>\n",
       "      <td>0.592042</td>\n",
       "      <td>0.572252</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.223082</td>\n",
       "      <td>0.952749</td>\n",
       "      <td>0.447125</td>\n",
       "      <td>0.846409</td>\n",
       "      <td>0.699479</td>\n",
       "      <td>0.297437</td>\n",
       "      <td>0.813798</td>\n",
       "      <td>0.396506</td>\n",
       "      <td>0.881103</td>\n",
       "      <td>0.581273</td>\n",
       "      <td>...</td>\n",
       "      <td>0.643990</td>\n",
       "      <td>0.423855</td>\n",
       "      <td>0.606393</td>\n",
       "      <td>0.019193</td>\n",
       "      <td>0.301575</td>\n",
       "      <td>0.660174</td>\n",
       "      <td>0.290078</td>\n",
       "      <td>0.618015</td>\n",
       "      <td>0.428769</td>\n",
       "      <td>0.135474</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 25 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          A         B         C         D         E         F         G  \\\n",
       "0  0.963663  0.383442  0.791725  0.528895  0.568045  0.925597  0.071036   \n",
       "1  0.568434  0.018790  0.617635  0.612096  0.616934  0.943748  0.681820   \n",
       "2  0.466311  0.244426  0.158970  0.110375  0.656330  0.138183  0.196582   \n",
       "3  0.692472  0.566601  0.265389  0.523248  0.093941  0.575946  0.929296   \n",
       "4  0.223082  0.952749  0.447125  0.846409  0.699479  0.297437  0.813798   \n",
       "\n",
       "          H         I         J  ...         P         Q         R         S  \\\n",
       "0  0.087129  0.020218  0.832620  ...  0.780529  0.118274  0.639921  0.143353   \n",
       "1  0.359508  0.437032  0.697631  ...  0.315428  0.363711  0.570197  0.438602   \n",
       "2  0.368725  0.820993  0.097101  ...  0.604846  0.739264  0.039188  0.282807   \n",
       "3  0.318569  0.667410  0.131798  ...  0.828940  0.004695  0.677817  0.270008   \n",
       "4  0.396506  0.881103  0.581273  ...  0.643990  0.423855  0.606393  0.019193   \n",
       "\n",
       "          T         U         V         W         X         Y  \n",
       "0  0.944669  0.521848  0.414662  0.264556  0.774234  0.456150  \n",
       "1  0.988374  0.102045  0.208877  0.161310  0.653108  0.253292  \n",
       "2  0.120197  0.296140  0.118728  0.317983  0.414263  0.064147  \n",
       "3  0.735194  0.962189  0.248753  0.576157  0.592042  0.572252  \n",
       "4  0.301575  0.660174  0.290078  0.618015  0.428769  0.135474  \n",
       "\n",
       "[5 rows x 25 columns]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The easiest solution is just to transpose the head:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>0.963663</td>\n",
       "      <td>0.568434</td>\n",
       "      <td>0.466311</td>\n",
       "      <td>0.692472</td>\n",
       "      <td>0.223082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>0.383442</td>\n",
       "      <td>0.018790</td>\n",
       "      <td>0.244426</td>\n",
       "      <td>0.566601</td>\n",
       "      <td>0.952749</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>0.791725</td>\n",
       "      <td>0.617635</td>\n",
       "      <td>0.158970</td>\n",
       "      <td>0.265389</td>\n",
       "      <td>0.447125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>D</th>\n",
       "      <td>0.528895</td>\n",
       "      <td>0.612096</td>\n",
       "      <td>0.110375</td>\n",
       "      <td>0.523248</td>\n",
       "      <td>0.846409</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td>0.568045</td>\n",
       "      <td>0.616934</td>\n",
       "      <td>0.656330</td>\n",
       "      <td>0.093941</td>\n",
       "      <td>0.699479</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>0.925597</td>\n",
       "      <td>0.943748</td>\n",
       "      <td>0.138183</td>\n",
       "      <td>0.575946</td>\n",
       "      <td>0.297437</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>G</th>\n",
       "      <td>0.071036</td>\n",
       "      <td>0.681820</td>\n",
       "      <td>0.196582</td>\n",
       "      <td>0.929296</td>\n",
       "      <td>0.813798</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>H</th>\n",
       "      <td>0.087129</td>\n",
       "      <td>0.359508</td>\n",
       "      <td>0.368725</td>\n",
       "      <td>0.318569</td>\n",
       "      <td>0.396506</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>I</th>\n",
       "      <td>0.020218</td>\n",
       "      <td>0.437032</td>\n",
       "      <td>0.820993</td>\n",
       "      <td>0.667410</td>\n",
       "      <td>0.881103</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>J</th>\n",
       "      <td>0.832620</td>\n",
       "      <td>0.697631</td>\n",
       "      <td>0.097101</td>\n",
       "      <td>0.131798</td>\n",
       "      <td>0.581273</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K</th>\n",
       "      <td>0.778157</td>\n",
       "      <td>0.060225</td>\n",
       "      <td>0.837945</td>\n",
       "      <td>0.716327</td>\n",
       "      <td>0.881735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>L</th>\n",
       "      <td>0.870012</td>\n",
       "      <td>0.666767</td>\n",
       "      <td>0.096098</td>\n",
       "      <td>0.289406</td>\n",
       "      <td>0.692532</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>0.978618</td>\n",
       "      <td>0.670638</td>\n",
       "      <td>0.976459</td>\n",
       "      <td>0.183191</td>\n",
       "      <td>0.725254</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>0.799159</td>\n",
       "      <td>0.210383</td>\n",
       "      <td>0.468651</td>\n",
       "      <td>0.586513</td>\n",
       "      <td>0.501324</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>O</th>\n",
       "      <td>0.461479</td>\n",
       "      <td>0.128926</td>\n",
       "      <td>0.976761</td>\n",
       "      <td>0.020108</td>\n",
       "      <td>0.956084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>P</th>\n",
       "      <td>0.780529</td>\n",
       "      <td>0.315428</td>\n",
       "      <td>0.604846</td>\n",
       "      <td>0.828940</td>\n",
       "      <td>0.643990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Q</th>\n",
       "      <td>0.118274</td>\n",
       "      <td>0.363711</td>\n",
       "      <td>0.739264</td>\n",
       "      <td>0.004695</td>\n",
       "      <td>0.423855</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R</th>\n",
       "      <td>0.639921</td>\n",
       "      <td>0.570197</td>\n",
       "      <td>0.039188</td>\n",
       "      <td>0.677817</td>\n",
       "      <td>0.606393</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S</th>\n",
       "      <td>0.143353</td>\n",
       "      <td>0.438602</td>\n",
       "      <td>0.282807</td>\n",
       "      <td>0.270008</td>\n",
       "      <td>0.019193</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>T</th>\n",
       "      <td>0.944669</td>\n",
       "      <td>0.988374</td>\n",
       "      <td>0.120197</td>\n",
       "      <td>0.735194</td>\n",
       "      <td>0.301575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U</th>\n",
       "      <td>0.521848</td>\n",
       "      <td>0.102045</td>\n",
       "      <td>0.296140</td>\n",
       "      <td>0.962189</td>\n",
       "      <td>0.660174</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>V</th>\n",
       "      <td>0.414662</td>\n",
       "      <td>0.208877</td>\n",
       "      <td>0.118728</td>\n",
       "      <td>0.248753</td>\n",
       "      <td>0.290078</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>W</th>\n",
       "      <td>0.264556</td>\n",
       "      <td>0.161310</td>\n",
       "      <td>0.317983</td>\n",
       "      <td>0.576157</td>\n",
       "      <td>0.618015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>X</th>\n",
       "      <td>0.774234</td>\n",
       "      <td>0.653108</td>\n",
       "      <td>0.414263</td>\n",
       "      <td>0.592042</td>\n",
       "      <td>0.428769</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>0.456150</td>\n",
       "      <td>0.253292</td>\n",
       "      <td>0.064147</td>\n",
       "      <td>0.572252</td>\n",
       "      <td>0.135474</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          0         1         2         3         4\n",
       "A  0.963663  0.568434  0.466311  0.692472  0.223082\n",
       "B  0.383442  0.018790  0.244426  0.566601  0.952749\n",
       "C  0.791725  0.617635  0.158970  0.265389  0.447125\n",
       "D  0.528895  0.612096  0.110375  0.523248  0.846409\n",
       "E  0.568045  0.616934  0.656330  0.093941  0.699479\n",
       "F  0.925597  0.943748  0.138183  0.575946  0.297437\n",
       "G  0.071036  0.681820  0.196582  0.929296  0.813798\n",
       "H  0.087129  0.359508  0.368725  0.318569  0.396506\n",
       "I  0.020218  0.437032  0.820993  0.667410  0.881103\n",
       "J  0.832620  0.697631  0.097101  0.131798  0.581273\n",
       "K  0.778157  0.060225  0.837945  0.716327  0.881735\n",
       "L  0.870012  0.666767  0.096098  0.289406  0.692532\n",
       "M  0.978618  0.670638  0.976459  0.183191  0.725254\n",
       "N  0.799159  0.210383  0.468651  0.586513  0.501324\n",
       "O  0.461479  0.128926  0.976761  0.020108  0.956084\n",
       "P  0.780529  0.315428  0.604846  0.828940  0.643990\n",
       "Q  0.118274  0.363711  0.739264  0.004695  0.423855\n",
       "R  0.639921  0.570197  0.039188  0.677817  0.606393\n",
       "S  0.143353  0.438602  0.282807  0.270008  0.019193\n",
       "T  0.944669  0.988374  0.120197  0.735194  0.301575\n",
       "U  0.521848  0.102045  0.296140  0.962189  0.660174\n",
       "V  0.414662  0.208877  0.118728  0.248753  0.290078\n",
       "W  0.264556  0.161310  0.317983  0.576157  0.618015\n",
       "X  0.774234  0.653108  0.414263  0.592042  0.428769\n",
       "Y  0.456150  0.253292  0.064147  0.572252  0.135474"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head().T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the columns have become the rows (and vice versa), we can now easily browse through the DataFrame's head.\n",
    "\n",
    "Transposing is also helpful when using the `describe()` method on a wide DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>25%</th>\n",
       "      <th>50%</th>\n",
       "      <th>75%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>A</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.492993</td>\n",
       "      <td>0.297821</td>\n",
       "      <td>0.000367</td>\n",
       "      <td>0.232334</td>\n",
       "      <td>0.473208</td>\n",
       "      <td>0.752013</td>\n",
       "      <td>0.995733</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>B</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.511242</td>\n",
       "      <td>0.296146</td>\n",
       "      <td>0.004655</td>\n",
       "      <td>0.244792</td>\n",
       "      <td>0.502961</td>\n",
       "      <td>0.778799</td>\n",
       "      <td>0.995713</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>C</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.493275</td>\n",
       "      <td>0.265398</td>\n",
       "      <td>0.003866</td>\n",
       "      <td>0.271378</td>\n",
       "      <td>0.507585</td>\n",
       "      <td>0.699023</td>\n",
       "      <td>0.993405</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>D</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.510597</td>\n",
       "      <td>0.279209</td>\n",
       "      <td>0.005206</td>\n",
       "      <td>0.273734</td>\n",
       "      <td>0.539189</td>\n",
       "      <td>0.736733</td>\n",
       "      <td>0.999949</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>E</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.488265</td>\n",
       "      <td>0.286736</td>\n",
       "      <td>0.003710</td>\n",
       "      <td>0.264644</td>\n",
       "      <td>0.469948</td>\n",
       "      <td>0.729416</td>\n",
       "      <td>0.992820</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>F</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.486180</td>\n",
       "      <td>0.300410</td>\n",
       "      <td>0.011355</td>\n",
       "      <td>0.248337</td>\n",
       "      <td>0.455500</td>\n",
       "      <td>0.751475</td>\n",
       "      <td>0.999931</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>G</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.487767</td>\n",
       "      <td>0.285290</td>\n",
       "      <td>0.018173</td>\n",
       "      <td>0.219824</td>\n",
       "      <td>0.496847</td>\n",
       "      <td>0.743175</td>\n",
       "      <td>0.980979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>H</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.469514</td>\n",
       "      <td>0.288366</td>\n",
       "      <td>0.002703</td>\n",
       "      <td>0.217834</td>\n",
       "      <td>0.439801</td>\n",
       "      <td>0.712790</td>\n",
       "      <td>0.985155</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>I</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.475042</td>\n",
       "      <td>0.284267</td>\n",
       "      <td>0.001962</td>\n",
       "      <td>0.229787</td>\n",
       "      <td>0.455727</td>\n",
       "      <td>0.710170</td>\n",
       "      <td>0.996100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>J</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.486357</td>\n",
       "      <td>0.292939</td>\n",
       "      <td>0.003860</td>\n",
       "      <td>0.242252</td>\n",
       "      <td>0.451637</td>\n",
       "      <td>0.734957</td>\n",
       "      <td>0.983426</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>K</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.526301</td>\n",
       "      <td>0.297170</td>\n",
       "      <td>0.005495</td>\n",
       "      <td>0.255519</td>\n",
       "      <td>0.559120</td>\n",
       "      <td>0.798100</td>\n",
       "      <td>0.994496</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>L</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.508508</td>\n",
       "      <td>0.284417</td>\n",
       "      <td>0.005939</td>\n",
       "      <td>0.275634</td>\n",
       "      <td>0.503145</td>\n",
       "      <td>0.741509</td>\n",
       "      <td>0.999278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>M</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.531511</td>\n",
       "      <td>0.283377</td>\n",
       "      <td>0.000664</td>\n",
       "      <td>0.312775</td>\n",
       "      <td>0.548991</td>\n",
       "      <td>0.765784</td>\n",
       "      <td>0.997962</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>N</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.492445</td>\n",
       "      <td>0.283717</td>\n",
       "      <td>0.000546</td>\n",
       "      <td>0.253779</td>\n",
       "      <td>0.477410</td>\n",
       "      <td>0.725116</td>\n",
       "      <td>0.990345</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>O</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.496102</td>\n",
       "      <td>0.289496</td>\n",
       "      <td>0.009060</td>\n",
       "      <td>0.259039</td>\n",
       "      <td>0.468176</td>\n",
       "      <td>0.725585</td>\n",
       "      <td>0.995830</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>P</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.515763</td>\n",
       "      <td>0.302013</td>\n",
       "      <td>0.004475</td>\n",
       "      <td>0.247396</td>\n",
       "      <td>0.504664</td>\n",
       "      <td>0.806546</td>\n",
       "      <td>0.997354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Q</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.499350</td>\n",
       "      <td>0.296964</td>\n",
       "      <td>0.000903</td>\n",
       "      <td>0.249951</td>\n",
       "      <td>0.524557</td>\n",
       "      <td>0.747713</td>\n",
       "      <td>0.999964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.506541</td>\n",
       "      <td>0.283706</td>\n",
       "      <td>0.008187</td>\n",
       "      <td>0.258631</td>\n",
       "      <td>0.480919</td>\n",
       "      <td>0.786813</td>\n",
       "      <td>0.997994</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>S</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.509491</td>\n",
       "      <td>0.292244</td>\n",
       "      <td>0.006238</td>\n",
       "      <td>0.276656</td>\n",
       "      <td>0.500628</td>\n",
       "      <td>0.745958</td>\n",
       "      <td>0.998355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>T</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.489958</td>\n",
       "      <td>0.276607</td>\n",
       "      <td>0.005052</td>\n",
       "      <td>0.273893</td>\n",
       "      <td>0.469294</td>\n",
       "      <td>0.733976</td>\n",
       "      <td>0.998023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>U</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.502008</td>\n",
       "      <td>0.295426</td>\n",
       "      <td>0.005510</td>\n",
       "      <td>0.252521</td>\n",
       "      <td>0.513378</td>\n",
       "      <td>0.749146</td>\n",
       "      <td>0.997858</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>V</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.533049</td>\n",
       "      <td>0.298239</td>\n",
       "      <td>0.002084</td>\n",
       "      <td>0.274829</td>\n",
       "      <td>0.553971</td>\n",
       "      <td>0.806714</td>\n",
       "      <td>0.997046</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>W</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.461300</td>\n",
       "      <td>0.294328</td>\n",
       "      <td>0.000074</td>\n",
       "      <td>0.187038</td>\n",
       "      <td>0.429369</td>\n",
       "      <td>0.709391</td>\n",
       "      <td>0.975735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>X</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.470272</td>\n",
       "      <td>0.291905</td>\n",
       "      <td>0.000072</td>\n",
       "      <td>0.219438</td>\n",
       "      <td>0.453254</td>\n",
       "      <td>0.717279</td>\n",
       "      <td>0.995813</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Y</th>\n",
       "      <td>200.0</td>\n",
       "      <td>0.468069</td>\n",
       "      <td>0.284750</td>\n",
       "      <td>0.001383</td>\n",
       "      <td>0.244620</td>\n",
       "      <td>0.436273</td>\n",
       "      <td>0.706354</td>\n",
       "      <td>0.998199</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   count      mean       std       min       25%       50%       75%       max\n",
       "A  200.0  0.492993  0.297821  0.000367  0.232334  0.473208  0.752013  0.995733\n",
       "B  200.0  0.511242  0.296146  0.004655  0.244792  0.502961  0.778799  0.995713\n",
       "C  200.0  0.493275  0.265398  0.003866  0.271378  0.507585  0.699023  0.993405\n",
       "D  200.0  0.510597  0.279209  0.005206  0.273734  0.539189  0.736733  0.999949\n",
       "E  200.0  0.488265  0.286736  0.003710  0.264644  0.469948  0.729416  0.992820\n",
       "F  200.0  0.486180  0.300410  0.011355  0.248337  0.455500  0.751475  0.999931\n",
       "G  200.0  0.487767  0.285290  0.018173  0.219824  0.496847  0.743175  0.980979\n",
       "H  200.0  0.469514  0.288366  0.002703  0.217834  0.439801  0.712790  0.985155\n",
       "I  200.0  0.475042  0.284267  0.001962  0.229787  0.455727  0.710170  0.996100\n",
       "J  200.0  0.486357  0.292939  0.003860  0.242252  0.451637  0.734957  0.983426\n",
       "K  200.0  0.526301  0.297170  0.005495  0.255519  0.559120  0.798100  0.994496\n",
       "L  200.0  0.508508  0.284417  0.005939  0.275634  0.503145  0.741509  0.999278\n",
       "M  200.0  0.531511  0.283377  0.000664  0.312775  0.548991  0.765784  0.997962\n",
       "N  200.0  0.492445  0.283717  0.000546  0.253779  0.477410  0.725116  0.990345\n",
       "O  200.0  0.496102  0.289496  0.009060  0.259039  0.468176  0.725585  0.995830\n",
       "P  200.0  0.515763  0.302013  0.004475  0.247396  0.504664  0.806546  0.997354\n",
       "Q  200.0  0.499350  0.296964  0.000903  0.249951  0.524557  0.747713  0.999964\n",
       "R  200.0  0.506541  0.283706  0.008187  0.258631  0.480919  0.786813  0.997994\n",
       "S  200.0  0.509491  0.292244  0.006238  0.276656  0.500628  0.745958  0.998355\n",
       "T  200.0  0.489958  0.276607  0.005052  0.273893  0.469294  0.733976  0.998023\n",
       "U  200.0  0.502008  0.295426  0.005510  0.252521  0.513378  0.749146  0.997858\n",
       "V  200.0  0.533049  0.298239  0.002084  0.274829  0.553971  0.806714  0.997046\n",
       "W  200.0  0.461300  0.294328  0.000074  0.187038  0.429369  0.709391  0.975735\n",
       "X  200.0  0.470272  0.291905  0.000072  0.219438  0.453254  0.717279  0.995813\n",
       "Y  200.0  0.468069  0.284750  0.001383  0.244620  0.436273  0.706354  0.998199"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe().T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 15. Create an example DataFrame (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These are the methods that I taught in the main tricks video for creating example DataFrames:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col one</th>\n",
       "      <th>col two</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100</td>\n",
       "      <td>300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>200</td>\n",
       "      <td>400</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col one  col two\n",
       "0      100      300\n",
       "1      200      400"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({'col one':[100, 200], 'col two':[300, 400]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a</th>\n",
       "      <th>b</th>\n",
       "      <th>c</th>\n",
       "      <th>d</th>\n",
       "      <th>e</th>\n",
       "      <th>f</th>\n",
       "      <th>g</th>\n",
       "      <th>h</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.278510</td>\n",
       "      <td>0.288027</td>\n",
       "      <td>0.846305</td>\n",
       "      <td>0.791284</td>\n",
       "      <td>0.578636</td>\n",
       "      <td>0.288589</td>\n",
       "      <td>0.318878</td>\n",
       "      <td>0.592218</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.739867</td>\n",
       "      <td>0.384098</td>\n",
       "      <td>0.509562</td>\n",
       "      <td>0.888033</td>\n",
       "      <td>0.649791</td>\n",
       "      <td>0.535550</td>\n",
       "      <td>0.071222</td>\n",
       "      <td>0.176015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.200992</td>\n",
       "      <td>0.623148</td>\n",
       "      <td>0.108113</td>\n",
       "      <td>0.028995</td>\n",
       "      <td>0.360351</td>\n",
       "      <td>0.718859</td>\n",
       "      <td>0.693249</td>\n",
       "      <td>0.792670</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.696248</td>\n",
       "      <td>0.613286</td>\n",
       "      <td>0.486162</td>\n",
       "      <td>0.208498</td>\n",
       "      <td>0.568548</td>\n",
       "      <td>0.636625</td>\n",
       "      <td>0.123743</td>\n",
       "      <td>0.565147</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          a         b         c         d         e         f         g  \\\n",
       "0  0.278510  0.288027  0.846305  0.791284  0.578636  0.288589  0.318878   \n",
       "1  0.739867  0.384098  0.509562  0.888033  0.649791  0.535550  0.071222   \n",
       "2  0.200992  0.623148  0.108113  0.028995  0.360351  0.718859  0.693249   \n",
       "3  0.696248  0.613286  0.486162  0.208498  0.568548  0.636625  0.123743   \n",
       "\n",
       "          h  \n",
       "0  0.592218  \n",
       "1  0.176015  \n",
       "2  0.792670  \n",
       "3  0.565147  "
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(np.random.rand(4, 8), columns=list('abcdefgh'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want an even simpler method, you can use `makeDataFrame()` to create a 30x4 DataFrame filled with random values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/kevin/miniconda3/envs/pd14/lib/python3.9/site-packages/pandas/util/__init__.py:15: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n",
      "  import pandas.util.testing\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>EmWXi3UyK3</th>\n",
       "      <td>-0.250656</td>\n",
       "      <td>-0.040365</td>\n",
       "      <td>-0.829203</td>\n",
       "      <td>0.296782</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>jgi4rj56bK</th>\n",
       "      <td>0.919812</td>\n",
       "      <td>-1.527617</td>\n",
       "      <td>-0.670679</td>\n",
       "      <td>-0.270356</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>a6Po2BAIfh</th>\n",
       "      <td>2.445794</td>\n",
       "      <td>-0.057607</td>\n",
       "      <td>-0.315385</td>\n",
       "      <td>0.736752</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Ys4pDMq9Yd</th>\n",
       "      <td>1.205698</td>\n",
       "      <td>-2.490090</td>\n",
       "      <td>0.442820</td>\n",
       "      <td>1.157280</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>XfLAIHy1QI</th>\n",
       "      <td>0.920549</td>\n",
       "      <td>-0.026303</td>\n",
       "      <td>-1.052564</td>\n",
       "      <td>-0.658804</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   A         B         C         D\n",
       "EmWXi3UyK3 -0.250656 -0.040365 -0.829203  0.296782\n",
       "jgi4rj56bK  0.919812 -1.527617 -0.670679 -0.270356\n",
       "a6Po2BAIfh  2.445794 -0.057607 -0.315385  0.736752\n",
       "Ys4pDMq9Yd  1.205698 -2.490090  0.442820  1.157280\n",
       "XfLAIHy1QI  0.920549 -0.026303 -1.052564 -0.658804"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "pd.util.testing.makeDataFrame().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`makeMissingDataframe()` is similar, except that some of the values are missing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>8JAZ6EeRof</th>\n",
       "      <td>0.000341</td>\n",
       "      <td>0.537035</td>\n",
       "      <td>0.656464</td>\n",
       "      <td>-0.941815</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>GaUe9v7uYV</th>\n",
       "      <td>-0.062205</td>\n",
       "      <td>-0.286835</td>\n",
       "      <td>0.065851</td>\n",
       "      <td>-0.702835</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>No3fmW5dDH</th>\n",
       "      <td>-0.317349</td>\n",
       "      <td>1.634406</td>\n",
       "      <td>-0.873631</td>\n",
       "      <td>-1.378837</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>W5LsjKWdWf</th>\n",
       "      <td>NaN</td>\n",
       "      <td>0.571227</td>\n",
       "      <td>-1.057754</td>\n",
       "      <td>2.925250</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8ijrQMVxS0</th>\n",
       "      <td>-0.211620</td>\n",
       "      <td>-0.789954</td>\n",
       "      <td>-0.897237</td>\n",
       "      <td>0.571752</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   A         B         C         D\n",
       "8JAZ6EeRof  0.000341  0.537035  0.656464 -0.941815\n",
       "GaUe9v7uYV -0.062205 -0.286835  0.065851 -0.702835\n",
       "No3fmW5dDH -0.317349  1.634406 -0.873631 -1.378837\n",
       "W5LsjKWdWf       NaN  0.571227 -1.057754  2.925250\n",
       "8ijrQMVxS0 -0.211620 -0.789954 -0.897237  0.571752"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "pd.util.testing.makeMissingDataframe().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`makeTimeDataFrame()` is similar, except it creates a DatetimeIndex:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2000-01-03</th>\n",
       "      <td>0.470549</td>\n",
       "      <td>-0.981102</td>\n",
       "      <td>1.358382</td>\n",
       "      <td>0.064058</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-04</th>\n",
       "      <td>-0.235804</td>\n",
       "      <td>0.165183</td>\n",
       "      <td>0.060856</td>\n",
       "      <td>-1.063780</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-05</th>\n",
       "      <td>0.678789</td>\n",
       "      <td>-0.016646</td>\n",
       "      <td>2.005891</td>\n",
       "      <td>-0.079188</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-06</th>\n",
       "      <td>1.153618</td>\n",
       "      <td>0.140933</td>\n",
       "      <td>0.081658</td>\n",
       "      <td>-0.527496</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2000-01-07</th>\n",
       "      <td>-0.554733</td>\n",
       "      <td>-0.725172</td>\n",
       "      <td>-0.079209</td>\n",
       "      <td>0.460215</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   A         B         C         D\n",
       "2000-01-03  0.470549 -0.981102  1.358382  0.064058\n",
       "2000-01-04 -0.235804  0.165183  0.060856 -1.063780\n",
       "2000-01-05  0.678789 -0.016646  2.005891 -0.079188\n",
       "2000-01-06  1.153618  0.140933  0.081658 -0.527496\n",
       "2000-01-07 -0.554733 -0.725172 -0.079209  0.460215"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "pd.util.testing.makeTimeDataFrame().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, `makeMixedDataFrame()` creates this exact 5x4 DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo1</td>\n",
       "      <td>2009-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo2</td>\n",
       "      <td>2009-01-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo3</td>\n",
       "      <td>2009-01-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo4</td>\n",
       "      <td>2009-01-06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo5</td>\n",
       "      <td>2009-01-07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D\n",
       "0  0.0  0.0  foo1 2009-01-01\n",
       "1  1.0  1.0  foo2 2009-01-02\n",
       "2  2.0  0.0  foo3 2009-01-05\n",
       "3  3.0  1.0  foo4 2009-01-06\n",
       "4  4.0  0.0  foo5 2009-01-07"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "pd.util.testing.makeMixedDataFrame()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It has 2 float columns, 1 object column, and 1 datetime column.\n",
    "\n",
    "There are many other similar functions that you can use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['makeBoolIndex',\n",
       " 'makeCategoricalIndex',\n",
       " 'makeCustomDataframe',\n",
       " 'makeCustomIndex',\n",
       " 'makeDataFrame',\n",
       " 'makeDateIndex',\n",
       " 'makeFloatIndex',\n",
       " 'makeFloatSeries',\n",
       " 'makeIntIndex',\n",
       " 'makeIntervalIndex',\n",
       " 'makeMissingDataframe',\n",
       " 'makeMixedDataFrame',\n",
       " 'makeMultiIndex',\n",
       " 'makeNumericIndex',\n",
       " 'makeObjectSeries',\n",
       " 'makePeriodFrame',\n",
       " 'makePeriodIndex',\n",
       " 'makePeriodSeries',\n",
       " 'makeRangeIndex',\n",
       " 'makeStringIndex',\n",
       " 'makeStringSeries',\n",
       " 'makeTimeDataFrame',\n",
       " 'makeTimeSeries',\n",
       " 'makeTimedeltaIndex',\n",
       " 'makeUIntIndex',\n",
       " 'makeUnicodeIndex']"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "[x for x in dir(pd.util.testing) if x.startswith('make')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, keep in mind that most of these have no arguments and no docstring, and none of them are listed in the pandas documentation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 16. Identify rows that are missing from a DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a small example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo1</td>\n",
       "      <td>2009-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo2</td>\n",
       "      <td>2009-01-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo3</td>\n",
       "      <td>2009-01-05</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo4</td>\n",
       "      <td>2009-01-06</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo5</td>\n",
       "      <td>2009-01-07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D\n",
       "0  0.0  0.0  foo1 2009-01-01\n",
       "1  1.0  1.0  foo2 2009-01-02\n",
       "2  2.0  0.0  foo3 2009-01-05\n",
       "3  3.0  1.0  foo4 2009-01-06\n",
       "4  4.0  0.0  foo5 2009-01-07"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# NOTE: pd.util.testing is deprecated in pandas 1.0\n",
    "df1 = pd.util.testing.makeMixedDataFrame()\n",
    "df1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then let's create a copy of that DataFrame in which rows 2 and 3 are missing:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo1</td>\n",
       "      <td>2009-01-01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo2</td>\n",
       "      <td>2009-01-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo5</td>\n",
       "      <td>2009-01-07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D\n",
       "0  0.0  0.0  foo1 2009-01-01\n",
       "1  1.0  1.0  foo2 2009-01-02\n",
       "4  4.0  0.0  foo5 2009-01-07"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = df1.drop([2, 3], axis='rows')\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if we needed to identify which rows are missing from the second DataFrame? The easiest way to do this would be to merge the two DataFrames using a left join and set `indicator=True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo1</td>\n",
       "      <td>2009-01-01</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo2</td>\n",
       "      <td>2009-01-02</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo3</td>\n",
       "      <td>2009-01-05</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo4</td>\n",
       "      <td>2009-01-06</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo5</td>\n",
       "      <td>2009-01-07</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D     _merge\n",
       "0  0.0  0.0  foo1 2009-01-01       both\n",
       "1  1.0  1.0  foo2 2009-01-02       both\n",
       "2  2.0  0.0  foo3 2009-01-05  left_only\n",
       "3  3.0  1.0  foo4 2009-01-06  left_only\n",
       "4  4.0  0.0  foo5 2009-01-07       both"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.merge(df1, df2, how='left', indicator=True)\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This adds a column to the DataFrame which shows the source of each row.\n",
    "\n",
    "In order to locate the rows that were missing from \"df2\", we simply filter \"df3\" to show the rows that were only present in the left DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo3</td>\n",
       "      <td>2009-01-05</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo4</td>\n",
       "      <td>2009-01-06</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D     _merge\n",
       "2  2.0  0.0  foo3 2009-01-05  left_only\n",
       "3  3.0  1.0  foo4 2009-01-06  left_only"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3[df3._merge == 'left_only']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can see that rows 2 and 3 were the missing rows."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 17. Use query to avoid intermediate variables"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's take another look at the stocks DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Symbol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>31.50</td>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>57.42</td>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>57.24</td>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>31.35</td>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>57.64</td>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>31.59</td>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date   Close    Volume Symbol\n",
       "0 2016-10-03   31.50  14070500   CSCO\n",
       "1 2016-10-03  112.52  21701800   AAPL\n",
       "2 2016-10-03   57.42  19189500   MSFT\n",
       "3 2016-10-04  113.00  29736800   AAPL\n",
       "4 2016-10-04   57.24  20085900   MSFT\n",
       "5 2016-10-04   31.35  18460400   CSCO\n",
       "6 2016-10-05   57.64  16726400   MSFT\n",
       "7 2016-10-05   31.59  11808600   CSCO\n",
       "8 2016-10-05  113.05  21453100   AAPL"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you wanted to filter the DataFrame to only show rows in which the Symbol is \"AAPL\", this is the usual approach:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Symbol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date   Close    Volume Symbol\n",
       "1 2016-10-03  112.52  21701800   AAPL\n",
       "3 2016-10-04  113.00  29736800   AAPL\n",
       "8 2016-10-05  113.05  21453100   AAPL"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks[stocks.Symbol == 'AAPL']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, this can also be done using the `query()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Symbol</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2016-10-03</td>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2016-10-04</td>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2016-10-05</td>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        Date   Close    Volume Symbol\n",
       "1 2016-10-03  112.52  21701800   AAPL\n",
       "3 2016-10-04  113.00  29736800   AAPL\n",
       "8 2016-10-05  113.05  21453100   AAPL"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks.query(\"Symbol == 'AAPL'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are three things worth noting about the `query()` method:\n",
    "\n",
    "1. You don't have to repeat the name of the DataFrame within the query string.\n",
    "2. The entire condition is expressed as a string, thus you lose any syntax highlighting.\n",
    "3. Since there is a string within the condition, you have to use single quotes with the inner string and double quotes with the outer string.\n",
    "\n",
    "Let's look at another example that shows the real usefulness of `query()`. First let's group by \"Symbol\" and then take the mean of all numeric columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>AAPL</th>\n",
       "      <td>112.856667</td>\n",
       "      <td>2.429723e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>CSCO</th>\n",
       "      <td>31.480000</td>\n",
       "      <td>1.477983e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td>57.433333</td>\n",
       "      <td>1.866727e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Close        Volume\n",
       "Symbol                          \n",
       "AAPL    112.856667  2.429723e+07\n",
       "CSCO     31.480000  1.477983e+07\n",
       "MSFT     57.433333  1.866727e+07"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks.groupby('Symbol').mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if I wanted to filter this DataFrame to only show rows in which \"Close\" is less than 100? The usual approach would be to create a temporary DataFrame and then filter that:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>CSCO</th>\n",
       "      <td>31.480000</td>\n",
       "      <td>1.477983e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td>57.433333</td>\n",
       "      <td>1.866727e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Close        Volume\n",
       "Symbol                         \n",
       "CSCO    31.480000  1.477983e+07\n",
       "MSFT    57.433333  1.866727e+07"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "temp = stocks.groupby('Symbol').mean()\n",
    "temp[temp.Close < 100]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But `query()` works even better in this situation, since you can avoid creating an intermediate object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Symbol</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>CSCO</th>\n",
       "      <td>31.480000</td>\n",
       "      <td>1.477983e+07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>MSFT</th>\n",
       "      <td>57.433333</td>\n",
       "      <td>1.866727e+07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            Close        Volume\n",
       "Symbol                         \n",
       "CSCO    31.480000  1.477983e+07\n",
       "MSFT    57.433333  1.866727e+07"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks.groupby('Symbol').mean().query('Close < 100')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In fact, `query()` is a great solution to our previous trick, because it would have allowed us to filter the merged DataFrame without creating the \"df3\" object:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>foo3</td>\n",
       "      <td>2009-01-05</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>foo4</td>\n",
       "      <td>2009-01-06</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B     C          D     _merge\n",
       "2  2.0  0.0  foo3 2009-01-05  left_only\n",
       "3  3.0  1.0  foo4 2009-01-06  left_only"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how='left', indicator=True).query(\"_merge == 'left_only'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 18. Reshape a DataFrame from wide format to long format"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create another example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>zip</th>\n",
       "      <th>factory</th>\n",
       "      <th>warehouse</th>\n",
       "      <th>retail</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12345</td>\n",
       "      <td>100</td>\n",
       "      <td>200</td>\n",
       "      <td>300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>34567</td>\n",
       "      <td>400</td>\n",
       "      <td>500</td>\n",
       "      <td>600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>67890</td>\n",
       "      <td>700</td>\n",
       "      <td>800</td>\n",
       "      <td>900</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     zip  factory  warehouse  retail\n",
       "0  12345      100        200     300\n",
       "1  34567      400        500     600\n",
       "2  67890      700        800     900"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "distances = pd.DataFrame([['12345', 100, 200, 300], ['34567', 400, 500, 600], ['67890', 700, 800, 900]],\n",
    "                          columns=['zip', 'factory', 'warehouse', 'retail'])\n",
    "distances"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's pretend that a manufacturing company has three locations: a factory, a warehouse, and a retail store. They've created the DataFrame above, which shows the distance between every US zip code and that particular location.\n",
    "\n",
    "Let's create one more DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>zip</th>\n",
       "      <th>location_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12345</td>\n",
       "      <td>factory</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>34567</td>\n",
       "      <td>warehouse</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id    zip location_type\n",
       "0        1  12345       factory\n",
       "1        2  34567     warehouse"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "users = pd.DataFrame([[1, '12345', 'factory'], [2, '34567', 'warehouse']],\n",
    "                      columns=['user_id', 'zip', 'location_type'])\n",
    "users"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is a DataFrame of users. It shows the user's zip code and the location they would like to visit. We want to add a fourth column to \"users\", which shows the distance between that user and the location they want to visit. This information is available in the \"distances\" DataFrame, but how do we get it into the \"users\" DataFrame?\n",
    "\n",
    "We actually need to merge the DataFrames, but the problem is that the \"distances\" DataFrame doesn't have the right columns for merging. The solution is to reshape it using the `melt()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>zip</th>\n",
       "      <th>location_type</th>\n",
       "      <th>distance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12345</td>\n",
       "      <td>factory</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>34567</td>\n",
       "      <td>factory</td>\n",
       "      <td>400</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>67890</td>\n",
       "      <td>factory</td>\n",
       "      <td>700</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12345</td>\n",
       "      <td>warehouse</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>34567</td>\n",
       "      <td>warehouse</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>67890</td>\n",
       "      <td>warehouse</td>\n",
       "      <td>800</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>12345</td>\n",
       "      <td>retail</td>\n",
       "      <td>300</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>34567</td>\n",
       "      <td>retail</td>\n",
       "      <td>600</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>67890</td>\n",
       "      <td>retail</td>\n",
       "      <td>900</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     zip location_type  distance\n",
       "0  12345       factory       100\n",
       "1  34567       factory       400\n",
       "2  67890       factory       700\n",
       "3  12345     warehouse       200\n",
       "4  34567     warehouse       500\n",
       "5  67890     warehouse       800\n",
       "6  12345        retail       300\n",
       "7  34567        retail       600\n",
       "8  67890        retail       900"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "distances_long = distances.melt(id_vars='zip', var_name='location_type', value_name='distance')\n",
    "distances_long"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We've reshaped the \"distances\" DataFrame from \"wide format\", meaning lots of columns, to \"long format\", meaning lots of rows. It contains the same data as before, but it's now structured such that it can easily be merged with the \"users\" DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>zip</th>\n",
       "      <th>location_type</th>\n",
       "      <th>distance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>12345</td>\n",
       "      <td>factory</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>34567</td>\n",
       "      <td>warehouse</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   user_id    zip location_type  distance\n",
       "0        1  12345       factory       100\n",
       "1        2  34567     warehouse       500"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(users, distances_long)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you're ever confused about \"wide\" versus \"long\" data, the easiest way to recognize a \"wide format\" DataFrame is that it doesn't tell you what you're looking at. For example, it doesn't tell me what these numbers represent, and it doesn't tell me what these column names represent. In contrast, the \"long format\" DataFrame tells you that the numbers represent distance and these names represent location types."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 19. Reverse row order (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might remember the drinks DataFrame from the main video:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Afghanistan</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Albania</td>\n",
       "      <td>89</td>\n",
       "      <td>132</td>\n",
       "      <td>54</td>\n",
       "      <td>4.9</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Algeria</td>\n",
       "      <td>25</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "      <td>0.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Andorra</td>\n",
       "      <td>245</td>\n",
       "      <td>138</td>\n",
       "      <td>312</td>\n",
       "      <td>12.4</td>\n",
       "      <td>Europe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Angola</td>\n",
       "      <td>217</td>\n",
       "      <td>57</td>\n",
       "      <td>45</td>\n",
       "      <td>5.9</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "0  Afghanistan              0                0              0   \n",
       "1      Albania             89              132             54   \n",
       "2      Algeria             25                0             14   \n",
       "3      Andorra            245              138            312   \n",
       "4       Angola            217               57             45   \n",
       "\n",
       "   total_litres_of_pure_alcohol continent  \n",
       "0                           0.0      Asia  \n",
       "1                           4.9    Europe  \n",
       "2                           0.7    Africa  \n",
       "3                          12.4    Europe  \n",
       "4                           5.9    Africa  "
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the method that I taught in the main video for reversing row order, because it will always work:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>64</td>\n",
       "      <td>18</td>\n",
       "      <td>4</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>32</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Yemen</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>111</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>333</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>7.7</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "192   Zimbabwe             64               18              4   \n",
       "191     Zambia             32               19              4   \n",
       "190      Yemen              6                0              0   \n",
       "189    Vietnam            111                2              1   \n",
       "188  Venezuela            333              100              3   \n",
       "\n",
       "     total_litres_of_pure_alcohol      continent  \n",
       "192                           4.7         Africa  \n",
       "191                           2.5         Africa  \n",
       "190                           0.1           Asia  \n",
       "189                           2.0           Asia  \n",
       "188                           7.7  South America  "
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.loc[::-1].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you can use Python's built-in `reversed()` function to reverse the index, and then use that to reindex the DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>continent</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Zimbabwe</td>\n",
       "      <td>64</td>\n",
       "      <td>18</td>\n",
       "      <td>4</td>\n",
       "      <td>4.7</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>Zambia</td>\n",
       "      <td>32</td>\n",
       "      <td>19</td>\n",
       "      <td>4</td>\n",
       "      <td>2.5</td>\n",
       "      <td>Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Yemen</td>\n",
       "      <td>6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>189</th>\n",
       "      <td>Vietnam</td>\n",
       "      <td>111</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2.0</td>\n",
       "      <td>Asia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>188</th>\n",
       "      <td>Venezuela</td>\n",
       "      <td>333</td>\n",
       "      <td>100</td>\n",
       "      <td>3</td>\n",
       "      <td>7.7</td>\n",
       "      <td>South America</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       country  beer_servings  spirit_servings  wine_servings  \\\n",
       "192   Zimbabwe             64               18              4   \n",
       "191     Zambia             32               19              4   \n",
       "190      Yemen              6                0              0   \n",
       "189    Vietnam            111                2              1   \n",
       "188  Venezuela            333              100              3   \n",
       "\n",
       "     total_litres_of_pure_alcohol      continent  \n",
       "192                           4.7         Africa  \n",
       "191                           2.5         Africa  \n",
       "190                           0.1           Asia  \n",
       "189                           2.0           Asia  \n",
       "188                           7.7  South America  "
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.reindex(reversed(drinks.index)).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you decide to use this alternative method, be aware that it will fail if the DataFrame has duplicate values in the index. To demonstrate this, let's give the stocks DataFrame a non-unique index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Close</th>\n",
       "      <th>Volume</th>\n",
       "      <th>Symbol</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>31.50</td>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>57.42</td>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>57.24</td>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>31.35</td>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>57.64</td>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>31.59</td>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Close    Volume Symbol\n",
       "Date                               \n",
       "2016-10-03   31.50  14070500   CSCO\n",
       "2016-10-03  112.52  21701800   AAPL\n",
       "2016-10-03   57.42  19189500   MSFT\n",
       "2016-10-04  113.00  29736800   AAPL\n",
       "2016-10-04   57.24  20085900   MSFT\n",
       "2016-10-04   31.35  18460400   CSCO\n",
       "2016-10-05   57.64  16726400   MSFT\n",
       "2016-10-05   31.59  11808600   CSCO\n",
       "2016-10-05  113.05  21453100   AAPL"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks = stocks.set_index('Date')\n",
    "stocks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the index above is not unique, this will result in an error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "# stocks.reindex(reversed(stocks.index))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 20. Reverse column order (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the method that I taught in the main video for reversing column order, because it will always work:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>continent</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Asia</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Afghanistan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Europe</td>\n",
       "      <td>4.9</td>\n",
       "      <td>54</td>\n",
       "      <td>132</td>\n",
       "      <td>89</td>\n",
       "      <td>Albania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Africa</td>\n",
       "      <td>0.7</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "      <td>Algeria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Europe</td>\n",
       "      <td>12.4</td>\n",
       "      <td>312</td>\n",
       "      <td>138</td>\n",
       "      <td>245</td>\n",
       "      <td>Andorra</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Africa</td>\n",
       "      <td>5.9</td>\n",
       "      <td>45</td>\n",
       "      <td>57</td>\n",
       "      <td>217</td>\n",
       "      <td>Angola</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  continent  total_litres_of_pure_alcohol  wine_servings  spirit_servings  \\\n",
       "0      Asia                           0.0              0                0   \n",
       "1    Europe                           4.9             54              132   \n",
       "2    Africa                           0.7             14                0   \n",
       "3    Europe                          12.4            312              138   \n",
       "4    Africa                           5.9             45               57   \n",
       "\n",
       "   beer_servings      country  \n",
       "0              0  Afghanistan  \n",
       "1             89      Albania  \n",
       "2             25      Algeria  \n",
       "3            245      Andorra  \n",
       "4            217       Angola  "
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks.loc[:, ::-1].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you can use Python's built-in `reversed()` function to reverse the columns attribute, and then pass that as a filter to the DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>continent</th>\n",
       "      <th>total_litres_of_pure_alcohol</th>\n",
       "      <th>wine_servings</th>\n",
       "      <th>spirit_servings</th>\n",
       "      <th>beer_servings</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Asia</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Afghanistan</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Europe</td>\n",
       "      <td>4.9</td>\n",
       "      <td>54</td>\n",
       "      <td>132</td>\n",
       "      <td>89</td>\n",
       "      <td>Albania</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Africa</td>\n",
       "      <td>0.7</td>\n",
       "      <td>14</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "      <td>Algeria</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Europe</td>\n",
       "      <td>12.4</td>\n",
       "      <td>312</td>\n",
       "      <td>138</td>\n",
       "      <td>245</td>\n",
       "      <td>Andorra</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Africa</td>\n",
       "      <td>5.9</td>\n",
       "      <td>45</td>\n",
       "      <td>57</td>\n",
       "      <td>217</td>\n",
       "      <td>Angola</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  continent  total_litres_of_pure_alcohol  wine_servings  spirit_servings  \\\n",
       "0      Asia                           0.0              0                0   \n",
       "1    Europe                           4.9             54              132   \n",
       "2    Africa                           0.7             14                0   \n",
       "3    Europe                          12.4            312              138   \n",
       "4    Africa                           5.9             45               57   \n",
       "\n",
       "   beer_servings      country  \n",
       "0              0  Afghanistan  \n",
       "1             89      Albania  \n",
       "2             25      Algeria  \n",
       "3            245      Andorra  \n",
       "4            217       Angola  "
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "drinks[reversed(drinks.columns)].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you decide to use this alternative method, be aware that it will fail if the DataFrame has duplicate column names. To demonstrate this, let's rename two of the columns in the stocks DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Close</th>\n",
       "      <th>XYZ</th>\n",
       "      <th>XYZ</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>31.50</td>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>112.52</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>57.42</td>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>113.00</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>57.24</td>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>31.35</td>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>57.64</td>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>31.59</td>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>113.05</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             Close       XYZ   XYZ\n",
       "Date                              \n",
       "2016-10-03   31.50  14070500  CSCO\n",
       "2016-10-03  112.52  21701800  AAPL\n",
       "2016-10-03   57.42  19189500  MSFT\n",
       "2016-10-04  113.00  29736800  AAPL\n",
       "2016-10-04   57.24  20085900  MSFT\n",
       "2016-10-04   31.35  18460400  CSCO\n",
       "2016-10-05   57.64  16726400  MSFT\n",
       "2016-10-05   31.59  11808600  CSCO\n",
       "2016-10-05  113.05  21453100  AAPL"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks = stocks.rename({'Symbol':'XYZ', 'Volume':'XYZ'}, axis='columns')\n",
    "stocks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Since the column names are not unique, you will get multiple copies of those columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>XYZ</th>\n",
       "      <th>XYZ</th>\n",
       "      <th>XYZ</th>\n",
       "      <th>XYZ</th>\n",
       "      <th>Close</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>14070500</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>31.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>21701800</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>112.52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-03</th>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>19189500</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>57.42</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>29736800</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>113.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>20085900</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>57.24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-04</th>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>18460400</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>31.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>16726400</td>\n",
       "      <td>MSFT</td>\n",
       "      <td>57.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>11808600</td>\n",
       "      <td>CSCO</td>\n",
       "      <td>31.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2016-10-05</th>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>21453100</td>\n",
       "      <td>AAPL</td>\n",
       "      <td>113.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 XYZ   XYZ       XYZ   XYZ   Close\n",
       "Date                                              \n",
       "2016-10-03  14070500  CSCO  14070500  CSCO   31.50\n",
       "2016-10-03  21701800  AAPL  21701800  AAPL  112.52\n",
       "2016-10-03  19189500  MSFT  19189500  MSFT   57.42\n",
       "2016-10-04  29736800  AAPL  29736800  AAPL  113.00\n",
       "2016-10-04  20085900  MSFT  20085900  MSFT   57.24\n",
       "2016-10-04  18460400  CSCO  18460400  CSCO   31.35\n",
       "2016-10-05  16726400  MSFT  16726400  MSFT   57.64\n",
       "2016-10-05  11808600  CSCO  11808600  CSCO   31.59\n",
       "2016-10-05  21453100  AAPL  21453100  AAPL  113.05"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "stocks[reversed(stocks.columns)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 21. Split a string into multiple columns (alternative)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's an example DataFrame:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Arthur Doe</td>\n",
       "      <td>Los Angeles, CA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Ann Smith</td>\n",
       "      <td>Washington, DC</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              name         location\n",
       "0  John Arthur Doe  Los Angeles, CA\n",
       "1   Jane Ann Smith   Washington, DC"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({'name':['John Arthur Doe', 'Jane Ann Smith'], 'location':['Los Angeles, CA', 'Washington, DC']})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the method that I taught in the main video for splitting the \"name\" string into multiple columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>location</th>\n",
       "      <th>first</th>\n",
       "      <th>middle</th>\n",
       "      <th>last</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Arthur Doe</td>\n",
       "      <td>Los Angeles, CA</td>\n",
       "      <td>John</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>Doe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Ann Smith</td>\n",
       "      <td>Washington, DC</td>\n",
       "      <td>Jane</td>\n",
       "      <td>Ann</td>\n",
       "      <td>Smith</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              name         location first  middle   last\n",
       "0  John Arthur Doe  Los Angeles, CA  John  Arthur    Doe\n",
       "1   Jane Ann Smith   Washington, DC  Jane     Ann  Smith"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['first', 'middle', 'last']] = df.name.str.split(' ', expand=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here is an alternative method that also works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>location</th>\n",
       "      <th>first</th>\n",
       "      <th>middle</th>\n",
       "      <th>last</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Arthur Doe</td>\n",
       "      <td>Los Angeles, CA</td>\n",
       "      <td>John</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>Doe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Ann Smith</td>\n",
       "      <td>Washington, DC</td>\n",
       "      <td>Jane</td>\n",
       "      <td>Ann</td>\n",
       "      <td>Smith</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              name         location first  middle   last\n",
       "0  John Arthur Doe  Los Angeles, CA  John  Arthur    Doe\n",
       "1   Jane Ann Smith   Washington, DC  Jane     Ann  Smith"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['first'], df['middle'], df['last'] = zip(*df.name.str.split(' '))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's how the alternative method works. First, `str.split()` splits on a space character and returns a Series of two lists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    [John, Arthur, Doe]\n",
       "1     [Jane, Ann, Smith]\n",
       "Name: name, dtype: object"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.name.str.split(' ')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then, you unpack the Series using the asterisk, and zip the lists back together using the `zip()` function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[('John', 'Jane'), ('Arthur', 'Ann'), ('Doe', 'Smith')]"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(zip(*df.name.str.split(' ')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first, middle, and last names are now paired together as tuples. These tuples become three new DataFrame columns through multiple assignment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>location</th>\n",
       "      <th>first</th>\n",
       "      <th>middle</th>\n",
       "      <th>last</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>John Arthur Doe</td>\n",
       "      <td>Los Angeles, CA</td>\n",
       "      <td>John</td>\n",
       "      <td>Arthur</td>\n",
       "      <td>Doe</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Jane Ann Smith</td>\n",
       "      <td>Washington, DC</td>\n",
       "      <td>Jane</td>\n",
       "      <td>Ann</td>\n",
       "      <td>Smith</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              name         location first  middle   last\n",
       "0  John Arthur Doe  Los Angeles, CA  John  Arthur    Doe\n",
       "1   Jane Ann Smith   Washington, DC  Jane     Ann  Smith"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['first'], df['middle'], df['last'] = zip(*df.name.str.split(' '))\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Want more tricks? Watch [My top 25 pandas tricks](https://www.youtube.com/watch?v=RlIiVeig3hc&list=PL5-da3qGB5ICCsgW1MxlZ0Hq8LL5U3u9y&index=35) or [Read the notebook](https://nbviewer.jupyter.org/github/justmarkham/pandas-videos/blob/master/top_25_pandas_tricks.ipynb)\n",
    "\n",
    "© 2022 [Data School](https://www.dataschool.io). All rights reserved."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}