{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Missing Values, Datetime, Aggregation, and Merging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture Notes and in-class exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "tc = unittest.TestCase()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Import the following Python packages.\n", " 1. `pandas`: Use alias `pd`.\n", " 2. `numpy`: Use alias `np`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "import pandas as pd\n", "import numpy as np\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import sys\n", "tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')\n", "tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 👉 Working with Missing Values\n", "\n", "A dataset commonly can contain one or more missing values. Pandas provides a flexible set of methods for you to work with missing values. Missing values can be represented in multiple ways. Here are few common notations that denote a missing value.\n", "\n", "- `NaN`: NaN stands for \"not a number\".\n", "- `None`: python's built-in type (equivalent to `null` in other programming languages)\n", "- `NA`: A native scalar to denote a missing value in `pandas`\n", "- `NaT`: NaT stands for \"not a time\". " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "nan\n" ] } ], "source": [ "# NaN\n", "nan_example = np.nan\n", "print(nan_example)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "None\n" ] } ], "source": [ "# None\n", "none_example = None\n", "print(none_example)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# NA\n", "na_example = pd.NA\n", "print(na_example)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "NaT\n" ] } ], "source": [ "# NaT\n", "nat_example = pd.NaT\n", "print(nat_example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Load data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to create a new `DataFrame` named `df_people`." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemajor1major2cityfav_restaurantfav_moviehas_iphone
0Dwight BoonAccountancyMarketingFairfaxChick-fil-aFord vs FerrariFalse
1Jaclyn KayMarketingNaNPloiestiNaNNaNFalse
2Wade KitchensSupply Chain ManagementMarketingNapervilleNoodles and CompanyThe MummyTrue
3Rajeev FryeStatisticsNaNChicagoSoho HouseThe Shawshank RedemptionFalse
4Rowland WaldoNondegreeNaNGrayslakeChipotleInceptionTrue
\n", "
" ], "text/plain": [ " name major1 major2 city \\\n", "0 Dwight Boon Accountancy Marketing Fairfax \n", "1 Jaclyn Kay Marketing NaN Ploiesti \n", "2 Wade Kitchens Supply Chain Management Marketing Naperville \n", "3 Rajeev Frye Statistics NaN Chicago \n", "4 Rowland Waldo Nondegree NaN Grayslake \n", "\n", " fav_restaurant fav_movie has_iphone \n", "0 Chick-fil-a Ford vs Ferrari False \n", "1 NaN NaN False \n", "2 Noodles and Company The Mummy True \n", "3 Soho House The Shawshank Redemption False \n", "4 Chipotle Inception True " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_people = pd.read_csv(\"https://raw.githubusercontent.com/UI-Deloitte-business-analytics-center/datasets/main/people-sample.csv\")\n", "\n", "# Used to keep a clean copy\n", "df_people_backup = df_people.copy()\n", "\n", "# head() displays the first 5 rows of a DataFrame\n", "df_people.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table below describes each column in `df_people`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "| Column Name | Description |\n", "|-------------------------|-----------------------------------------------------------|\n", "| name | Full name |\n", "| major1 | Major |\n", "| major2 | Second major OR minor (blank if no second major or no minor) |\n", "| city | City the person is from |\n", "| fav_restaurant | Favorite restaurant (blank if no restaurant was given) |\n", "| has_iphone | Whether the person use an iPhone |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 1: People from Grayslake" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_people`, filter rows where the person is from `\"Grayslake\"`.\n", " - Check whether the `city` column is equal to `\"Grayslake\"`.\n", " - Store the result to a new variable named `df_grayslake`.\n", "- ✔️ `df_people` should remain unaltered after your code." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemajor1major2cityfav_restaurantfav_moviehas_iphone
4Rowland WaldoNondegreeNaNGrayslakeChipotleInceptionTrue
19Maud DickensonAccountancyNaNGrayslakeBig BowlThe Trial of Chicago 7True
\n", "
" ], "text/plain": [ " name major1 major2 city fav_restaurant \\\n", "4 Rowland Waldo Nondegree NaN Grayslake Chipotle \n", "19 Maud Dickenson Accountancy NaN Grayslake Big Bowl \n", "\n", " fav_movie has_iphone \n", "4 Inception True \n", "19 The Trial of Chicago 7 True " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_grayslake = df_people[df_people[\"city\"] == \"Grayslake\"]\n", "### END SOLUTION\n", "\n", "df_grayslake" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-03", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# df_people should remain unaltered\n", "pd.testing.assert_frame_equal(df_people, df_people_backup)\n", "\n", "pd.testing.assert_frame_equal(df_grayslake.sort_values(df_grayslake.columns.tolist()).reset_index(drop=True),\n", " df_people_backup.query(f'{\"cItY\".lower()} == \"{\"GraYsLakE\".capitalize()}\"')\n", " .sort_values(df_people_backup.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 2: Anyone with a non-missing `major2`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_people`, filter rows where the person has a second major or a minor.\n", " - You're looking for rows where `major2` is not `NaN`.\n", "- ✔️ `NaN` is a special value to denote missing value. You must use `my_series.isna()` or `my_series.notna()` to check whether a row contains a missing value or not.\n", "- ✔️ Store the result to a new variable named `df_major2`.\n", "- ✔️ `df_people` should remain unaltered after your code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🚀 Hints\n", "\n", "- `my_series.notna()` can be used to check whether a row contains a missing value or not.\n", "\n", "![notna](https://github.com/bdi475/images/blob/main/pandas/notna-series.png?raw=true)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemajor1major2cityfav_restaurantfav_moviehas_iphone
0Dwight BoonAccountancyMarketingFairfaxChick-fil-aFord vs FerrariFalse
2Wade KitchensSupply Chain ManagementMarketingNapervilleNoodles and CompanyThe MummyTrue
9Eldred BullardEconometrics & Quant EconComputer ScienceEvanstonCravingsTRON: LegacyFalse
10Conrad SargentAdvertisingMarketingEvanstonDominosNaNTrue
11Berenice LewisComputer Science & EconomBusinessUrbanaOlive GardenNaNTrue
12Monica MorrisStatistics & Computer SciBusinessOak ParkChipotleRoundersTrue
14Sara JansonComputer Science & EconomStatisticsChicagoTexas RoadhouseStepbrothersTrue
15Inderjeet JoynerAccountancyFinanceHighland ParkSushi KingWolf of Wall StreetTrue
17Tate ParentAccountancyFinanceUrbanaPizzaria AnticaNaNTrue
20Flynn WoodrowAccountancySupply Chain ManagementWuxiHaidilaoInterstellarTrue
29Bailey FarnhamComputer ScienceBusinessRockfordChipotleNaNTrue
\n", "
" ], "text/plain": [ " name major1 major2 \\\n", "0 Dwight Boon Accountancy Marketing \n", "2 Wade Kitchens Supply Chain Management Marketing \n", "9 Eldred Bullard Econometrics & Quant Econ Computer Science \n", "10 Conrad Sargent Advertising Marketing \n", "11 Berenice Lewis Computer Science & Econom Business \n", "12 Monica Morris Statistics & Computer Sci Business \n", "14 Sara Janson Computer Science & Econom Statistics \n", "15 Inderjeet Joyner Accountancy Finance \n", "17 Tate Parent Accountancy Finance \n", "20 Flynn Woodrow Accountancy Supply Chain Management \n", "29 Bailey Farnham Computer Science Business \n", "\n", " city fav_restaurant fav_movie has_iphone \n", "0 Fairfax Chick-fil-a Ford vs Ferrari False \n", "2 Naperville Noodles and Company The Mummy True \n", "9 Evanston Cravings TRON: Legacy False \n", "10 Evanston Dominos NaN True \n", "11 Urbana Olive Garden NaN True \n", "12 Oak Park Chipotle Rounders True \n", "14 Chicago Texas Roadhouse Stepbrothers True \n", "15 Highland Park Sushi King Wolf of Wall Street True \n", "17 Urbana Pizzaria Antica NaN True \n", "20 Wuxi Haidilao Interstellar True \n", "29 Rockford Chipotle NaN True " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_major2 = df_people[df_people[\"major2\"].notna()]\n", "### END SOLUTION\n", "\n", "df_major2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-04", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# df_people should remain unaltered\n", "pd.testing.assert_frame_equal(df_people, df_people_backup)\n", "\n", "pd.testing.assert_frame_equal(df_major2.sort_values(df_major2.columns.tolist()).reset_index(drop=True),\n", " df_people_backup.query(f\"major2 == major2\")\n", " .sort_values(df_people_backup.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 3: Anyone without a favorite movie\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_people`, filter rows where the person's `fav_movie` is `NaN`.\n", "- ✔️ `NaN` is a special value to denote missing value. You must use `my_series.isna()` or `my_series.notna()` to compare a `Series` with `NaN`.\n", "- ✔️ Store the result to a new variable named `df_no_fav_movie`.\n", "- ✔️ `df_people` should remain unaltered after your code.\n", "\n", "#### 🚀 Hints\n", "\n", "- `my_series.isna()` can be used to check whether a row contains a missing value." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namemajor1major2cityfav_restaurantfav_moviehas_iphone
1Jaclyn KayMarketingNaNPloiestiNaNNaNFalse
10Conrad SargentAdvertisingMarketingEvanstonDominosNaNTrue
11Berenice LewisComputer Science & EconomBusinessUrbanaOlive GardenNaNTrue
16Kimberlee GuptaComputer ScienceNaNNapervilleNaNNaNTrue
17Tate ParentAccountancyFinanceUrbanaPizzaria AnticaNaNTrue
21Erick HollinsPsychologyNaNOrland ParkNaNNaNTrue
22Marsha BrettAccountancyNaNLake ForestNaNNaNTrue
23Chanda NealNondegreeNaNBuffalo GroveMcDonaldsNaNTrue
24Lei TylerMathematicsNaNVernon HillsNaNNaNTrue
26Xinyi CooksonChemical EngineeringNaNNorthbrookBurger KingNaNFalse
27Layton OvertonInformation SciencesNaNChampaignNoodles and CompanyNaNTrue
28Reshmi BackusNondegreeNaNUrbanaHaidilaoNaNTrue
29Bailey FarnhamComputer ScienceBusinessRockfordChipotleNaNTrue
30Prabhakar RyleyEconometrics & Quant EconNaNDubaiCalifornia Pizza KitchenNaNTrue
31Monday CliffordEconomicsNaNFremontOlive GardenNaNTrue
33Miranda SteeleStatisticsNaNChampaignNaNNaNTrue
34Viraj TathamInformation SciencesNaNChampaignNaNNaNTrue
36Yuwen XiangStatisticsNaNNorthbrookNaNNaNTrue
37Eleanore PowersAccountancyNaNVernon HillsChick-fil-aNaNFalse
38Sachin WestonComputer Science & EconomNaNNapervilleMcDonaldsNaNFalse
\n", "
" ], "text/plain": [ " name major1 major2 city \\\n", "1 Jaclyn Kay Marketing NaN Ploiesti \n", "10 Conrad Sargent Advertising Marketing Evanston \n", "11 Berenice Lewis Computer Science & Econom Business Urbana \n", "16 Kimberlee Gupta Computer Science NaN Naperville \n", "17 Tate Parent Accountancy Finance Urbana \n", "21 Erick Hollins Psychology NaN Orland Park \n", "22 Marsha Brett Accountancy NaN Lake Forest \n", "23 Chanda Neal Nondegree NaN Buffalo Grove \n", "24 Lei Tyler Mathematics NaN Vernon Hills \n", "26 Xinyi Cookson Chemical Engineering NaN Northbrook \n", "27 Layton Overton Information Sciences NaN Champaign \n", "28 Reshmi Backus Nondegree NaN Urbana \n", "29 Bailey Farnham Computer Science Business Rockford \n", "30 Prabhakar Ryley Econometrics & Quant Econ NaN Dubai \n", "31 Monday Clifford Economics NaN Fremont \n", "33 Miranda Steele Statistics NaN Champaign \n", "34 Viraj Tatham Information Sciences NaN Champaign \n", "36 Yuwen Xiang Statistics NaN Northbrook \n", "37 Eleanore Powers Accountancy NaN Vernon Hills \n", "38 Sachin Weston Computer Science & Econom NaN Naperville \n", "\n", " fav_restaurant fav_movie has_iphone \n", "1 NaN NaN False \n", "10 Dominos NaN True \n", "11 Olive Garden NaN True \n", "16 NaN NaN True \n", "17 Pizzaria Antica NaN True \n", "21 NaN NaN True \n", "22 NaN NaN True \n", "23 McDonalds NaN True \n", "24 NaN NaN True \n", "26 Burger King NaN False \n", "27 Noodles and Company NaN True \n", "28 Haidilao NaN True \n", "29 Chipotle NaN True \n", "30 California Pizza Kitchen NaN True \n", "31 Olive Garden NaN True \n", "33 NaN NaN True \n", "34 NaN NaN True \n", "36 NaN NaN True \n", "37 Chick-fil-a NaN False \n", "38 McDonalds NaN False " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_no_fav_movie = df_people[df_people[\"fav_movie\"].isna()]\n", "### END SOLUTION\n", "\n", "df_no_fav_movie" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-05", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# df_people should remain unaltered\n", "pd.testing.assert_frame_equal(df_people, df_people_backup)\n", "\n", "pd.testing.assert_frame_equal(df_no_fav_movie.sort_values(df_no_fav_movie.columns.tolist()).reset_index(drop=True),\n", " df_people_backup.query(f\"fav_movie != fav_movie\")\n", " .sort_values(df_people_backup.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## 🗓️ Working with Datetime Values\n", "\n", "You will often see date-looking strings in your data. A few examples are:\n", "\n", "- `20210315`\n", "- `Mar 15, 2021`\n", "- `2020-03-15`\n", "- `2020/3/15`\n", "\n", "In this part, we'll discuss how we can *parse* and utilize datetime values." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Load employees data\n", "\n", "▶️ Run the code cell below to create a new `DataFrame` named `df_emp`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonestart_datesalary
030TalalFinance(217)123-45002017-05-01202000
140JoshPurchaseNaN2018-02-01185000
210AnikaFinanceNaN2020-08-01240000
320AishaniPurchase(217)987-66002019-12-01160500
\n", "
" ], "text/plain": [ " emp_id name dept office_phone start_date salary\n", "0 30 Talal Finance (217)123-4500 2017-05-01 202000\n", "1 40 Josh Purchase NaN 2018-02-01 185000\n", "2 10 Anika Finance NaN 2020-08-01 240000\n", "3 20 Aishani Purchase (217)987-6600 2019-12-01 160500" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_emp = pd.DataFrame({\n", " 'emp_id': [30, 40, 10, 20],\n", " 'name': ['Talal', 'Josh', 'Anika', 'Aishani'],\n", " 'dept': ['Finance', 'Purchase', 'Finance', 'Purchase'],\n", " 'office_phone': ['(217)123-4500', np.nan, np.nan, '(217)987-6600'],\n", " 'start_date': ['2017-05-01', '2018-02-01', '2020-08-01', '2019-12-01'],\n", " 'salary': [202000, 185000, 240000, 160500]\n", "})\n", "\n", "# Used for intermediate checks\n", "df_emp_backup = df_emp.copy()\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Question**: What is the data type of the `start_date` column?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run `str(df_emp['start_date'].dtype)` below to see the data type of the `start_date` column." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'object'" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "str(df_emp['start_date'].dtype)\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While `object` can refer to many different types, you can safely assume that all `object` data types you see in this course refer to strings." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 4: Parse a string column as datetime\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Parse `start_date` to a `datetime` data type.\n", "- ✔️ Store the result to a new column named `start_date_parsed`.\n", "\n", "#### 🚀 Hints\n", "\n", "The code below converts `date_str` column to a `datetime`-typed column and stores the converted result to a new column named `date_parsed`.\n", "\n", "```python\n", "my_dataframe['date_parsed'] = pd.to_datetime(my_dataframe['date_str'])\n", "```" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonestart_datesalarystart_date_parsed
030TalalFinance(217)123-45002017-05-012020002017-05-01
140JoshPurchaseNaN2018-02-011850002018-02-01
210AnikaFinanceNaN2020-08-012400002020-08-01
320AishaniPurchase(217)987-66002019-12-011605002019-12-01
\n", "
" ], "text/plain": [ " emp_id name dept office_phone start_date salary \\\n", "0 30 Talal Finance (217)123-4500 2017-05-01 202000 \n", "1 40 Josh Purchase NaN 2018-02-01 185000 \n", "2 10 Anika Finance NaN 2020-08-01 240000 \n", "3 20 Aishani Purchase (217)987-6600 2019-12-01 160500 \n", "\n", " start_date_parsed \n", "0 2017-05-01 \n", "1 2018-02-01 \n", "2 2020-08-01 \n", "3 2019-12-01 " ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp['start_date_parsed'] = pd.to_datetime(df_emp['start_date'])\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "tc.assertEqual(set(df_emp.columns), set(df_emp_backup.columns.tolist() + ['start_date_parsed']))\n", "pd.testing.assert_series_equal(df_emp['start_date_parsed'].reset_index(drop=True),\n", " pd.to_datetime(df_emp_backup['_'.join(['sTarT', 'DaTe']).lower()])\n", " .reset_index(drop=True),\n", " check_names=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "▶️ Run `str(df_emp['start_date_parsed'].dtype)` below to see the data type of the `start_date` column." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-06", "locked": true, "points": "1", "solution": false } }, "outputs": [ { "data": { "text/plain": [ "'datetime64[ns]'" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# BEGIN SOLUTION\n", "str(df_emp['start_date_parsed'].dtype)\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 5: Drop `start_date` column *in-place*\n", "\n", "We no longer need the `start_date` column. We'll work with the new `start_date_parsed` column from this point on.\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Drop `start_date` column from `df_emp` *in-place*.\n", "\n", "#### 🚀 Hints\n", "\n", "The code below drops `col1` from `my_dataframe` *in-place* without creating a new variable.\n", "\n", "```python\n", "my_dataframe.drop(columns=['col1'], inplace=True)\n", "```" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonesalarystart_date_parsed
030TalalFinance(217)123-45002020002017-05-01
140JoshPurchaseNaN1850002018-02-01
210AnikaFinanceNaN2400002020-08-01
320AishaniPurchase(217)987-66001605002019-12-01
\n", "
" ], "text/plain": [ " emp_id name dept office_phone salary start_date_parsed\n", "0 30 Talal Finance (217)123-4500 202000 2017-05-01\n", "1 40 Josh Purchase NaN 185000 2018-02-01\n", "2 10 Anika Finance NaN 240000 2020-08-01\n", "3 20 Aishani Purchase (217)987-6600 160500 2019-12-01" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp.drop(columns=['start_date'], inplace=True)\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-07", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "df_check = df_emp_backup.copy()\n", "df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])\n", "df_check = df_check.drop(columns=['start_date'])\n", "\n", "# Check result\n", "tc.assertEqual(set(df_emp.columns), set(['start_date_parsed', 'salary', 'office_phone', 'dept', 'name', 'emp_id']))\n", "pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 6: Rename `start_date_parsed` to `start_date`\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Rename `start_date_parsed` to `start_date` in `df_emp` *in-place*.\n", "\n", "#### 🚀 Hints\n", "\n", "The code below renames the `name_before` column to `name_after` in `my_dataframe` *in-place* without creating a new variable.\n", "\n", "```python\n", "my_dataframe.rename(columns={'name_before': 'name_after'}, inplace=True)\n", "```" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonesalarystart_date
030TalalFinance(217)123-45002020002017-05-01
140JoshPurchaseNaN1850002018-02-01
210AnikaFinanceNaN2400002020-08-01
320AishaniPurchase(217)987-66001605002019-12-01
\n", "
" ], "text/plain": [ " emp_id name dept office_phone salary start_date\n", "0 30 Talal Finance (217)123-4500 202000 2017-05-01\n", "1 40 Josh Purchase NaN 185000 2018-02-01\n", "2 10 Anika Finance NaN 240000 2020-08-01\n", "3 20 Aishani Purchase (217)987-6600 160500 2019-12-01" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp.rename(columns={'start_date_parsed': 'start_date'}, inplace=True)\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-08", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "df_check = df_emp_backup.copy()\n", "df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])\n", "df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})\n", "\n", "# Check result\n", "pd.testing.assert_frame_equal(df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 7: Extract year from a datetime column\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Create a new column named `start_year` in `df_emp` that contains the starting years in integers (e.g., `2017`, `2018`).\n", "- ✔️ Extract values from `df_emp['start_date']`.\n", "\n", "#### 🚀 Hints\n", "\n", "The code extracts the year of a datetime column `my_date` and stores it to a new column named `my_year`.\n", "\n", "```python\n", "my_dataframe['my_year'] = my_dataframe['my_date'].dt.year\n", "```" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonesalarystart_datestart_year
030TalalFinance(217)123-45002020002017-05-012017
140JoshPurchaseNaN1850002018-02-012018
210AnikaFinanceNaN2400002020-08-012020
320AishaniPurchase(217)987-66001605002019-12-012019
\n", "
" ], "text/plain": [ " emp_id name dept office_phone salary start_date start_year\n", "0 30 Talal Finance (217)123-4500 202000 2017-05-01 2017\n", "1 40 Josh Purchase NaN 185000 2018-02-01 2018\n", "2 10 Anika Finance NaN 240000 2020-08-01 2020\n", "3 20 Aishani Purchase (217)987-6600 160500 2019-12-01 2019" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp['start_year'] = df_emp['start_date'].dt.year\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-09", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "df_check = df_emp_backup.copy()\n", "df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])\n", "df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})\n", "df_check['_'.join(['sTarT', 'yEaR']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.year\n", "\n", "df_emp_check = df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True)\n", "df_check = df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True)\n", "cols_to_check = ['emp_id', 'start_year']\n", "\n", "# Check result\n", "pd.testing.assert_frame_equal(df_emp_check[cols_to_check],\n", " df_check[cols_to_check])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 8: Extract month, day of month from a datetime column\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Create new columns named `start_month` and `start_day` in `df_emp` that contain the starting months and days of month in integers.\n", "- ✔️ Extract values from `df_emp['start_date']`.\n", "\n", "#### 🚀 Hints\n", "\n", "The code extracts the months and days of a datetime column `my_date` and stores it to two new columns.\n", "\n", "```python\n", "my_dataframe['my_month'] = my_dataframe['my_date'].dt.month\n", "my_dataframe['my_day'] = my_dataframe['my_date'].dt.day\n", "```" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonesalarystart_datestart_yearstart_monthstart_day
030TalalFinance(217)123-45002020002017-05-01201751
140JoshPurchaseNaN1850002018-02-01201821
210AnikaFinanceNaN2400002020-08-01202081
320AishaniPurchase(217)987-66001605002019-12-012019121
\n", "
" ], "text/plain": [ " emp_id name dept office_phone salary start_date start_year \\\n", "0 30 Talal Finance (217)123-4500 202000 2017-05-01 2017 \n", "1 40 Josh Purchase NaN 185000 2018-02-01 2018 \n", "2 10 Anika Finance NaN 240000 2020-08-01 2020 \n", "3 20 Aishani Purchase (217)987-6600 160500 2019-12-01 2019 \n", "\n", " start_month start_day \n", "0 5 1 \n", "1 2 1 \n", "2 8 1 \n", "3 12 1 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp['start_month'] = df_emp['start_date'].dt.month\n", "df_emp['start_day'] = df_emp['start_date'].dt.day\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-10", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "df_check = df_emp_backup.copy()\n", "df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])\n", "df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})\n", "df_check['_'.join(['sTarT', 'mOnTh']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.month\n", "df_check['_'.join(['sTarT', 'dAy']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.day\n", "\n", "df_emp_check = df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True)\n", "df_check = df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True)\n", "cols_to_check = ['emp_id', 'start_month', 'start_day']\n", "\n", "# Check result\n", "pd.testing.assert_frame_equal(df_emp_check[cols_to_check],\n", " df_check[cols_to_check])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Challenge 9: Extract quarter, weekday of month from a datetime column\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Create new columns named `start_quarter` and `start_weekday` in `df_emp` that contain the starting quarters and weekdays of month in integers.\n", "- ✔️ Extract values from `df_emp['start_date']`.\n", "\n", "#### 🚀 Hints\n", "\n", "The code extracts the quarters and weekdays of a datetime column `my_date` and stores it to two new columns.\n", "\n", "```python\n", "my_dataframe['my_quarter'] = my_dataframe['my_date'].dt.quarter\n", "my_dataframe['my_weekday'] = my_dataframe['my_date'].dt.weekday\n", "```" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamedeptoffice_phonesalarystart_datestart_yearstart_monthstart_daystart_quarterstart_weekday
030TalalFinance(217)123-45002020002017-05-0120175120
140JoshPurchaseNaN1850002018-02-0120182113
210AnikaFinanceNaN2400002020-08-0120208135
320AishaniPurchase(217)987-66001605002019-12-01201912146
\n", "
" ], "text/plain": [ " emp_id name dept office_phone salary start_date start_year \\\n", "0 30 Talal Finance (217)123-4500 202000 2017-05-01 2017 \n", "1 40 Josh Purchase NaN 185000 2018-02-01 2018 \n", "2 10 Anika Finance NaN 240000 2020-08-01 2020 \n", "3 20 Aishani Purchase (217)987-6600 160500 2019-12-01 2019 \n", "\n", " start_month start_day start_quarter start_weekday \n", "0 5 1 2 0 \n", "1 2 1 1 3 \n", "2 8 1 3 5 \n", "3 12 1 4 6 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_emp['start_quarter'] = df_emp['start_date'].dt.quarter\n", "df_emp['start_weekday'] = df_emp['start_date'].dt.weekday\n", "### END SOLUTION\n", "\n", "df_emp" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-10", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "df_check = df_emp_backup.copy()\n", "df_check['_'.join(['sTarT', 'DaTe', 'pArSeD']).lower()] = pd.to_datetime(df_check['start_date'])\n", "df_check = df_check.drop(columns=['start_date']).rename(columns={'start_date_parsed': 'start_date'})\n", "df_check['_'.join(['sTarT', 'qUarTer']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.quarter\n", "df_check['_'.join(['sTarT', 'wEeKDaY']).lower()] = df_check['_'.join(['sTarT', 'dAtE']).lower()].dt.weekday\n", "\n", "df_emp_check = df_emp.sort_values(df_emp.columns.tolist()).reset_index(drop=True)\n", "df_check = df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True)\n", "cols_to_check = ['emp_id', 'start_quarter', 'start_weekday']\n", "\n", "# Check result\n", "pd.testing.assert_frame_equal(df_emp_check[cols_to_check],\n", " df_check[cols_to_check])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.3" } }, "nbformat": 4, "nbformat_minor": 4 }