{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Merge and String Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture Notes and in-class exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import unittest\n", "tc = unittest.TestCase()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Import the following Python packages.\n", " 1. `pandas`: Use alias `pd`.\n", " 2. `numpy`: Use alias `np`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "import pandas as pd\n", "import numpy as np\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import sys\n", "tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')\n", "tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Load employees and work laptops data\n", "\n", "For the first part, we're going to work with a small DataFrame to see how we merge two DataFrames together.\n", "\n", "▶️ Run the code cell below to create `df_employees` and `df_laptops`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df_employees = pd.DataFrame({\n", " 'emp_id': [1, 2, 3],\n", " 'name': ['Jasper', 'Gary', 'Sally'],\n", " 'laptop_id': ['A', 'B', np.nan]\n", "})\n", "\n", "df_laptops = pd.DataFrame({\n", " 'laptop_id': ['A', 'B', 'C', 'D'],\n", " 'model': ['Red Touchbook', 'BlueGo', 'Eco Green', 'Hackbook Pro']\n", "})\n", "\n", "# Used for 🧭 Check Your Work sections\n", "df_employees_check = df_employees.copy()\n", "df_laptops_check = df_laptops.copy()\n", "df_join_check = df_employees_check.merge(df_laptops, on='laptop_id', how='outer')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to display `df_employees`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamelaptop_id
01JasperA
12GaryB
23SallyNaN
\n", "
" ], "text/plain": [ " emp_id name laptop_id\n", "0 1 Jasper A\n", "1 2 Gary B\n", "2 3 Sally NaN" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_employees" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to display `df_laptops`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
laptop_idmodel
0ARed Touchbook
1BBlueGo
2CEco Green
3DHackbook Pro
\n", "
" ], "text/plain": [ " laptop_id model\n", "0 A Red Touchbook\n", "1 B BlueGo\n", "2 C Eco Green\n", "3 D Hackbook Pro" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_laptops" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 1: Inner merge\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Find employees who have been assigned a work laptop.\n", "- ✔️ In other words, merge `df_employees` and `df_laptop` using an inner merge.\n", "- ✔️ Store the merged result to a new variable named `df_inner`.\n", "\n", "#### 🚀 Sample Code\n", "\n", "```python\n", "df_inner = pd.merge(\n", " left=...,\n", " right=...,\n", " on='...',\n", " how='...'\n", ")\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | emp_id | name | laptop_id | model |\n", "|---:|---------:|:-------|:------------|:--------------|\n", "| 0 | 1 | Jasper | A | Red Touchbook |\n", "| 1 | 2 | Gary | B | BlueGo |" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamelaptop_idmodel
01JasperARed Touchbook
12GaryBBlueGo
\n", "
" ], "text/plain": [ " emp_id name laptop_id model\n", "0 1 Jasper A Red Touchbook\n", "1 2 Gary B BlueGo" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_inner = pd.merge(\n", " left=df_employees,\n", " right=df_laptops,\n", " on='laptop_id',\n", " how='inner'\n", ")\n", "### END SOLUTION\n", "\n", "display(df_inner)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_jc = df_join_check\n", "df_sol = df_jc[df_jc['emp_id'].notna() & df_jc['laptop_id'].notna()].reset_index(drop=True)\n", "\n", "pd.testing.assert_frame_equal(\n", " df_inner.reset_index(drop=True),\n", " df_sol.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 2: Left merge\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ List all employees and their assigned work laptops - if they are assigned one.\n", "- ✔️ If an employee has not been assigned a work laptop, leave `'laptop_id'` and `'model'` as np.NaN (or any other null-like value).\n", "- ✔️ In other words, merge `df_employees` and `df_laptop` using a left merge.\n", "- ✔️ Store the merged result to a new variable named `df_left`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | emp_id | name | laptop_id | model |\n", "|---:|---------:|:-------|:------------|:--------------|\n", "| 0 | 1 | Jasper | A | Red Touchbook |\n", "| 1 | 2 | Gary | B | BlueGo |\n", "| 2 | 3 | Sally | NaN | NaN |" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamelaptop_idmodel
01JasperARed Touchbook
12GaryBBlueGo
23SallyNaNNaN
\n", "
" ], "text/plain": [ " emp_id name laptop_id model\n", "0 1 Jasper A Red Touchbook\n", "1 2 Gary B BlueGo\n", "2 3 Sally NaN NaN" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_left = pd.merge(\n", " left=df_employees,\n", " right=df_laptops,\n", " on='laptop_id',\n", " how='left'\n", ")\n", "### END SOLUTION\n", "\n", "display(df_left)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_jc = df_join_check\n", "df_sol = df_jc[df_jc['emp_id'].notna()].reset_index(drop=True)\n", "\n", "pd.testing.assert_frame_equal(\n", " df_left.reset_index(drop=True),\n", " df_sol.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 3: Right merge\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ List all laptops and their associated owners - if they are assigned one.\n", "- ✔️ If a laptop has not been assigned to an employee, leave `'emp_id'` and `'name'` as np.NaN (or any other null-like value).\n", "- ✔️ In other words, merge `df_employees` and `df_laptop` using a right merge.\n", "- ✔️ Store the merged result to a new variable named `df_right`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | emp_id | name | laptop_id | model |\n", "|---:|---------:|:-------|:------------|:--------------|\n", "| 0 | 1 | Jasper | A | Red Touchbook |\n", "| 1 | 2 | Gary | B | BlueGo |\n", "| 2 | NaN | NaN | C | Eco Green |\n", "| 3 | NaN | NaN | D | Hackbook Pro |" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamelaptop_idmodel
01.0JasperARed Touchbook
12.0GaryBBlueGo
2NaNNaNCEco Green
3NaNNaNDHackbook Pro
\n", "
" ], "text/plain": [ " emp_id name laptop_id model\n", "0 1.0 Jasper A Red Touchbook\n", "1 2.0 Gary B BlueGo\n", "2 NaN NaN C Eco Green\n", "3 NaN NaN D Hackbook Pro" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_right = pd.merge(\n", " left=df_employees,\n", " right=df_laptops,\n", " on='laptop_id',\n", " how='right'\n", ")\n", "### END SOLUTION\n", "\n", "display(df_right)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false }, "tags": [] }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_jc = df_join_check\n", "df_sol = df_jc[df_jc['laptop_id'].notna()].reset_index(drop=True)\n", "\n", "pd.testing.assert_frame_equal(\n", " df_right.reset_index(drop=True),\n", " df_sol.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 4: Full outer merge\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ List all employees and all work laptops - regardless of whether they are associated with one another.\n", "- ✔️ If an employee has not been assigned a work laptop, leave `'laptop_id'` and `'model'` as np.NaN (or any other null-like value).\n", "- ✔️ If a laptop has not been assigned to an employee, leave `'emp_id'` and `'name'` as np.NaN (or any other null-like value).\n", "- ✔️ In other words, merge `df_employees` and `df_laptop` using an outer merge.\n", "- ✔️ Store the merged result to a new variable named `df_outer`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | emp_id | name | laptop_id | model |\n", "|---:|---------:|:-------|:------------|:--------------|\n", "| 0 | 1 | Jasper | A | Red Touchbook |\n", "| 1 | 2 | Gary | B | BlueGo |\n", "| 2 | 3 | Sally | NaN | NaN |\n", "| 3 | NaN | NaN | C | Eco Green |\n", "| 4 | NaN | NaN | D | Hackbook Pro |" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
emp_idnamelaptop_idmodel
01.0JasperARed Touchbook
12.0GaryBBlueGo
23.0SallyNaNNaN
3NaNNaNCEco Green
4NaNNaNDHackbook Pro
\n", "
" ], "text/plain": [ " emp_id name laptop_id model\n", "0 1.0 Jasper A Red Touchbook\n", "1 2.0 Gary B BlueGo\n", "2 3.0 Sally NaN NaN\n", "3 NaN NaN C Eco Green\n", "4 NaN NaN D Hackbook Pro" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_outer = pd.merge(\n", " left=df_employees,\n", " right=df_laptops,\n", " on='laptop_id',\n", " how='outer'\n", ")\n", "### END SOLUTION\n", "\n", "display(df_outer)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_jc = df_join_check\n", "df_sol = df_jc[df_jc['emp_id'].notna()].reset_index(drop=True)\n", "\n", "pd.testing.assert_frame_equal(\n", " df_left.reset_index(drop=True),\n", " df_sol.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## Pandas string methods\n", "\n", "### 📌 Load textual data\n", "\n", "▶️ Run the code cell below to create `df_libraries`." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "df_libraries = pd.DataFrame({\n", " 'name': ['ACES (Funk)', 'Grainger', 'Law', 'Main'],\n", " 'amenities': [\n", " 'Rooms,Scanner,Printer',\n", " 'Rooms,Scanner,Printer,Cafe',\n", " 'Cafe',\n", " 'Rooms,Scanner,Printer,Cafe'\n", " ],\n", "})\n", "\n", "# Used for 🧭 Check Your Work sections\n", "df_libraries_check = df_libraries.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to display `df_libraries`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameamenities
0ACES (Funk)Rooms,Scanner,Printer
1GraingerRooms,Scanner,Printer,Cafe
2LawCafe
3MainRooms,Scanner,Printer,Cafe
\n", "
" ], "text/plain": [ " name amenities\n", "0 ACES (Funk) Rooms,Scanner,Printer\n", "1 Grainger Rooms,Scanner,Printer,Cafe\n", "2 Law Cafe\n", "3 Main Rooms,Scanner,Printer,Cafe" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 5: Length of library names\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Find the number of characters (i.e., string length) of each library.\n", "- ✔️ Store the result to a new column named `'name_length'` in `df_libraries`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | name | amenities | name_length |\n", "|---:|:------------|:---------------------------|--------------:|\n", "| 0 | ACES (Funk) | Rooms,Scanner,Printer | 11 |\n", "| 1 | Grainger | Rooms,Scanner,Printer,Cafe | 8 |\n", "| 2 | Law | Cafe | 3 |\n", "| 3 | Main | Rooms,Scanner,Printer,Cafe | 4 |" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameamenitiesname_length
0ACES (Funk)Rooms,Scanner,Printer11
1GraingerRooms,Scanner,Printer,Cafe8
2LawCafe3
3MainRooms,Scanner,Printer,Cafe4
\n", "
" ], "text/plain": [ " name amenities name_length\n", "0 ACES (Funk) Rooms,Scanner,Printer 11\n", "1 Grainger Rooms,Scanner,Printer,Cafe 8\n", "2 Law Cafe 3\n", "3 Main Rooms,Scanner,Printer,Cafe 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_libraries['name_length'] = df_libraries['name'].str.len()\n", "### END SOLUTION\n", "\n", "display(df_libraries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false }, "tags": [] }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_lc = df_libraries_check\n", "df_lc['name_length'] = df_lc['name'].str.len()\n", "\n", "pd.testing.assert_frame_equal(\n", " df_libraries.reset_index(drop=True),\n", " df_lc.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 6: Uppercase library names\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Convert the library names to uppercase.\n", "- ✔️ Directly update the `'name'` column in `df_libraries`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | name | amenities | name_length |\n", "|---:|:------------|:---------------------------|--------------:|\n", "| 0 | ACES (FUNK) | Rooms,Scanner,Printer | 11 |\n", "| 1 | GRAINGER | Rooms,Scanner,Printer,Cafe | 8 |\n", "| 2 | LAW | Cafe | 3 |\n", "| 3 | MAIN | Rooms,Scanner,Printer,Cafe | 4 |" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameamenitiesname_length
0ACES (FUNK)Rooms,Scanner,Printer11
1GRAINGERRooms,Scanner,Printer,Cafe8
2LAWCafe3
3MAINRooms,Scanner,Printer,Cafe4
\n", "
" ], "text/plain": [ " name amenities name_length\n", "0 ACES (FUNK) Rooms,Scanner,Printer 11\n", "1 GRAINGER Rooms,Scanner,Printer,Cafe 8\n", "2 LAW Cafe 3\n", "3 MAIN Rooms,Scanner,Printer,Cafe 4" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_libraries['name'] = df_libraries['name'].str.upper()\n", "### END SOLUTION\n", "\n", "display(df_libraries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false }, "tags": [] }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_lc = df_libraries_check\n", "df_lc['name_length'] = df_lc['name'].str.len()\n", "df_lc['name'] = df_lc['name'].str.upper()\n", "\n", "pd.testing.assert_frame_equal(\n", " df_libraries.reset_index(drop=True),\n", " df_lc.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 7: Split amenities into lists\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Split the items in the `'amenities'` column using the comma (`,`) as a delimiter.\n", "- ✔️ Store the splitted result to a new column named `'amenities_list'` in `df_libraries`.\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | name | amenities | name_length | amenities_list |\n", "|---:|:------------|:---------------------------|--------------:|:----------------------------------------|\n", "| 0 | ACES (FUNK) | Rooms,Scanner,Printer | 11 | ['Rooms', 'Scanner', 'Printer'] |\n", "| 1 | GRAINGER | Rooms,Scanner,Printer,Cafe | 8 | ['Rooms', 'Scanner', 'Printer', 'Cafe'] |\n", "| 2 | LAW | Cafe | 3 | ['Cafe'] |\n", "| 3 | MAIN | Rooms,Scanner,Printer,Cafe | 4 | ['Rooms', 'Scanner', 'Printer', 'Cafe'] |" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nameamenitiesname_lengthamenities_list
0ACES (FUNK)Rooms,Scanner,Printer11[Rooms, Scanner, Printer]
1GRAINGERRooms,Scanner,Printer,Cafe8[Rooms, Scanner, Printer, Cafe]
2LAWCafe3[Cafe]
3MAINRooms,Scanner,Printer,Cafe4[Rooms, Scanner, Printer, Cafe]
\n", "
" ], "text/plain": [ " name amenities name_length \\\n", "0 ACES (FUNK) Rooms,Scanner,Printer 11 \n", "1 GRAINGER Rooms,Scanner,Printer,Cafe 8 \n", "2 LAW Cafe 3 \n", "3 MAIN Rooms,Scanner,Printer,Cafe 4 \n", "\n", " amenities_list \n", "0 [Rooms, Scanner, Printer] \n", "1 [Rooms, Scanner, Printer, Cafe] \n", "2 [Cafe] \n", "3 [Rooms, Scanner, Printer, Cafe] " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "### BEGIN SOLUTION\n", "df_libraries['amenities_list'] = df_libraries['amenities'].str.split(',')\n", "### END SOLUTION\n", "\n", "display(df_libraries)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false }, "tags": [] }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_lc = df_libraries_check\n", "df_lc['name_length'] = df_lc['name'].str.len()\n", "df_lc['name'] = df_lc['name'].str.upper()\n", "df_lc['amenities_list'] = df_lc['amenities'].str.split(',')\n", "\n", "pd.testing.assert_frame_equal(\n", " df_libraries.reset_index(drop=True),\n", " df_lc.reset_index(drop=True),\n", " check_dtype=False\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }