{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas Update Cells, More Exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Lecture Notes and in-class exercises"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import unittest\n",
"tc = unittest.TestCase()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 👇 Tasks\n",
"\n",
"- ✔️ Import the following Python packages.\n",
" 1. `pandas`: Use alias `pd`.\n",
" 2. `numpy`: Use alias `np`."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"### BEGIN SOLUTION\n",
"import pandas as pd\n",
"import numpy as np\n",
"### END SOLUTION"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check your work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')\n",
"tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 📌 Load companies data\n",
"\n",
"For the first part, we're going to work with a small DataFrame to see how we can read and/or update individual cells in a DataFrame.\n",
"\n",
"▶️ Run the code cell below to create `df_companies`."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" company_name | \n",
" ticker | \n",
" headquarters | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Amazon | \n",
" AMZN | \n",
" Seattle | \n",
"
\n",
" \n",
" 1 | \n",
" Nvidia | \n",
" AMD | \n",
" Santa Clara | \n",
"
\n",
" \n",
" 2 | \n",
" Google | \n",
" NaN | \n",
" Champaign | \n",
"
\n",
" \n",
" 3 | \n",
" Microsoft | \n",
" NaN | \n",
" Redmond | \n",
"
\n",
" \n",
" 4 | \n",
" Adobe | \n",
" ADBE | \n",
" San Jose | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" company_name ticker headquarters\n",
"0 Amazon AMZN Seattle\n",
"1 Nvidia AMD Santa Clara\n",
"2 Google NaN Champaign\n",
"3 Microsoft NaN Redmond\n",
"4 Adobe ADBE San Jose"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_companies = pd.DataFrame({\n",
" 'company_name': ['Amazon', 'Nvidia', 'Google', 'Microsoft', 'Adobe'],\n",
" 'ticker': ['AMZN', 'AMD', np.nan, np.nan, 'ADBE'],\n",
" 'headquarters': ['Seattle', 'Santa Clara', 'Champaign', 'Redmond', 'San Jose']\n",
"})\n",
"\n",
"df_companies"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 1: Update Nvidia's ticker\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Nvidia's ticker is incorrectly listed as `'AMD'`.\n",
"- ✔️ Replace `'AMD'` with `'NVDA'`.\n",
"- ✔️ Directly update `df_companies`.\n",
"\n",
"#### 🚀 Sample Code\n",
"\n",
"```python\n",
"my_dataframe.loc[my_dataframe['column1'] == 'John Doe Co', 'column2'] = 'new value'\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" company_name | \n",
" ticker | \n",
" headquarters | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Amazon | \n",
" AMZN | \n",
" Seattle | \n",
"
\n",
" \n",
" 1 | \n",
" Nvidia | \n",
" NVDA | \n",
" Santa Clara | \n",
"
\n",
" \n",
" 2 | \n",
" Google | \n",
" NaN | \n",
" Champaign | \n",
"
\n",
" \n",
" 3 | \n",
" Microsoft | \n",
" NaN | \n",
" Redmond | \n",
"
\n",
" \n",
" 4 | \n",
" Adobe | \n",
" ADBE | \n",
" San Jose | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" company_name ticker headquarters\n",
"0 Amazon AMZN Seattle\n",
"1 Nvidia NVDA Santa Clara\n",
"2 Google NaN Champaign\n",
"3 Microsoft NaN Redmond\n",
"4 Adobe ADBE San Jose"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_companies.loc[df_companies['company_name'] == 'Nvidia', 'ticker'] = 'NVDA'\n",
"### END SOLUTION\n",
"\n",
"df_companies"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-01",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"tc.assertEqual(\n",
" df_companies.loc[df_companies['company_name'] == 'Nvidia', 'ticker'].iloc[0],\n",
" 'NVDA'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 2: Update Google and Microsoft's tickers\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Both Google and Microsoft's tickers are missing.\n",
"- ✔️ Set Google's ticker to `'GOOG'`.\n",
"- ✔️ Set Microsoft's ticker to `'MSFT'`.\n",
"- ✔️ Directly update `df_companies`.\n",
"\n",
"#### 🚀 Sample Code\n",
"\n",
"```python\n",
"my_dataframe.loc[my_dataframe['column1'] == 'John Doe Co', 'column2'] = 'new value'\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" company_name | \n",
" ticker | \n",
" headquarters | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Amazon | \n",
" AMZN | \n",
" Seattle | \n",
"
\n",
" \n",
" 1 | \n",
" Nvidia | \n",
" NVDA | \n",
" Santa Clara | \n",
"
\n",
" \n",
" 2 | \n",
" Google | \n",
" GOOG | \n",
" Champaign | \n",
"
\n",
" \n",
" 3 | \n",
" Microsoft | \n",
" MSFT | \n",
" Redmond | \n",
"
\n",
" \n",
" 4 | \n",
" Adobe | \n",
" ADBE | \n",
" San Jose | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" company_name ticker headquarters\n",
"0 Amazon AMZN Seattle\n",
"1 Nvidia NVDA Santa Clara\n",
"2 Google GOOG Champaign\n",
"3 Microsoft MSFT Redmond\n",
"4 Adobe ADBE San Jose"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_companies.loc[df_companies['company_name'] == 'Google', 'ticker'] = 'GOOG'\n",
"df_companies.loc[df_companies['company_name'] == 'Microsoft', 'ticker'] = 'MSFT'\n",
"### END SOLUTION\n",
"\n",
"df_companies"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-02",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"tc.assertEqual(\n",
" df_companies.loc[df_companies['company_name'] == 'Google', 'ticker'].iloc[0],\n",
" 'GOOG'\n",
")\n",
"\n",
"tc.assertEqual(\n",
" df_companies.loc[df_companies['company_name'] == 'Microsoft', 'ticker'].iloc[0],\n",
" 'MSFT'\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 3: Update Google's headquarters location\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Google's headquarters is listed as Champaign.\n",
"- ✔️ Set Google's headquarters to `'Mountain View'`.\n",
"- ✔️ Directly update `df_companies`."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" company_name | \n",
" ticker | \n",
" headquarters | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Amazon | \n",
" AMZN | \n",
" Seattle | \n",
"
\n",
" \n",
" 1 | \n",
" Nvidia | \n",
" NVDA | \n",
" Santa Clara | \n",
"
\n",
" \n",
" 2 | \n",
" Google | \n",
" GOOG | \n",
" Mountain View | \n",
"
\n",
" \n",
" 3 | \n",
" Microsoft | \n",
" MSFT | \n",
" Redmond | \n",
"
\n",
" \n",
" 4 | \n",
" Adobe | \n",
" ADBE | \n",
" San Jose | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" company_name ticker headquarters\n",
"0 Amazon AMZN Seattle\n",
"1 Nvidia NVDA Santa Clara\n",
"2 Google GOOG Mountain View\n",
"3 Microsoft MSFT Redmond\n",
"4 Adobe ADBE San Jose"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_companies.loc[df_companies['company_name'] == 'Google', 'headquarters'] = 'Mountain View'\n",
"### END SOLUTION\n",
"\n",
"df_companies"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-03",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_companies_CHECK = pd.DataFrame({'company_name': ['Amazon', 'Nvidia', 'Google', 'Microsoft', 'Adobe'],\n",
" 'ticker': ['AMZN', 'NVDA', 'GOOG', 'MSFT', 'ADBE'], 'headquarters': ['Seattle', 'Santa Clara',\n",
" 'Mountain View', 'Redmond', 'San Jose']})\n",
"\n",
"pd.testing.assert_frame_equal(\n",
" df_companies.reset_index(drop=True),\n",
" df_companies_CHECK.reset_index(drop=True)\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 4: Retrieve Amazon's headquarters location\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Retrieve Amazon's headquarters location.\n",
"- ✔️ Store the result to a new variable named `amazon_headquarters`.\n",
"\n",
"#### 🚀 Sample Code\n",
"\n",
"```python\n",
"retrieved_value = my_dataframe.loc[my_dataframe['column1'] == 'Amazon', 'column2'].iloc[0]\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"```\n",
"Amazon's headquarters is in Seattle.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Amazon's headquarters is in Seattle.\n"
]
}
],
"source": [
"### BEGIN SOLUTION\n",
"amazon_headquarters = df_companies.loc[df_companies['company_name'] == 'Amazon', 'headquarters'].iloc[0]\n",
"### END SOLUTION\n",
"\n",
"print(f\"Amazon's headquarters is in {amazon_headquarters}.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-04",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"tc.assertEqual(amazon_headquarters, 'Seattle')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 5: Retrieve Adobe's ticker\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Retrieve Adobe's ticker.\n",
"- ✔️ Store the result to a new variable named `adobe_ticker`.\n",
"\n",
"#### 🚀 Sample Code\n",
"\n",
"```python\n",
"retrieved_value = my_dataframe.loc[my_dataframe['column1'] == 'Adobe', 'column2'].iloc[0]\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"```\n",
"Adobe's ticker is ADBE.\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"nbgrader": {
"grade": false,
"grade_id": "exercise-05",
"locked": false,
"points": "1",
"solution": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Adobe's ticker is ADBE.\n"
]
}
],
"source": [
"### BEGIN SOLUTION\n",
"adobe_ticker = df_companies.loc[df_companies['company_name'] == 'Adobe', 'ticker'].iloc[0]\n",
"### END SOLUTION\n",
"\n",
"print(f\"Adobe's ticker is {adobe_ticker}.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-05",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"tc.assertEqual(adobe_ticker, 'ADBE')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 📌 Load data\n",
"\n",
"Now, you'll work with the Starbucks Customer Rewards Program dataset. ☕ The dataset was originally released as a part of a capstone project of a Udacity course.\n",
"\n",
"For the purpose of this course, the original dataset in JSON format has been transformed to CSV files for convenience. You can download the original JSON files [here](https://www.kaggle.com/blacktile/starbucks-app-customer-reward-program-data?select=portfolio.json).\n",
"\n",
"▶️ Run the code cell below to read the transcript data."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" event | \n",
" time | \n",
" offer_id | \n",
" amount | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 78afa995795e4d85b5d9ceeca43f5fef | \n",
" offer received | \n",
" 0 | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" a03223e636434f42ac4c3df47e8bac43 | \n",
" offer received | \n",
" 0 | \n",
" 0b1e1539f2cc45b7b9fa7c272da2e1d7 | \n",
" NaN | \n",
"
\n",
" \n",
" 2 | \n",
" e2127556f4f64592b11af22de27a7932 | \n",
" offer received | \n",
" 0 | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 8ec6ce2a7e7949b1bf142def7d0e0586 | \n",
" offer received | \n",
" 0 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" NaN | \n",
"
\n",
" \n",
" 4 | \n",
" 68617ca6246f4fbc85e91a2a49552598 | \n",
" offer received | \n",
" 0 | \n",
" 4d5c57ea9a6940dd891ad53e9dbe8da0 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 306529 | \n",
" b3a1272bc9904337b331bf348c3e8c17 | \n",
" transaction | \n",
" 714 | \n",
" NaN | \n",
" 1.59 | \n",
"
\n",
" \n",
" 306530 | \n",
" 68213b08d99a4ae1b0dcb72aebd9aa35 | \n",
" transaction | \n",
" 714 | \n",
" NaN | \n",
" 9.53 | \n",
"
\n",
" \n",
" 306531 | \n",
" a00058cf10334a308c68e7631c529907 | \n",
" transaction | \n",
" 714 | \n",
" NaN | \n",
" 3.61 | \n",
"
\n",
" \n",
" 306532 | \n",
" 76ddbd6576844afe811f1a3c0fbb5bec | \n",
" transaction | \n",
" 714 | \n",
" NaN | \n",
" 3.53 | \n",
"
\n",
" \n",
" 306533 | \n",
" c02b10e8752c4d8e9b73f918558531f7 | \n",
" transaction | \n",
" 714 | \n",
" NaN | \n",
" 4.05 | \n",
"
\n",
" \n",
"
\n",
"
306534 rows × 5 columns
\n",
"
"
],
"text/plain": [
" member_id event time \\\n",
"0 78afa995795e4d85b5d9ceeca43f5fef offer received 0 \n",
"1 a03223e636434f42ac4c3df47e8bac43 offer received 0 \n",
"2 e2127556f4f64592b11af22de27a7932 offer received 0 \n",
"3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received 0 \n",
"4 68617ca6246f4fbc85e91a2a49552598 offer received 0 \n",
"... ... ... ... \n",
"306529 b3a1272bc9904337b331bf348c3e8c17 transaction 714 \n",
"306530 68213b08d99a4ae1b0dcb72aebd9aa35 transaction 714 \n",
"306531 a00058cf10334a308c68e7631c529907 transaction 714 \n",
"306532 76ddbd6576844afe811f1a3c0fbb5bec transaction 714 \n",
"306533 c02b10e8752c4d8e9b73f918558531f7 transaction 714 \n",
"\n",
" offer_id amount \n",
"0 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n",
"1 0b1e1539f2cc45b7b9fa7c272da2e1d7 NaN \n",
"2 2906b810c7d4411798c6938adc9daaa5 NaN \n",
"3 fafdcd668e3743c1bb461111dcafc2a4 NaN \n",
"4 4d5c57ea9a6940dd891ad53e9dbe8da0 NaN \n",
"... ... ... \n",
"306529 NaN 1.59 \n",
"306530 NaN 9.53 \n",
"306531 NaN 3.61 \n",
"306532 NaN 3.53 \n",
"306533 NaN 4.05 \n",
"\n",
"[306534 rows x 5 columns]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_transcript = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/transcript.v2.csv.gz')\n",
"df_transcript_backup = df_transcript.copy()\n",
"display(df_transcript)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"▶️ Run the code cell below to read the customer profiles data."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" gender | \n",
" age | \n",
" became_member_on | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 68be06ca386d4c31939f3a4f0e3dd783 | \n",
" NaN | \n",
" NaN | \n",
" 2017-02-12 | \n",
" NaN | \n",
"
\n",
" \n",
" 1 | \n",
" 0610b486422d4921ae7d2bf64640c50b | \n",
" F | \n",
" 55.0 | \n",
" 2017-07-15 | \n",
" 112000.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 38fe809add3b4fcf9315a9694bb96ff5 | \n",
" NaN | \n",
" NaN | \n",
" 2018-07-12 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 78afa995795e4d85b5d9ceeca43f5fef | \n",
" F | \n",
" 75.0 | \n",
" 2017-05-09 | \n",
" 100000.0 | \n",
"
\n",
" \n",
" 4 | \n",
" a03223e636434f42ac4c3df47e8bac43 | \n",
" NaN | \n",
" NaN | \n",
" 2017-08-04 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 16995 | \n",
" 6d5f3a774f3d4714ab0c092238f3a1d7 | \n",
" F | \n",
" 45.0 | \n",
" 2018-06-04 | \n",
" 54000.0 | \n",
"
\n",
" \n",
" 16996 | \n",
" 2cb4f97358b841b9a9773a7aa05a9d77 | \n",
" M | \n",
" 61.0 | \n",
" 2018-07-13 | \n",
" 72000.0 | \n",
"
\n",
" \n",
" 16997 | \n",
" 01d26f638c274aa0b965d24cefe3183f | \n",
" M | \n",
" 49.0 | \n",
" 2017-01-26 | \n",
" 73000.0 | \n",
"
\n",
" \n",
" 16998 | \n",
" 9dc1421481194dcd9400aec7c9ae6366 | \n",
" F | \n",
" 83.0 | \n",
" 2016-03-07 | \n",
" 50000.0 | \n",
"
\n",
" \n",
" 16999 | \n",
" e4052622e5ba45a8b96b59aba68cf068 | \n",
" F | \n",
" 62.0 | \n",
" 2017-07-22 | \n",
" 82000.0 | \n",
"
\n",
" \n",
"
\n",
"
17000 rows × 5 columns
\n",
"
"
],
"text/plain": [
" member_id gender age became_member_on \\\n",
"0 68be06ca386d4c31939f3a4f0e3dd783 NaN NaN 2017-02-12 \n",
"1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n",
"2 38fe809add3b4fcf9315a9694bb96ff5 NaN NaN 2018-07-12 \n",
"3 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 \n",
"4 a03223e636434f42ac4c3df47e8bac43 NaN NaN 2017-08-04 \n",
"... ... ... ... ... \n",
"16995 6d5f3a774f3d4714ab0c092238f3a1d7 F 45.0 2018-06-04 \n",
"16996 2cb4f97358b841b9a9773a7aa05a9d77 M 61.0 2018-07-13 \n",
"16997 01d26f638c274aa0b965d24cefe3183f M 49.0 2017-01-26 \n",
"16998 9dc1421481194dcd9400aec7c9ae6366 F 83.0 2016-03-07 \n",
"16999 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n",
"\n",
" income \n",
"0 NaN \n",
"1 112000.0 \n",
"2 NaN \n",
"3 100000.0 \n",
"4 NaN \n",
"... ... \n",
"16995 54000.0 \n",
"16996 72000.0 \n",
"16997 73000.0 \n",
"16998 50000.0 \n",
"16999 82000.0 \n",
"\n",
"[17000 rows x 5 columns]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_profiles = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/profile.csv')\n",
"df_profiles_backup = df_profiles.copy()\n",
"display(df_profiles)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"▶️ Run the code cell below to read the offers data."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" reward | \n",
" difficulty | \n",
" duration | \n",
" offer_type | \n",
" offer_id | \n",
" email | \n",
" mobile | \n",
" social | \n",
" web | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 10 | \n",
" 10 | \n",
" 7 | \n",
" bogo | \n",
" ae264e3637204a6fb9bb56bc8210ddfd | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 10 | \n",
" 10 | \n",
" 5 | \n",
" bogo | \n",
" 4d5c57ea9a6940dd891ad53e9dbe8da0 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 0 | \n",
" 0 | \n",
" 4 | \n",
" informational | \n",
" 3f207df678b143eea3cee63160fa8bed | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 5 | \n",
" 5 | \n",
" 7 | \n",
" bogo | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 20 | \n",
" 10 | \n",
" discount | \n",
" 0b1e1539f2cc45b7b9fa7c272da2e1d7 | \n",
" 1 | \n",
" 0 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 5 | \n",
" 3 | \n",
" 7 | \n",
" 7 | \n",
" discount | \n",
" 2298d6c36e964ae4a3e7e9706d1fb8c2 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" 2 | \n",
" 10 | \n",
" 10 | \n",
" discount | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 7 | \n",
" 0 | \n",
" 0 | \n",
" 3 | \n",
" informational | \n",
" 5a8bc65990b245e5a138643cd4eb9837 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 8 | \n",
" 5 | \n",
" 5 | \n",
" 5 | \n",
" bogo | \n",
" f19421c1d4aa40978ebb69ca19b0e20d | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
" 1 | \n",
"
\n",
" \n",
" 9 | \n",
" 2 | \n",
" 10 | \n",
" 7 | \n",
" discount | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" reward difficulty duration offer_type \\\n",
"0 10 10 7 bogo \n",
"1 10 10 5 bogo \n",
"2 0 0 4 informational \n",
"3 5 5 7 bogo \n",
"4 5 20 10 discount \n",
"5 3 7 7 discount \n",
"6 2 10 10 discount \n",
"7 0 0 3 informational \n",
"8 5 5 5 bogo \n",
"9 2 10 7 discount \n",
"\n",
" offer_id email mobile social web \n",
"0 ae264e3637204a6fb9bb56bc8210ddfd 1 1 1 0 \n",
"1 4d5c57ea9a6940dd891ad53e9dbe8da0 1 1 1 1 \n",
"2 3f207df678b143eea3cee63160fa8bed 1 1 0 1 \n",
"3 9b98b8c7a33c4b65b9aebfe6a799e6d9 1 1 0 1 \n",
"4 0b1e1539f2cc45b7b9fa7c272da2e1d7 1 0 0 1 \n",
"5 2298d6c36e964ae4a3e7e9706d1fb8c2 1 1 1 1 \n",
"6 fafdcd668e3743c1bb461111dcafc2a4 1 1 1 1 \n",
"7 5a8bc65990b245e5a138643cd4eb9837 1 1 1 0 \n",
"8 f19421c1d4aa40978ebb69ca19b0e20d 1 1 1 1 \n",
"9 2906b810c7d4411798c6938adc9daaa5 1 1 0 1 "
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_offers = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/portfolio.csv')\n",
"df_offers = df_offers.copy()\n",
"display(df_offers)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 6: Find all `event` types in `df_transcript`\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Find the unique values in the `event` column as a Series.\n",
"- ✔️ A fully-working code is provided below.\n",
"\n",
"#### 🚀 Code\n",
"\n",
"```python\n",
"event_types = df_transcript['event'].unique()\n",
"\n",
"print(event_types)\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['offer received' 'offer viewed' 'transaction' 'offer completed']\n"
]
}
],
"source": [
"### BEGIN SOLUTION\n",
"event_types = df_transcript['event'].unique()\n",
"\n",
"print(event_types)\n",
"### END SOLUTION"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-06",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"tc.assertEqual(set(event_types), set(df_transcript_backup['event'].unique()))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 7: Find all completed offers\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ In `df_transcript` find all rows of completed offers (where the `event` column's value is `'offer completed'`).\n",
"- ✔️ Store the filtered result to a new DataFrame named `df_completed`.\n",
"- ✔️ `df_transcript` should remain unaltered."
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" event | \n",
" time | \n",
" offer_id | \n",
" amount | \n",
"
\n",
" \n",
" \n",
" \n",
" 12658 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" offer completed | \n",
" 0 | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" NaN | \n",
"
\n",
" \n",
" 12672 | \n",
" fe97aa22dd3e48c8b143116a8403dd52 | \n",
" offer completed | \n",
" 0 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" NaN | \n",
"
\n",
" \n",
" 12679 | \n",
" 629fc02d56414d91bca360decdfa9288 | \n",
" offer completed | \n",
" 0 | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
" NaN | \n",
"
\n",
" \n",
" 12692 | \n",
" 676506bad68e4161b9bbaffeb039626b | \n",
" offer completed | \n",
" 0 | \n",
" ae264e3637204a6fb9bb56bc8210ddfd | \n",
" NaN | \n",
"
\n",
" \n",
" 12697 | \n",
" 8f7dd3b2afe14c078eb4f6e6fe4ba97d | \n",
" offer completed | \n",
" 0 | \n",
" 4d5c57ea9a6940dd891ad53e9dbe8da0 | \n",
" NaN | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 306475 | \n",
" 0c027f5f34dd4b9eba0a25785c611273 | \n",
" offer completed | \n",
" 714 | \n",
" 2298d6c36e964ae4a3e7e9706d1fb8c2 | \n",
" NaN | \n",
"
\n",
" \n",
" 306497 | \n",
" a6f84f4e976f44508c358cc9aba6d2b3 | \n",
" offer completed | \n",
" 714 | \n",
" 2298d6c36e964ae4a3e7e9706d1fb8c2 | \n",
" NaN | \n",
"
\n",
" \n",
" 306506 | \n",
" b895c57e8cd047a8872ce02aa54759d6 | \n",
" offer completed | \n",
" 714 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" NaN | \n",
"
\n",
" \n",
" 306509 | \n",
" 8431c16f8e1d440880db371a68f82dd0 | \n",
" offer completed | \n",
" 714 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" NaN | \n",
"
\n",
" \n",
" 306527 | \n",
" 24f56b5e1849462093931b164eb803b5 | \n",
" offer completed | \n",
" 714 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
33579 rows × 5 columns
\n",
"
"
],
"text/plain": [
" member_id event time \\\n",
"12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f offer completed 0 \n",
"12672 fe97aa22dd3e48c8b143116a8403dd52 offer completed 0 \n",
"12679 629fc02d56414d91bca360decdfa9288 offer completed 0 \n",
"12692 676506bad68e4161b9bbaffeb039626b offer completed 0 \n",
"12697 8f7dd3b2afe14c078eb4f6e6fe4ba97d offer completed 0 \n",
"... ... ... ... \n",
"306475 0c027f5f34dd4b9eba0a25785c611273 offer completed 714 \n",
"306497 a6f84f4e976f44508c358cc9aba6d2b3 offer completed 714 \n",
"306506 b895c57e8cd047a8872ce02aa54759d6 offer completed 714 \n",
"306509 8431c16f8e1d440880db371a68f82dd0 offer completed 714 \n",
"306527 24f56b5e1849462093931b164eb803b5 offer completed 714 \n",
"\n",
" offer_id amount \n",
"12658 2906b810c7d4411798c6938adc9daaa5 NaN \n",
"12672 fafdcd668e3743c1bb461111dcafc2a4 NaN \n",
"12679 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n",
"12692 ae264e3637204a6fb9bb56bc8210ddfd NaN \n",
"12697 4d5c57ea9a6940dd891ad53e9dbe8da0 NaN \n",
"... ... ... \n",
"306475 2298d6c36e964ae4a3e7e9706d1fb8c2 NaN \n",
"306497 2298d6c36e964ae4a3e7e9706d1fb8c2 NaN \n",
"306506 fafdcd668e3743c1bb461111dcafc2a4 NaN \n",
"306509 fafdcd668e3743c1bb461111dcafc2a4 NaN \n",
"306527 fafdcd668e3743c1bb461111dcafc2a4 NaN \n",
"\n",
"[33579 rows x 5 columns]"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_completed = df_transcript[df_transcript['event'] == 'offer completed']\n",
"### END SOLUTION\n",
"\n",
"df_completed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-07",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = df_transcript.query(\"event == 'offer completed'\")\n",
"\n",
"pd.testing.assert_frame_equal(df_transcript, df_transcript_backup)\n",
"pd.testing.assert_frame_equal(df_completed.sort_values(df_completed.columns.tolist()).reset_index(drop=True),\n",
" df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 8: Drop unused columns\n",
"\n",
"▶️ Before proceeding, run the code cell below."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE BELOW\n",
"df_completed = df_completed.copy()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 👇 Tasks\n",
"\n",
"- ✔️ Drop `event`, `time`, and `amount` columns from `df_completed` **in-place**.\n",
"- ✔️ You should directly drop the columns from `df_completed` using the `inplace=True` option.\n",
"\n",
"#### 🚀 Hints\n",
"\n",
"Use the following code as a reference.\n",
"\n",
"```python\n",
"my_dataframe.drop(columns=[\"my_column1\", \"my_column2\"], inplace=True)\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"| | member_id | offer_id |\n",
"|------:|:---------------------------------|:---------------------------------|\n",
"| 12658 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 |\n",
"| 12672 | fe97aa22dd3e48c8b143116a8403dd52 | fafdcd668e3743c1bb461111dcafc2a4 |\n",
"| 12679 | 629fc02d56414d91bca360decdfa9288 | 9b98b8c7a33c4b65b9aebfe6a799e6d9 |"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" offer_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 12658 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
"
\n",
" \n",
" 12672 | \n",
" fe97aa22dd3e48c8b143116a8403dd52 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
"
\n",
" \n",
" 12679 | \n",
" 629fc02d56414d91bca360decdfa9288 | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id offer_id\n",
"12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5\n",
"12672 fe97aa22dd3e48c8b143116a8403dd52 fafdcd668e3743c1bb461111dcafc2a4\n",
"12679 629fc02d56414d91bca360decdfa9288 9b98b8c7a33c4b65b9aebfe6a799e6d9"
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_completed.drop(columns=[\"event\", \"time\", \"amount\"], inplace=True)\n",
"### END SOLUTION\n",
"\n",
"df_completed.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-08",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = df_transcript.query(\"event == 'offer completed'\")[['member_id', 'offer_id']]\n",
"\n",
"pd.testing.assert_frame_equal(df_completed.sort_values(df_completed.columns.tolist()).reset_index(drop=True),\n",
" df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 9: Remove rows with missing values\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Update `df_profiles` so that it only contains rows where `gender` is NOT missing.\n",
"\n",
"#### 🚀 Hints\n",
"\n",
"```python\n",
"df = df[df['some_column'].notna()]\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"| | member_id | gender | age | became_member_on | income |\n",
"|---:|---:|---:|---:|---:|---:|\n",
"| 1 | 0610b486422d4921ae7d2bf64640c50b | F | 55.0 | 2017-07-15 | 112000.0 |\n",
"| 3 | 78afa995795e4d85b5d9ceeca43f5fef | F | 75.0 | 2017-05-09 | 100000.0 |\n",
"| 5 | e2127556f4f64592b11af22de27a7932 | M | 68.0 | 2018-04-26 | 70000.0 |\n",
"| 8 | 389bc3fa690240e798340f5a15918d5c | M | 65.0 | 2018-02-09 | 53000.0 |\n",
"| 12 | 2eeac8d8feae4a8cad5a6af0499a211d | M | 58.0 | 2017-11-11 | 51000.0 |"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" gender | \n",
" age | \n",
" became_member_on | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0610b486422d4921ae7d2bf64640c50b | \n",
" F | \n",
" 55.0 | \n",
" 2017-07-15 | \n",
" 112000.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 78afa995795e4d85b5d9ceeca43f5fef | \n",
" F | \n",
" 75.0 | \n",
" 2017-05-09 | \n",
" 100000.0 | \n",
"
\n",
" \n",
" 5 | \n",
" e2127556f4f64592b11af22de27a7932 | \n",
" M | \n",
" 68.0 | \n",
" 2018-04-26 | \n",
" 70000.0 | \n",
"
\n",
" \n",
" 8 | \n",
" 389bc3fa690240e798340f5a15918d5c | \n",
" M | \n",
" 65.0 | \n",
" 2018-02-09 | \n",
" 53000.0 | \n",
"
\n",
" \n",
" 12 | \n",
" 2eeac8d8feae4a8cad5a6af0499a211d | \n",
" M | \n",
" 58.0 | \n",
" 2017-11-11 | \n",
" 51000.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id gender age became_member_on income\n",
"1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 112000.0\n",
"3 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 100000.0\n",
"5 e2127556f4f64592b11af22de27a7932 M 68.0 2018-04-26 70000.0\n",
"8 389bc3fa690240e798340f5a15918d5c M 65.0 2018-02-09 53000.0\n",
"12 2eeac8d8feae4a8cad5a6af0499a211d M 58.0 2017-11-11 51000.0"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_profiles = df_profiles[df_profiles['gender'].notna()]\n",
"### END SOLUTION\n",
"\n",
"df_profiles.head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-09",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = df_profiles_backup.query('gender == gender')\n",
"\n",
"pd.testing.assert_frame_equal(df_profiles.sort_values(df_profiles.columns.to_list()).reset_index(drop=True),\n",
" df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 10: Merge profiles into `df_completed`\n",
"\n",
"▶️ Run the code below to:\n",
"1. Convert `age` and `income` columns in `df_profiles` to integer types (from float types).\n",
"2. Print out the first three rows of `df_transactions` and `df_profiles`."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" offer_id | \n",
"
\n",
" \n",
" \n",
" \n",
" 12658 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
"
\n",
" \n",
" 12672 | \n",
" fe97aa22dd3e48c8b143116a8403dd52 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
"
\n",
" \n",
" 12679 | \n",
" 629fc02d56414d91bca360decdfa9288 | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id offer_id\n",
"12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5\n",
"12672 fe97aa22dd3e48c8b143116a8403dd52 fafdcd668e3743c1bb461111dcafc2a4\n",
"12679 629fc02d56414d91bca360decdfa9288 9b98b8c7a33c4b65b9aebfe6a799e6d9"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" gender | \n",
" age | \n",
" became_member_on | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 0610b486422d4921ae7d2bf64640c50b | \n",
" F | \n",
" 55 | \n",
" 2017-07-15 | \n",
" 112000 | \n",
"
\n",
" \n",
" 3 | \n",
" 78afa995795e4d85b5d9ceeca43f5fef | \n",
" F | \n",
" 75 | \n",
" 2017-05-09 | \n",
" 100000 | \n",
"
\n",
" \n",
" 5 | \n",
" e2127556f4f64592b11af22de27a7932 | \n",
" M | \n",
" 68 | \n",
" 2018-04-26 | \n",
" 70000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id gender age became_member_on income\n",
"1 0610b486422d4921ae7d2bf64640c50b F 55 2017-07-15 112000\n",
"3 78afa995795e4d85b5d9ceeca43f5fef F 75 2017-05-09 100000\n",
"5 e2127556f4f64592b11af22de27a7932 M 68 2018-04-26 70000"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"# Convert age and income to int64 types\n",
"df_profiles = df_profiles.copy()\n",
"df_profiles['age'] = df_profiles['age'].astype(np.int64)\n",
"df_profiles['income'] = df_profiles['income'].astype(np.int64)\n",
"\n",
"display(df_completed.head(3))\n",
"display(df_profiles.head(3))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 👇 Tasks\n",
"\n",
"- ✔️ Merge `df_completed` and `df_profiles` using an **inner** join type.\n",
" - By using an inner join, we are intentionally removing transactions made by customers with missing gender information.\n",
"- ✔️ Place `df_completed` on the \"left\" side.\n",
"- ✔️ Store the merged DataFrame to `df_merged`.\n",
"\n",
"#### 🚀 Hints\n",
"\n",
"```python\n",
"my_merged_dataframe = pd.merge(\n",
" left=left_dataframe,\n",
" right=right_dataframe,\n",
" on='shared_key_column',\n",
" how='inner'\n",
")\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"| | member_id | offer_id | gender | age | became_member_on | income |\n",
"|---:|:---------------------------------|:---------------------------------|:---------|------:|:-------------------|---------:|\n",
"| 0 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 | M | 42 | 2016-01-17 | 96000 |\n",
"| 1 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2298d6c36e964ae4a3e7e9706d1fb8c2 | M | 42 | 2016-01-17 | 96000 |\n",
"| 2 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 0b1e1539f2cc45b7b9fa7c272da2e1d7 | M | 42 | 2016-01-17 | 96000 |"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" offer_id | \n",
" gender | \n",
" age | \n",
" became_member_on | \n",
" income | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" M | \n",
" 42 | \n",
" 2016-01-17 | \n",
" 96000 | \n",
"
\n",
" \n",
" 1 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 2298d6c36e964ae4a3e7e9706d1fb8c2 | \n",
" M | \n",
" 42 | \n",
" 2016-01-17 | \n",
" 96000 | \n",
"
\n",
" \n",
" 2 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 0b1e1539f2cc45b7b9fa7c272da2e1d7 | \n",
" M | \n",
" 42 | \n",
" 2016-01-17 | \n",
" 96000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id offer_id gender \\\n",
"0 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5 M \n",
"1 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2298d6c36e964ae4a3e7e9706d1fb8c2 M \n",
"2 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 0b1e1539f2cc45b7b9fa7c272da2e1d7 M \n",
"\n",
" age became_member_on income \n",
"0 42 2016-01-17 96000 \n",
"1 42 2016-01-17 96000 \n",
"2 42 2016-01-17 96000 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_merged = pd.merge(\n",
" left=df_completed,\n",
" right=df_profiles,\n",
" on='member_id',\n",
" how='inner'\n",
")\n",
"### END SOLUTION\n",
"\n",
"df_merged.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-10",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = df_transcript_backup.query('event == \"offer completed\"').merge(\n",
" df_profiles_backup.query('gender == gender'),\n",
" on='member_id',\n",
" how='inner'\n",
").drop(columns=['event', 'time', 'amount'])\n",
"df_check['age'] = df_check['age'].astype(np.int64)\n",
"df_check['income'] = df_check['income'].astype(np.int64)\n",
"df_merged_backup1 = df_check.copy()\n",
"\n",
"pd.testing.assert_frame_equal(df_merged.sort_values(df_merged.columns.to_list()).reset_index(drop=True),\n",
" df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 11: Merge offers into `df_merged`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 👇 Tasks\n",
"\n",
"- ✔️ Merge `df_offers` and `df_merged` using an **inner** join type.\n",
"- ✔️ Place `df_merged` on the \"left\" side.\n",
"- ✔️ Store the merged DataFrame to `df_completed_offers`.\n",
"\n",
"#### 🚀 Hints\n",
"\n",
"```python\n",
"my_merged_dataframe = pd.merge(\n",
" left=left_dataframe,\n",
" right=right_dataframe,\n",
" on='shared_key_column',\n",
" how='inner'\n",
")\n",
"```\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"| | member_id | offer_id | gender | age | became_member_on | income | reward | difficulty | duration | offer_type | email | mobile | social | web |\n",
"|---:|:---------------------------------|:---------------------------------|:---------|------:|:-------------------|---------:|---------:|-------------:|-----------:|:-------------|--------:|---------:|---------:|------:|\n",
"| 0 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 | M | 42 | 2016-01-17 | 96000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |\n",
"| 1 | 629fc02d56414d91bca360decdfa9288 | 2906b810c7d4411798c6938adc9daaa5 | M | 52 | 2018-06-05 | 72000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |\n",
"| 2 | 73ffefd41e9a4ca3ab26b2b3697c6eb7 | 2906b810c7d4411798c6938adc9daaa5 | F | 67 | 2015-09-29 | 67000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_id | \n",
" offer_id | \n",
" gender | \n",
" age | \n",
" became_member_on | \n",
" income | \n",
" reward | \n",
" difficulty | \n",
" duration | \n",
" offer_type | \n",
" email | \n",
" mobile | \n",
" social | \n",
" web | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" M | \n",
" 42 | \n",
" 2016-01-17 | \n",
" 96000 | \n",
" 2 | \n",
" 10 | \n",
" 7 | \n",
" discount | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 1 | \n",
" 629fc02d56414d91bca360decdfa9288 | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" M | \n",
" 52 | \n",
" 2018-06-05 | \n",
" 72000 | \n",
" 2 | \n",
" 10 | \n",
" 7 | \n",
" discount | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 73ffefd41e9a4ca3ab26b2b3697c6eb7 | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" F | \n",
" 67 | \n",
" 2015-09-29 | \n",
" 67000 | \n",
" 2 | \n",
" 10 | \n",
" 7 | \n",
" discount | \n",
" 1 | \n",
" 1 | \n",
" 0 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_id offer_id gender \\\n",
"0 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5 M \n",
"1 629fc02d56414d91bca360decdfa9288 2906b810c7d4411798c6938adc9daaa5 M \n",
"2 73ffefd41e9a4ca3ab26b2b3697c6eb7 2906b810c7d4411798c6938adc9daaa5 F \n",
"\n",
" age became_member_on income reward difficulty duration offer_type \\\n",
"0 42 2016-01-17 96000 2 10 7 discount \n",
"1 52 2018-06-05 72000 2 10 7 discount \n",
"2 67 2015-09-29 67000 2 10 7 discount \n",
"\n",
" email mobile social web \n",
"0 1 1 0 1 \n",
"1 1 1 0 1 \n",
"2 1 1 0 1 "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_completed_offers = pd.merge(\n",
" left=df_merged,\n",
" right=df_offers,\n",
" on='offer_id',\n",
" how='inner'\n",
")\n",
"### END SOLUTION\n",
"\n",
"df_completed_offers.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-11",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = df_merged_backup1.merge(\n",
" df_offers,\n",
" on='offer_id',\n",
" how='inner'\n",
")\n",
"df_merged_backup2 = df_check.copy()\n",
"\n",
"pd.testing.assert_frame_equal(df_completed_offers.sort_values(df_completed_offers.columns.to_list()).reset_index(drop=True),\n",
" df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 📌 Challenge Day\n",
"\n",
"- For the rest of this lecture, your challenge is to get through these challenges by yourself! 🏀\n",
"- At the end of the class, we'll go over them together.\n",
"- If we're short on time, I will post the solutions to these challenges"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 12: Find how many times each offer has been redeemed\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Using `df_completed_offers`, find the number of times each offer has been redeemed.\n",
"- ✔️ Store the summary DataFrame to `df_count_by_offer_id`.\n",
"- ✔️ `df_count_by_offer_id` should have two non-index columns.\n",
" - `df_count_by_offer_id.columns.to_list()` should print out `['offer_id', 'count']`.\n",
"- ✔️ Sort `df_count_by_offer_id` by `count` in descending order.\n",
"- ✔️ `df_completed_offers` should remain unaltered.\n",
"\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"Your index column may contain different values.\n",
"\n",
"| | offer_id | count |\n",
"|---:|:---------------------------------|--------:|\n",
"| 7 | fafdcd668e3743c1bb461111dcafc2a4 | 5003 |\n",
"| 1 | 2298d6c36e964ae4a3e7e9706d1fb8c2 | 4886 |\n",
"| 4 | 9b98b8c7a33c4b65b9aebfe6a799e6d9 | 4188 |\n",
"| 6 | f19421c1d4aa40978ebb69ca19b0e20d | 4103 |\n",
"| 2 | 2906b810c7d4411798c6938adc9daaa5 | 3911 |\n",
"| 5 | ae264e3637204a6fb9bb56bc8210ddfd | 3657 |\n",
"| 0 | 0b1e1539f2cc45b7b9fa7c272da2e1d7 | 3386 |\n",
"| 3 | 4d5c57ea9a6940dd891ad53e9dbe8da0 | 3310 |"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" offer_id | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 7 | \n",
" fafdcd668e3743c1bb461111dcafc2a4 | \n",
" 5003 | \n",
"
\n",
" \n",
" 1 | \n",
" 2298d6c36e964ae4a3e7e9706d1fb8c2 | \n",
" 4886 | \n",
"
\n",
" \n",
" 4 | \n",
" 9b98b8c7a33c4b65b9aebfe6a799e6d9 | \n",
" 4188 | \n",
"
\n",
" \n",
" 6 | \n",
" f19421c1d4aa40978ebb69ca19b0e20d | \n",
" 4103 | \n",
"
\n",
" \n",
" 2 | \n",
" 2906b810c7d4411798c6938adc9daaa5 | \n",
" 3911 | \n",
"
\n",
" \n",
" 5 | \n",
" ae264e3637204a6fb9bb56bc8210ddfd | \n",
" 3657 | \n",
"
\n",
" \n",
" 0 | \n",
" 0b1e1539f2cc45b7b9fa7c272da2e1d7 | \n",
" 3386 | \n",
"
\n",
" \n",
" 3 | \n",
" 4d5c57ea9a6940dd891ad53e9dbe8da0 | \n",
" 3310 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" offer_id count\n",
"7 fafdcd668e3743c1bb461111dcafc2a4 5003\n",
"1 2298d6c36e964ae4a3e7e9706d1fb8c2 4886\n",
"4 9b98b8c7a33c4b65b9aebfe6a799e6d9 4188\n",
"6 f19421c1d4aa40978ebb69ca19b0e20d 4103\n",
"2 2906b810c7d4411798c6938adc9daaa5 3911\n",
"5 ae264e3637204a6fb9bb56bc8210ddfd 3657\n",
"0 0b1e1539f2cc45b7b9fa7c272da2e1d7 3386\n",
"3 4d5c57ea9a6940dd891ad53e9dbe8da0 3310"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_count_by_offer_id = df_completed_offers.groupby('offer_id', as_index=False).agg({\n",
" 'member_id': 'count'\n",
"}).rename(columns={\n",
" 'member_id': 'count'\n",
"}).sort_values('count', ascending=False)\n",
"### END SOLUTION\n",
"\n",
"df_count_by_offer_id"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-12",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = pd.DataFrame({'offer_id': {7: 'fafdcd668e3743c1bb461111dcafc2a4', 1: '2298d6c36e964ae4a3e7e9706d1fb8c2', \n",
"4: '9b98b8c7a33c4b65b9aebfe6a799e6d9', 6: 'f19421c1d4aa40978ebb69ca19b0e20d', 2: '2906b810c7d4411798c6938adc9daaa5',\n",
"5: 'ae264e3637204a6fb9bb56bc8210ddfd', 0: '0b1e1539f2cc45b7b9fa7c272da2e1d7', 3: '4d5c57ea9a6940dd891ad53e9dbe8da0'},\n",
"'count': {7: 5003, 1: 4886, 4: 4188, 6: 4103, 2: 3911, 5: 3657, 0: 3386, 3: 3310}})\n",
"\n",
"pd.testing.assert_frame_equal(df_count_by_offer_id.reset_index(drop=True),\n",
" df_check.reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 13: Find how many times each type of offer has been received\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Using `df_completed_offers`, find the number of times each offer **type** has been redeemed.\n",
"- ✔️ Store the summary DataFrame to `df_count_by_offer_type`.\n",
"- ✔️ `df_count_by_offer_type` should have two non-index columns.\n",
" - `df_count_by_offer_type.columns.to_list()` should print out `['offer_type', 'count']`.\n",
"- ✔️ Sort `df_count_by_offer_type` by `count` in descending order.\n",
"- ✔️ `df_completed_offers` should remain unaltered.\n",
"\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"Your index column may contain different values.\n",
"\n",
"| | offer_type | count |\n",
"|---:|:-------------|--------:|\n",
"| 1 | discount | 17186 |\n",
"| 0 | bogo | 15258 |"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" offer_type | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" discount | \n",
" 17186 | \n",
"
\n",
" \n",
" 0 | \n",
" bogo | \n",
" 15258 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" offer_type count\n",
"1 discount 17186\n",
"0 bogo 15258"
]
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_count_by_offer_type = df_completed_offers.groupby('offer_type', as_index=False).agg({\n",
" 'member_id': 'count'\n",
"}).rename(columns={\n",
" 'member_id': 'count'\n",
"}).sort_values('count', ascending=False)\n",
"### END SOLUTION\n",
"\n",
"df_count_by_offer_type"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-13",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = pd.DataFrame({'offer_type': {1: 'discount', 0: 'bogo'}, 'count': {1: 17186, 0: 15258}})\n",
"\n",
"pd.testing.assert_frame_equal(df_count_by_offer_type.reset_index(drop=True),\n",
" df_check.reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 14: Find redeem count by gender\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Using `df_completed_offers`, find the number of times each gender has redeemed an offer.\n",
"- ✔️ Store the summary DataFrame to `df_count_by_gender`.\n",
"- ✔️ `df_count_by_offer_type` should have two non-index columns.\n",
" - `df_count_by_gender.columns.to_list()` should print out `['gender', 'count']`.\n",
"- ✔️ Sort `df_count_by_gender` by `count` in descending order.\n",
"- ✔️ `df_completed_offers` should remain unaltered.\n",
"\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"Your index column may contain different values.\n",
"\n",
"| | gender | count |\n",
"|---:|:---------|--------:|\n",
"| 1 | M | 16466 |\n",
"| 0 | F | 15477 |\n",
"| 2 | O | 501 |"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" gender | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" M | \n",
" 16466 | \n",
"
\n",
" \n",
" 0 | \n",
" F | \n",
" 15477 | \n",
"
\n",
" \n",
" 2 | \n",
" O | \n",
" 501 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" gender count\n",
"1 M 16466\n",
"0 F 15477\n",
"2 O 501"
]
},
"execution_count": 36,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_count_by_gender = df_completed_offers.groupby('gender', as_index=False).agg({\n",
" 'member_id': 'count'\n",
"}).rename(columns={\n",
" 'member_id': 'count'\n",
"}).sort_values('count', ascending=False)\n",
"### END SOLUTION\n",
"\n",
"df_count_by_gender"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "exercise-14",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = pd.DataFrame({'gender': {1: 'M', 0: 'F', 2: 'O'}, 'count': {1: 16466, 0: 15477, 2: 501}})\n",
"\n",
"pd.testing.assert_frame_equal(df_count_by_gender.reset_index(drop=True),\n",
" df_check.reset_index(drop=True))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"---\n",
"\n",
"### 🎯 Exercise 15: Find redeem count by member sign-up year\n",
"\n",
"#### 👇 Tasks\n",
"\n",
"- ✔️ Using `df_completed_offers`, first find the member sign-up year using `became_member_on` column.\n",
"- ✔️ Then, find the number of completed offers by the sign-up year.\n",
"- ✔️ Store the summary DataFrame to `df_count_by_member_signup_year`.\n",
"- ✔️ `df_count_by_member_signup_year` should have two non-index columns.\n",
" - `df_count_by_member_signup_year.columns.to_list()` should print out `['member_since', 'count']`.\n",
"- ✔️ Sort `df_count_by_member_signup_year` by `year` in ascending order.\n",
"- ✔️ **You can modify `df_completed_offers` if you need to (e.g., convert a column to datetime format, extract year, etc).**\n",
"\n",
"\n",
"#### 🔑 Expected Output\n",
"\n",
"Your index column may contain different values.\n",
"\n",
"| | member_since | count |\n",
"|---:|---------------:|--------:|\n",
"| 0 | 2013 | 561 |\n",
"| 1 | 2014 | 1321 |\n",
"| 2 | 2015 | 4319 |\n",
"| 3 | 2016 | 8928 |\n",
"| 4 | 2017 | 12119 |\n",
"| 5 | 2018 | 5196 |"
]
},
{
"cell_type": "code",
"execution_count": 38,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" member_since | \n",
" count | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2013 | \n",
" 561 | \n",
"
\n",
" \n",
" 1 | \n",
" 2014 | \n",
" 1321 | \n",
"
\n",
" \n",
" 2 | \n",
" 2015 | \n",
" 4319 | \n",
"
\n",
" \n",
" 3 | \n",
" 2016 | \n",
" 8928 | \n",
"
\n",
" \n",
" 4 | \n",
" 2017 | \n",
" 12119 | \n",
"
\n",
" \n",
" 5 | \n",
" 2018 | \n",
" 5196 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" member_since count\n",
"0 2013 561\n",
"1 2014 1321\n",
"2 2015 4319\n",
"3 2016 8928\n",
"4 2017 12119\n",
"5 2018 5196"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### BEGIN SOLUTION\n",
"df_completed_offers['became_member_on'] = pd.to_datetime(df_completed_offers['became_member_on'])\n",
"df_completed_offers['member_since'] = df_completed_offers['became_member_on'].dt.year\n",
"df_count_by_member_signup_year = df_completed_offers.groupby('member_since', as_index=False).agg({\n",
" 'member_id': 'count'\n",
"}).rename(columns={\n",
" 'member_id': 'count'\n",
"})\n",
"### END SOLUTION\n",
"\n",
"df_count_by_member_signup_year"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "EdrK-mBsem7r"
},
"source": [
"#### 🧭 Check Your Work\n",
"\n",
"- Once you're done, run the code cell below to test correctness.\n",
"- ✔️ If the code cell runs without an error, you're good to move on.\n",
"- ❌ If the code cell throws an error, go back and fix any incorrect parts."
]
},
{
"cell_type": "code",
"execution_count": 39,
"metadata": {
"nbgrader": {
"grade": true,
"grade_id": "challenge-15",
"locked": true,
"points": "1",
"solution": false
}
},
"outputs": [],
"source": [
"# DO NOT CHANGE THE CODE IN THIS CELL\n",
"df_check = pd.DataFrame({'member_since': {0: 2013, 1: 2014, 2: 2015, 3: 2016, 4: 2017, 5: 2018},\n",
" 'count': {0: 561, 1: 1321, 2: 4319, 3: 8928, 4: 12119, 5: 5196}})\n",
"\n",
"pd.testing.assert_frame_equal(df_count_by_member_signup_year.reset_index(drop=True),\n",
" df_check.reset_index(drop=True))"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 4
}