{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pandas Update Cells, More Exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lecture Notes and in-class exercises" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ First, run the code cell below to import `unittest`, a module used for **🧭 Check Your Work** sections and the autograder." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [] }, "outputs": [], "source": [ "import unittest\n", "tc = unittest.TestCase()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Import the following Python packages.\n", " 1. `pandas`: Use alias `pd`.\n", " 2. `numpy`: Use alias `np`." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "### BEGIN SOLUTION\n", "import pandas as pd\n", "import numpy as np\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import sys\n", "tc.assertTrue('pd' in globals(), 'Check whether you have correctly import Pandas with an alias.')\n", "tc.assertTrue('np' in globals(), 'Check whether you have correctly import NumPy with an alias.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Load companies data\n", "\n", "For the first part, we're going to work with a small DataFrame to see how we can read and/or update individual cells in a DataFrame.\n", "\n", "▶️ Run the code cell below to create `df_companies`." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_nametickerheadquarters
0AmazonAMZNSeattle
1NvidiaAMDSanta Clara
2GoogleNaNChampaign
3MicrosoftNaNRedmond
4AdobeADBESan Jose
\n", "
" ], "text/plain": [ " company_name ticker headquarters\n", "0 Amazon AMZN Seattle\n", "1 Nvidia AMD Santa Clara\n", "2 Google NaN Champaign\n", "3 Microsoft NaN Redmond\n", "4 Adobe ADBE San Jose" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_companies = pd.DataFrame({\n", " 'company_name': ['Amazon', 'Nvidia', 'Google', 'Microsoft', 'Adobe'],\n", " 'ticker': ['AMZN', 'AMD', np.nan, np.nan, 'ADBE'],\n", " 'headquarters': ['Seattle', 'Santa Clara', 'Champaign', 'Redmond', 'San Jose']\n", "})\n", "\n", "df_companies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 1: Update Nvidia's ticker\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Nvidia's ticker is incorrectly listed as `'AMD'`.\n", "- ✔️ Replace `'AMD'` with `'NVDA'`.\n", "- ✔️ Directly update `df_companies`.\n", "\n", "#### 🚀 Sample Code\n", "\n", "```python\n", "my_dataframe.loc[my_dataframe['column1'] == 'John Doe Co', 'column2'] = 'new value'\n", "```" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_nametickerheadquarters
0AmazonAMZNSeattle
1NvidiaNVDASanta Clara
2GoogleNaNChampaign
3MicrosoftNaNRedmond
4AdobeADBESan Jose
\n", "
" ], "text/plain": [ " company_name ticker headquarters\n", "0 Amazon AMZN Seattle\n", "1 Nvidia NVDA Santa Clara\n", "2 Google NaN Champaign\n", "3 Microsoft NaN Redmond\n", "4 Adobe ADBE San Jose" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_companies.loc[df_companies['company_name'] == 'Nvidia', 'ticker'] = 'NVDA'\n", "### END SOLUTION\n", "\n", "df_companies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "tc.assertEqual(\n", " df_companies.loc[df_companies['company_name'] == 'Nvidia', 'ticker'].iloc[0],\n", " 'NVDA'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 2: Update Google and Microsoft's tickers\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Both Google and Microsoft's tickers are missing.\n", "- ✔️ Set Google's ticker to `'GOOG'`.\n", "- ✔️ Set Microsoft's ticker to `'MSFT'`.\n", "- ✔️ Directly update `df_companies`.\n", "\n", "#### 🚀 Sample Code\n", "\n", "```python\n", "my_dataframe.loc[my_dataframe['column1'] == 'John Doe Co', 'column2'] = 'new value'\n", "```" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_nametickerheadquarters
0AmazonAMZNSeattle
1NvidiaNVDASanta Clara
2GoogleGOOGChampaign
3MicrosoftMSFTRedmond
4AdobeADBESan Jose
\n", "
" ], "text/plain": [ " company_name ticker headquarters\n", "0 Amazon AMZN Seattle\n", "1 Nvidia NVDA Santa Clara\n", "2 Google GOOG Champaign\n", "3 Microsoft MSFT Redmond\n", "4 Adobe ADBE San Jose" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_companies.loc[df_companies['company_name'] == 'Google', 'ticker'] = 'GOOG'\n", "df_companies.loc[df_companies['company_name'] == 'Microsoft', 'ticker'] = 'MSFT'\n", "### END SOLUTION\n", "\n", "df_companies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-02", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "tc.assertEqual(\n", " df_companies.loc[df_companies['company_name'] == 'Google', 'ticker'].iloc[0],\n", " 'GOOG'\n", ")\n", "\n", "tc.assertEqual(\n", " df_companies.loc[df_companies['company_name'] == 'Microsoft', 'ticker'].iloc[0],\n", " 'MSFT'\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 3: Update Google's headquarters location\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Google's headquarters is listed as Champaign.\n", "- ✔️ Set Google's headquarters to `'Mountain View'`.\n", "- ✔️ Directly update `df_companies`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
company_nametickerheadquarters
0AmazonAMZNSeattle
1NvidiaNVDASanta Clara
2GoogleGOOGMountain View
3MicrosoftMSFTRedmond
4AdobeADBESan Jose
\n", "
" ], "text/plain": [ " company_name ticker headquarters\n", "0 Amazon AMZN Seattle\n", "1 Nvidia NVDA Santa Clara\n", "2 Google GOOG Mountain View\n", "3 Microsoft MSFT Redmond\n", "4 Adobe ADBE San Jose" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_companies.loc[df_companies['company_name'] == 'Google', 'headquarters'] = 'Mountain View'\n", "### END SOLUTION\n", "\n", "df_companies" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-03", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_companies_CHECK = pd.DataFrame({'company_name': ['Amazon', 'Nvidia', 'Google', 'Microsoft', 'Adobe'],\n", " 'ticker': ['AMZN', 'NVDA', 'GOOG', 'MSFT', 'ADBE'], 'headquarters': ['Seattle', 'Santa Clara',\n", " 'Mountain View', 'Redmond', 'San Jose']})\n", "\n", "pd.testing.assert_frame_equal(\n", " df_companies.reset_index(drop=True),\n", " df_companies_CHECK.reset_index(drop=True)\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 4: Retrieve Amazon's headquarters location\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Retrieve Amazon's headquarters location.\n", "- ✔️ Store the result to a new variable named `amazon_headquarters`.\n", "\n", "#### 🚀 Sample Code\n", "\n", "```python\n", "retrieved_value = my_dataframe.loc[my_dataframe['column1'] == 'Amazon', 'column2'].iloc[0]\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "```\n", "Amazon's headquarters is in Seattle.\n", "```" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Amazon's headquarters is in Seattle.\n" ] } ], "source": [ "### BEGIN SOLUTION\n", "amazon_headquarters = df_companies.loc[df_companies['company_name'] == 'Amazon', 'headquarters'].iloc[0]\n", "### END SOLUTION\n", "\n", "print(f\"Amazon's headquarters is in {amazon_headquarters}.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-04", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "tc.assertEqual(amazon_headquarters, 'Seattle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 5: Retrieve Adobe's ticker\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Retrieve Adobe's ticker.\n", "- ✔️ Store the result to a new variable named `adobe_ticker`.\n", "\n", "#### 🚀 Sample Code\n", "\n", "```python\n", "retrieved_value = my_dataframe.loc[my_dataframe['column1'] == 'Adobe', 'column2'].iloc[0]\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "```\n", "Adobe's ticker is ADBE.\n", "```" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "nbgrader": { "grade": false, "grade_id": "exercise-05", "locked": false, "points": "1", "solution": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Adobe's ticker is ADBE.\n" ] } ], "source": [ "### BEGIN SOLUTION\n", "adobe_ticker = df_companies.loc[df_companies['company_name'] == 'Adobe', 'ticker'].iloc[0]\n", "### END SOLUTION\n", "\n", "print(f\"Adobe's ticker is {adobe_ticker}.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-05", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "tc.assertEqual(adobe_ticker, 'ADBE')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Load data\n", "\n", "Now, you'll work with the Starbucks Customer Rewards Program dataset. ☕ The dataset was originally released as a part of a capstone project of a Udacity course.\n", "\n", "For the purpose of this course, the original dataset in JSON format has been transformed to CSV files for convenience. You can download the original JSON files [here](https://www.kaggle.com/blacktile/starbucks-app-customer-reward-program-data?select=portfolio.json).\n", "\n", "▶️ Run the code cell below to read the transcript data." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_ideventtimeoffer_idamount
078afa995795e4d85b5d9ceeca43f5fefoffer received09b98b8c7a33c4b65b9aebfe6a799e6d9NaN
1a03223e636434f42ac4c3df47e8bac43offer received00b1e1539f2cc45b7b9fa7c272da2e1d7NaN
2e2127556f4f64592b11af22de27a7932offer received02906b810c7d4411798c6938adc9daaa5NaN
38ec6ce2a7e7949b1bf142def7d0e0586offer received0fafdcd668e3743c1bb461111dcafc2a4NaN
468617ca6246f4fbc85e91a2a49552598offer received04d5c57ea9a6940dd891ad53e9dbe8da0NaN
..................
306529b3a1272bc9904337b331bf348c3e8c17transaction714NaN1.59
30653068213b08d99a4ae1b0dcb72aebd9aa35transaction714NaN9.53
306531a00058cf10334a308c68e7631c529907transaction714NaN3.61
30653276ddbd6576844afe811f1a3c0fbb5bectransaction714NaN3.53
306533c02b10e8752c4d8e9b73f918558531f7transaction714NaN4.05
\n", "

306534 rows × 5 columns

\n", "
" ], "text/plain": [ " member_id event time \\\n", "0 78afa995795e4d85b5d9ceeca43f5fef offer received 0 \n", "1 a03223e636434f42ac4c3df47e8bac43 offer received 0 \n", "2 e2127556f4f64592b11af22de27a7932 offer received 0 \n", "3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received 0 \n", "4 68617ca6246f4fbc85e91a2a49552598 offer received 0 \n", "... ... ... ... \n", "306529 b3a1272bc9904337b331bf348c3e8c17 transaction 714 \n", "306530 68213b08d99a4ae1b0dcb72aebd9aa35 transaction 714 \n", "306531 a00058cf10334a308c68e7631c529907 transaction 714 \n", "306532 76ddbd6576844afe811f1a3c0fbb5bec transaction 714 \n", "306533 c02b10e8752c4d8e9b73f918558531f7 transaction 714 \n", "\n", " offer_id amount \n", "0 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n", "1 0b1e1539f2cc45b7b9fa7c272da2e1d7 NaN \n", "2 2906b810c7d4411798c6938adc9daaa5 NaN \n", "3 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "4 4d5c57ea9a6940dd891ad53e9dbe8da0 NaN \n", "... ... ... \n", "306529 NaN 1.59 \n", "306530 NaN 9.53 \n", "306531 NaN 3.61 \n", "306532 NaN 3.53 \n", "306533 NaN 4.05 \n", "\n", "[306534 rows x 5 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_transcript = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/transcript.v2.csv.gz')\n", "df_transcript_backup = df_transcript.copy()\n", "display(df_transcript)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to read the customer profiles data." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idgenderagebecame_member_onincome
068be06ca386d4c31939f3a4f0e3dd783NaNNaN2017-02-12NaN
10610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
238fe809add3b4fcf9315a9694bb96ff5NaNNaN2018-07-12NaN
378afa995795e4d85b5d9ceeca43f5fefF75.02017-05-09100000.0
4a03223e636434f42ac4c3df47e8bac43NaNNaN2017-08-04NaN
..................
169956d5f3a774f3d4714ab0c092238f3a1d7F45.02018-06-0454000.0
169962cb4f97358b841b9a9773a7aa05a9d77M61.02018-07-1372000.0
1699701d26f638c274aa0b965d24cefe3183fM49.02017-01-2673000.0
169989dc1421481194dcd9400aec7c9ae6366F83.02016-03-0750000.0
16999e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
\n", "

17000 rows × 5 columns

\n", "
" ], "text/plain": [ " member_id gender age became_member_on \\\n", "0 68be06ca386d4c31939f3a4f0e3dd783 NaN NaN 2017-02-12 \n", "1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "2 38fe809add3b4fcf9315a9694bb96ff5 NaN NaN 2018-07-12 \n", "3 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 \n", "4 a03223e636434f42ac4c3df47e8bac43 NaN NaN 2017-08-04 \n", "... ... ... ... ... \n", "16995 6d5f3a774f3d4714ab0c092238f3a1d7 F 45.0 2018-06-04 \n", "16996 2cb4f97358b841b9a9773a7aa05a9d77 M 61.0 2018-07-13 \n", "16997 01d26f638c274aa0b965d24cefe3183f M 49.0 2017-01-26 \n", "16998 9dc1421481194dcd9400aec7c9ae6366 F 83.0 2016-03-07 \n", "16999 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "\n", " income \n", "0 NaN \n", "1 112000.0 \n", "2 NaN \n", "3 100000.0 \n", "4 NaN \n", "... ... \n", "16995 54000.0 \n", "16996 72000.0 \n", "16997 73000.0 \n", "16998 50000.0 \n", "16999 82000.0 \n", "\n", "[17000 rows x 5 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_profiles = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/profile.csv')\n", "df_profiles_backup = df_profiles.copy()\n", "display(df_profiles)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code cell below to read the offers data." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
rewarddifficultydurationoffer_typeoffer_idemailmobilesocialweb
010107bogoae264e3637204a6fb9bb56bc8210ddfd1110
110105bogo4d5c57ea9a6940dd891ad53e9dbe8da01111
2004informational3f207df678b143eea3cee63160fa8bed1101
3557bogo9b98b8c7a33c4b65b9aebfe6a799e6d91101
452010discount0b1e1539f2cc45b7b9fa7c272da2e1d71001
5377discount2298d6c36e964ae4a3e7e9706d1fb8c21111
621010discountfafdcd668e3743c1bb461111dcafc2a41111
7003informational5a8bc65990b245e5a138643cd4eb98371110
8555bogof19421c1d4aa40978ebb69ca19b0e20d1111
92107discount2906b810c7d4411798c6938adc9daaa51101
\n", "
" ], "text/plain": [ " reward difficulty duration offer_type \\\n", "0 10 10 7 bogo \n", "1 10 10 5 bogo \n", "2 0 0 4 informational \n", "3 5 5 7 bogo \n", "4 5 20 10 discount \n", "5 3 7 7 discount \n", "6 2 10 10 discount \n", "7 0 0 3 informational \n", "8 5 5 5 bogo \n", "9 2 10 7 discount \n", "\n", " offer_id email mobile social web \n", "0 ae264e3637204a6fb9bb56bc8210ddfd 1 1 1 0 \n", "1 4d5c57ea9a6940dd891ad53e9dbe8da0 1 1 1 1 \n", "2 3f207df678b143eea3cee63160fa8bed 1 1 0 1 \n", "3 9b98b8c7a33c4b65b9aebfe6a799e6d9 1 1 0 1 \n", "4 0b1e1539f2cc45b7b9fa7c272da2e1d7 1 0 0 1 \n", "5 2298d6c36e964ae4a3e7e9706d1fb8c2 1 1 1 1 \n", "6 fafdcd668e3743c1bb461111dcafc2a4 1 1 1 1 \n", "7 5a8bc65990b245e5a138643cd4eb9837 1 1 1 0 \n", "8 f19421c1d4aa40978ebb69ca19b0e20d 1 1 1 1 \n", "9 2906b810c7d4411798c6938adc9daaa5 1 1 0 1 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_offers = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/portfolio.csv')\n", "df_offers = df_offers.copy()\n", "display(df_offers)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 6: Find all `event` types in `df_transcript`\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Find the unique values in the `event` column as a Series.\n", "- ✔️ A fully-working code is provided below.\n", "\n", "#### 🚀 Code\n", "\n", "```python\n", "event_types = df_transcript['event'].unique()\n", "\n", "print(event_types)\n", "```" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['offer received' 'offer viewed' 'transaction' 'offer completed']\n" ] } ], "source": [ "### BEGIN SOLUTION\n", "event_types = df_transcript['event'].unique()\n", "\n", "print(event_types)\n", "### END SOLUTION" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-06", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "tc.assertEqual(set(event_types), set(df_transcript_backup['event'].unique()))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 7: Find all completed offers\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ In `df_transcript` find all rows of completed offers (where the `event` column's value is `'offer completed'`).\n", "- ✔️ Store the filtered result to a new DataFrame named `df_completed`.\n", "- ✔️ `df_transcript` should remain unaltered." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_ideventtimeoffer_idamount
126589fa9ae8f57894cc9a3b8a9bbe0fc1b2foffer completed02906b810c7d4411798c6938adc9daaa5NaN
12672fe97aa22dd3e48c8b143116a8403dd52offer completed0fafdcd668e3743c1bb461111dcafc2a4NaN
12679629fc02d56414d91bca360decdfa9288offer completed09b98b8c7a33c4b65b9aebfe6a799e6d9NaN
12692676506bad68e4161b9bbaffeb039626boffer completed0ae264e3637204a6fb9bb56bc8210ddfdNaN
126978f7dd3b2afe14c078eb4f6e6fe4ba97doffer completed04d5c57ea9a6940dd891ad53e9dbe8da0NaN
..................
3064750c027f5f34dd4b9eba0a25785c611273offer completed7142298d6c36e964ae4a3e7e9706d1fb8c2NaN
306497a6f84f4e976f44508c358cc9aba6d2b3offer completed7142298d6c36e964ae4a3e7e9706d1fb8c2NaN
306506b895c57e8cd047a8872ce02aa54759d6offer completed714fafdcd668e3743c1bb461111dcafc2a4NaN
3065098431c16f8e1d440880db371a68f82dd0offer completed714fafdcd668e3743c1bb461111dcafc2a4NaN
30652724f56b5e1849462093931b164eb803b5offer completed714fafdcd668e3743c1bb461111dcafc2a4NaN
\n", "

33579 rows × 5 columns

\n", "
" ], "text/plain": [ " member_id event time \\\n", "12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f offer completed 0 \n", "12672 fe97aa22dd3e48c8b143116a8403dd52 offer completed 0 \n", "12679 629fc02d56414d91bca360decdfa9288 offer completed 0 \n", "12692 676506bad68e4161b9bbaffeb039626b offer completed 0 \n", "12697 8f7dd3b2afe14c078eb4f6e6fe4ba97d offer completed 0 \n", "... ... ... ... \n", "306475 0c027f5f34dd4b9eba0a25785c611273 offer completed 714 \n", "306497 a6f84f4e976f44508c358cc9aba6d2b3 offer completed 714 \n", "306506 b895c57e8cd047a8872ce02aa54759d6 offer completed 714 \n", "306509 8431c16f8e1d440880db371a68f82dd0 offer completed 714 \n", "306527 24f56b5e1849462093931b164eb803b5 offer completed 714 \n", "\n", " offer_id amount \n", "12658 2906b810c7d4411798c6938adc9daaa5 NaN \n", "12672 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "12679 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n", "12692 ae264e3637204a6fb9bb56bc8210ddfd NaN \n", "12697 4d5c57ea9a6940dd891ad53e9dbe8da0 NaN \n", "... ... ... \n", "306475 2298d6c36e964ae4a3e7e9706d1fb8c2 NaN \n", "306497 2298d6c36e964ae4a3e7e9706d1fb8c2 NaN \n", "306506 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "306509 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "306527 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "\n", "[33579 rows x 5 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_completed = df_transcript[df_transcript['event'] == 'offer completed']\n", "### END SOLUTION\n", "\n", "df_completed" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix incorrect parts." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-07", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = df_transcript.query(\"event == 'offer completed'\")\n", "\n", "pd.testing.assert_frame_equal(df_transcript, df_transcript_backup)\n", "pd.testing.assert_frame_equal(df_completed.sort_values(df_completed.columns.tolist()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 8: Drop unused columns\n", "\n", "▶️ Before proceeding, run the code cell below." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE BELOW\n", "df_completed = df_completed.copy()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Drop `event`, `time`, and `amount` columns from `df_completed` **in-place**.\n", "- ✔️ You should directly drop the columns from `df_completed` using the `inplace=True` option.\n", "\n", "#### 🚀 Hints\n", "\n", "Use the following code as a reference.\n", "\n", "```python\n", "my_dataframe.drop(columns=[\"my_column1\", \"my_column2\"], inplace=True)\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | member_id | offer_id |\n", "|------:|:---------------------------------|:---------------------------------|\n", "| 12658 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 |\n", "| 12672 | fe97aa22dd3e48c8b143116a8403dd52 | fafdcd668e3743c1bb461111dcafc2a4 |\n", "| 12679 | 629fc02d56414d91bca360decdfa9288 | 9b98b8c7a33c4b65b9aebfe6a799e6d9 |" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idoffer_id
126589fa9ae8f57894cc9a3b8a9bbe0fc1b2f2906b810c7d4411798c6938adc9daaa5
12672fe97aa22dd3e48c8b143116a8403dd52fafdcd668e3743c1bb461111dcafc2a4
12679629fc02d56414d91bca360decdfa92889b98b8c7a33c4b65b9aebfe6a799e6d9
\n", "
" ], "text/plain": [ " member_id offer_id\n", "12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5\n", "12672 fe97aa22dd3e48c8b143116a8403dd52 fafdcd668e3743c1bb461111dcafc2a4\n", "12679 629fc02d56414d91bca360decdfa9288 9b98b8c7a33c4b65b9aebfe6a799e6d9" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_completed.drop(columns=[\"event\", \"time\", \"amount\"], inplace=True)\n", "### END SOLUTION\n", "\n", "df_completed.head(3)" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-08", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = df_transcript.query(\"event == 'offer completed'\")[['member_id', 'offer_id']]\n", "\n", "pd.testing.assert_frame_equal(df_completed.sort_values(df_completed.columns.tolist()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 9: Remove rows with missing values\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Update `df_profiles` so that it only contains rows where `gender` is NOT missing.\n", "\n", "#### 🚀 Hints\n", "\n", "```python\n", "df = df[df['some_column'].notna()]\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | member_id | gender | age | became_member_on | income |\n", "|---:|---:|---:|---:|---:|---:|\n", "| 1 | 0610b486422d4921ae7d2bf64640c50b | F | 55.0 | 2017-07-15 | 112000.0 |\n", "| 3 | 78afa995795e4d85b5d9ceeca43f5fef | F | 75.0 | 2017-05-09 | 100000.0 |\n", "| 5 | e2127556f4f64592b11af22de27a7932 | M | 68.0 | 2018-04-26 | 70000.0 |\n", "| 8 | 389bc3fa690240e798340f5a15918d5c | M | 65.0 | 2018-02-09 | 53000.0 |\n", "| 12 | 2eeac8d8feae4a8cad5a6af0499a211d | M | 58.0 | 2017-11-11 | 51000.0 |" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idgenderagebecame_member_onincome
10610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
378afa995795e4d85b5d9ceeca43f5fefF75.02017-05-09100000.0
5e2127556f4f64592b11af22de27a7932M68.02018-04-2670000.0
8389bc3fa690240e798340f5a15918d5cM65.02018-02-0953000.0
122eeac8d8feae4a8cad5a6af0499a211dM58.02017-11-1151000.0
\n", "
" ], "text/plain": [ " member_id gender age became_member_on income\n", "1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 112000.0\n", "3 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 100000.0\n", "5 e2127556f4f64592b11af22de27a7932 M 68.0 2018-04-26 70000.0\n", "8 389bc3fa690240e798340f5a15918d5c M 65.0 2018-02-09 53000.0\n", "12 2eeac8d8feae4a8cad5a6af0499a211d M 58.0 2017-11-11 51000.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_profiles = df_profiles[df_profiles['gender'].notna()]\n", "### END SOLUTION\n", "\n", "df_profiles.head(5)" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-09", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = df_profiles_backup.query('gender == gender')\n", "\n", "pd.testing.assert_frame_equal(df_profiles.sort_values(df_profiles.columns.to_list()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 10: Merge profiles into `df_completed`\n", "\n", "▶️ Run the code below to:\n", "1. Convert `age` and `income` columns in `df_profiles` to integer types (from float types).\n", "2. Print out the first three rows of `df_transactions` and `df_profiles`." ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idoffer_id
126589fa9ae8f57894cc9a3b8a9bbe0fc1b2f2906b810c7d4411798c6938adc9daaa5
12672fe97aa22dd3e48c8b143116a8403dd52fafdcd668e3743c1bb461111dcafc2a4
12679629fc02d56414d91bca360decdfa92889b98b8c7a33c4b65b9aebfe6a799e6d9
\n", "
" ], "text/plain": [ " member_id offer_id\n", "12658 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5\n", "12672 fe97aa22dd3e48c8b143116a8403dd52 fafdcd668e3743c1bb461111dcafc2a4\n", "12679 629fc02d56414d91bca360decdfa9288 9b98b8c7a33c4b65b9aebfe6a799e6d9" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idgenderagebecame_member_onincome
10610b486422d4921ae7d2bf64640c50bF552017-07-15112000
378afa995795e4d85b5d9ceeca43f5fefF752017-05-09100000
5e2127556f4f64592b11af22de27a7932M682018-04-2670000
\n", "
" ], "text/plain": [ " member_id gender age became_member_on income\n", "1 0610b486422d4921ae7d2bf64640c50b F 55 2017-07-15 112000\n", "3 78afa995795e4d85b5d9ceeca43f5fef F 75 2017-05-09 100000\n", "5 e2127556f4f64592b11af22de27a7932 M 68 2018-04-26 70000" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Convert age and income to int64 types\n", "df_profiles = df_profiles.copy()\n", "df_profiles['age'] = df_profiles['age'].astype(np.int64)\n", "df_profiles['income'] = df_profiles['income'].astype(np.int64)\n", "\n", "display(df_completed.head(3))\n", "display(df_profiles.head(3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Merge `df_completed` and `df_profiles` using an **inner** join type.\n", " - By using an inner join, we are intentionally removing transactions made by customers with missing gender information.\n", "- ✔️ Place `df_completed` on the \"left\" side.\n", "- ✔️ Store the merged DataFrame to `df_merged`.\n", "\n", "#### 🚀 Hints\n", "\n", "```python\n", "my_merged_dataframe = pd.merge(\n", " left=left_dataframe,\n", " right=right_dataframe,\n", " on='shared_key_column',\n", " how='inner'\n", ")\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | member_id | offer_id | gender | age | became_member_on | income |\n", "|---:|:---------------------------------|:---------------------------------|:---------|------:|:-------------------|---------:|\n", "| 0 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 | M | 42 | 2016-01-17 | 96000 |\n", "| 1 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2298d6c36e964ae4a3e7e9706d1fb8c2 | M | 42 | 2016-01-17 | 96000 |\n", "| 2 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 0b1e1539f2cc45b7b9fa7c272da2e1d7 | M | 42 | 2016-01-17 | 96000 |" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idoffer_idgenderagebecame_member_onincome
09fa9ae8f57894cc9a3b8a9bbe0fc1b2f2906b810c7d4411798c6938adc9daaa5M422016-01-1796000
19fa9ae8f57894cc9a3b8a9bbe0fc1b2f2298d6c36e964ae4a3e7e9706d1fb8c2M422016-01-1796000
29fa9ae8f57894cc9a3b8a9bbe0fc1b2f0b1e1539f2cc45b7b9fa7c272da2e1d7M422016-01-1796000
\n", "
" ], "text/plain": [ " member_id offer_id gender \\\n", "0 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5 M \n", "1 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2298d6c36e964ae4a3e7e9706d1fb8c2 M \n", "2 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 0b1e1539f2cc45b7b9fa7c272da2e1d7 M \n", "\n", " age became_member_on income \n", "0 42 2016-01-17 96000 \n", "1 42 2016-01-17 96000 \n", "2 42 2016-01-17 96000 " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_merged = pd.merge(\n", " left=df_completed,\n", " right=df_profiles,\n", " on='member_id',\n", " how='inner'\n", ")\n", "### END SOLUTION\n", "\n", "df_merged.head(3)" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-10", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = df_transcript_backup.query('event == \"offer completed\"').merge(\n", " df_profiles_backup.query('gender == gender'),\n", " on='member_id',\n", " how='inner'\n", ").drop(columns=['event', 'time', 'amount'])\n", "df_check['age'] = df_check['age'].astype(np.int64)\n", "df_check['income'] = df_check['income'].astype(np.int64)\n", "df_merged_backup1 = df_check.copy()\n", "\n", "pd.testing.assert_frame_equal(df_merged.sort_values(df_merged.columns.to_list()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 11: Merge offers into `df_merged`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Merge `df_offers` and `df_merged` using an **inner** join type.\n", "- ✔️ Place `df_merged` on the \"left\" side.\n", "- ✔️ Store the merged DataFrame to `df_completed_offers`.\n", "\n", "#### 🚀 Hints\n", "\n", "```python\n", "my_merged_dataframe = pd.merge(\n", " left=left_dataframe,\n", " right=right_dataframe,\n", " on='shared_key_column',\n", " how='inner'\n", ")\n", "```\n", "\n", "#### 🔑 Expected Output\n", "\n", "| | member_id | offer_id | gender | age | became_member_on | income | reward | difficulty | duration | offer_type | email | mobile | social | web |\n", "|---:|:---------------------------------|:---------------------------------|:---------|------:|:-------------------|---------:|---------:|-------------:|-----------:|:-------------|--------:|---------:|---------:|------:|\n", "| 0 | 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f | 2906b810c7d4411798c6938adc9daaa5 | M | 42 | 2016-01-17 | 96000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |\n", "| 1 | 629fc02d56414d91bca360decdfa9288 | 2906b810c7d4411798c6938adc9daaa5 | M | 52 | 2018-06-05 | 72000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |\n", "| 2 | 73ffefd41e9a4ca3ab26b2b3697c6eb7 | 2906b810c7d4411798c6938adc9daaa5 | F | 67 | 2015-09-29 | 67000 | 2 | 10 | 7 | discount | 1 | 1 | 0 | 1 |" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idoffer_idgenderagebecame_member_onincomerewarddifficultydurationoffer_typeemailmobilesocialweb
09fa9ae8f57894cc9a3b8a9bbe0fc1b2f2906b810c7d4411798c6938adc9daaa5M422016-01-17960002107discount1101
1629fc02d56414d91bca360decdfa92882906b810c7d4411798c6938adc9daaa5M522018-06-05720002107discount1101
273ffefd41e9a4ca3ab26b2b3697c6eb72906b810c7d4411798c6938adc9daaa5F672015-09-29670002107discount1101
\n", "
" ], "text/plain": [ " member_id offer_id gender \\\n", "0 9fa9ae8f57894cc9a3b8a9bbe0fc1b2f 2906b810c7d4411798c6938adc9daaa5 M \n", "1 629fc02d56414d91bca360decdfa9288 2906b810c7d4411798c6938adc9daaa5 M \n", "2 73ffefd41e9a4ca3ab26b2b3697c6eb7 2906b810c7d4411798c6938adc9daaa5 F \n", "\n", " age became_member_on income reward difficulty duration offer_type \\\n", "0 42 2016-01-17 96000 2 10 7 discount \n", "1 52 2018-06-05 72000 2 10 7 discount \n", "2 67 2015-09-29 67000 2 10 7 discount \n", "\n", " email mobile social web \n", "0 1 1 0 1 \n", "1 1 1 0 1 \n", "2 1 1 0 1 " ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_completed_offers = pd.merge(\n", " left=df_merged,\n", " right=df_offers,\n", " on='offer_id',\n", " how='inner'\n", ")\n", "### END SOLUTION\n", "\n", "df_completed_offers.head(3)" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-11", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = df_merged_backup1.merge(\n", " df_offers,\n", " on='offer_id',\n", " how='inner'\n", ")\n", "df_merged_backup2 = df_check.copy()\n", "\n", "pd.testing.assert_frame_equal(df_completed_offers.sort_values(df_completed_offers.columns.to_list()).reset_index(drop=True),\n", " df_check.sort_values(df_check.columns.tolist()).reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 📌 Challenge Day\n", "\n", "- For the rest of this lecture, your challenge is to get through these challenges by yourself! 🏀\n", "- At the end of the class, we'll go over them together.\n", "- If we're short on time, I will post the solutions to these challenges" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 12: Find how many times each offer has been redeemed\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_completed_offers`, find the number of times each offer has been redeemed.\n", "- ✔️ Store the summary DataFrame to `df_count_by_offer_id`.\n", "- ✔️ `df_count_by_offer_id` should have two non-index columns.\n", " - `df_count_by_offer_id.columns.to_list()` should print out `['offer_id', 'count']`.\n", "- ✔️ Sort `df_count_by_offer_id` by `count` in descending order.\n", "- ✔️ `df_completed_offers` should remain unaltered.\n", "\n", "\n", "#### 🔑 Expected Output\n", "\n", "Your index column may contain different values.\n", "\n", "| | offer_id | count |\n", "|---:|:---------------------------------|--------:|\n", "| 7 | fafdcd668e3743c1bb461111dcafc2a4 | 5003 |\n", "| 1 | 2298d6c36e964ae4a3e7e9706d1fb8c2 | 4886 |\n", "| 4 | 9b98b8c7a33c4b65b9aebfe6a799e6d9 | 4188 |\n", "| 6 | f19421c1d4aa40978ebb69ca19b0e20d | 4103 |\n", "| 2 | 2906b810c7d4411798c6938adc9daaa5 | 3911 |\n", "| 5 | ae264e3637204a6fb9bb56bc8210ddfd | 3657 |\n", "| 0 | 0b1e1539f2cc45b7b9fa7c272da2e1d7 | 3386 |\n", "| 3 | 4d5c57ea9a6940dd891ad53e9dbe8da0 | 3310 |" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
offer_idcount
7fafdcd668e3743c1bb461111dcafc2a45003
12298d6c36e964ae4a3e7e9706d1fb8c24886
49b98b8c7a33c4b65b9aebfe6a799e6d94188
6f19421c1d4aa40978ebb69ca19b0e20d4103
22906b810c7d4411798c6938adc9daaa53911
5ae264e3637204a6fb9bb56bc8210ddfd3657
00b1e1539f2cc45b7b9fa7c272da2e1d73386
34d5c57ea9a6940dd891ad53e9dbe8da03310
\n", "
" ], "text/plain": [ " offer_id count\n", "7 fafdcd668e3743c1bb461111dcafc2a4 5003\n", "1 2298d6c36e964ae4a3e7e9706d1fb8c2 4886\n", "4 9b98b8c7a33c4b65b9aebfe6a799e6d9 4188\n", "6 f19421c1d4aa40978ebb69ca19b0e20d 4103\n", "2 2906b810c7d4411798c6938adc9daaa5 3911\n", "5 ae264e3637204a6fb9bb56bc8210ddfd 3657\n", "0 0b1e1539f2cc45b7b9fa7c272da2e1d7 3386\n", "3 4d5c57ea9a6940dd891ad53e9dbe8da0 3310" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_count_by_offer_id = df_completed_offers.groupby('offer_id', as_index=False).agg({\n", " 'member_id': 'count'\n", "}).rename(columns={\n", " 'member_id': 'count'\n", "}).sort_values('count', ascending=False)\n", "### END SOLUTION\n", "\n", "df_count_by_offer_id" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-12", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = pd.DataFrame({'offer_id': {7: 'fafdcd668e3743c1bb461111dcafc2a4', 1: '2298d6c36e964ae4a3e7e9706d1fb8c2', \n", "4: '9b98b8c7a33c4b65b9aebfe6a799e6d9', 6: 'f19421c1d4aa40978ebb69ca19b0e20d', 2: '2906b810c7d4411798c6938adc9daaa5',\n", "5: 'ae264e3637204a6fb9bb56bc8210ddfd', 0: '0b1e1539f2cc45b7b9fa7c272da2e1d7', 3: '4d5c57ea9a6940dd891ad53e9dbe8da0'},\n", "'count': {7: 5003, 1: 4886, 4: 4188, 6: 4103, 2: 3911, 5: 3657, 0: 3386, 3: 3310}})\n", "\n", "pd.testing.assert_frame_equal(df_count_by_offer_id.reset_index(drop=True),\n", " df_check.reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 13: Find how many times each type of offer has been received\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_completed_offers`, find the number of times each offer **type** has been redeemed.\n", "- ✔️ Store the summary DataFrame to `df_count_by_offer_type`.\n", "- ✔️ `df_count_by_offer_type` should have two non-index columns.\n", " - `df_count_by_offer_type.columns.to_list()` should print out `['offer_type', 'count']`.\n", "- ✔️ Sort `df_count_by_offer_type` by `count` in descending order.\n", "- ✔️ `df_completed_offers` should remain unaltered.\n", "\n", "\n", "#### 🔑 Expected Output\n", "\n", "Your index column may contain different values.\n", "\n", "| | offer_type | count |\n", "|---:|:-------------|--------:|\n", "| 1 | discount | 17186 |\n", "| 0 | bogo | 15258 |" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
offer_typecount
1discount17186
0bogo15258
\n", "
" ], "text/plain": [ " offer_type count\n", "1 discount 17186\n", "0 bogo 15258" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_count_by_offer_type = df_completed_offers.groupby('offer_type', as_index=False).agg({\n", " 'member_id': 'count'\n", "}).rename(columns={\n", " 'member_id': 'count'\n", "}).sort_values('count', ascending=False)\n", "### END SOLUTION\n", "\n", "df_count_by_offer_type" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-13", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = pd.DataFrame({'offer_type': {1: 'discount', 0: 'bogo'}, 'count': {1: 17186, 0: 15258}})\n", "\n", "pd.testing.assert_frame_equal(df_count_by_offer_type.reset_index(drop=True),\n", " df_check.reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 14: Find redeem count by gender\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_completed_offers`, find the number of times each gender has redeemed an offer.\n", "- ✔️ Store the summary DataFrame to `df_count_by_gender`.\n", "- ✔️ `df_count_by_offer_type` should have two non-index columns.\n", " - `df_count_by_gender.columns.to_list()` should print out `['gender', 'count']`.\n", "- ✔️ Sort `df_count_by_gender` by `count` in descending order.\n", "- ✔️ `df_completed_offers` should remain unaltered.\n", "\n", "\n", "#### 🔑 Expected Output\n", "\n", "Your index column may contain different values.\n", "\n", "| | gender | count |\n", "|---:|:---------|--------:|\n", "| 1 | M | 16466 |\n", "| 0 | F | 15477 |\n", "| 2 | O | 501 |" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gendercount
1M16466
0F15477
2O501
\n", "
" ], "text/plain": [ " gender count\n", "1 M 16466\n", "0 F 15477\n", "2 O 501" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_count_by_gender = df_completed_offers.groupby('gender', as_index=False).agg({\n", " 'member_id': 'count'\n", "}).rename(columns={\n", " 'member_id': 'count'\n", "}).sort_values('count', ascending=False)\n", "### END SOLUTION\n", "\n", "df_count_by_gender" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-14", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = pd.DataFrame({'gender': {1: 'M', 0: 'F', 2: 'O'}, 'count': {1: 16466, 0: 15477, 2: 501}})\n", "\n", "pd.testing.assert_frame_equal(df_count_by_gender.reset_index(drop=True),\n", " df_check.reset_index(drop=True))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 15: Find redeem count by member sign-up year\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Using `df_completed_offers`, first find the member sign-up year using `became_member_on` column.\n", "- ✔️ Then, find the number of completed offers by the sign-up year.\n", "- ✔️ Store the summary DataFrame to `df_count_by_member_signup_year`.\n", "- ✔️ `df_count_by_member_signup_year` should have two non-index columns.\n", " - `df_count_by_member_signup_year.columns.to_list()` should print out `['member_since', 'count']`.\n", "- ✔️ Sort `df_count_by_member_signup_year` by `year` in ascending order.\n", "- ✔️ **You can modify `df_completed_offers` if you need to (e.g., convert a column to datetime format, extract year, etc).**\n", "\n", "\n", "#### 🔑 Expected Output\n", "\n", "Your index column may contain different values.\n", "\n", "| | member_since | count |\n", "|---:|---------------:|--------:|\n", "| 0 | 2013 | 561 |\n", "| 1 | 2014 | 1321 |\n", "| 2 | 2015 | 4319 |\n", "| 3 | 2016 | 8928 |\n", "| 4 | 2017 | 12119 |\n", "| 5 | 2018 | 5196 |" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_sincecount
02013561
120141321
220154319
320168928
4201712119
520185196
\n", "
" ], "text/plain": [ " member_since count\n", "0 2013 561\n", "1 2014 1321\n", "2 2015 4319\n", "3 2016 8928\n", "4 2017 12119\n", "5 2018 5196" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "### BEGIN SOLUTION\n", "df_completed_offers['became_member_on'] = pd.to_datetime(df_completed_offers['became_member_on'])\n", "df_completed_offers['member_since'] = df_completed_offers['became_member_on'].dt.year\n", "df_count_by_member_signup_year = df_completed_offers.groupby('member_since', as_index=False).agg({\n", " 'member_id': 'count'\n", "}).rename(columns={\n", " 'member_id': 'count'\n", "})\n", "### END SOLUTION\n", "\n", "df_count_by_member_signup_year" ] }, { "cell_type": "markdown", "metadata": { "id": "EdrK-mBsem7r" }, "source": [ "#### 🧭 Check Your Work\n", "\n", "- Once you're done, run the code cell below to test correctness.\n", "- ✔️ If the code cell runs without an error, you're good to move on.\n", "- ❌ If the code cell throws an error, go back and fix any incorrect parts." ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "nbgrader": { "grade": true, "grade_id": "challenge-15", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "df_check = pd.DataFrame({'member_since': {0: 2013, 1: 2014, 2: 2015, 3: 2016, 4: 2017, 5: 2018},\n", " 'count': {0: 561, 1: 1321, 2: 4319, 3: 8928, 4: 12119, 5: 5196}})\n", "\n", "pd.testing.assert_frame_equal(df_count_by_member_signup_year.reset_index(drop=True),\n", " df_check.reset_index(drop=True))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }