{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SQL Joins" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "## ✨ Joining Tables" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Today, we will review basic SQL joins." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ First, run the code cell below to import modules used for **🧭 Check Your Work** sections and the autograder." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import unittest\n", "import base64\n", "tc = unittest.TestCase()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Pre-exercise: Import Packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 👇 Tasks\n", "\n", "- ✔️ Import the following Python packages.\n", " 1. `pandas`: Use alias `pd`.\n", " 2. `numpy`: Use alias `np`.\n", " 3. `sqlite3`: No alias" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# YOUR CODE BEGINS\n", "import pandas as pd\n", "import numpy as np\n", "import sqlite3\n", "# YOUR CODE ENDS" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "nbgrader": { "grade": true, "grade_id": "pre-exercise", "locked": true, "points": "1", "solution": false } }, "outputs": [], "source": [ "import sys\n", "tc.assertTrue('pd' in globals(), 'Check whether you have correctly imported Pandas with an alias.')\n", "tc.assertTrue('np' in globals(), 'Check whether you have correctly imported NumPy with an alias.')\n", "tc.assertTrue('sqlite3' in globals(), 'Check whether you have correctly imported the sqlite3 package.')" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "locked": true, "solution": false } }, "source": [ "---\n", "\n", "### 📌 Transcripts\n", "\n", "Throughout this exercise, you'll work with the Starbucks Customer Rewards Program dataset. ☕ The dataset was originally released as a part of a capstone project of a Udacity course.\n", "\n", "For the purpose of this course, the original dataset in JSON format has been transformed to CSV files for convenience. You can download the original JSON files [here](https://www.kaggle.com/blacktile/starbucks-app-customer-reward-program-data?select=portfolio.json)." ] }, { "cell_type": "markdown", "metadata": { "id": "LI33A8-jem7m", "nbgrader": { "grade": false, "locked": true, "solution": false } }, "source": [ "The table below describes the columns in the `transactions` table.\n", "\n", "| Field | Description |\n", "|---|---|\n", "| member_id | Member ID |\n", "| event | Type of event ('offer received', 'offer completed', 'transaction') |\n", "| time | Time as continuous value |\n", "| offer_id | Offer ID |\n", "| amount | Amount of purchase or NaN if non-purchase |" ] }, { "cell_type": "markdown", "metadata": { "nbgrader": { "grade": false, "locked": true, "solution": false } }, "source": [ "---\n", "\n", "### 📌 Customer profiles\n", "\n", "We want to go further with the purchase data analysis by looking at the customers. Unfortunately, the information we currently have about the customers in each purchase is limited. We only have their unique IDs (e.g., `02c083884c7d45b39cc68e1314fec56c`, `9fa9ae8f57894cc9a3b8a9bbe0fc1b2f`).\n", "\n", "In the next few steps, you'll read the customer profiles data and link them to `df_transactions` using `member_id`." ] }, { "cell_type": "markdown", "metadata": { "id": "LI33A8-jem7m", "nbgrader": { "grade": false, "locked": true, "solution": false } }, "source": [ "The table below describes the columns in the `profiles` table.\n", "\n", "| Field | Description |\n", "|---|---|\n", "| member_id | Unique identifier for each Starbucks app member |\n", "| gender | Gender (NaN if unknown) |\n", "| age | Age (NaN if unknown) |\n", "| became_member_on | Sign-up date |\n", "| income | Annual income (NaN if unknown) |" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 📌 Populate database tables from a CSV file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code below to populate the `profiles` and `transcripts` tables." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "df_transactions = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/transcript.v2.csv.gz')\n", "df_profiles = pd.read_csv('https://github.com/bdi475/datasets/raw/main/starbucks-rewards/profile.csv')\n", "\n", "conn = sqlite3.connect('starbucks.db')\n", "c = conn.cursor()\n", "\n", "tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type=\"table\";', con=conn)['tbl_name'])\n", "tables_to_populate = ['transactions', 'profiles']\n", "\n", "for t in tables_to_populate:\n", " if t in tables:\n", " c.execute(f'DELETE FROM {t}')\n", " conn.commit()\n", "\n", "df_transactions.to_sql(name='transactions', index=False, con=conn, if_exists='append')\n", "df_profiles.to_sql(name='profiles', index=False, con=conn, if_exists='append')\n", "\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn_checker = sqlite3.connect('starbucks.db')\n", "tables_to_check = ['transactions', 'profiles']\n", "\n", "# Check if table exists\n", "for t in tables_to_check:\n", " user_tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type=\"table\";', con=conn_checker)['tbl_name'])\n", " tc.assertTrue(t in user_tables, f'{t} does not exist in your starbucks.db file!')\n", "\n", "conn_checker.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "### 📌 Read Sqlite Database File" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code below to select the first 5 rows from the `transactions` table" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_ideventtimeoffer_idamount
078afa995795e4d85b5d9ceeca43f5fefoffer received09b98b8c7a33c4b65b9aebfe6a799e6d9None
1a03223e636434f42ac4c3df47e8bac43offer received00b1e1539f2cc45b7b9fa7c272da2e1d7None
2e2127556f4f64592b11af22de27a7932offer received02906b810c7d4411798c6938adc9daaa5None
38ec6ce2a7e7949b1bf142def7d0e0586offer received0fafdcd668e3743c1bb461111dcafc2a4None
468617ca6246f4fbc85e91a2a49552598offer received04d5c57ea9a6940dd891ad53e9dbe8da0None
\n", "
" ], "text/plain": [ " member_id event time \\\n", "0 78afa995795e4d85b5d9ceeca43f5fef offer received 0 \n", "1 a03223e636434f42ac4c3df47e8bac43 offer received 0 \n", "2 e2127556f4f64592b11af22de27a7932 offer received 0 \n", "3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received 0 \n", "4 68617ca6246f4fbc85e91a2a49552598 offer received 0 \n", "\n", " offer_id amount \n", "0 9b98b8c7a33c4b65b9aebfe6a799e6d9 None \n", "1 0b1e1539f2cc45b7b9fa7c272da2e1d7 None \n", "2 2906b810c7d4411798c6938adc9daaa5 None \n", "3 fafdcd668e3743c1bb461111dcafc2a4 None \n", "4 4d5c57ea9a6940dd891ad53e9dbe8da0 None " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn = sqlite3.connect('starbucks.db')\n", "display(pd.read_sql_query('SELECT * FROM transactions LIMIT 5;', con=conn))\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "▶️ Run the code below to select the first 5 rows from the `profiles` table" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_idgenderagebecame_member_onincome
068be06ca386d4c31939f3a4f0e3dd783NoneNaN2017-02-12NaN
10610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
238fe809add3b4fcf9315a9694bb96ff5NoneNaN2018-07-12NaN
378afa995795e4d85b5d9ceeca43f5fefF75.02017-05-09100000.0
4a03223e636434f42ac4c3df47e8bac43NoneNaN2017-08-04NaN
\n", "
" ], "text/plain": [ " member_id gender age became_member_on income\n", "0 68be06ca386d4c31939f3a4f0e3dd783 None NaN 2017-02-12 NaN\n", "1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 112000.0\n", "2 38fe809add3b4fcf9315a9694bb96ff5 None NaN 2018-07-12 NaN\n", "3 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 100000.0\n", "4 a03223e636434f42ac4c3df47e8bac43 None NaN 2017-08-04 NaN" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn = sqlite3.connect('starbucks.db')\n", "display(pd.read_sql_query('SELECT * FROM profiles LIMIT 5;', con=conn))\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn_checker = sqlite3.connect('starbucks.db')\n", "tables_to_check = ['transactions', 'profiles']\n", "\n", "# Check if table exists\n", "user_tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type=\"table\";', con=conn_checker)['tbl_name'])\n", "\n", "for table_to_check in tables_to_check:\n", " tc.assertTrue(table_to_check in user_tables, f'{table_to_check} does not exist in your NWT.db file!')\n", "\n", "conn_checker.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 1: Join `profiles` into `transactions` table\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Write a query that joins the `profiles` table into `transactions`.\n", "- ✔️ Use an inner join.\n", "- ✔️ Select all columns.\n", "- ✔️ Store your query to a new variable named `query_joined1`." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_ideventtimeoffer_idamountmember_idgenderagebecame_member_onincome
078afa995795e4d85b5d9ceeca43f5fefoffer received09b98b8c7a33c4b65b9aebfe6a799e6d9NaN78afa995795e4d85b5d9ceeca43f5fefF75.02017-05-09100000.0
1a03223e636434f42ac4c3df47e8bac43offer received00b1e1539f2cc45b7b9fa7c272da2e1d7NaNa03223e636434f42ac4c3df47e8bac43NoneNaN2017-08-04NaN
2e2127556f4f64592b11af22de27a7932offer received02906b810c7d4411798c6938adc9daaa5NaNe2127556f4f64592b11af22de27a7932M68.02018-04-2670000.0
38ec6ce2a7e7949b1bf142def7d0e0586offer received0fafdcd668e3743c1bb461111dcafc2a4NaN8ec6ce2a7e7949b1bf142def7d0e0586NoneNaN2017-09-25NaN
468617ca6246f4fbc85e91a2a49552598offer received04d5c57ea9a6940dd891ad53e9dbe8da0NaN68617ca6246f4fbc85e91a2a49552598NoneNaN2017-10-02NaN
.................................
306529b3a1272bc9904337b331bf348c3e8c17transaction714None1.59b3a1272bc9904337b331bf348c3e8c17M66.02018-01-0147000.0
30653068213b08d99a4ae1b0dcb72aebd9aa35transaction714None9.5368213b08d99a4ae1b0dcb72aebd9aa35M52.02018-04-0862000.0
306531a00058cf10334a308c68e7631c529907transaction714None3.61a00058cf10334a308c68e7631c529907F63.02013-09-2252000.0
30653276ddbd6576844afe811f1a3c0fbb5bectransaction714None3.5376ddbd6576844afe811f1a3c0fbb5becM57.02016-07-0940000.0
306533c02b10e8752c4d8e9b73f918558531f7transaction714None4.05c02b10e8752c4d8e9b73f918558531f7NoneNaN2015-12-11NaN
\n", "

306534 rows × 10 columns

\n", "
" ], "text/plain": [ " member_id event time \\\n", "0 78afa995795e4d85b5d9ceeca43f5fef offer received 0 \n", "1 a03223e636434f42ac4c3df47e8bac43 offer received 0 \n", "2 e2127556f4f64592b11af22de27a7932 offer received 0 \n", "3 8ec6ce2a7e7949b1bf142def7d0e0586 offer received 0 \n", "4 68617ca6246f4fbc85e91a2a49552598 offer received 0 \n", "... ... ... ... \n", "306529 b3a1272bc9904337b331bf348c3e8c17 transaction 714 \n", "306530 68213b08d99a4ae1b0dcb72aebd9aa35 transaction 714 \n", "306531 a00058cf10334a308c68e7631c529907 transaction 714 \n", "306532 76ddbd6576844afe811f1a3c0fbb5bec transaction 714 \n", "306533 c02b10e8752c4d8e9b73f918558531f7 transaction 714 \n", "\n", " offer_id amount \\\n", "0 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n", "1 0b1e1539f2cc45b7b9fa7c272da2e1d7 NaN \n", "2 2906b810c7d4411798c6938adc9daaa5 NaN \n", "3 fafdcd668e3743c1bb461111dcafc2a4 NaN \n", "4 4d5c57ea9a6940dd891ad53e9dbe8da0 NaN \n", "... ... ... \n", "306529 None 1.59 \n", "306530 None 9.53 \n", "306531 None 3.61 \n", "306532 None 3.53 \n", "306533 None 4.05 \n", "\n", " member_id gender age became_member_on \\\n", "0 78afa995795e4d85b5d9ceeca43f5fef F 75.0 2017-05-09 \n", "1 a03223e636434f42ac4c3df47e8bac43 None NaN 2017-08-04 \n", "2 e2127556f4f64592b11af22de27a7932 M 68.0 2018-04-26 \n", "3 8ec6ce2a7e7949b1bf142def7d0e0586 None NaN 2017-09-25 \n", "4 68617ca6246f4fbc85e91a2a49552598 None NaN 2017-10-02 \n", "... ... ... ... ... \n", "306529 b3a1272bc9904337b331bf348c3e8c17 M 66.0 2018-01-01 \n", "306530 68213b08d99a4ae1b0dcb72aebd9aa35 M 52.0 2018-04-08 \n", "306531 a00058cf10334a308c68e7631c529907 F 63.0 2013-09-22 \n", "306532 76ddbd6576844afe811f1a3c0fbb5bec M 57.0 2016-07-09 \n", "306533 c02b10e8752c4d8e9b73f918558531f7 None NaN 2015-12-11 \n", "\n", " income \n", "0 100000.0 \n", "1 NaN \n", "2 70000.0 \n", "3 NaN \n", "4 NaN \n", "... ... \n", "306529 47000.0 \n", "306530 62000.0 \n", "306531 52000.0 \n", "306532 40000.0 \n", "306533 NaN \n", "\n", "[306534 rows x 10 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR CODE BEGINS\n", "query_joined1 = '''\n", "SELECT *\n", "FROM transactions\n", "INNER JOIN profiles\n", "ON transactions.member_id == profiles.member_id;\n", "'''\n", "# YOUR CODE ENDS\n", "\n", "conn = sqlite3.connect('starbucks.db')\n", "df_result = pd.read_sql_query(query_joined1, con=conn)\n", "display(df_result)\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "3", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn = sqlite3.connect('starbucks.db')\n", "df_check = pd.read_sql_query(query_joined1, con=conn)\n", "tc.assertEqual(df_result.shape, (306534, 10), 'Incorrect number of rows and/or columns')\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 2: Join `profiles` into `transactions` table where gender is not missing\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Write a query that joins the `profiles` table into `transactions`.\n", "- ✔️ Only keep rows where the `gender` column's value in the `profiles` table is NOT NULL.\n", "- ✔️ Use an inner join.\n", "- ✔️ Select all columns.\n", "- ✔️ Store your query to a new variable named `query_joined2`." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
member_ideventtimeoffer_idamountmember_idgenderagebecame_member_onincome
00610b486422d4921ae7d2bf64640c50boffer completed5289b98b8c7a33c4b65b9aebfe6a799e6d9NaN0610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
10610b486422d4921ae7d2bf64640c50boffer received4089b98b8c7a33c4b65b9aebfe6a799e6d9NaN0610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
20610b486422d4921ae7d2bf64640c50boffer received5043f207df678b143eea3cee63160fa8bedNaN0610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
30610b486422d4921ae7d2bf64640c50btransaction18None21.510610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
40610b486422d4921ae7d2bf64640c50btransaction144None32.280610b486422d4921ae7d2bf64640c50bF55.02017-07-15112000.0
.................................
272757e4052622e5ba45a8b96b59aba68cf068transaction84None25.19e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
272758e4052622e5ba45a8b96b59aba68cf068transaction96None21.53e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
272759e4052622e5ba45a8b96b59aba68cf068transaction480None30.57e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
272760e4052622e5ba45a8b96b59aba68cf068transaction486None19.47e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
272761e4052622e5ba45a8b96b59aba68cf068transaction690None24.71e4052622e5ba45a8b96b59aba68cf068F62.02017-07-2282000.0
\n", "

272762 rows × 10 columns

\n", "
" ], "text/plain": [ " member_id event time \\\n", "0 0610b486422d4921ae7d2bf64640c50b offer completed 528 \n", "1 0610b486422d4921ae7d2bf64640c50b offer received 408 \n", "2 0610b486422d4921ae7d2bf64640c50b offer received 504 \n", "3 0610b486422d4921ae7d2bf64640c50b transaction 18 \n", "4 0610b486422d4921ae7d2bf64640c50b transaction 144 \n", "... ... ... ... \n", "272757 e4052622e5ba45a8b96b59aba68cf068 transaction 84 \n", "272758 e4052622e5ba45a8b96b59aba68cf068 transaction 96 \n", "272759 e4052622e5ba45a8b96b59aba68cf068 transaction 480 \n", "272760 e4052622e5ba45a8b96b59aba68cf068 transaction 486 \n", "272761 e4052622e5ba45a8b96b59aba68cf068 transaction 690 \n", "\n", " offer_id amount \\\n", "0 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n", "1 9b98b8c7a33c4b65b9aebfe6a799e6d9 NaN \n", "2 3f207df678b143eea3cee63160fa8bed NaN \n", "3 None 21.51 \n", "4 None 32.28 \n", "... ... ... \n", "272757 None 25.19 \n", "272758 None 21.53 \n", "272759 None 30.57 \n", "272760 None 19.47 \n", "272761 None 24.71 \n", "\n", " member_id gender age became_member_on \\\n", "0 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "1 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "2 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "3 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "4 0610b486422d4921ae7d2bf64640c50b F 55.0 2017-07-15 \n", "... ... ... ... ... \n", "272757 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "272758 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "272759 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "272760 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "272761 e4052622e5ba45a8b96b59aba68cf068 F 62.0 2017-07-22 \n", "\n", " income \n", "0 112000.0 \n", "1 112000.0 \n", "2 112000.0 \n", "3 112000.0 \n", "4 112000.0 \n", "... ... \n", "272757 82000.0 \n", "272758 82000.0 \n", "272759 82000.0 \n", "272760 82000.0 \n", "272761 82000.0 \n", "\n", "[272762 rows x 10 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR CODE BEGINS\n", "query_joined2 = '''\n", "SELECT *\n", "FROM transactions\n", "INNER JOIN profiles\n", "ON transactions.member_id == profiles.member_id\n", "WHERE profiles.gender IS NOT NULL;\n", "'''\n", "# YOUR CODE ENDS\n", "\n", "conn = sqlite3.connect('starbucks.db')\n", "df_result = pd.read_sql_query(query_joined2, con=conn)\n", "display(df_result)\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "3", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn = sqlite3.connect('starbucks.db')\n", "df_check = pd.read_sql_query(query_joined2, con=conn)\n", "tc.assertEqual(df_result.shape[0], 272762, 'Incorrect number of rows and/or columns')\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### 🎯 Exercise 3: Average transaction amount by gender\n", "\n", "#### 👇 Tasks\n", "\n", "- ✔️ Write a query that finds the average transaction amount by gender.\n", "- ✔️ Only display rows with a non-missing `gender` value.\n", "- ✔️ Only display the following two columns:\n", " - `gender`\n", " - `average_amount`\n", "- ✔️ Store your query to a new variable named `query_joined3`.\n", "- ✔️ Sort the result by `average_amount` in descending order.\n", "\n", "\n", "#### 🧭 Expected Output\n", "\n", "| | gender | average_amount |\n", "|---:|:---------|-----------------:|\n", "| 0 | F | 17.4901 |\n", "| 1 | O | 14.7987 |\n", "| 2 | M | 11.6066 |" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genderaverage_amount
0F17.490077
1O14.798731
2M11.606600
\n", "
" ], "text/plain": [ " gender average_amount\n", "0 F 17.490077\n", "1 O 14.798731\n", "2 M 11.606600" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# YOUR CODE BEGINS\n", "query_joined3 = '''\n", "SELECT gender, AVG(amount) AS average_amount\n", "FROM transactions\n", "INNER JOIN profiles\n", "ON transactions.member_id == profiles.member_id\n", "GROUP BY profiles.gender\n", "HAVING gender IS NOT NULL\n", "ORDER BY average_amount DESC;\n", "'''\n", "# YOUR CODE ENDS\n", "\n", "conn = sqlite3.connect('starbucks.db')\n", "df_result = pd.read_sql_query(query_joined3, con=conn)\n", "display(df_result)\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 🧭 Check your work" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "nbgrader": { "grade": true, "grade_id": "exercise-01", "locked": true, "points": "3", "solution": false } }, "outputs": [], "source": [ "# DO NOT CHANGE THE CODE IN THIS CELL\n", "conn = sqlite3.connect('starbucks.db')\n", "decoded_query = base64.b64decode(b'ClNFTEVDVCBnZW5kZXIsIEFWRyhhbW91\\\n", "bnQpIEFTIGF2ZXJhZ2VfYW1vdW50CkZST00gdHJhbnNhY3Rpb25zCklOTkVSIEpPSU4\\\n", "gcHJvZmlsZXMKT04gdHJhbnNhY3Rpb25zLm1lbWJlcl9pZCA9PSBwcm9maWxlcy5tZW\\\n", "1iZXJfaWQKR1JPVVAgQlkgcHJvZmlsZXMuZ2VuZGVyCkhBVklORyBnZW5kZXIgSVMgT\\\n", "k9UIE5VTEwKT1JERVIgQlkgYXZlcmFnZV9hbW91bnQgREVTQzsK').decode()\n", "df_check = pd.read_sql_query(decoded_query, con=conn)\n", "pd.testing.assert_frame_equal(df_result.reset_index(drop=True),\n", " df_check.reset_index(drop=True))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 4 }