{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Answering Business Questions using SQL\n", "\n", "## About the database\n", "We'll be working with a modified version of a database called Chinook. The Chinook database contains information about a fictional digital music shop - kind of like a mini-iTunes store.\n", "\n", "The Chinook database contains information about the artists, songs, and albums from the music shop, as well as information on the shop's employees, customers, and the customers purchases. This information is contained in eleven tables.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use the following code to connect our Jupyter Notebook to our database file:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "%load_ext sql\n", "%sql sqlite:///chinook.db" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To run SQL queries in Jupyter Notebook, we have to add %%sql on its own line to the start of our query.\n", "\n", "Let's start by getting familiar with our data. " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametype
albumtable
artisttable
customertable
employeetable
genretable
invoicetable
invoice_linetable
media_typetable
playlisttable
playlist_tracktable
tracktable
country_or_otherview
" ], "text/plain": [ "[('album', 'table'),\n", " ('artist', 'table'),\n", " ('customer', 'table'),\n", " ('employee', 'table'),\n", " ('genre', 'table'),\n", " ('invoice', 'table'),\n", " ('invoice_line', 'table'),\n", " ('media_type', 'table'),\n", " ('playlist', 'table'),\n", " ('playlist_track', 'table'),\n", " ('track', 'table'),\n", " ('country_or_other', 'view')]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT \n", " name,\n", " type\n", "FROM sqlite_master\n", "WHERE type IN (\"table\", \"view\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting albums to purchase\n", "The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:\n", "\n", " Artist Name : Genre\n", " Regal : Hip-Hop\n", " Red Tone : Punk\n", " Meteor and the Girls : Pop\n", " Slim Jim Bites : Blues\n", "\n", "The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do this, we need track, genre, invoice_line, invoice, and customer tables (refer schema).\n", "We first find out the most popular genre across all countries and then limit the result using country = 'USA'." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genre_nametracks_sold
Rock2635
Metal619
Alternative & Punk492
Latin167
R&B/Soul159
Blues124
Jazz121
Alternative117
Easy Listening74
Pop63
Electronica/Dance55
Classical47
Reggae35
Hip Hop/Rap33
Heavy Metal8
Soundtrack5
TV Shows2
Drama1
" ], "text/plain": [ "[('Rock', 2635),\n", " ('Metal', 619),\n", " ('Alternative & Punk', 492),\n", " ('Latin', 167),\n", " ('R&B/Soul', 159),\n", " ('Blues', 124),\n", " ('Jazz', 121),\n", " ('Alternative', 117),\n", " ('Easy Listening', 74),\n", " ('Pop', 63),\n", " ('Electronica/Dance', 55),\n", " ('Classical', 47),\n", " ('Reggae', 35),\n", " ('Hip Hop/Rap', 33),\n", " ('Heavy Metal', 8),\n", " ('Soundtrack', 5),\n", " ('TV Shows', 2),\n", " ('Drama', 1)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT \n", " g.name genre_name,\n", " SUM(il.quantity) tracks_sold\n", "FROM track t\n", "INNER JOIN invoice_line il ON il.track_id = t.track_id\n", "INNER JOIN genre g ON g.genre_id = t.genre_id\n", "GROUP BY 1\n", "ORDER BY 2 DESC;" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genre_nametracks_sold
Rock561
Alternative & Punk130
Metal124
R&B/Soul53
Blues36
Alternative35
Pop22
Latin22
Hip Hop/Rap20
Jazz14
Easy Listening13
Reggae6
Electronica/Dance5
Classical4
Heavy Metal3
Soundtrack2
TV Shows1
" ], "text/plain": [ "[('Rock', 561),\n", " ('Alternative & Punk', 130),\n", " ('Metal', 124),\n", " ('R&B/Soul', 53),\n", " ('Blues', 36),\n", " ('Alternative', 35),\n", " ('Pop', 22),\n", " ('Latin', 22),\n", " ('Hip Hop/Rap', 20),\n", " ('Jazz', 14),\n", " ('Easy Listening', 13),\n", " ('Reggae', 6),\n", " ('Electronica/Dance', 5),\n", " ('Classical', 4),\n", " ('Heavy Metal', 3),\n", " ('Soundtrack', 2),\n", " ('TV Shows', 1)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT \n", " g.name genre_name,\n", " SUM(il.quantity) tracks_sold\n", "FROM track t\n", "INNER JOIN invoice_line il ON il.track_id = t.track_id\n", "INNER JOIN genre g ON g.genre_id = t.genre_id\n", "INNER JOIN invoice i ON i.invoice_id = il.invoice_id\n", "INNER JOIN customer c ON c.customer_id = i.customer_id\n", "WHERE c.country = \"USA\"\n", "GROUP BY 1\n", "ORDER BY 2 DESC;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Based on these results Red Tone, Slim Jim Bites, Meteor and the Girls should be purchased for the store.\n", "Also note that Rock seems to be the most popular genre, so we should look out for more albums of the same to increase sales." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyzing employee sales performance\n", "Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. You have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do this, we need employee, customer, and invoice tables (refer schema). We would be analyzing performance based on the total dollar amount.\n" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employee_nametotal_dollar_amount
Jane Peacock1731.51
Margaret Park1584.0
Steve Johnson1393.92
" ], "text/plain": [ "[('Jane Peacock', 1731.51),\n", " ('Margaret Park', 1584.0),\n", " ('Steve Johnson', 1393.92)]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT\n", " e.first_name||' '||e.last_name employee_name,\n", " ROUND(SUM(i.total),2) total_dollar_amount\n", "FROM employee e\n", "INNER JOIN customer c ON c.support_rep_id = e.employee_id\n", "INNER JOIN invoice i ON c.customer_id = i.customer_id\n", "WHERE e.title = \"Sales Support Agent\"\n", "GROUP BY 1\n", "ORDER BY 2 DESC;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We check the employee table to see if we can find out any reason for the observed results." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employee_idlast_namefirst_nametitlereports_tobirthdatehire_dateaddresscitystatecountrypostal_codephonefaxemail
3PeacockJaneSales Support Agent21973-08-29 00:00:002017-04-01 00:00:001111 6 Ave SWCalgaryABCanadaT2P 5M5+1 (403) 262-3443+1 (403) 262-6712jane@chinookcorp.com
4ParkMargaretSales Support Agent21947-09-19 00:00:002017-05-03 00:00:00683 10 Street SWCalgaryABCanadaT2P 5G3+1 (403) 263-4423+1 (403) 263-4289margaret@chinookcorp.com
5JohnsonSteveSales Support Agent21965-03-03 00:00:002017-10-17 00:00:007727B 41 AveCalgaryABCanadaT3B 1Y71 (780) 836-99871 (780) 836-9543steve@chinookcorp.com
" ], "text/plain": [ "[(3, 'Peacock', 'Jane', 'Sales Support Agent', 2, '1973-08-29 00:00:00', '2017-04-01 00:00:00', '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com'),\n", " (4, 'Park', 'Margaret', 'Sales Support Agent', 2, '1947-09-19 00:00:00', '2017-05-03 00:00:00', '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com'),\n", " (5, 'Johnson', 'Steve', 'Sales Support Agent', 2, '1965-03-03 00:00:00', '2017-10-17 00:00:00', '7727B 41 Ave', 'Calgary', 'AB', 'Canada', 'T3B 1Y7', '1 (780) 836-9987', '1 (780) 836-9543', 'steve@chinookcorp.com')]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT *\n", "FROM employee\n", "WHERE title LIKE \"%support%\";" ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employee_nametotal_dollar_amounthire_datebirthdate
Jane Peacock1731.512017-04-01 00:00:001973-08-29 00:00:00
Margaret Park1584.02017-05-03 00:00:001947-09-19 00:00:00
Steve Johnson1393.922017-10-17 00:00:001965-03-03 00:00:00
" ], "text/plain": [ "[('Jane Peacock', 1731.51, '2017-04-01 00:00:00', '1973-08-29 00:00:00'),\n", " ('Margaret Park', 1584.0, '2017-05-03 00:00:00', '1947-09-19 00:00:00'),\n", " ('Steve Johnson', 1393.92, '2017-10-17 00:00:00', '1965-03-03 00:00:00')]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "WITH support_rep_sales AS\n", " (\n", " SELECT\n", " e.employee_id,\n", " e.first_name||' '||e.last_name employee_name,\n", " ROUND(SUM(i.total),2) total_dollar_amount\n", " FROM employee e\n", " INNER JOIN customer c ON c.support_rep_id = e.employee_id\n", " INNER JOIN invoice i ON c.customer_id = i.customer_id\n", " WHERE e.title = \"Sales Support Agent\"\n", " GROUP BY 1\n", " )\n", "SELECT \n", " srs.employee_name,\n", " srs.total_dollar_amount,\n", " e.hire_date,\n", " e.birthdate\n", "FROM employee e\n", "INNER JOIN support_rep_sales srs ON e.employee_id = srs.employee_id\n", "ORDER BY 2 DESC;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The store has 3 Sales Support Agents, all of them from Canada. All agents seem to have the same amount of experience with the company as seen in their hiring dates. The difference is sales roughly corresponds with the difference in hiring dates. The agent with the highest sales is the youngest, which could possibly reflect in the ability to connect with a younger customer market." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Analyzing sales by country\n", "Your next task is to analyze the sales data for customers from each different country. You have been given guidance to use the country value from the customers table, and ignore the country from the billing address in the invoice table.\n", "\n", "In particular, you have been directed to calculate data, for each country, on the:\n", "\n", "- total number of customers\n", "- total value of sales\n", "- average value of sales per customer\n", "- average order value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do this, we need customer and invoice tables (refer schema). We will create a temporary table that groups all countries with 1 customer as 'Other'. " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countryTotal_CustomersTotal_Orders
Argentina15
Australia110
Austria19
Belgium17
Brazil561
Canada876
Chile113
Czech Republic230
Denmark110
Finland111
France550
Germany441
Hungary110
India221
Ireland113
Italy19
Netherlands110
Norway19
Poland110
Portugal229
Spain111
Sweden110
USA13131
United Kingdom328
" ], "text/plain": [ "[('Argentina', 1, 5),\n", " ('Australia', 1, 10),\n", " ('Austria', 1, 9),\n", " ('Belgium', 1, 7),\n", " ('Brazil', 5, 61),\n", " ('Canada', 8, 76),\n", " ('Chile', 1, 13),\n", " ('Czech Republic', 2, 30),\n", " ('Denmark', 1, 10),\n", " ('Finland', 1, 11),\n", " ('France', 5, 50),\n", " ('Germany', 4, 41),\n", " ('Hungary', 1, 10),\n", " ('India', 2, 21),\n", " ('Ireland', 1, 13),\n", " ('Italy', 1, 9),\n", " ('Netherlands', 1, 10),\n", " ('Norway', 1, 9),\n", " ('Poland', 1, 10),\n", " ('Portugal', 2, 29),\n", " ('Spain', 1, 11),\n", " ('Sweden', 1, 10),\n", " ('USA', 13, 131),\n", " ('United Kingdom', 3, 28)]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT\n", " c.country,\n", " COUNT(DISTINCT(c.customer_id)) AS Total_Customers,\n", " COUNT(i.invoice_id) AS Total_Orders\n", "FROM invoice i\n", "INNER JOIN customer c ON c.customer_id = i.customer_id\n", "GROUP BY 1;" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "CREATE VIEW country_or_other AS\n", " SELECT \n", " CASE \n", " WHEN (\n", " SELECT COUNT(*)\n", " FROM customer\n", " WHERE country = c.country\n", " ) = 1 THEN \"Other\"\n", " ELSE c.country\n", " END country,\n", " COUNT(DISTINCT(c.customer_id)) total_customers,\n", " COUNT(i.invoice_id) total_orders,\n", " ROUND(SUM(i.total),2) total_sales\n", " FROM invoice i\n", " INNER JOIN customer c ON c.customer_id = i.customer_id\n", " GROUP BY 1;\n", " " ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrytotal_customerstotal_orderstotal_sales
Brazil561427.68
Canada876535.59
Czech Republic230273.24
France550389.07
Germany441334.62
India221183.15
Other151471094.94
Portugal229185.13
USA131311040.49
United Kingdom328245.52
" ], "text/plain": [ "[('Brazil', 5, 61, 427.68),\n", " ('Canada', 8, 76, 535.59),\n", " ('Czech Republic', 2, 30, 273.24),\n", " ('France', 5, 50, 389.07),\n", " ('Germany', 4, 41, 334.62),\n", " ('India', 2, 21, 183.15),\n", " ('Other', 15, 147, 1094.94),\n", " ('Portugal', 2, 29, 185.13),\n", " ('USA', 13, 131, 1040.49),\n", " ('United Kingdom', 3, 28, 245.52)]" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT * FROM country_or_other;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We add another piece of code to ensure that 'Other' is always at the end of the result." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrytotal_customerstotal_orderstotal_salesavg_sales_per_customeravg_order_value
USA131311040.4980.047.94
Canada876535.5966.957.05
Brazil561427.6885.547.01
France550389.0777.817.78
Germany441334.6283.668.16
United Kingdom328245.5281.848.77
Czech Republic230273.24136.629.11
India221183.1591.588.72
Portugal229185.1392.576.38
Other151471094.9473.07.45
" ], "text/plain": [ "[('USA', 13, 131, 1040.49, 80.04, 7.94),\n", " ('Canada', 8, 76, 535.59, 66.95, 7.05),\n", " ('Brazil', 5, 61, 427.68, 85.54, 7.01),\n", " ('France', 5, 50, 389.07, 77.81, 7.78),\n", " ('Germany', 4, 41, 334.62, 83.66, 8.16),\n", " ('United Kingdom', 3, 28, 245.52, 81.84, 8.77),\n", " ('Czech Republic', 2, 30, 273.24, 136.62, 9.11),\n", " ('India', 2, 21, 183.15, 91.58, 8.72),\n", " ('Portugal', 2, 29, 185.13, 92.57, 6.38),\n", " ('Other', 15, 147, 1094.94, 73.0, 7.45)]" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT \n", " country,\n", " total_customers,\n", " total_orders,\n", " total_sales,\n", " avg_sales_per_customer,\n", " avg_order_value\n", "FROM \n", " (\n", " SELECT \n", " co.*,\n", " ROUND(CAST(total_sales AS FLOAT)/total_customers,2) avg_sales_per_customer,\n", " ROUND(CAST(total_sales AS FLOAT)/total_orders,2) avg_order_value,\n", " CASE\n", " WHEN co.country = \"Other\" THEN 1\n", " ELSE 0\n", " END AS sort\n", " FROM country_or_other co\n", " GROUP BY country\n", " ORDER BY sort ASC, total_customers DESC\n", " );" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can observe that, USA is Chinook's largest market, followed by Canada and Brazil. However, we see a clear difference between average sales in all of its countries (around 80 - 90 USD per customer on average) and average sales in Czech Republic of 136 USD per customer. This indicates that in this country average revenue per customer is significantly higher and targeted marketing could potentially be more profitable per customer compared to its other markets. Average order value tends to be approximately 8 USD per order; again, we notice that in the Czech Republic this value is slightly higher than in other countries at 9 USD per order." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Albums vs individual tracks\n", "The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:\n", "\n", "- purchase a whole album\n", "- purchase a collection of one or more individual tracks.\n", "\n", "The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.\n", "\n", "Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.\n", "\n", "We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " In this instance, we have two edge cases to consider:\n", "\n", "- Albums that have only one or two tracks are likely to be purchased by customers as part of a collection of individual tracks.\n", "- Customers may decide to manually select every track from an album, and then add a few individual tracks from other albums to their purchase.\n", "\n", "In the first case, since our analysis is concerned with maximizing revenue we can safely ignore albums consisting of only a few tracks. The company has previously done analysis to confirm that the second case does not happen often, so we can ignore this case also." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To do this, we need the invoice_line, album, and track tables (refer schema).\n", "We are going to check whether each invoice has all the tracks from an album." ] }, { "cell_type": "code", "execution_count": 64, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
invoice_line_idinvoice_idtrack_idunit_pricequantity
1111580.991
2111590.991
3111600.991
4111610.991
5111620.991
6111630.991
7111640.991
8111650.991
9111660.991
10111670.991
" ], "text/plain": [ "[(1, 1, 1158, 0.99, 1),\n", " (2, 1, 1159, 0.99, 1),\n", " (3, 1, 1160, 0.99, 1),\n", " (4, 1, 1161, 0.99, 1),\n", " (5, 1, 1162, 0.99, 1),\n", " (6, 1, 1163, 0.99, 1),\n", " (7, 1, 1164, 0.99, 1),\n", " (8, 1, 1165, 0.99, 1),\n", " (9, 1, 1166, 0.99, 1),\n", " (10, 1, 1167, 0.99, 1)]" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT * FROM invoice_line\n", "LIMIT 10;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One invoice_id is mapped to several track_id.\n", "\n", "We create temporary tables- album_tracks that has album_id and track_id from the tracks included in that album, and invoice_tracks_albums that has invoice_id, track_id and album_id for the tracks that are sold. We then proceed to do something as shown below with these two tables in invoice_full_dictionary. Lastly in invoice_album we have the invoice_id of an order and whether it has a full album or not.\n", "" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_invoicestotal_full_albumperc_full_album
61411418.57
" ], "text/plain": [ "[(614, 114, 18.57)]" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "WITH album_tracks AS\n", " (\n", " SELECT \n", " a.album_id,\n", " t.track_id\n", " FROM album a\n", " INNER JOIN track t ON a.album_id = t.album_id\n", " ),\n", " \n", " invoice_tracks_album AS\n", " (\n", " SELECT \n", " i.invoice_id,\n", " at.album_id,\n", " at.track_id\n", " FROM invoice_line i\n", " INNER JOIN album_tracks at ON at.track_id = i.track_id \n", " ),\n", " \n", " invoice_full_dictionary AS\n", " (\n", " SELECT \n", " ita.invoice_id,\n", " CASE \n", " WHEN\n", " (\n", " SELECT track_id \n", " FROM (\n", " SELECT itain.track_id track_id\n", " FROM invoice_tracks_album itain\n", " WHERE itain.invoice_id = ita.invoice_id\n", " )\n", " EXCEPT\n", " SELECT track_id\n", " FROM (\n", " SELECT at.track_id track_id\n", " FROM album_tracks at\n", " WHERE at.album_id = ita.album_id\n", " )\n", " ) IS NULL\n", " AND\n", " (\n", " SELECT track_id \n", " FROM (\n", " SELECT at.track_id track_id\n", " FROM album_tracks at\n", " WHERE at.album_id = ita.album_id\n", " )\n", " EXCEPT\n", " SELECT track_id \n", " FROM (\n", " SELECT itain.track_id track_id\n", " FROM invoice_tracks_album itain\n", " WHERE itain.invoice_id = ita.invoice_id\n", " )\n", " ) IS NULL\n", " THEN 1\n", " ELSE 0\n", " END full_album\n", " FROM invoice_tracks_album ita\n", " ),\n", " \n", " invoice_album AS\n", " (\n", " SELECT \n", " invoice_id,\n", " MAX(full_album) full_album\n", " FROM invoice_full_dictionary\n", " GROUP BY 1\n", " )\n", " \n", "SELECT\n", " COUNT(invoice_id) total_invoices,\n", " SUM(full_album) total_full_album,\n", " ROUND(CAST(SUM(full_album) AS FLOAT)/COUNT(invoice_id) * 100, 2) AS perc_full_album\n", "FROM invoice_album;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can note that 18.5% of invoices are related to the purchase of full albums. Rest 80 percent purchases are made by customers seeking specific tracks. The company should change their strategy to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Popular artists in playlists vs sales\n", "We try to find out which artists are used in most of the playlists.\n", "To do this, we need playlist_track, track, album, and artist tables (refer schema). " ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
name
Music
Movies
TV Shows
Audiobooks
90’s Music
Music Videos
Brazilian Music
Classical
Classical 101 - Deep Cuts
Classical 101 - Next Steps
Classical 101 - The Basics
Grunge
Heavy Metal Classic
On-The-Go 1
" ], "text/plain": [ "[('Music',),\n", " ('Movies',),\n", " ('TV Shows',),\n", " ('Audiobooks',),\n", " ('90’s Music',),\n", " ('Music Videos',),\n", " ('Brazilian Music',),\n", " ('Classical',),\n", " ('Classical 101 - Deep Cuts',),\n", " ('Classical 101 - Next Steps',),\n", " ('Classical 101 - The Basics',),\n", " ('Grunge',),\n", " ('Heavy Metal Classic',),\n", " ('On-The-Go 1',)]" ] }, "execution_count": 67, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT DISTINCT name\n", "FROM playlist;" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artist_nameno_of_playlists
Eugene Ormandy7
English Concert & Trevor Pinnock6
Academy of St. Martin in the Fields & Sir Neville Marriner6
The King's Singers6
Berliner Philharmoniker & Herbert Von Karajan6
Alberto Turco & Nova Schola Gregoriana5
Richard Marlow & The Choir of Trinity College, Cambridge5
Wilhelm Kempff5
Yo-Yo Ma5
Scholars Baroque Ensemble5
" ], "text/plain": [ "[('Eugene Ormandy', 7),\n", " ('English Concert & Trevor Pinnock', 6),\n", " ('Academy of St. Martin in the Fields & Sir Neville Marriner', 6),\n", " (\"The King's Singers\", 6),\n", " ('Berliner Philharmoniker & Herbert Von Karajan', 6),\n", " ('Alberto Turco & Nova Schola Gregoriana', 5),\n", " ('Richard Marlow & The Choir of Trinity College, Cambridge', 5),\n", " ('Wilhelm Kempff', 5),\n", " ('Yo-Yo Ma', 5),\n", " ('Scholars Baroque Ensemble', 5)]" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "WITH pop_album AS \n", " (\n", " SELECT \n", " al.artist_id artist_id,\n", " COUNT(DISTINCT playlist_id) no_of_playlists\n", " FROM album al \n", " INNER JOIN track t ON al.album_id = t.album_id\n", " INNER JOIN playlist_track pt ON t.track_id = pt.track_id\n", " GROUP BY 1\n", " )\n", "SELECT \n", " ar.name artist_name,\n", " pa.no_of_playlists\n", "FROM artist ar\n", "INNER JOIN pop_album pa ON pa.artist_id = ar.artist_id\n", "ORDER BY 2 DESC\n", "LIMIT 10;\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Eugene Ormandy is the most popular artist in playlists. The different kinds of playlists represent the versatility in the kind of music produced by an artist.\n", "\n", "Let's see if the most popular artist in playlists is the best selling artist for Chinook. \n", "To do this, we would need invoice_line, track, album, and artist tables (refer schema)." ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
artist_nameno_of_tracks_sold
Queen192
Jimi Hendrix187
Nirvana130
Red Hot Chili Peppers130
Pearl Jam129
AC/DC124
Guns N' Roses124
Foo Fighters121
The Rolling Stones117
Metallica106
" ], "text/plain": [ "[('Queen', 192),\n", " ('Jimi Hendrix', 187),\n", " ('Nirvana', 130),\n", " ('Red Hot Chili Peppers', 130),\n", " ('Pearl Jam', 129),\n", " ('AC/DC', 124),\n", " (\"Guns N' Roses\", 124),\n", " ('Foo Fighters', 121),\n", " ('The Rolling Stones', 117),\n", " ('Metallica', 106)]" ] }, "execution_count": 58, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "WITH invoice_info AS\n", " (\n", " SELECT \n", " al.artist_id,\n", " COUNT(il.track_id) no_of_tracks_sold\n", " FROM album al\n", " INNER JOIN track t ON al.album_id = t.album_id\n", " INNER JOIN invoice_line il ON t.track_id = il.track_id\n", " GROUP BY 1\n", " )\n", "SELECT \n", " ar.name artist_name,\n", " i.no_of_tracks_sold\n", "FROM artist ar\n", "INNER JOIN invoice_info i ON ar.artist_id = i.artist_id\n", "ORDER BY 2 DESC\n", "LIMIT 10;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The sales are mostly dominated by rock bands, which is not surprising based on query 1. \n", "Popularity is a metric, and can be interpreted in different ways based on the information need. As far as sales are concerned, genre seems to affect the sales more than versatility/ playlists." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tracks purchased vs not purchased\n", "The management at Chinook wants to understand what share of tracks in the store are bought by customers to be able to manage the inventory better.\n", "\n", "To do this, we need the track and invoice_line tables (refer schema). invoice_line contains the tracks that are sold while track contains all the tracks in the store." ] }, { "cell_type": "code", "execution_count": 81, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
total_trackstracks_purchasedperc_purchasedperc_not_purchased
4757180637.9762.03
" ], "text/plain": [ "[(4757, 1806, 37.97, 62.03)]" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT\n", " COUNT(t.track_id) total_tracks,\n", " COUNT(DISTINCT i.track_id) tracks_purchased,\n", " ROUND(COUNT(DISTINCT i.track_id)/CAST(COUNT(t.track_id) AS FLOAT)*100,2) perc_purchased,\n", " ROUND(100 - COUNT(DISTINCT i.track_id)/CAST(COUNT(t.track_id) AS FLOAT)*100,2) perc_not_purchased\n", "FROM track t\n", "INNER JOIN invoice_line i ON i.track_id = t.track_id;\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "~38% of unique tracks from the inventory are purchased by the customers, while ~62% are not purchased. The analysis here is done in terms of variety of tracks rather than the volume of sale - say a track is purchased 100 times in comparision to another one which is purchased only once, but both of them are purchased by customers.\n", "The remaining 62% tracks can be analyzed further to understand any possible trends, leading to their unpopularity among customers. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Protected vs Non-protected media types\n", "Do protected vs non-protected media types have an effect on popularity?\n", "\n", "To do this, we need media_type, track, and invoice_line tables (refer schema). From the previous query we know that there are 1806 distinct tracks sold by the store." ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
media_type_idname
1MPEG audio file
2Protected AAC audio file
3Protected MPEG-4 video file
4Purchased AAC audio file
5AAC audio file
" ], "text/plain": [ "[(1, 'MPEG audio file'),\n", " (2, 'Protected AAC audio file'),\n", " (3, 'Protected MPEG-4 video file'),\n", " (4, 'Purchased AAC audio file'),\n", " (5, 'AAC audio file')]" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT * FROM media_type;" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
protectedtracks_soldperc_sold
No165291.47
Yes1548.53
" ], "text/plain": [ "[('No', 1652, 91.47), ('Yes', 154, 8.53)]" ] }, "execution_count": 83, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "WITH track_media AS \n", " (\n", " SELECT \n", " CASE\n", " WHEN m.name LIKE \"%protected%\" \n", " THEN \"Yes\"\n", " ELSE \"No\"\n", " END protected,\n", " t.track_id\n", " FROM media_type m\n", " INNER JOIN track t ON t.media_type_id = m.media_type_id\n", " )\n", "SELECT \n", " tm.protected,\n", " COUNT(DISTINCT i.track_id) tracks_sold,\n", " ROUND(CAST(COUNT(DISTINCT i.track_id) AS FLOAT)/1806*100,2) perc_sold\n", "FROM invoice_line i\n", "LEFT JOIN track_media tm ON i.track_id = tm.track_id\n", "GROUP BY 1;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Out of all the unique tracks sold (1806 tracks) by the store, ~91% comprises of unprotected media." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }