{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Answering Business Questions using SQL\n",
"\n",
"## About the database\n",
"We'll be working with a modified version of a database called Chinook. The Chinook database contains information about a fictional digital music shop - kind of like a mini-iTunes store.\n",
"\n",
"The Chinook database contains information about the artists, songs, and albums from the music shop, as well as information on the shop's employees, customers, and the customers purchases. This information is contained in eleven tables.\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We'll use the following code to connect our Jupyter Notebook to our database file:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"%load_ext sql\n",
"%sql sqlite:///chinook.db"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run SQL queries in Jupyter Notebook, we have to add %%sql on its own line to the start of our query.\n",
"\n",
"Let's start by getting familiar with our data. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"
\n",
" \n",
" name | \n",
" type | \n",
"
\n",
" \n",
" album | \n",
" table | \n",
"
\n",
" \n",
" artist | \n",
" table | \n",
"
\n",
" \n",
" customer | \n",
" table | \n",
"
\n",
" \n",
" employee | \n",
" table | \n",
"
\n",
" \n",
" genre | \n",
" table | \n",
"
\n",
" \n",
" invoice | \n",
" table | \n",
"
\n",
" \n",
" invoice_line | \n",
" table | \n",
"
\n",
" \n",
" media_type | \n",
" table | \n",
"
\n",
" \n",
" playlist | \n",
" table | \n",
"
\n",
" \n",
" playlist_track | \n",
" table | \n",
"
\n",
" \n",
" track | \n",
" table | \n",
"
\n",
" \n",
" country_or_other | \n",
" view | \n",
"
\n",
"
"
],
"text/plain": [
"[('album', 'table'),\n",
" ('artist', 'table'),\n",
" ('customer', 'table'),\n",
" ('employee', 'table'),\n",
" ('genre', 'table'),\n",
" ('invoice', 'table'),\n",
" ('invoice_line', 'table'),\n",
" ('media_type', 'table'),\n",
" ('playlist', 'table'),\n",
" ('playlist_track', 'table'),\n",
" ('track', 'table'),\n",
" ('country_or_other', 'view')]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" name,\n",
" type\n",
"FROM sqlite_master\n",
"WHERE type IN (\"table\", \"view\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Selecting albums to purchase\n",
"The Chinook record store has just signed a deal with a new record label, and you've been tasked with selecting the first three albums that will be added to the store, from a list of four. All four albums are by artists that don't have any tracks in the store right now - we have the artist names, and the genre of music they produce:\n",
"\n",
" Artist Name : Genre\n",
" Regal : Hip-Hop\n",
" Red Tone : Punk\n",
" Meteor and the Girls : Pop\n",
" Slim Jim Bites : Blues\n",
"\n",
"The record label specializes in artists from the USA, and they have given Chinook some money to advertise the new albums in the USA, so we're interested in finding out which genres sell the best in the USA."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To do this, we need track, genre, invoice_line, invoice, and customer tables (refer schema).\n",
"We first find out the most popular genre across all countries and then limit the result using country = 'USA'."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" genre_name | \n",
" tracks_sold | \n",
"
\n",
" \n",
" Rock | \n",
" 2635 | \n",
"
\n",
" \n",
" Metal | \n",
" 619 | \n",
"
\n",
" \n",
" Alternative & Punk | \n",
" 492 | \n",
"
\n",
" \n",
" Latin | \n",
" 167 | \n",
"
\n",
" \n",
" R&B/Soul | \n",
" 159 | \n",
"
\n",
" \n",
" Blues | \n",
" 124 | \n",
"
\n",
" \n",
" Jazz | \n",
" 121 | \n",
"
\n",
" \n",
" Alternative | \n",
" 117 | \n",
"
\n",
" \n",
" Easy Listening | \n",
" 74 | \n",
"
\n",
" \n",
" Pop | \n",
" 63 | \n",
"
\n",
" \n",
" Electronica/Dance | \n",
" 55 | \n",
"
\n",
" \n",
" Classical | \n",
" 47 | \n",
"
\n",
" \n",
" Reggae | \n",
" 35 | \n",
"
\n",
" \n",
" Hip Hop/Rap | \n",
" 33 | \n",
"
\n",
" \n",
" Heavy Metal | \n",
" 8 | \n",
"
\n",
" \n",
" Soundtrack | \n",
" 5 | \n",
"
\n",
" \n",
" TV Shows | \n",
" 2 | \n",
"
\n",
" \n",
" Drama | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Rock', 2635),\n",
" ('Metal', 619),\n",
" ('Alternative & Punk', 492),\n",
" ('Latin', 167),\n",
" ('R&B/Soul', 159),\n",
" ('Blues', 124),\n",
" ('Jazz', 121),\n",
" ('Alternative', 117),\n",
" ('Easy Listening', 74),\n",
" ('Pop', 63),\n",
" ('Electronica/Dance', 55),\n",
" ('Classical', 47),\n",
" ('Reggae', 35),\n",
" ('Hip Hop/Rap', 33),\n",
" ('Heavy Metal', 8),\n",
" ('Soundtrack', 5),\n",
" ('TV Shows', 2),\n",
" ('Drama', 1)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" g.name genre_name,\n",
" SUM(il.quantity) tracks_sold\n",
"FROM track t\n",
"INNER JOIN invoice_line il ON il.track_id = t.track_id\n",
"INNER JOIN genre g ON g.genre_id = t.genre_id\n",
"GROUP BY 1\n",
"ORDER BY 2 DESC;"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" genre_name | \n",
" tracks_sold | \n",
"
\n",
" \n",
" Rock | \n",
" 561 | \n",
"
\n",
" \n",
" Alternative & Punk | \n",
" 130 | \n",
"
\n",
" \n",
" Metal | \n",
" 124 | \n",
"
\n",
" \n",
" R&B/Soul | \n",
" 53 | \n",
"
\n",
" \n",
" Blues | \n",
" 36 | \n",
"
\n",
" \n",
" Alternative | \n",
" 35 | \n",
"
\n",
" \n",
" Pop | \n",
" 22 | \n",
"
\n",
" \n",
" Latin | \n",
" 22 | \n",
"
\n",
" \n",
" Hip Hop/Rap | \n",
" 20 | \n",
"
\n",
" \n",
" Jazz | \n",
" 14 | \n",
"
\n",
" \n",
" Easy Listening | \n",
" 13 | \n",
"
\n",
" \n",
" Reggae | \n",
" 6 | \n",
"
\n",
" \n",
" Electronica/Dance | \n",
" 5 | \n",
"
\n",
" \n",
" Classical | \n",
" 4 | \n",
"
\n",
" \n",
" Heavy Metal | \n",
" 3 | \n",
"
\n",
" \n",
" Soundtrack | \n",
" 2 | \n",
"
\n",
" \n",
" TV Shows | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Rock', 561),\n",
" ('Alternative & Punk', 130),\n",
" ('Metal', 124),\n",
" ('R&B/Soul', 53),\n",
" ('Blues', 36),\n",
" ('Alternative', 35),\n",
" ('Pop', 22),\n",
" ('Latin', 22),\n",
" ('Hip Hop/Rap', 20),\n",
" ('Jazz', 14),\n",
" ('Easy Listening', 13),\n",
" ('Reggae', 6),\n",
" ('Electronica/Dance', 5),\n",
" ('Classical', 4),\n",
" ('Heavy Metal', 3),\n",
" ('Soundtrack', 2),\n",
" ('TV Shows', 1)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" g.name genre_name,\n",
" SUM(il.quantity) tracks_sold\n",
"FROM track t\n",
"INNER JOIN invoice_line il ON il.track_id = t.track_id\n",
"INNER JOIN genre g ON g.genre_id = t.genre_id\n",
"INNER JOIN invoice i ON i.invoice_id = il.invoice_id\n",
"INNER JOIN customer c ON c.customer_id = i.customer_id\n",
"WHERE c.country = \"USA\"\n",
"GROUP BY 1\n",
"ORDER BY 2 DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Based on these results Red Tone, Slim Jim Bites, Meteor and the Girls should be purchased for the store.\n",
"Also note that Rock seems to be the most popular genre, so we should look out for more albums of the same to increase sales."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyzing employee sales performance\n",
"Each customer for the Chinook store gets assigned to a sales support agent within the company when they first make a purchase. You have been asked to analyze the purchases of customers belonging to each employee to see if any sales support agent is performing either better or worse than the others."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To do this, we need employee, customer, and invoice tables (refer schema). We would be analyzing performance based on the total dollar amount.\n"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" employee_name | \n",
" total_dollar_amount | \n",
"
\n",
" \n",
" Jane Peacock | \n",
" 1731.51 | \n",
"
\n",
" \n",
" Margaret Park | \n",
" 1584.0 | \n",
"
\n",
" \n",
" Steve Johnson | \n",
" 1393.92 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Jane Peacock', 1731.51),\n",
" ('Margaret Park', 1584.0),\n",
" ('Steve Johnson', 1393.92)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT\n",
" e.first_name||' '||e.last_name employee_name,\n",
" ROUND(SUM(i.total),2) total_dollar_amount\n",
"FROM employee e\n",
"INNER JOIN customer c ON c.support_rep_id = e.employee_id\n",
"INNER JOIN invoice i ON c.customer_id = i.customer_id\n",
"WHERE e.title = \"Sales Support Agent\"\n",
"GROUP BY 1\n",
"ORDER BY 2 DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We check the employee table to see if we can find out any reason for the observed results."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" employee_id | \n",
" last_name | \n",
" first_name | \n",
" title | \n",
" reports_to | \n",
" birthdate | \n",
" hire_date | \n",
" address | \n",
" city | \n",
" state | \n",
" country | \n",
" postal_code | \n",
" phone | \n",
" fax | \n",
" email | \n",
"
\n",
" \n",
" 3 | \n",
" Peacock | \n",
" Jane | \n",
" Sales Support Agent | \n",
" 2 | \n",
" 1973-08-29 00:00:00 | \n",
" 2017-04-01 00:00:00 | \n",
" 1111 6 Ave SW | \n",
" Calgary | \n",
" AB | \n",
" Canada | \n",
" T2P 5M5 | \n",
" +1 (403) 262-3443 | \n",
" +1 (403) 262-6712 | \n",
" jane@chinookcorp.com | \n",
"
\n",
" \n",
" 4 | \n",
" Park | \n",
" Margaret | \n",
" Sales Support Agent | \n",
" 2 | \n",
" 1947-09-19 00:00:00 | \n",
" 2017-05-03 00:00:00 | \n",
" 683 10 Street SW | \n",
" Calgary | \n",
" AB | \n",
" Canada | \n",
" T2P 5G3 | \n",
" +1 (403) 263-4423 | \n",
" +1 (403) 263-4289 | \n",
" margaret@chinookcorp.com | \n",
"
\n",
" \n",
" 5 | \n",
" Johnson | \n",
" Steve | \n",
" Sales Support Agent | \n",
" 2 | \n",
" 1965-03-03 00:00:00 | \n",
" 2017-10-17 00:00:00 | \n",
" 7727B 41 Ave | \n",
" Calgary | \n",
" AB | \n",
" Canada | \n",
" T3B 1Y7 | \n",
" 1 (780) 836-9987 | \n",
" 1 (780) 836-9543 | \n",
" steve@chinookcorp.com | \n",
"
\n",
"
"
],
"text/plain": [
"[(3, 'Peacock', 'Jane', 'Sales Support Agent', 2, '1973-08-29 00:00:00', '2017-04-01 00:00:00', '1111 6 Ave SW', 'Calgary', 'AB', 'Canada', 'T2P 5M5', '+1 (403) 262-3443', '+1 (403) 262-6712', 'jane@chinookcorp.com'),\n",
" (4, 'Park', 'Margaret', 'Sales Support Agent', 2, '1947-09-19 00:00:00', '2017-05-03 00:00:00', '683 10 Street SW', 'Calgary', 'AB', 'Canada', 'T2P 5G3', '+1 (403) 263-4423', '+1 (403) 263-4289', 'margaret@chinookcorp.com'),\n",
" (5, 'Johnson', 'Steve', 'Sales Support Agent', 2, '1965-03-03 00:00:00', '2017-10-17 00:00:00', '7727B 41 Ave', 'Calgary', 'AB', 'Canada', 'T3B 1Y7', '1 (780) 836-9987', '1 (780) 836-9543', 'steve@chinookcorp.com')]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT *\n",
"FROM employee\n",
"WHERE title LIKE \"%support%\";"
]
},
{
"cell_type": "code",
"execution_count": 37,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" employee_name | \n",
" total_dollar_amount | \n",
" hire_date | \n",
" birthdate | \n",
"
\n",
" \n",
" Jane Peacock | \n",
" 1731.51 | \n",
" 2017-04-01 00:00:00 | \n",
" 1973-08-29 00:00:00 | \n",
"
\n",
" \n",
" Margaret Park | \n",
" 1584.0 | \n",
" 2017-05-03 00:00:00 | \n",
" 1947-09-19 00:00:00 | \n",
"
\n",
" \n",
" Steve Johnson | \n",
" 1393.92 | \n",
" 2017-10-17 00:00:00 | \n",
" 1965-03-03 00:00:00 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Jane Peacock', 1731.51, '2017-04-01 00:00:00', '1973-08-29 00:00:00'),\n",
" ('Margaret Park', 1584.0, '2017-05-03 00:00:00', '1947-09-19 00:00:00'),\n",
" ('Steve Johnson', 1393.92, '2017-10-17 00:00:00', '1965-03-03 00:00:00')]"
]
},
"execution_count": 37,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"WITH support_rep_sales AS\n",
" (\n",
" SELECT\n",
" e.employee_id,\n",
" e.first_name||' '||e.last_name employee_name,\n",
" ROUND(SUM(i.total),2) total_dollar_amount\n",
" FROM employee e\n",
" INNER JOIN customer c ON c.support_rep_id = e.employee_id\n",
" INNER JOIN invoice i ON c.customer_id = i.customer_id\n",
" WHERE e.title = \"Sales Support Agent\"\n",
" GROUP BY 1\n",
" )\n",
"SELECT \n",
" srs.employee_name,\n",
" srs.total_dollar_amount,\n",
" e.hire_date,\n",
" e.birthdate\n",
"FROM employee e\n",
"INNER JOIN support_rep_sales srs ON e.employee_id = srs.employee_id\n",
"ORDER BY 2 DESC;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The store has 3 Sales Support Agents, all of them from Canada. All agents seem to have the same amount of experience with the company as seen in their hiring dates. The difference is sales roughly corresponds with the difference in hiring dates. The agent with the highest sales is the youngest, which could possibly reflect in the ability to connect with a younger customer market."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Analyzing sales by country\n",
"Your next task is to analyze the sales data for customers from each different country. You have been given guidance to use the country value from the customers table, and ignore the country from the billing address in the invoice table.\n",
"\n",
"In particular, you have been directed to calculate data, for each country, on the:\n",
"\n",
"- total number of customers\n",
"- total value of sales\n",
"- average value of sales per customer\n",
"- average order value"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To do this, we need customer and invoice tables (refer schema). We will create a temporary table that groups all countries with 1 customer as 'Other'. "
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" country | \n",
" Total_Customers | \n",
" Total_Orders | \n",
"
\n",
" \n",
" Argentina | \n",
" 1 | \n",
" 5 | \n",
"
\n",
" \n",
" Australia | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" Austria | \n",
" 1 | \n",
" 9 | \n",
"
\n",
" \n",
" Belgium | \n",
" 1 | \n",
" 7 | \n",
"
\n",
" \n",
" Brazil | \n",
" 5 | \n",
" 61 | \n",
"
\n",
" \n",
" Canada | \n",
" 8 | \n",
" 76 | \n",
"
\n",
" \n",
" Chile | \n",
" 1 | \n",
" 13 | \n",
"
\n",
" \n",
" Czech Republic | \n",
" 2 | \n",
" 30 | \n",
"
\n",
" \n",
" Denmark | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" Finland | \n",
" 1 | \n",
" 11 | \n",
"
\n",
" \n",
" France | \n",
" 5 | \n",
" 50 | \n",
"
\n",
" \n",
" Germany | \n",
" 4 | \n",
" 41 | \n",
"
\n",
" \n",
" Hungary | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" India | \n",
" 2 | \n",
" 21 | \n",
"
\n",
" \n",
" Ireland | \n",
" 1 | \n",
" 13 | \n",
"
\n",
" \n",
" Italy | \n",
" 1 | \n",
" 9 | \n",
"
\n",
" \n",
" Netherlands | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" Norway | \n",
" 1 | \n",
" 9 | \n",
"
\n",
" \n",
" Poland | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" Portugal | \n",
" 2 | \n",
" 29 | \n",
"
\n",
" \n",
" Spain | \n",
" 1 | \n",
" 11 | \n",
"
\n",
" \n",
" Sweden | \n",
" 1 | \n",
" 10 | \n",
"
\n",
" \n",
" USA | \n",
" 13 | \n",
" 131 | \n",
"
\n",
" \n",
" United Kingdom | \n",
" 3 | \n",
" 28 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Argentina', 1, 5),\n",
" ('Australia', 1, 10),\n",
" ('Austria', 1, 9),\n",
" ('Belgium', 1, 7),\n",
" ('Brazil', 5, 61),\n",
" ('Canada', 8, 76),\n",
" ('Chile', 1, 13),\n",
" ('Czech Republic', 2, 30),\n",
" ('Denmark', 1, 10),\n",
" ('Finland', 1, 11),\n",
" ('France', 5, 50),\n",
" ('Germany', 4, 41),\n",
" ('Hungary', 1, 10),\n",
" ('India', 2, 21),\n",
" ('Ireland', 1, 13),\n",
" ('Italy', 1, 9),\n",
" ('Netherlands', 1, 10),\n",
" ('Norway', 1, 9),\n",
" ('Poland', 1, 10),\n",
" ('Portugal', 2, 29),\n",
" ('Spain', 1, 11),\n",
" ('Sweden', 1, 10),\n",
" ('USA', 13, 131),\n",
" ('United Kingdom', 3, 28)]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT\n",
" c.country,\n",
" COUNT(DISTINCT(c.customer_id)) AS Total_Customers,\n",
" COUNT(i.invoice_id) AS Total_Orders\n",
"FROM invoice i\n",
"INNER JOIN customer c ON c.customer_id = i.customer_id\n",
"GROUP BY 1;"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"CREATE VIEW country_or_other AS\n",
" SELECT \n",
" CASE \n",
" WHEN (\n",
" SELECT COUNT(*)\n",
" FROM customer\n",
" WHERE country = c.country\n",
" ) = 1 THEN \"Other\"\n",
" ELSE c.country\n",
" END country,\n",
" COUNT(DISTINCT(c.customer_id)) total_customers,\n",
" COUNT(i.invoice_id) total_orders,\n",
" ROUND(SUM(i.total),2) total_sales\n",
" FROM invoice i\n",
" INNER JOIN customer c ON c.customer_id = i.customer_id\n",
" GROUP BY 1;\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" country | \n",
" total_customers | \n",
" total_orders | \n",
" total_sales | \n",
"
\n",
" \n",
" Brazil | \n",
" 5 | \n",
" 61 | \n",
" 427.68 | \n",
"
\n",
" \n",
" Canada | \n",
" 8 | \n",
" 76 | \n",
" 535.59 | \n",
"
\n",
" \n",
" Czech Republic | \n",
" 2 | \n",
" 30 | \n",
" 273.24 | \n",
"
\n",
" \n",
" France | \n",
" 5 | \n",
" 50 | \n",
" 389.07 | \n",
"
\n",
" \n",
" Germany | \n",
" 4 | \n",
" 41 | \n",
" 334.62 | \n",
"
\n",
" \n",
" India | \n",
" 2 | \n",
" 21 | \n",
" 183.15 | \n",
"
\n",
" \n",
" Other | \n",
" 15 | \n",
" 147 | \n",
" 1094.94 | \n",
"
\n",
" \n",
" Portugal | \n",
" 2 | \n",
" 29 | \n",
" 185.13 | \n",
"
\n",
" \n",
" USA | \n",
" 13 | \n",
" 131 | \n",
" 1040.49 | \n",
"
\n",
" \n",
" United Kingdom | \n",
" 3 | \n",
" 28 | \n",
" 245.52 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Brazil', 5, 61, 427.68),\n",
" ('Canada', 8, 76, 535.59),\n",
" ('Czech Republic', 2, 30, 273.24),\n",
" ('France', 5, 50, 389.07),\n",
" ('Germany', 4, 41, 334.62),\n",
" ('India', 2, 21, 183.15),\n",
" ('Other', 15, 147, 1094.94),\n",
" ('Portugal', 2, 29, 185.13),\n",
" ('USA', 13, 131, 1040.49),\n",
" ('United Kingdom', 3, 28, 245.52)]"
]
},
"execution_count": 33,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT * FROM country_or_other;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We add another piece of code to ensure that 'Other' is always at the end of the result."
]
},
{
"cell_type": "code",
"execution_count": 35,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" country | \n",
" total_customers | \n",
" total_orders | \n",
" total_sales | \n",
" avg_sales_per_customer | \n",
" avg_order_value | \n",
"
\n",
" \n",
" USA | \n",
" 13 | \n",
" 131 | \n",
" 1040.49 | \n",
" 80.04 | \n",
" 7.94 | \n",
"
\n",
" \n",
" Canada | \n",
" 8 | \n",
" 76 | \n",
" 535.59 | \n",
" 66.95 | \n",
" 7.05 | \n",
"
\n",
" \n",
" Brazil | \n",
" 5 | \n",
" 61 | \n",
" 427.68 | \n",
" 85.54 | \n",
" 7.01 | \n",
"
\n",
" \n",
" France | \n",
" 5 | \n",
" 50 | \n",
" 389.07 | \n",
" 77.81 | \n",
" 7.78 | \n",
"
\n",
" \n",
" Germany | \n",
" 4 | \n",
" 41 | \n",
" 334.62 | \n",
" 83.66 | \n",
" 8.16 | \n",
"
\n",
" \n",
" United Kingdom | \n",
" 3 | \n",
" 28 | \n",
" 245.52 | \n",
" 81.84 | \n",
" 8.77 | \n",
"
\n",
" \n",
" Czech Republic | \n",
" 2 | \n",
" 30 | \n",
" 273.24 | \n",
" 136.62 | \n",
" 9.11 | \n",
"
\n",
" \n",
" India | \n",
" 2 | \n",
" 21 | \n",
" 183.15 | \n",
" 91.58 | \n",
" 8.72 | \n",
"
\n",
" \n",
" Portugal | \n",
" 2 | \n",
" 29 | \n",
" 185.13 | \n",
" 92.57 | \n",
" 6.38 | \n",
"
\n",
" \n",
" Other | \n",
" 15 | \n",
" 147 | \n",
" 1094.94 | \n",
" 73.0 | \n",
" 7.45 | \n",
"
\n",
"
"
],
"text/plain": [
"[('USA', 13, 131, 1040.49, 80.04, 7.94),\n",
" ('Canada', 8, 76, 535.59, 66.95, 7.05),\n",
" ('Brazil', 5, 61, 427.68, 85.54, 7.01),\n",
" ('France', 5, 50, 389.07, 77.81, 7.78),\n",
" ('Germany', 4, 41, 334.62, 83.66, 8.16),\n",
" ('United Kingdom', 3, 28, 245.52, 81.84, 8.77),\n",
" ('Czech Republic', 2, 30, 273.24, 136.62, 9.11),\n",
" ('India', 2, 21, 183.15, 91.58, 8.72),\n",
" ('Portugal', 2, 29, 185.13, 92.57, 6.38),\n",
" ('Other', 15, 147, 1094.94, 73.0, 7.45)]"
]
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT \n",
" country,\n",
" total_customers,\n",
" total_orders,\n",
" total_sales,\n",
" avg_sales_per_customer,\n",
" avg_order_value\n",
"FROM \n",
" (\n",
" SELECT \n",
" co.*,\n",
" ROUND(CAST(total_sales AS FLOAT)/total_customers,2) avg_sales_per_customer,\n",
" ROUND(CAST(total_sales AS FLOAT)/total_orders,2) avg_order_value,\n",
" CASE\n",
" WHEN co.country = \"Other\" THEN 1\n",
" ELSE 0\n",
" END AS sort\n",
" FROM country_or_other co\n",
" GROUP BY country\n",
" ORDER BY sort ASC, total_customers DESC\n",
" );"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can observe that, USA is Chinook's largest market, followed by Canada and Brazil. However, we see a clear difference between average sales in all of its countries (around 80 - 90 USD per customer on average) and average sales in Czech Republic of 136 USD per customer. This indicates that in this country average revenue per customer is significantly higher and targeted marketing could potentially be more profitable per customer compared to its other markets. Average order value tends to be approximately 8 USD per order; again, we notice that in the Czech Republic this value is slightly higher than in other countries at 9 USD per order."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Albums vs individual tracks\n",
"The Chinook store is setup in a way that allows customer to make purchases in one of the two ways:\n",
"\n",
"- purchase a whole album\n",
"- purchase a collection of one or more individual tracks.\n",
"\n",
"The store does not let customers purchase a whole album, and then add individual tracks to that same purchase (unless they do that by choosing each track manually). When customers purchase albums they are charged the same price as if they had purchased each of those tracks separately.\n",
"\n",
"Management are currently considering changing their purchasing strategy to save money. The strategy they are considering is to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album.\n",
"\n",
"We have been asked to find out what percentage of purchases are individual tracks vs whole albums, so that management can use this data to understand the effect this decision might have on overall revenue.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" In this instance, we have two edge cases to consider:\n",
"\n",
"- Albums that have only one or two tracks are likely to be purchased by customers as part of a collection of individual tracks.\n",
"- Customers may decide to manually select every track from an album, and then add a few individual tracks from other albums to their purchase.\n",
"\n",
"In the first case, since our analysis is concerned with maximizing revenue we can safely ignore albums consisting of only a few tracks. The company has previously done analysis to confirm that the second case does not happen often, so we can ignore this case also."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To do this, we need the invoice_line, album, and track tables (refer schema).\n",
"We are going to check whether each invoice has all the tracks from an album."
]
},
{
"cell_type": "code",
"execution_count": 64,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" invoice_line_id | \n",
" invoice_id | \n",
" track_id | \n",
" unit_price | \n",
" quantity | \n",
"
\n",
" \n",
" 1 | \n",
" 1 | \n",
" 1158 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 1 | \n",
" 1159 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 1 | \n",
" 1160 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 1 | \n",
" 1161 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 5 | \n",
" 1 | \n",
" 1162 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" 1 | \n",
" 1163 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 7 | \n",
" 1 | \n",
" 1164 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 8 | \n",
" 1 | \n",
" 1165 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 9 | \n",
" 1 | \n",
" 1166 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
" \n",
" 10 | \n",
" 1 | \n",
" 1167 | \n",
" 0.99 | \n",
" 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 1, 1158, 0.99, 1),\n",
" (2, 1, 1159, 0.99, 1),\n",
" (3, 1, 1160, 0.99, 1),\n",
" (4, 1, 1161, 0.99, 1),\n",
" (5, 1, 1162, 0.99, 1),\n",
" (6, 1, 1163, 0.99, 1),\n",
" (7, 1, 1164, 0.99, 1),\n",
" (8, 1, 1165, 0.99, 1),\n",
" (9, 1, 1166, 0.99, 1),\n",
" (10, 1, 1167, 0.99, 1)]"
]
},
"execution_count": 64,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT * FROM invoice_line\n",
"LIMIT 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One invoice_id is mapped to several track_id.\n",
"\n",
"We create temporary tables- album_tracks that has album_id and track_id from the tracks included in that album, and invoice_tracks_albums that has invoice_id, track_id and album_id for the tracks that are sold. We then proceed to do something as shown below with these two tables in invoice_full_dictionary. Lastly in invoice_album we have the invoice_id of an order and whether it has a full album or not.\n",
""
]
},
{
"cell_type": "code",
"execution_count": 50,
"metadata": {
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" total_invoices | \n",
" total_full_album | \n",
" perc_full_album | \n",
"
\n",
" \n",
" 614 | \n",
" 114 | \n",
" 18.57 | \n",
"
\n",
"
"
],
"text/plain": [
"[(614, 114, 18.57)]"
]
},
"execution_count": 50,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"WITH album_tracks AS\n",
" (\n",
" SELECT \n",
" a.album_id,\n",
" t.track_id\n",
" FROM album a\n",
" INNER JOIN track t ON a.album_id = t.album_id\n",
" ),\n",
" \n",
" invoice_tracks_album AS\n",
" (\n",
" SELECT \n",
" i.invoice_id,\n",
" at.album_id,\n",
" at.track_id\n",
" FROM invoice_line i\n",
" INNER JOIN album_tracks at ON at.track_id = i.track_id \n",
" ),\n",
" \n",
" invoice_full_dictionary AS\n",
" (\n",
" SELECT \n",
" ita.invoice_id,\n",
" CASE \n",
" WHEN\n",
" (\n",
" SELECT track_id \n",
" FROM (\n",
" SELECT itain.track_id track_id\n",
" FROM invoice_tracks_album itain\n",
" WHERE itain.invoice_id = ita.invoice_id\n",
" )\n",
" EXCEPT\n",
" SELECT track_id\n",
" FROM (\n",
" SELECT at.track_id track_id\n",
" FROM album_tracks at\n",
" WHERE at.album_id = ita.album_id\n",
" )\n",
" ) IS NULL\n",
" AND\n",
" (\n",
" SELECT track_id \n",
" FROM (\n",
" SELECT at.track_id track_id\n",
" FROM album_tracks at\n",
" WHERE at.album_id = ita.album_id\n",
" )\n",
" EXCEPT\n",
" SELECT track_id \n",
" FROM (\n",
" SELECT itain.track_id track_id\n",
" FROM invoice_tracks_album itain\n",
" WHERE itain.invoice_id = ita.invoice_id\n",
" )\n",
" ) IS NULL\n",
" THEN 1\n",
" ELSE 0\n",
" END full_album\n",
" FROM invoice_tracks_album ita\n",
" ),\n",
" \n",
" invoice_album AS\n",
" (\n",
" SELECT \n",
" invoice_id,\n",
" MAX(full_album) full_album\n",
" FROM invoice_full_dictionary\n",
" GROUP BY 1\n",
" )\n",
" \n",
"SELECT\n",
" COUNT(invoice_id) total_invoices,\n",
" SUM(full_album) total_full_album,\n",
" ROUND(CAST(SUM(full_album) AS FLOAT)/COUNT(invoice_id) * 100, 2) AS perc_full_album\n",
"FROM invoice_album;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can note that 18.5% of invoices are related to the purchase of full albums. Rest 80 percent purchases are made by customers seeking specific tracks. The company should change their strategy to purchase only the most popular tracks from each album from record companies, instead of purchasing every track from an album."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Popular artists in playlists vs sales\n",
"We try to find out which artists are used in most of the playlists.\n",
"To do this, we need playlist_track, track, album, and artist tables (refer schema). "
]
},
{
"cell_type": "code",
"execution_count": 67,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" name | \n",
"
\n",
" \n",
" Music | \n",
"
\n",
" \n",
" Movies | \n",
"
\n",
" \n",
" TV Shows | \n",
"
\n",
" \n",
" Audiobooks | \n",
"
\n",
" \n",
" 90’s Music | \n",
"
\n",
" \n",
" Music Videos | \n",
"
\n",
" \n",
" Brazilian Music | \n",
"
\n",
" \n",
" Classical | \n",
"
\n",
" \n",
" Classical 101 - Deep Cuts | \n",
"
\n",
" \n",
" Classical 101 - Next Steps | \n",
"
\n",
" \n",
" Classical 101 - The Basics | \n",
"
\n",
" \n",
" Grunge | \n",
"
\n",
" \n",
" Heavy Metal Classic | \n",
"
\n",
" \n",
" On-The-Go 1 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Music',),\n",
" ('Movies',),\n",
" ('TV Shows',),\n",
" ('Audiobooks',),\n",
" ('90’s Music',),\n",
" ('Music Videos',),\n",
" ('Brazilian Music',),\n",
" ('Classical',),\n",
" ('Classical 101 - Deep Cuts',),\n",
" ('Classical 101 - Next Steps',),\n",
" ('Classical 101 - The Basics',),\n",
" ('Grunge',),\n",
" ('Heavy Metal Classic',),\n",
" ('On-The-Go 1',)]"
]
},
"execution_count": 67,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT DISTINCT name\n",
"FROM playlist;"
]
},
{
"cell_type": "code",
"execution_count": 57,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" artist_name | \n",
" no_of_playlists | \n",
"
\n",
" \n",
" Eugene Ormandy | \n",
" 7 | \n",
"
\n",
" \n",
" English Concert & Trevor Pinnock | \n",
" 6 | \n",
"
\n",
" \n",
" Academy of St. Martin in the Fields & Sir Neville Marriner | \n",
" 6 | \n",
"
\n",
" \n",
" The King's Singers | \n",
" 6 | \n",
"
\n",
" \n",
" Berliner Philharmoniker & Herbert Von Karajan | \n",
" 6 | \n",
"
\n",
" \n",
" Alberto Turco & Nova Schola Gregoriana | \n",
" 5 | \n",
"
\n",
" \n",
" Richard Marlow & The Choir of Trinity College, Cambridge | \n",
" 5 | \n",
"
\n",
" \n",
" Wilhelm Kempff | \n",
" 5 | \n",
"
\n",
" \n",
" Yo-Yo Ma | \n",
" 5 | \n",
"
\n",
" \n",
" Scholars Baroque Ensemble | \n",
" 5 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Eugene Ormandy', 7),\n",
" ('English Concert & Trevor Pinnock', 6),\n",
" ('Academy of St. Martin in the Fields & Sir Neville Marriner', 6),\n",
" (\"The King's Singers\", 6),\n",
" ('Berliner Philharmoniker & Herbert Von Karajan', 6),\n",
" ('Alberto Turco & Nova Schola Gregoriana', 5),\n",
" ('Richard Marlow & The Choir of Trinity College, Cambridge', 5),\n",
" ('Wilhelm Kempff', 5),\n",
" ('Yo-Yo Ma', 5),\n",
" ('Scholars Baroque Ensemble', 5)]"
]
},
"execution_count": 57,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"WITH pop_album AS \n",
" (\n",
" SELECT \n",
" al.artist_id artist_id,\n",
" COUNT(DISTINCT playlist_id) no_of_playlists\n",
" FROM album al \n",
" INNER JOIN track t ON al.album_id = t.album_id\n",
" INNER JOIN playlist_track pt ON t.track_id = pt.track_id\n",
" GROUP BY 1\n",
" )\n",
"SELECT \n",
" ar.name artist_name,\n",
" pa.no_of_playlists\n",
"FROM artist ar\n",
"INNER JOIN pop_album pa ON pa.artist_id = ar.artist_id\n",
"ORDER BY 2 DESC\n",
"LIMIT 10;\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Eugene Ormandy is the most popular artist in playlists. The different kinds of playlists represent the versatility in the kind of music produced by an artist.\n",
"\n",
"Let's see if the most popular artist in playlists is the best selling artist for Chinook. \n",
"To do this, we would need invoice_line, track, album, and artist tables (refer schema)."
]
},
{
"cell_type": "code",
"execution_count": 58,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" artist_name | \n",
" no_of_tracks_sold | \n",
"
\n",
" \n",
" Queen | \n",
" 192 | \n",
"
\n",
" \n",
" Jimi Hendrix | \n",
" 187 | \n",
"
\n",
" \n",
" Nirvana | \n",
" 130 | \n",
"
\n",
" \n",
" Red Hot Chili Peppers | \n",
" 130 | \n",
"
\n",
" \n",
" Pearl Jam | \n",
" 129 | \n",
"
\n",
" \n",
" AC/DC | \n",
" 124 | \n",
"
\n",
" \n",
" Guns N' Roses | \n",
" 124 | \n",
"
\n",
" \n",
" Foo Fighters | \n",
" 121 | \n",
"
\n",
" \n",
" The Rolling Stones | \n",
" 117 | \n",
"
\n",
" \n",
" Metallica | \n",
" 106 | \n",
"
\n",
"
"
],
"text/plain": [
"[('Queen', 192),\n",
" ('Jimi Hendrix', 187),\n",
" ('Nirvana', 130),\n",
" ('Red Hot Chili Peppers', 130),\n",
" ('Pearl Jam', 129),\n",
" ('AC/DC', 124),\n",
" (\"Guns N' Roses\", 124),\n",
" ('Foo Fighters', 121),\n",
" ('The Rolling Stones', 117),\n",
" ('Metallica', 106)]"
]
},
"execution_count": 58,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"WITH invoice_info AS\n",
" (\n",
" SELECT \n",
" al.artist_id,\n",
" COUNT(il.track_id) no_of_tracks_sold\n",
" FROM album al\n",
" INNER JOIN track t ON al.album_id = t.album_id\n",
" INNER JOIN invoice_line il ON t.track_id = il.track_id\n",
" GROUP BY 1\n",
" )\n",
"SELECT \n",
" ar.name artist_name,\n",
" i.no_of_tracks_sold\n",
"FROM artist ar\n",
"INNER JOIN invoice_info i ON ar.artist_id = i.artist_id\n",
"ORDER BY 2 DESC\n",
"LIMIT 10;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The sales are mostly dominated by rock bands, which is not surprising based on query 1. \n",
"Popularity is a metric, and can be interpreted in different ways based on the information need. As far as sales are concerned, genre seems to affect the sales more than versatility/ playlists."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tracks purchased vs not purchased\n",
"The management at Chinook wants to understand what share of tracks in the store are bought by customers to be able to manage the inventory better.\n",
"\n",
"To do this, we need the track and invoice_line tables (refer schema). invoice_line contains the tracks that are sold while track contains all the tracks in the store."
]
},
{
"cell_type": "code",
"execution_count": 81,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" total_tracks | \n",
" tracks_purchased | \n",
" perc_purchased | \n",
" perc_not_purchased | \n",
"
\n",
" \n",
" 4757 | \n",
" 1806 | \n",
" 37.97 | \n",
" 62.03 | \n",
"
\n",
"
"
],
"text/plain": [
"[(4757, 1806, 37.97, 62.03)]"
]
},
"execution_count": 81,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT\n",
" COUNT(t.track_id) total_tracks,\n",
" COUNT(DISTINCT i.track_id) tracks_purchased,\n",
" ROUND(COUNT(DISTINCT i.track_id)/CAST(COUNT(t.track_id) AS FLOAT)*100,2) perc_purchased,\n",
" ROUND(100 - COUNT(DISTINCT i.track_id)/CAST(COUNT(t.track_id) AS FLOAT)*100,2) perc_not_purchased\n",
"FROM track t\n",
"INNER JOIN invoice_line i ON i.track_id = t.track_id;\n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"~38% of unique tracks from the inventory are purchased by the customers, while ~62% are not purchased. The analysis here is done in terms of variety of tracks rather than the volume of sale - say a track is purchased 100 times in comparision to another one which is purchased only once, but both of them are purchased by customers.\n",
"The remaining 62% tracks can be analyzed further to understand any possible trends, leading to their unpopularity among customers. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Protected vs Non-protected media types\n",
"Do protected vs non-protected media types have an effect on popularity?\n",
"\n",
"To do this, we need media_type, track, and invoice_line tables (refer schema). From the previous query we know that there are 1806 distinct tracks sold by the store."
]
},
{
"cell_type": "code",
"execution_count": 62,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" media_type_id | \n",
" name | \n",
"
\n",
" \n",
" 1 | \n",
" MPEG audio file | \n",
"
\n",
" \n",
" 2 | \n",
" Protected AAC audio file | \n",
"
\n",
" \n",
" 3 | \n",
" Protected MPEG-4 video file | \n",
"
\n",
" \n",
" 4 | \n",
" Purchased AAC audio file | \n",
"
\n",
" \n",
" 5 | \n",
" AAC audio file | \n",
"
\n",
"
"
],
"text/plain": [
"[(1, 'MPEG audio file'),\n",
" (2, 'Protected AAC audio file'),\n",
" (3, 'Protected MPEG-4 video file'),\n",
" (4, 'Purchased AAC audio file'),\n",
" (5, 'AAC audio file')]"
]
},
"execution_count": 62,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT * FROM media_type;"
]
},
{
"cell_type": "code",
"execution_count": 83,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" protected | \n",
" tracks_sold | \n",
" perc_sold | \n",
"
\n",
" \n",
" No | \n",
" 1652 | \n",
" 91.47 | \n",
"
\n",
" \n",
" Yes | \n",
" 154 | \n",
" 8.53 | \n",
"
\n",
"
"
],
"text/plain": [
"[('No', 1652, 91.47), ('Yes', 154, 8.53)]"
]
},
"execution_count": 83,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"WITH track_media AS \n",
" (\n",
" SELECT \n",
" CASE\n",
" WHEN m.name LIKE \"%protected%\" \n",
" THEN \"Yes\"\n",
" ELSE \"No\"\n",
" END protected,\n",
" t.track_id\n",
" FROM media_type m\n",
" INNER JOIN track t ON t.media_type_id = m.media_type_id\n",
" )\n",
"SELECT \n",
" tm.protected,\n",
" COUNT(DISTINCT i.track_id) tracks_sold,\n",
" ROUND(CAST(COUNT(DISTINCT i.track_id) AS FLOAT)/1806*100,2) perc_sold\n",
"FROM invoice_line i\n",
"LEFT JOIN track_media tm ON i.track_id = tm.track_id\n",
"GROUP BY 1;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Out of all the unique tracks sold (1806 tracks) by the store, ~91% comprises of unprotected media."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.7"
}
},
"nbformat": 4,
"nbformat_minor": 2
}