{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "s1", "content", "l1" ] }, "source": [ "# Advanced SQL\n", "\n", "After handling basic sql queries through Python, let us look at slightly more advanced SQL queries and their execution through Python's sqlite module. SQL concepts like nested queries, merges, join operations are some of the basic concepts among advanced sql queries. In order to execute queries and practice these concepts, we need to first load data. Most of these concepts involve more than one table, so we will load and work with data among two tables.\n", "\n", "## Introduction to Fandango dataset\n", "\n", "Fandango is an online ticketing platform for movies. In order to support user's ticket buying decision, Fandango has viewer reviews and ratings for movie listings. This Fandango data set consists of two files - Fandango scrape and Fandango score comparison. The Fandango scrape consists of the list of movies on Fandango and their ratings. The score comparison file consists of the movie name and compares Fandango's movie score with that of other movie review websites like rotten tomatoes and metacritic.\n", "\n", "### Exercise\n", "\n", "Connect to fandangodb.db and load the two data sets, fandango_scrape and fandango_score_comparison into two tables 'fscrape' and 'fscore' respectively." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true, "tags": [ "s1", "ce", "l1" ] }, "outputs": [], "source": [ "import sqlite3\n", "import csv\n", "import pandas as pd\n", "\n", "# Connecting to the database\n", "fancon = sqlite3.connect(':memory:')\n", "\n", "# Reading data into tables\n", "scrapedf = pd.read_csv('https://raw.githubusercontent.com/colaberry/538data/master/fandango/fandango_scrape.csv')\n", "scoredf = pd.read_csv('https://raw.githubusercontent.com/colaberry/538data/master/fandango/fandango_score_comparison.csv')\n", "scrapedf.to_sql(name='fscrape',con=fancon,if_exists='append',index=False)\n", "scoredf.to_sql(name='fscore',con=fancon,if_exists='append',index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "l1", "hint" ] }, "outputs": [], "source": [ " " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "l1", "ans" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "hid", "l1" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "ref_tmp_var = True\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l2", "content", "s2" ] }, "source": [ "In the above step we have loaded two files fandango_score_comparison and fandango_scrape into two dataframes. Let us verify the success of the loading operation by visualizing the dataframes.\n", "\n", "### Exercise\n", "\n", "Visualize the heads of both data frames created in the step above. Read and understand the nature of the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l2", "ce", "s2" ] }, "outputs": [], "source": [ "# Use .head(5) function on the dataframes to visualize first 5 rows of the dataframes.\n", "# Use print function to print both the dataframes in a single step." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l2", "s2", "hint" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "l2", "s2", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " FILM STARS RATING VOTES\n", "0 Fifty Shades of Grey (2015) 4.0 3.9 34846\n", "1 Jurassic World (2015) 4.5 4.5 34390\n", "2 American Sniper (2015) 5.0 4.8 34085\n", "3 Furious 7 (2015) 5.0 4.8 33538\n", "4 Inside Out (2015) 4.5 4.5 15749 FILM RottenTomatoes RottenTomatoes_User \\\n", "0 Avengers: Age of Ultron (2015) 74 86 \n", "1 Cinderella (2015) 85 80 \n", "2 Ant-Man (2015) 80 90 \n", "3 Do You Believe? (2015) 18 84 \n", "4 Hot Tub Time Machine 2 (2015) 14 28 \n", "\n", " Metacritic Metacritic_User IMDB Fandango_Stars Fandango_Ratingvalue \\\n", "0 66 7.1 7.8 5.0 4.5 \n", "1 67 7.5 7.1 5.0 4.5 \n", "2 64 8.1 7.8 5.0 4.5 \n", "3 22 4.7 5.4 5.0 4.5 \n", "4 29 3.4 5.1 3.5 3.0 \n", "\n", " RT_norm RT_user_norm ... IMDB_norm RT_norm_round \\\n", "0 3.70 4.3 ... 3.90 3.5 \n", "1 4.25 4.0 ... 3.55 4.5 \n", "2 4.00 4.5 ... 3.90 4.0 \n", "3 0.90 4.2 ... 2.70 1.0 \n", "4 0.70 1.4 ... 2.55 0.5 \n", "\n", " RT_user_norm_round Metacritic_norm_round Metacritic_user_norm_round \\\n", "0 4.5 3.5 3.5 \n", "1 4.0 3.5 4.0 \n", "2 4.5 3.0 4.0 \n", "3 4.0 1.0 2.5 \n", "4 1.5 1.5 1.5 \n", "\n", " IMDB_norm_round Metacritic_user_vote_count IMDB_user_vote_count \\\n", "0 4.0 1330 271107 \n", "1 3.5 249 65709 \n", "2 4.0 627 103660 \n", "3 2.5 31 3136 \n", "4 2.5 88 19560 \n", "\n", " Fandango_votes Fandango_Difference \n", "0 14846 0.5 \n", "1 12640 0.5 \n", "2 12055 0.5 \n", "3 1793 0.5 \n", "4 1021 0.5 \n", "\n", "[5 rows x 22 columns]\n" ] } ], "source": [ "print(scrapedf.head(5),scoredf.head(5))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "l2", "hid", "s2" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " scrapedftest = pd.DataFrame([('Fifty Shades of Grey (2015)',4.0,3.9,34846),\n", " ('Jurassic World (2015)',4.5,4.5,34390),\n", " ('American Sniper (2015)',5.0,4.8,34085),\n", " ('Furious 7 (2015)',5.0,4.8,33538),\n", " ('Inside Out (2015)',4.5,4.5,15749)],\n", " columns=['FILM','STARS','RATING','VOTES'])\n", "\n", " scoredftest = pd.DataFrame([('Avengers: Age of Ultron (2015)',74,86,66,7.1,7.8,5.0,4.5,3.70,4.3,3.3,3.55,3.90,3.5,4.5,3.5,3.5,4.0,1330,271107,14846,0.5),\n", " ('Cinderella (2015)',85,80,67,7.5,7.1,5.0,4.5,4.25,4.0,3.35,3.75,3.55,4.5,4.0,3.5,4.0,3.5,249,65709,12640,0.5),\n", " ('Ant-Man (2015)',80,90,64,8.1,7.8,5.0,4.5,4.00,4.5,3.2,4.05,3.90,4.0,4.5,3.0,4.0,4.0,627,103660,12055,0.5),\n", " ('Do You Believe? (2015)',18,84,22,4.7,5.4,5.0,4.5,0.90,4.2,1.1,2.35,2.70,1.0,4.0,1.0,2.5,2.5,31,3136,1793,0.5),\n", " ('Hot Tub Time Machine 2 (2015)',14,28,29,3.4,5.1,3.5,3.0,0.70,1.4,1.45,1.7,2.55,0.5,1.5,1.5,1.5,2.5,88,19560,1021,0.5)],\n", " columns=['FILM','RottenTomatoes','RottenTomatoes_User','Metacritic','Metacritic_User','IMDB','Fandango_Stars','Fandango_Ratingvalue','RT_norm','RT_user_norm','Metacritic_norm','Metacritic_user_nom','IMDB_norm','RT_norm_round','RT_user_norm_round','Metacritic_norm_round','Metacritic_user_norm_round','IMDB_norm_round','Metacritic_user_vote_count','IMDB_user_vote_count','Fandango_votes','Fandango_Difference'])\n", "\n", "\n", " if scrapedftest.equals(scrapedf.head(5)) and scoredftest.equals(scoredf.head(5)):\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l3", "s3", "content" ] }, "source": [ "Let us understand a bit more about the data, before using SQL queries to analyze it.\n", "\n", "### Exercise\n", "\n", "Find the total number of rows in both the tables and print them out. You may use the dataframes to count the data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "tags": [ "l3", "s3", "ce" ] }, "outputs": [], "source": [ "# Use a simple print statement to print the number of rows" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l3", "s3", "hint" ] }, "outputs": [], "source": [ "# Use the length method on the dataframes" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l3", "s3", "ans" ] }, "outputs": [], "source": [ "print(\"Total number of records in fscrape table=%d and fscore table=%d\"%(len(scrapedf),len(scoredf)))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "tags": [ "l3", "s3", "hid" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", "\n", " var1=510\n", " var2=146\n", "\n", " if var1==len(scrapedf) and var2==len(scoredf):\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l4", "s4", "content" ] }, "source": [ "## Union, Intersection and Minus\n", "\n", "![title](img/intersect_union_minus.png)\n", "\n", "Union is a set theory concept which says that it is a collection of all unique elements of two or more sets. When performed among two tables say by specifying column name, it outputs the collection of all unique values that appear in the common column, of either or both tables. Intersection outputs the values that are appear in both tables. Note that when performed on same inputs, intersection will always be a subset of Union. Minus operation results in values/records, featuring in one table but not featuring in the other - i.e. A-B = A-(Intersection(A,B)).\n", "\n", "### Exercise\n", "\n", "Execute Union and Intersection operations on 'FILM' column of 'fscore' and 'fscrape' tables to understand the data better as well as understand the operations better.\n", "* 1) Execute the SQL query\n", "* 2) Fetch the output and store it in an array\n", "* 3) Print the first five rows and size of the array to show total number of records in output" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [ "l4", "s4", "ce" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "fancur = fancon.cursor()\n", "\n", "# Modify the code below\n", "fancur.execute(\"SELECT...UNION...\")\n", "unionout = fancur.fetchall()\n", "fancur.execute(\"...INTERSECT...\")\n", "intersectout = fancur.fetchall()\n", "fancur.execute(\"...EXCEPT...\")\n", "exceptout = fancur.fetchall()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s4", "l4", "hint" ] }, "outputs": [], "source": [ "# 'MINUS' keyword is not supported in sqlite module at present. 'EXCEPT' keyword can be used as a substitute.\n", "# The EXCEPT query returns the same output as MINUS." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "s4", "l4", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[(' Like Sunday, Like Rain (2015)',), (\"'71 (2015)\",), ('1001 Grams (2015)',), ('12 Golden Ducks (2015)',), ('24 Days (2015)',)] [(\"'71 (2015)\",), ('5 Flights Up (2015)',), ('A Little Chaos (2015)',), ('A Most Violent Year (2014)',), ('About Elly (2015)',)] [(' Like Sunday, Like Rain (2015)',), ('1001 Grams (2015)',), ('12 Golden Ducks (2015)',), ('24 Days (2015)',), ('3 1/2 Minutes, Ten Bullets (2015)',)]\n", "total number of rows in union=510, total number of rows in intersection=145, total number of rows in minus=364\n" ] } ], "source": [ "fancur.execute(\"SELECT FILM FROM fscore UNION SELECT FILM FROM fscrape\")\n", "unionout = fancur.fetchall()\n", "fancur.execute(\"SELECT FILM FROM fscore INTERSECT SELECT FILM FROM fscrape\")\n", "intersectout = fancur.fetchall()\n", "fancur.execute(\"SELECT FILM FROM fscrape EXCEPT SELECT FILM FROM fscore\")\n", "exceptout = fancur.fetchall()\n", "print(unionout[:5],intersectout[:5],exceptout[:5])\n", "print(\"total number of rows in union=%d, total number of rows in intersection=%d, total number of rows in minus=%d\"%(len(unionout),len(intersectout),len(exceptout)))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "tags": [ "s4", "hid", "l4" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [(' Like Sunday, Like Rain (2015)',), (\"'71 (2015)\",), ('1001 Grams (2015)',), ('12 Golden Ducks (2015)',), ('24 Days (2015)',)]\n", " test2 = [(\"'71 (2015)\",), ('5 Flights Up (2015)',), ('A Little Chaos (2015)',), ('A Most Violent Year (2014)',), ('About Elly (2015)',)]\n", " test3 = [(' Like Sunday, Like Rain (2015)',), ('1001 Grams (2015)',), ('12 Golden Ducks (2015)',), ('24 Days (2015)',), ('3 1/2 Minutes, Ten Bullets (2015)',)]\n", "\n", " if test==unionout[:5] and len(unionout)==510 and test2==intersectout[:5] and len(intersectout)==145 and test3==exceptout[:5] and len(exceptout)==364:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l5", "content", "s5" ] }, "source": [ "### Nested query\n", "\n", "A Nested query, also referred to as sub-query, is a query within a query. This means that the innermost query is executed first and the output of that query serves as an input to the query which surrounds the embedded query.\n", "\n", "From above analysis we can see that fscrape is a bigger list and fscore is a smaller list of movies. Let us retrieve complete Fandango information for all movies that have an IMDB score of more than 7.5\n", "\n", "#### Exercise\n", "\n", "Retrieve FILM, STARS, RATING, VOTES for all movies that have an IMDB rating greater than 7.5 in the fscore table. Note: Do not use 'JOIN' operation to solve this exercise." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "tags": [ "l5", "ce", "s5" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "# Modify the code below\n", "fancur.execute('SELECT...IN (SELECT...)')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l5", "s5", "hint" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 9, "metadata": { "tags": [ "l5", "s5", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[('Inside Out (2015)', 4.5, 4.5, 15749), ('Kingsman: The Secret Service (2015)', 4.5, 4.2, 15205), ('Avengers: Age of Ultron (2015)', 5.0, 4.5, 14846), ('Ant-Man (2015)', 5.0, 4.5, 12055), ('Mad Max: Fury Road (2015)', 4.5, 4.3, 10509)]\n", "total number of rows=27\n" ] } ], "source": [ "fancur.execute('SELECT * FROM fscrape WHERE FILM IN (SELECT FILM FROM fscore WHERE IMDB>7.5)')\n", "nestout = fancur.fetchall()\n", "print(nestout[:5])\n", "print(\"total number of rows=%d\"%len(nestout))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "tags": [ "hid", "l5", "s5" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Inside Out (2015)', 4.5, 4.5, 15749), ('Kingsman: The Secret Service (2015)', 4.5, 4.2, 15205), ('Avengers: Age of Ultron (2015)', 5.0, 4.5, 14846), ('Ant-Man (2015)', 5.0, 4.5, 12055), ('Mad Max: Fury Road (2015)', 4.5, 4.3, 10509)]\n", "\n", " if test==nestout[:5] and len(nestout)==27:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "content", "l6", "s6" ] }, "source": [ "### Types of joins\n", "\n", "A join is an operation that is used to combine data from two tables, given that there is a common column in both of them. There are four types of joins:\n", "1) Inner join - returns common records among both the tables\n", "2) Right join - returns all records from right table, and the common records from the left table\n", "3) Left join - returns all records from left table, and the common records from the right table\n", "4) Full join - returns all records when the common record is either in table 1 or table 2.\n", "\n", "Lets understand what they actually mean by executing these operations on the two tables.\n", "\n", "#### Exercise (a)\n", "\n", "Execute inner join operation on the 'fscore' and 'fscrape' tables. Note that FILM is the common column among both tables." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "tags": [ "ce", "l6", "s6" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from tabulate import tabulate\n", "\n", "fancur.execute('''SELECT...INNER JOIN...ON...''')\n", "templist1 = fancur.fetchall()\n", "headers1 = []\n", "print(tabulate(templist1, headers1))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l6", "s6", "hint" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 15, "metadata": { "tags": [ "l6", "s6", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FILM rottentomatoes metacritic Imdb fandangorating fandangovotes\n", "---------------------------------------------------------------------- ---------------- ------------ ------ ---------------- ---------------\n", "Avengers: Age of Ultron (2015) 74 66 7.8 4.5 14846\n", "Cinderella (2015) 85 67 7.1 4.5 12640\n", "Ant-Man (2015) 80 64 7.8 4.5 12055\n", "Do You Believe? (2015) 18 22 5.4 4.5 1793\n", "Hot Tub Time Machine 2 (2015) 14 29 5.1 3 1021\n", "The Water Diviner (2015) 63 50 7.2 4 397\n", "Irrational Man (2015) 42 53 6.9 3.5 252\n", "Top Five (2014) 86 81 6.5 3.5 3223\n", "Shaun the Sheep Movie (2015) 99 81 7.4 4 896\n", "Love & Mercy (2015) 89 80 7.8 4 864\n", "Far From The Madding Crowd (2015) 84 71 7.2 4 804\n", "Black Sea (2015) 82 62 6.4 3.5 218\n", "Leviathan (2014) 99 92 7.7 3.5 64\n", "Unbroken (2014) 51 59 7.2 4.1 9443\n", "The Imitation Game (2014) 90 73 8.1 4.6 8055\n", "Taken 3 (2015) 9 26 6.1 4.1 6757\n", "Ted 2 (2015) 46 48 6.6 4.1 6437\n", "Southpaw (2015) 59 57 7.8 4.6 5597\n", "Night at the Museum: Secret of the Tomb (2014) 50 47 6.3 4.1 5445\n", "Pixels (2015) 17 27 5.6 4.1 3886\n", "McFarland, USA (2015) 79 60 7.5 4.6 3364\n", "Insidious: Chapter 3 (2015) 59 52 6.3 4.1 3276\n", "The Man From U.N.C.L.E. (2015) 68 55 7.6 4.1 2686\n", "Run All Night (2015) 60 59 6.6 4.1 2066\n", "Trainwreck (2015) 85 75 6.7 4.1 8381\n", "Selma (2014) 99 89 7.5 4.6 7025\n", "Ex Machina (2015) 92 78 7.7 4.1 3458\n", "Still Alice (2015) 88 72 7.5 4.1 1258\n", "Wild Tales (2014) 96 77 8.2 4.1 235\n", "The End of the Tour (2015) 92 84 7.9 4.1 121\n", "Red Army (2015) 96 82 7.7 4.1 54\n", "When Marnie Was There (2015) 89 71 7.8 4.1 46\n", "The Hunting Ground (2015) 92 77 7.5 4.1 42\n", "The Boy Next Door (2015) 10 30 4.6 3.6 2800\n", "Aloha (2015) 19 40 5.5 3.1 2284\n", "The Loft (2015) 11 24 6.3 3.6 811\n", "5 Flights Up (2015) 52 55 6.1 3.6 79\n", "Welcome to Me (2015) 71 67 5.9 3.6 56\n", "Saint Laurent (2015) 51 52 6.3 3.1 43\n", "Maps to the Stars (2015) 60 67 6.3 3.1 35\n", "I'll See You In My Dreams (2015) 94 75 6.9 3.6 281\n", "Timbuktu (2015) 99 91 7.2 3.6 74\n", "About Elly (2015) 97 87 8.2 3.6 43\n", "The Diary of a Teenage Girl (2015) 95 87 7 3.6 38\n", "Kingsman: The Secret Service (2015) 75 58 7.8 4.2 15205\n", "Tomorrowland (2015) 50 60 6.6 3.7 8077\n", "The Divergent Series: Insurgent (2015) 30 42 6.4 4.2 7123\n", "Annie (2014) 27 33 5.2 4.2 6835\n", "Fantastic Four (2015) 9 27 4 2.7 6288\n", "Terminator Genisys (2015) 26 38 6.9 4.2 6272\n", "Pitch Perfect 2 (2015) 67 63 6.7 4.2 4577\n", "Entourage (2015) 32 38 7.1 4.2 4279\n", "The Age of Adaline (2015) 54 51 7.3 4.2 3325\n", "Hot Pursuit (2015) 8 31 4.9 3.7 2618\n", "The DUFF (2015) 71 56 6.6 4.2 2273\n", "Black or White (2015) 39 45 6.6 4.2 1862\n", "Project Almanac (2015) 34 47 6.4 3.7 1834\n", "Ricki and the Flash (2015) 64 54 6.2 3.7 1462\n", "Seventh Son (2015) 12 30 5.5 3.2 1213\n", "Mortdecai (2015) 12 27 5.5 3.2 1196\n", "Unfinished Business (2015) 11 32 5.4 3.2 821\n", "American Ultra (2015) 46 50 6.5 3.7 638\n", "True Story (2015) 45 50 6.3 3.2 540\n", "Child 44 (2015) 26 41 6.4 3.7 308\n", "Dark Places (2015) 26 39 6.3 3.7 55\n", "Birdman (2014) 92 88 7.9 3.7 4194\n", "The Gift (2015) 93 77 7.6 3.7 2680\n", "Unfriended (2015) 60 59 5.9 2.7 2507\n", "Monkey Kingdom (2015) 94 72 7.3 4.2 701\n", "Mr. Turner (2014) 98 94 6.9 3.2 290\n", "Seymour: An Introduction (2015) 100 83 7.7 4.2 41\n", "The Wrecking Crew (2015) 93 67 7.8 4.2 38\n", "American Sniper (2015) 72 72 7.4 4.8 34085\n", "Furious 7 (2015) 81 67 7.4 4.8 33538\n", "The Hobbit: The Battle of the Five Armies (2014) 61 59 7.5 4.3 15337\n", "San Andreas (2015) 50 43 6.5 4.3 9749\n", "Straight Outta Compton (2015) 90 72 8.4 4.8 8096\n", "Vacation (2015) 27 34 6.3 3.8 3815\n", "Chappie (2015) 30 41 7 3.8 3642\n", "Poltergeist (2015) 31 47 5 2.8 2704\n", "Paper Towns (2015) 55 56 6.9 3.8 1750\n", "Big Eyes (2014) 72 62 7 3.8 1501\n", "Blackhat (2015) 34 51 5.4 2.8 1430\n", "Self/less (2015) 20 34 6.6 3.8 1415\n", "Sinister 2 (2015) 13 31 5.5 3.3 973\n", "Little Boy (2015) 20 30 7.4 4.3 811\n", "Me and Earl and The Dying Girl (2015) 81 74 8.2 4.3 624\n", "Maggie (2015) 54 52 5.6 3.3 95\n", "Mad Max: Fury Road (2015) 97 89 8.3 4.3 10509\n", "Spy (2015) 93 75 7.3 4.3 9418\n", "The SpongeBob Movie: Sponge Out of Water (2015) 78 62 6.1 3.3 4493\n", "Paddington (2015) 98 77 7.2 4.3 4045\n", "Dope (2015) 87 72 7.5 4.3 2195\n", "What We Do in the Shadows (2015) 96 75 7.6 4.3 259\n", "The Overnight (2015) 82 65 6.9 3.3 110\n", "The Salt of the Earth (2015) 96 83 8.4 4.3 83\n", "Song of the Sea (2014) 99 86 8.2 4.3 66\n", "Fifty Shades of Grey (2015) 25 46 4.2 3.9 34846\n", "Get Hard (2015) 29 34 6.1 3.9 5933\n", "Focus (2015) 57 56 6.6 3.9 4933\n", "Jupiter Ascending (2015) 26 40 5.5 3.4 4122\n", "The Gallows (2015) 16 30 4.4 2.9 1896\n", "The Second Best Exotic Marigold Hotel (2015) 62 51 6.6 3.9 1870\n", "Strange Magic (2015) 17 25 5.7 3.4 1117\n", "The Gunman (2015) 17 39 5.8 3.4 996\n", "Hitman: Agent 47 (2015) 7 28 5.9 3.9 917\n", "Cake (2015) 49 49 6.5 3.4 482\n", "The Vatican Tapes (2015) 13 37 4.6 2.9 210\n", "A Little Chaos (2015) 40 51 6.4 3.9 83\n", "The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015) 67 58 7.1 3.9 63\n", "Escobar: Paradise Lost (2015) 52 56 6.6 3.9 48\n", "Into the Woods (2014) 71 69 6 3.4 13055\n", "It Follows (2015) 96 83 6.9 2.9 2097\n", "Inherent Vice (2014) 73 81 6.7 2.9 1078\n", "A Most Violent Year (2014) 90 79 7.1 3.4 675\n", "While We're Young (2015) 83 76 6.4 2.9 449\n", "Clouds of Sils Maria (2015) 89 78 6.8 3.4 162\n", "Testament of Youth (2015) 81 77 7.3 3.9 127\n", "Infinitely Polar Bear (2015) 80 64 7.2 3.9 124\n", "Phoenix (2015) 99 91 7.2 3.4 70\n", "The Wolfpack (2015) 84 75 7.1 3.4 66\n", "The Stanford Prison Experiment (2015) 84 68 7.1 3.9 51\n", "Tangerine (2015) 95 86 7.4 3.9 36\n", "Magic Mike XXL (2015) 62 60 6.3 4.4 9363\n", "Home (2015) 45 55 6.7 4.4 7705\n", "The Wedding Ringer (2015) 27 35 6.7 4.4 6506\n", "Woman in Gold (2015) 52 51 7.4 4.4 2435\n", "The Last Five Years (2015) 60 60 6 4.4 99\n", "Amy (2015) 97 85 8 4.4 729\n", "Jurassic World (2015) 71 59 7.3 4.5 34390\n", "Minions (2015) 54 56 6.7 4 14998\n", "Max (2015) 35 47 7 4.5 3412\n", "Paul Blart: Mall Cop 2 (2015) 5 13 4.3 3.5 3054\n", "The Longest Ride (2015) 31 33 7.2 4.5 2603\n", "The Lazarus Effect (2015) 14 31 5.2 3 1651\n", "The Woman In Black 2 Angel of Death (2015) 22 42 4.9 3 1333\n", "Danny Collins (2015) 77 58 7.1 4 531\n", "Spare Parts (2015) 52 50 7.2 4.5 450\n", "Serena (2015) 18 36 5.4 3 50\n", "Inside Out (2015) 98 94 8.6 4.5 15749\n", "Mr. Holmes (2015) 87 67 7.4 4 1348\n", "'71 (2015) 97 83 7.2 3.5 192\n", "Two Days, One Night (2014) 97 89 7.4 3.5 118\n", "Gett: The Trial of Viviane Amsalem (2015) 100 90 7.8 3.5 59\n", "Kumiko, The Treasure Hunter (2015) 87 68 6.7 3.5 41\n", "145\n" ] } ], "source": [ "fancur.execute('''SELECT fscore.FILM, fscore.RottenTomatoes, fscore.Metacritic, fscore.IMDB, fscrape.RATING, fscrape.VOTES\n", " FROM fscore\n", " INNER JOIN fscrape ON fscore.FILM=fscrape.FILM''')\n", "templist1 = fancur.fetchall()\n", "headers1 = ['FILM','rottentomatoes','metacritic','Imdb','fandangorating','fandangovotes']\n", "print(tabulate(templist1, headers1))\n", "print(len(templist1))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "tags": [ "hid", "s6", "l6" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Avengers: Age of Ultron (2015)',74,66,7.8,4.5,14846),('Cinderella (2015)',85,67,7.1,4.5,12640),('Ant-Man (2015)',80,64,7.8,4.5,12055),('Do You Believe? (2015)',18,22,5.4,4.5,1793),('Hot Tub Time Machine 2 (2015)',14,29,5.1,3,1021)]\n", "\n", " if test==templist1[:5] and len(templist1)==145:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "l7", "s7", "content" ] }, "source": [ "Now that we have seen the results of an 'inner join' operation, Let's replicate the same for a 'left join'.\n", "\n", "#### Exercise (b)\n", "\n", "Execute a left join operation on the 'fscore' and 'fscrape' tables. Note that FILM is the common column among both tables." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "tags": [ "l7", "s7", "ce" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "fancur.execute()\n", "templist2 = \n", "headers2 = []" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "l7", "s7", "hint" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 13, "metadata": { "tags": [ "l7", "s7", "ans" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FILM rottentomatoes metacritic Imdb fandangorating fandangovotes\n", "---------------------------------------------------------------------- ---------------- ------------ ------ ---------------- ---------------\n", "Avengers: Age of Ultron (2015) 74 66 7.8 4.5 14846\n", "Cinderella (2015) 85 67 7.1 4.5 12640\n", "Ant-Man (2015) 80 64 7.8 4.5 12055\n", "Do You Believe? (2015) 18 22 5.4 4.5 1793\n", "Hot Tub Time Machine 2 (2015) 14 29 5.1 3 1021\n", "The Water Diviner (2015) 63 50 7.2 4 397\n", "Irrational Man (2015) 42 53 6.9 3.5 252\n", "Top Five (2014) 86 81 6.5 3.5 3223\n", "Shaun the Sheep Movie (2015) 99 81 7.4 4 896\n", "Love & Mercy (2015) 89 80 7.8 4 864\n", "Far From The Madding Crowd (2015) 84 71 7.2 4 804\n", "Black Sea (2015) 82 62 6.4 3.5 218\n", "Leviathan (2014) 99 92 7.7 3.5 64\n", "Unbroken (2014) 51 59 7.2 4.1 9443\n", "The Imitation Game (2014) 90 73 8.1 4.6 8055\n", "Taken 3 (2015) 9 26 6.1 4.1 6757\n", "Ted 2 (2015) 46 48 6.6 4.1 6437\n", "Southpaw (2015) 59 57 7.8 4.6 5597\n", "Night at the Museum: Secret of the Tomb (2014) 50 47 6.3 4.1 5445\n", "Pixels (2015) 17 27 5.6 4.1 3886\n", "McFarland, USA (2015) 79 60 7.5 4.6 3364\n", "Insidious: Chapter 3 (2015) 59 52 6.3 4.1 3276\n", "The Man From U.N.C.L.E. (2015) 68 55 7.6 4.1 2686\n", "Run All Night (2015) 60 59 6.6 4.1 2066\n", "Trainwreck (2015) 85 75 6.7 4.1 8381\n", "Selma (2014) 99 89 7.5 4.6 7025\n", "Ex Machina (2015) 92 78 7.7 4.1 3458\n", "Still Alice (2015) 88 72 7.5 4.1 1258\n", "Wild Tales (2014) 96 77 8.2 4.1 235\n", "The End of the Tour (2015) 92 84 7.9 4.1 121\n", "Red Army (2015) 96 82 7.7 4.1 54\n", "When Marnie Was There (2015) 89 71 7.8 4.1 46\n", "The Hunting Ground (2015) 92 77 7.5 4.1 42\n", "The Boy Next Door (2015) 10 30 4.6 3.6 2800\n", "Aloha (2015) 19 40 5.5 3.1 2284\n", "The Loft (2015) 11 24 6.3 3.6 811\n", "5 Flights Up (2015) 52 55 6.1 3.6 79\n", "Welcome to Me (2015) 71 67 5.9 3.6 56\n", "Saint Laurent (2015) 51 52 6.3 3.1 43\n", "Maps to the Stars (2015) 60 67 6.3 3.1 35\n", "I'll See You In My Dreams (2015) 94 75 6.9 3.6 281\n", "Timbuktu (2015) 99 91 7.2 3.6 74\n", "About Elly (2015) 97 87 8.2 3.6 43\n", "The Diary of a Teenage Girl (2015) 95 87 7 3.6 38\n", "Kingsman: The Secret Service (2015) 75 58 7.8 4.2 15205\n", "Tomorrowland (2015) 50 60 6.6 3.7 8077\n", "The Divergent Series: Insurgent (2015) 30 42 6.4 4.2 7123\n", "Annie (2014) 27 33 5.2 4.2 6835\n", "Fantastic Four (2015) 9 27 4 2.7 6288\n", "Terminator Genisys (2015) 26 38 6.9 4.2 6272\n", "Pitch Perfect 2 (2015) 67 63 6.7 4.2 4577\n", "Entourage (2015) 32 38 7.1 4.2 4279\n", "The Age of Adaline (2015) 54 51 7.3 4.2 3325\n", "Hot Pursuit (2015) 8 31 4.9 3.7 2618\n", "The DUFF (2015) 71 56 6.6 4.2 2273\n", "Black or White (2015) 39 45 6.6 4.2 1862\n", "Project Almanac (2015) 34 47 6.4 3.7 1834\n", "Ricki and the Flash (2015) 64 54 6.2 3.7 1462\n", "Seventh Son (2015) 12 30 5.5 3.2 1213\n", "Mortdecai (2015) 12 27 5.5 3.2 1196\n", "Unfinished Business (2015) 11 32 5.4 3.2 821\n", "American Ultra (2015) 46 50 6.5 3.7 638\n", "True Story (2015) 45 50 6.3 3.2 540\n", "Child 44 (2015) 26 41 6.4 3.7 308\n", "Dark Places (2015) 26 39 6.3 3.7 55\n", "Birdman (2014) 92 88 7.9 3.7 4194\n", "The Gift (2015) 93 77 7.6 3.7 2680\n", "Unfriended (2015) 60 59 5.9 2.7 2507\n", "Monkey Kingdom (2015) 94 72 7.3 4.2 701\n", "Mr. Turner (2014) 98 94 6.9 3.2 290\n", "Seymour: An Introduction (2015) 100 83 7.7 4.2 41\n", "The Wrecking Crew (2015) 93 67 7.8 4.2 38\n", "American Sniper (2015) 72 72 7.4 4.8 34085\n", "Furious 7 (2015) 81 67 7.4 4.8 33538\n", "The Hobbit: The Battle of the Five Armies (2014) 61 59 7.5 4.3 15337\n", "San Andreas (2015) 50 43 6.5 4.3 9749\n", "Straight Outta Compton (2015) 90 72 8.4 4.8 8096\n", "Vacation (2015) 27 34 6.3 3.8 3815\n", "Chappie (2015) 30 41 7 3.8 3642\n", "Poltergeist (2015) 31 47 5 2.8 2704\n", "Paper Towns (2015) 55 56 6.9 3.8 1750\n", "Big Eyes (2014) 72 62 7 3.8 1501\n", "Blackhat (2015) 34 51 5.4 2.8 1430\n", "Self/less (2015) 20 34 6.6 3.8 1415\n", "Sinister 2 (2015) 13 31 5.5 3.3 973\n", "Little Boy (2015) 20 30 7.4 4.3 811\n", "Me and Earl and The Dying Girl (2015) 81 74 8.2 4.3 624\n", "Maggie (2015) 54 52 5.6 3.3 95\n", "Mad Max: Fury Road (2015) 97 89 8.3 4.3 10509\n", "Spy (2015) 93 75 7.3 4.3 9418\n", "The SpongeBob Movie: Sponge Out of Water (2015) 78 62 6.1 3.3 4493\n", "Paddington (2015) 98 77 7.2 4.3 4045\n", "Dope (2015) 87 72 7.5 4.3 2195\n", "What We Do in the Shadows (2015) 96 75 7.6 4.3 259\n", "The Overnight (2015) 82 65 6.9 3.3 110\n", "The Salt of the Earth (2015) 96 83 8.4 4.3 83\n", "Song of the Sea (2014) 99 86 8.2 4.3 66\n", "Fifty Shades of Grey (2015) 25 46 4.2 3.9 34846\n", "Get Hard (2015) 29 34 6.1 3.9 5933\n", "Focus (2015) 57 56 6.6 3.9 4933\n", "Jupiter Ascending (2015) 26 40 5.5 3.4 4122\n", "The Gallows (2015) 16 30 4.4 2.9 1896\n", "The Second Best Exotic Marigold Hotel (2015) 62 51 6.6 3.9 1870\n", "Strange Magic (2015) 17 25 5.7 3.4 1117\n", "The Gunman (2015) 17 39 5.8 3.4 996\n", "Hitman: Agent 47 (2015) 7 28 5.9 3.9 917\n", "Cake (2015) 49 49 6.5 3.4 482\n", "The Vatican Tapes (2015) 13 37 4.6 2.9 210\n", "A Little Chaos (2015) 40 51 6.4 3.9 83\n", "The 100-Year-Old Man Who Climbed Out the Window and Disappeared (2015) 67 58 7.1 3.9 63\n", "Escobar: Paradise Lost (2015) 52 56 6.6 3.9 48\n", "Into the Woods (2014) 71 69 6 3.4 13055\n", "It Follows (2015) 96 83 6.9 2.9 2097\n", "Inherent Vice (2014) 73 81 6.7 2.9 1078\n", "A Most Violent Year (2014) 90 79 7.1 3.4 675\n", "While We're Young (2015) 83 76 6.4 2.9 449\n", "Clouds of Sils Maria (2015) 89 78 6.8 3.4 162\n", "Testament of Youth (2015) 81 77 7.3 3.9 127\n", "Infinitely Polar Bear (2015) 80 64 7.2 3.9 124\n", "Phoenix (2015) 99 91 7.2 3.4 70\n", "The Wolfpack (2015) 84 75 7.1 3.4 66\n", "The Stanford Prison Experiment (2015) 84 68 7.1 3.9 51\n", "Tangerine (2015) 95 86 7.4 3.9 36\n", "Magic Mike XXL (2015) 62 60 6.3 4.4 9363\n", "Home (2015) 45 55 6.7 4.4 7705\n", "The Wedding Ringer (2015) 27 35 6.7 4.4 6506\n", "Woman in Gold (2015) 52 51 7.4 4.4 2435\n", "The Last Five Years (2015) 60 60 6 4.4 99\n", "Mission: Impossible – Rogue Nation (2015) 92 75 7.8\n", "Amy (2015) 97 85 8 4.4 729\n", "Jurassic World (2015) 71 59 7.3 4.5 34390\n", "Minions (2015) 54 56 6.7 4 14998\n", "Max (2015) 35 47 7 4.5 3412\n", "Paul Blart: Mall Cop 2 (2015) 5 13 4.3 3.5 3054\n", "The Longest Ride (2015) 31 33 7.2 4.5 2603\n", "The Lazarus Effect (2015) 14 31 5.2 3 1651\n", "The Woman In Black 2 Angel of Death (2015) 22 42 4.9 3 1333\n", "Danny Collins (2015) 77 58 7.1 4 531\n", "Spare Parts (2015) 52 50 7.2 4.5 450\n", "Serena (2015) 18 36 5.4 3 50\n", "Inside Out (2015) 98 94 8.6 4.5 15749\n", "Mr. Holmes (2015) 87 67 7.4 4 1348\n", "'71 (2015) 97 83 7.2 3.5 192\n", "Two Days, One Night (2014) 97 89 7.4 3.5 118\n", "Gett: The Trial of Viviane Amsalem (2015) 100 90 7.8 3.5 59\n", "Kumiko, The Treasure Hunter (2015) 87 68 6.7 3.5 41\n", "146\n" ] } ], "source": [ "fancur.execute('''SELECT fscore.FILM, fscore.RottenTomatoes, fscore.Metacritic, fscore.IMDB, fscrape.RATING, fscrape.VOTES\n", " FROM fscore\n", " LEFT JOIN fscrape ON fscore.FILM=fscrape.FILM''')\n", "templist2 = fancur.fetchall()\n", "headers2 = ['FILM','rottentomatoes','metacritic','Imdb','fandangorating','fandangovotes']\n", "print(tabulate(templist2, headers2))\n", "print(len(templist2))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "tags": [ "l7", "hid", "s7" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Avengers: Age of Ultron (2015)',74,66,7.8,4.5,14846),('Cinderella (2015)',85,67,7.1,4.5,12640),('Ant-Man (2015)',80,64,7.8,4.5,12055),('Do You Believe? (2015)',18,22,5.4,4.5,1793),('Hot Tub Time Machine 2 (2015)',14,29,5.1,3,1021)]\n", "\n", " if test==templist2[:5] and len(templist2)==146:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s8", "content", "l8" ] }, "source": [ "From above code we can see that INNER JOIN and LEFT JOIN produce almost the same output. This is because the fscore table is a subset of fscrape table.\n", "\n", "As of now, RIGHT OUTER JOIN and FULL OUTER JOIN are not supported by the sqlite module.\n", "\n", "### Stored Procedures, Triggers and Materialized Views\n", "\n", "Stored procedures, triggers and materialized views are some of the advanced SQL concepts. A stored procedure is a set of instructions/SQL queries which performs a logical action on the database server and is executed on a recurring basis, much like a standard operating procedure. A trigger is a type of stored procedure, which is set to automatically execute when a certain event occurs in the database (Eg. Say a record in a specific table is updated or deleted). A view is the output of a sql query which is presented in the console, but not stored on disk. When this output is stored on disk, in the form of a structure, in order to retrieve it frequently (instead of having to run the query again), it is called a materialized view.\n", "\n", "Some additional reading on the above topics:\n", "Stored procedures: https://en.wikipedia.org/wiki/Stored_procedure, https://www.tutorialspoint.com/t_sql/t_sql_stored_procedures.htm\n", "\n", "Triggers: https://www.tutorialspoint.com/plsql/plsql_triggers.htm\n", "\n", "Materialized views: https://en.wikipedia.org/wiki/Materialized_view, http://www.postgresqltutorial.com/postgresql-materialized-views/\n", "\n", "The above concepts are not supported by sqlite module." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "ce", "l8" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "l8", "hint" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "l8", "ans" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "l8", "hid" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "ref_tmp_var = True\n", "\n", "assert ref_tmp_var" ] } ], "metadata": { "executed_sections": [], "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" }, "rf_version": 1 }, "nbformat": 4, "nbformat_minor": 2 }