{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# TPC-H Queries\n", "\n", "In this notebook, we explore how to reimplement SQL queries from the well-known TPC-H benchmark in Rabbit.\n", "\n", "We start with loading the dataset." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "Region:\n", " name :: ASCIIString\n", " comment :: ASCIIString\n", " nation (inverse of Nation.region) :: Array{Nation,1} # unique, covering\n", "Nation:\n", " name :: ASCIIString\n", " region :: Region\n", " comment :: ASCIIString\n", " customer (inverse of Customer.nation) :: Array{Customer,1} # unique, covering\n", " supplier (inverse of Supplier.nation) :: Array{Supplier,1} # unique, covering\n", "Customer:\n", " name :: ASCIIString\n", " address :: ASCIIString\n", " nation :: Nation\n", " phone :: ASCIIString\n", " acctbal :: RBT.Monetary{:USD}\n", " mktsegment :: ASCIIString\n", " comment :: ASCIIString\n", " order (inverse of Order.customer) :: Array{Order,1} # unique, covering\n", "Supplier:\n", " name :: ASCIIString\n", " address :: ASCIIString\n", " nation :: Nation\n", " phone :: ASCIIString\n", " acctbal :: RBT.Monetary{:USD}\n", " comment :: ASCIIString\n", " partsupp (inverse of Partsupp.supplier) :: Array{Partsupp,1} # unique, covering\n", " lineitem (inverse of Lineitem.supplier) :: Array{Lineitem,1} # unique, covering\n", "Part:\n", " name :: ASCIIString\n", " mfgr :: ASCIIString\n", " brand :: ASCIIString\n", " type_ :: ASCIIString\n", " size :: Int64\n", " container :: ASCIIString\n", " retailprice :: RBT.Monetary{:USD}\n", " comment :: ASCIIString\n", " partsupp (inverse of Partsupp.part) :: Array{Partsupp,1} # unique, covering\n", " lineitem (inverse of Lineitem.part) :: Array{Lineitem,1} # unique, covering\n", "Partsupp:\n", " part :: Part\n", " supplier :: Supplier\n", " availqty :: Int64\n", " supplycost :: RBT.Monetary{:USD}\n", " comment :: ASCIIString\n", " lineitem (inverse of Lineitem.partsupp) :: Array{Lineitem,1} # unique, covering\n", "Order:\n", " customer :: Customer\n", " orderstatus :: ASCIIString\n", " totalprice :: RBT.Monetary{:USD}\n", " orderdate :: Date\n", " orderpriority :: ASCIIString\n", " clerk :: ASCIIString\n", " shippriority :: Int64\n", " comment :: ASCIIString\n", " lineitem (inverse of Lineitem.order) :: Array{Lineitem,1} # unique, covering\n", "Lineitem:\n", " order :: Order\n", " part :: Part\n", " supplier :: Supplier\n", " partsupp :: Partsupp\n", " linenumber :: Int64\n", " quantity :: Int64\n", " extendedprice :: RBT.Monetary{:USD}\n", " discount :: Float64\n", " tax :: Float64\n", " returnflag :: ASCIIString\n", " linestatus :: ASCIIString\n", " shipdate :: Date\n", " commitdate :: Date\n", " receiptdate :: Date\n", " shipinstruct :: ASCIIString\n", " shipmode :: ASCIIString\n", " comment :: ASCIIString" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "include(\"../tpch.jl\")\n", "\n", "using RBT\n", "setdb(tpch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The TPC-H benchmark consists of a generated dataset of customers, suppliers, products and orders as well as a collection of queries that perform various types of business analysis.\n", "\n", "Table `order` lists all orders made by customers." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
nametotalpriceorderdate
1Customer#000000296159877.961996-01-02
2Customer#00000062569858.911996-12-01
3Customer#000000988183718.981993-10-14
4Customer#00000109650463.931995-10-11
5Customer#000000356133409.481994-07-30
6Customer#00000044535731.211992-02-21
7Customer#000000314229881.301996-01-10
8Customer#000001042173503.971995-07-16
9Customer#000000536135080.421993-10-27
10Customer#00000049064891.671998-07-21
" ], "text/plain": [ "12000x3 DataFrames.DataFrame\n", "| Row | name | totalprice | orderdate |\n", "|-------|----------------------|------------|------------|\n", "| 1 | \"Customer#000000296\" | 159877.96 | 1996-01-02 |\n", "⋮\n", "| 11999 | \"Customer#000001036\" | 96525.52 | 1993-08-15 |\n", "| 12000 | \"Customer#000000871\" | 43237.08 | 1992-08-14 |" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " order\n", " :select(\n", " customer.name, totalprice, orderdate))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Individual items in each order are stored in the `lineitem` table. Each item describes the ordered part, the quantity, the supplier and the pricing information." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
customerlinenumberpartsupplierquantityextendedpricediscounttaxshipdate
1Customer#0000002961dodger brown ivory blue pinkSupplier#0000000781719435.080.040.021996-03-13
2Customer#0000002962dodger midnight salmon drab saddleSupplier#0000000603651823.080.090.061996-04-12
3Customer#0000002963blush thistle orchid red laceSupplier#000000031811284.080.10.021996-01-29
4Customer#0000002964turquoise indian lemon lavender mistySupplier#0000000392825704.280.090.061996-04-21
5Customer#0000002965turquoise lime royal metallic azureSupplier#0000000202426236.560.10.041996-03-30
6Customer#0000002966burnished black blue metallic orchidSupplier#0000000093232835.840.070.021996-01-30
7Customer#0000006251peach goldenrod honeydew moccasin siennaSupplier#0000000313866532.300.00.051997-01-28
8Customer#0000009881green blush tomato burlywood seashellSupplier#0000000164542076.350.060.01994-02-02
9Customer#0000009882linen frosted slate coral peruSupplier#0000000574951604.350.10.01993-11-09
10Customer#0000009883blue drab lime khaki siennaSupplier#0000000532725083.540.060.071994-01-16
" ], "text/plain": [ "48214x9 DataFrames.DataFrame\n", "| Row | customer | linenumber |\n", "|-------|----------------------|------------|\n", "| 1 | \"Customer#000000296\" | 1 |\n", "⋮\n", "| 48213 | \"Customer#000001036\" | 3 |\n", "| 48214 | \"Customer#000000871\" | 1 |\n", "\n", "| Row | part | supplier |\n", "|-------|-------------------------------------|----------------------|\n", "| 1 | \"dodger brown ivory blue pink\" | \"Supplier#000000078\" |\n", "⋮\n", "| 48213 | \"peach antique deep peru saddle\" | \"Supplier#000000016\" |\n", "| 48214 | \"burlywood orchid dark drab dodger\" | \"Supplier#000000007\" |\n", "\n", "| Row | quantity | extendedprice | discount | tax | shipdate |\n", "|-------|----------|---------------|----------|------|------------|\n", "| 1 | 17 | 19435.08 | 0.04 | 0.02 | 1996-03-13 |\n", "⋮\n", "| 48213 | 29 | 34489.12 | 0.06 | 0.0 | 1993-11-08 |\n", "| 48214 | 39 | 44703.36 | 0.07 | 0.04 | 1992-11-24 |" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :select(\n", " order.customer, linenumber, part, supplier, quantity, extendedprice, discount, tax, shipdate))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each customer and supplier, we track their country of origin. Table `nation` contains a list of all countries partitioned by region." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
nameregioncust_countsupp_count
1ALGERIAAFRICA513
2ARGENTINAAMERICA503
3BRAZILAMERICA631
4CANADAAMERICA482
5EGYPTMIDDLE EAST564
6ETHIOPIAAFRICA443
7FRANCEEUROPE321
8GERMANYEUROPE414
9INDIAASIA525
10INDONESIAASIA564
" ], "text/plain": [ "25x4 DataFrames.DataFrame\n", "| Row | name | region | cust_count | supp_count |\n", "|-----|------------------|-----------|------------|------------|\n", "| 1 | \"ALGERIA\" | \"AFRICA\" | 51 | 3 |\n", "⋮\n", "| 24 | \"UNITED KINGDOM\" | \"EUROPE\" | 42 | 3 |\n", "| 25 | \"UNITED STATES\" | \"AMERICA\" | 31 | 6 |" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " nation\n", " :select(\n", " name,\n", " region,\n", " cust_count => count(customer),\n", " supp_count => count(supplier)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following query definitions are taken directly from TPC-H documentation. We will show how to incrementally construct these queries in Rabbit." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Pricing Summary Report Query (Q1)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 29]*\n", "\n", "This query reports the amount of business that was billed, shipped, and returned.\n", "\n", "### Business Question\n", "The Pricing Summary Report Query provides a summary pricing report for all lineitems shipped as of a given date.\n", "The date is within 60 - 120 days of the greatest ship date contained in the database. The query lists totals for\n", "extended price, discounted extended price, discounted extended price plus tax, average quantity, average extended\n", "price, and average discount. These aggregates are grouped by RETURNFLAG and LINESTATUS, and listed in\n", "ascending order of RETURNFLAG and LINESTATUS. A count of the number of lineitems in each group is\n", "included.\n", " \n", "### Query Definition\n", "```\n", "select\n", " l_returnflag,\n", " l_linestatus,\n", " sum(l_quantity) as sum_qty,\n", " sum(l_extendedprice) as sum_base_price,\n", " sum(l_extendedprice*(1-l_discount)) as sum_disc_price,\n", " sum(l_extendedprice*(1-l_discount)*(1+l_tax)) as sum_charge,\n", " avg(l_quantity) as avg_qty,\n", " avg(l_extendedprice) as avg_price,\n", " avg(l_discount) as avg_disc,\n", " count(*) as count_order\n", "from\n", " lineitem\n", "where\n", " l_shipdate <= date '1998-12-01' - interval '[DELTA]' day\n", "group by\n", " l_returnflag,\n", " l_linestatus\n", "order by\n", " l_returnflag,\n", " l_linestatus;\n", "```\n", "\n", "### Substitution Parameters\n", "1. DELTA = 90." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To generate this report, we use the data from the `lineitem` table." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
orderlinenumbershipdatequantityextendedpricediscounttaxreturnflaglinestatus
1Customer#00000029611996-03-131719435.080.040.02NO
2Customer#00000029621996-04-123651823.080.090.06NO
3Customer#00000029631996-01-29811284.080.10.02NO
4Customer#00000029641996-04-212825704.280.090.06NO
5Customer#00000029651996-03-302426236.560.10.04NO
6Customer#00000029661996-01-303232835.840.070.02NO
7Customer#00000062511997-01-283866532.300.00.05NO
8Customer#00000098811994-02-024542076.350.060.0RF
9Customer#00000098821993-11-094951604.350.10.0RF
10Customer#00000098831994-01-162725083.540.060.07AF
" ], "text/plain": [ "48214x9 DataFrames.DataFrame\n", "| Row | order | linenumber | shipdate | quantity |\n", "|-------|----------------------|------------|------------|----------|\n", "| 1 | \"Customer#000000296\" | 1 | 1996-03-13 | 17 |\n", "⋮\n", "| 48213 | \"Customer#000001036\" | 3 | 1993-11-08 | 29 |\n", "| 48214 | \"Customer#000000871\" | 1 | 1992-11-24 | 39 |\n", "\n", "| Row | extendedprice | discount | tax | returnflag | linestatus |\n", "|-------|---------------|----------|------|------------|------------|\n", "| 1 | 19435.08 | 0.04 | 0.02 | \"N\" | \"O\" |\n", "⋮\n", "| 48213 | 34489.12 | 0.06 | 0.0 | \"R\" | \"F\" |\n", "| 48214 | 44703.36 | 0.07 | 0.04 | \"R\" | \"F\" |" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :select(order, linenumber, shipdate, quantity, extendedprice, discount, tax, returnflag, linestatus))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We start with extracting all lineitems in the selected date range." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
orderpartsupplierlinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment
1Customer#000000296dodger brown ivory blue pinkSupplier#00000007811719435.080.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKegular courts above the
2Customer#000000296dodger midnight salmon drab saddleSupplier#00000006023651823.080.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILly final dependencies: slyly bold
3Customer#000000296blush thistle orchid red laceSupplier#0000000313811284.080.10.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRriously. regular, express dep
4Customer#000000296turquoise indian lemon lavender mistySupplier#00000003942825704.280.090.06NO1996-04-211996-03-301996-05-16NONEAIRlites. fluffily even de
5Customer#000000296turquoise lime royal metallic azureSupplier#00000002052426236.560.10.04NO1996-03-301996-03-141996-04-01NONEFOB pending foxes. slyly re
6Customer#000000296burnished black blue metallic orchidSupplier#00000000963232835.840.070.02NO1996-01-301996-02-071996-02-03DELIVER IN PERSONMAILarefully slyly ex
7Customer#000000625peach goldenrod honeydew moccasin siennaSupplier#00000003113866532.300.00.05NO1997-01-281997-01-141997-02-02TAKE BACK RETURNRAILven requests. deposits breach a
8Customer#000000988green blush tomato burlywood seashellSupplier#00000001614542076.350.060.0RF1994-02-021994-01-041994-02-23NONEAIRongside of the furiously brave acco
9Customer#000000988linen frosted slate coral peruSupplier#00000005724951604.350.10.0RF1993-11-091993-12-201993-11-24TAKE BACK RETURNRAIL unusual accounts. eve
10Customer#000000988blue drab lime khaki siennaSupplier#00000005332725083.540.060.07AF1994-01-161993-11-221994-01-23DELIVER IN PERSONSHIPnal foxes wake.
" ], "text/plain": [ "47535x16 DataFrames.DataFrame\n", "| Row | order | part |\n", "|-------|----------------------|-------------------------------------|\n", "| 1 | \"Customer#000000296\" | \"dodger brown ivory blue pink\" |\n", "⋮\n", "| 47534 | \"Customer#000001036\" | \"peach antique deep peru saddle\" |\n", "| 47535 | \"Customer#000000871\" | \"burlywood orchid dark drab dodger\" |\n", "\n", "| Row | supplier | linenumber | quantity | extendedprice |\n", "|-------|----------------------|------------|----------|---------------|\n", "| 1 | \"Supplier#000000078\" | 1 | 17 | 19435.08 |\n", "⋮\n", "| 47534 | \"Supplier#000000016\" | 3 | 29 | 34489.12 |\n", "| 47535 | \"Supplier#000000007\" | 1 | 39 | 44703.36 |\n", "\n", "| Row | discount | tax | returnflag | linestatus | shipdate | commitdate |\n", "|-------|----------|------|------------|------------|------------|------------|\n", "| 1 | 0.04 | 0.02 | \"N\" | \"O\" | 1996-03-13 | 1996-02-12 |\n", "⋮\n", "| 47534 | 0.06 | 0.0 | \"R\" | \"F\" | 1993-11-08 | 1993-09-29 |\n", "| 47535 | 0.07 | 0.04 | \"R\" | \"F\" | 1992-11-24 | 1992-10-21 |\n", "\n", "| Row | receiptdate | shipinstruct | shipmode |\n", "|-------|-------------|---------------------|----------|\n", "| 1 | 1996-03-22 | \"DELIVER IN PERSON\" | \"TRUCK\" |\n", "⋮\n", "| 47534 | 1993-11-16 | \"TAKE BACK RETURN\" | \"MAIL\" |\n", "| 47535 | 1992-12-12 | \"TAKE BACK RETURN\" | \"MAIL\" |\n", "\n", "| Row | comment |\n", "|-------|----------------------------------------|\n", "| 1 | \"egular courts above the\" |\n", "⋮\n", "| 47534 | \"ackages may serve above the furiousl\" |\n", "| 47535 | \"arefully ironic as\" |" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(shipdate <= date(\"1998-12-01\") - DELTA*days),\n", " DELTA=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we group the lineitems by values of `returnflag` and `linestatus` attributes." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
returnflaglinestatuslineitem
1AF
orderpartsupplierlinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment
1Customer#000000988blue drab lime khaki siennaSupplier#00000005332725083.540.060.07AF1994-01-161993-11-221994-01-23DELIVER IN PERSONSHIPnal foxes wake.
2Customer#000000988salmon antique burlywood linen peachSupplier#000000019422272.460.010.06AF1993-12-041994-01-071994-01-01NONETRUCKy. fluffily pending d
3Customer#000000988sandy sky gainsboro peach cornflowerSupplier#00000001762636360.740.10.02AF1993-10-291993-12-181993-11-04TAKE BACK RETURNRAILges sleep after the caref
4Customer#000000356misty snow lace burnished linenSupplier#00000000535060065.000.080.03AF1994-08-081994-10-131994-08-26DELIVER IN PERSONAIReodolites. fluffily unusual
2NF
orderpartsupplierlinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment
1Customer#000000262hot midnight orchid dim steelSupplier#00000002742526103.500.040.01NF1995-06-131995-05-231995-06-24TAKE BACK RETURNFOBs-- quickly final accounts
2Customer#000000760tan thistle frosted indian lawnSupplier#00000001113132523.340.00.03NF1995-06-051995-06-181995-06-26COLLECT CODFOBfinal theodolites. fluffil
3Customer#000000380pale navajo royal papaya thistleSupplier#00000004214250160.180.030.0NF1995-06-071995-05-291995-06-23TAKE BACK RETURNSHIPy asymptotes. regular depen
4Customer#000001066navy light red royal oliveSupplier#00000006111618468.000.10.06NF1995-05-231995-05-071995-06-19NONETRUCKun quickly slyly
3NO
orderpartsupplierlinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment
1Customer#000000296dodger brown ivory blue pinkSupplier#00000007811719435.080.040.02NO1996-03-131996-02-121996-03-22DELIVER IN PERSONTRUCKegular courts above the
2Customer#000000296dodger midnight salmon drab saddleSupplier#00000006023651823.080.090.06NO1996-04-121996-02-281996-04-20TAKE BACK RETURNMAILly final dependencies: slyly bold
3Customer#000000296blush thistle orchid red laceSupplier#0000000313811284.080.10.02NO1996-01-291996-03-051996-01-31TAKE BACK RETURNREG AIRriously. regular, express dep
4Customer#000000296turquoise indian lemon lavender mistySupplier#00000003942825704.280.090.06NO1996-04-211996-03-301996-05-16NONEAIRlites. fluffily even de
4RF
orderpartsupplierlinenumberquantityextendedpricediscounttaxreturnflaglinestatusshipdatecommitdatereceiptdateshipinstructshipmodecomment
1Customer#000000988green blush tomato burlywood seashellSupplier#00000001614542076.350.060.0RF1994-02-021994-01-041994-02-23NONEAIRongside of the furiously brave acco
2Customer#000000988linen frosted slate coral peruSupplier#00000005724951604.350.10.0RF1993-11-091993-12-201993-11-24TAKE BACK RETURNRAIL unusual accounts. eve
3Customer#000000988bisque saddle cyan rose wheatSupplier#00000006052838260.880.040.0RF1993-12-141994-01-101994-01-01TAKE BACK RETURNFOBages nag slyly pending
4Customer#000000356snow blush violet lace ghostSupplier#00000007011526547.900.020.04RF1994-10-311994-08-311994-11-20NONEAIRts wake furiously
" ], "text/plain": [ "4x3 DataFrames.DataFrame\n", "| Row | returnflag | linestatus |\n", "|-----|------------|------------|\n", "| 1 | \"A\" | \"F\" |\n", "| 2 | \"N\" | \"F\" |\n", "| 3 | \"N\" | \"O\" |\n", "| 4 | \"R\" | \"F\" |\n", "\n", "| Row | lineitem |\n", "|-----|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| 1 | 11958x16 DataFrames.DataFrame\n", "| Row | order | part |\n", "|-------|----------------------|--------------------------------------|\n", "| 1 | \"Customer#000000988\" | \"blue drab lime khaki sienna\" |\n", "⋮\n", "| 11957 | \"Customer#000000529\" | \"chiffon royal lime almond midnight\" |\n", "| 11958 | \"Customer#000001036\" | \"medium forest blue ghost black\" |\n", "\n", "| Row | supplier | linenumber | quantity | extendedprice |\n", "|-------|----------------------|------------|----------|---------------|\n", "| 1 | \"Supplier#000000053\" | 3 | 27 | 25083.54 |\n", "⋮\n", "| 11957 | \"Supplier#000000034\" | 4 | 14 | 24960.32 |\n", "| 11958 | \"Supplier#000000063\" | 2 | 42 | 38724.84 |\n", "\n", "| Row | discount | tax | returnflag | linestatus | shipdate | commitdate |\n", "|-------|----------|------|------------|------------|------------|------------|\n", "| 1 | 0.06 | 0.07 | \"A\" | \"F\" | 1994-01-16 | 1993-11-22 |\n", "⋮\n", "| 11957 | 0.08 | 0.07 | \"A\" | \"F\" | 1994-12-10 | 1994-10-07 |\n", "| 11958 | 0.05 | 0.02 | \"A\" | \"F\" | 1993-12-13 | 1993-09-29 |\n", "\n", "| Row | receiptdate | shipinstruct | shipmode | comment |\n", "|-------|-------------|---------------------|----------|--------------------|\n", "| 1 | 1994-01-23 | \"DELIVER IN PERSON\" | \"SHIP\" | \"nal foxes wake. \" |\n", "⋮\n", "| 11957 | 1994-12-27 | \"TAKE BACK RETURN\" | \"TRUCK\" | \" express reques\" |\n", "| 11958 | 1993-12-26 | \"COLLECT COD\" | \"SHIP\" | \"tions. daringl\" | |\n", "| 2 | 288x16 DataFrames.DataFrame\n", "| Row | order | part |\n", "|-----|----------------------|----------------------------------------------|\n", "| 1 | \"Customer#000000262\" | \"hot midnight orchid dim steel\" |\n", "⋮\n", "| 287 | \"Customer#000000397\" | \"wheat blush forest metallic navajo\" |\n", "| 288 | \"Customer#000000397\" | \"burnished seashell floral moccasin antique\" |\n", "\n", "| Row | supplier | linenumber | quantity | extendedprice |\n", "|-----|----------------------|------------|----------|---------------|\n", "| 1 | \"Supplier#000000027\" | 4 | 25 | 26103.50 |\n", "⋮\n", "| 287 | \"Supplier#000000045\" | 1 | 2 | 2048.24 |\n", "| 288 | \"Supplier#000000075\" | 2 | 50 | 85740.50 |\n", "\n", "| Row | discount | tax | returnflag | linestatus | shipdate | commitdate |\n", "|-----|----------|------|------------|------------|------------|------------|\n", "| 1 | 0.04 | 0.01 | \"N\" | \"F\" | 1995-06-13 | 1995-05-23 |\n", "⋮\n", "| 287 | 0.04 | 0.0 | \"N\" | \"F\" | 1995-06-17 | 1995-07-08 |\n", "| 288 | 0.02 | 0.01 | \"N\" | \"F\" | 1995-05-28 | 1995-08-03 |\n", "\n", "| Row | receiptdate | shipinstruct | shipmode |\n", "|-----|-------------|--------------------|----------|\n", "| 1 | 1995-06-24 | \"TAKE BACK RETURN\" | \"FOB\" |\n", "⋮\n", "| 287 | 1995-06-20 | \"NONE\" | \"FOB\" |\n", "| 288 | 1995-06-25 | \"COLLECT COD\" | \"FOB\" |\n", "\n", "| Row | comment |\n", "|-----|------------------------------|\n", "| 1 | \"s-- quickly final accounts\" |\n", "⋮\n", "| 287 | \"refully slyly ironic re\" |\n", "| 288 | \"ly enticing ideas\" | |\n", "| 3 | 23326x16 DataFrames.DataFrame\n", "| Row | order | part |\n", "|-------|----------------------|---------------------------------------|\n", "| 1 | \"Customer#000000296\" | \"dodger brown ivory blue pink\" |\n", "⋮\n", "| 23325 | \"Customer#000000110\" | \"pale tan lavender powder cornflower\" |\n", "| 23326 | \"Customer#000000110\" | \"plum midnight coral snow lemon\" |\n", "\n", "| Row | supplier | linenumber | quantity | extendedprice |\n", "|-------|----------------------|------------|----------|---------------|\n", "| 1 | \"Supplier#000000078\" | 1 | 17 | 19435.08 |\n", "⋮\n", "| 23325 | \"Supplier#000000051\" | 6 | 30 | 36939.90 |\n", "| 23326 | \"Supplier#000000044\" | 7 | 35 | 49893.20 |\n", "\n", "| Row | discount | tax | returnflag | linestatus | shipdate | commitdate |\n", "|-------|----------|------|------------|------------|------------|------------|\n", "| 1 | 0.04 | 0.02 | \"N\" | \"O\" | 1996-03-13 | 1996-02-12 |\n", "⋮\n", "| 23325 | 0.07 | 0.05 | \"N\" | \"O\" | 1997-08-14 | 1997-09-13 |\n", "| 23326 | 0.09 | 0.01 | \"N\" | \"O\" | 1997-09-08 | 1997-08-27 |\n", "\n", "| Row | receiptdate | shipinstruct | shipmode |\n", "|-------|-------------|---------------------|----------|\n", "| 1 | 1996-03-22 | \"DELIVER IN PERSON\" | \"TRUCK\" |\n", "⋮\n", "| 23325 | 1997-09-08 | \"NONE\" | \"TRUCK\" |\n", "| 23326 | 1997-09-15 | \"NONE\" | \"MAIL\" |\n", "\n", "| Row | comment |\n", "|-------|-----------------------------------------------|\n", "| 1 | \"egular courts above the\" |\n", "⋮\n", "| 23325 | \"ously bold packages. foxes will have to wak\" |\n", "| 23326 | \"cuses use finally unusual platel\" | |\n", "| 4 | 11963x16 DataFrames.DataFrame\n", "| Row | order | part |\n", "|-------|----------------------|-----------------------------------------|\n", "| 1 | \"Customer#000000988\" | \"green blush tomato burlywood seashell\" |\n", "⋮\n", "| 11962 | \"Customer#000001036\" | \"peach antique deep peru saddle\" |\n", "| 11963 | \"Customer#000000871\" | \"burlywood orchid dark drab dodger\" |\n", "\n", "| Row | supplier | linenumber | quantity | extendedprice |\n", "|-------|----------------------|------------|----------|---------------|\n", "| 1 | \"Supplier#000000016\" | 1 | 45 | 42076.35 |\n", "⋮\n", "| 11962 | \"Supplier#000000016\" | 3 | 29 | 34489.12 |\n", "| 11963 | \"Supplier#000000007\" | 1 | 39 | 44703.36 |\n", "\n", "| Row | discount | tax | returnflag | linestatus | shipdate | commitdate |\n", "|-------|----------|------|------------|------------|------------|------------|\n", "| 1 | 0.06 | 0.0 | \"R\" | \"F\" | 1994-02-02 | 1994-01-04 |\n", "⋮\n", "| 11962 | 0.06 | 0.0 | \"R\" | \"F\" | 1993-11-08 | 1993-09-29 |\n", "| 11963 | 0.07 | 0.04 | \"R\" | \"F\" | 1992-11-24 | 1992-10-21 |\n", "\n", "| Row | receiptdate | shipinstruct | shipmode |\n", "|-------|-------------|--------------------|----------|\n", "| 1 | 1994-02-23 | \"NONE\" | \"AIR\" |\n", "⋮\n", "| 11962 | 1993-11-16 | \"TAKE BACK RETURN\" | \"MAIL\" |\n", "| 11963 | 1992-12-12 | \"TAKE BACK RETURN\" | \"MAIL\" |\n", "\n", "| Row | comment |\n", "|-------|----------------------------------------|\n", "| 1 | \"ongside of the furiously brave acco\" |\n", "⋮\n", "| 11962 | \"ackages may serve above the furiousl\" |\n", "| 11963 | \"arefully ironic as\" | |" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(shipdate <= date(\"1998-12-01\") - DELTA*days)\n", " :group(returnflag, linestatus),\n", " DELTA=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For each distinct combination of `returnflag` and `linestatus` values, we see a collection of corresponding lineitem rows. We can now use aggregate functions to summarize information about matching lineitems.\n", "\n", "For example, to find the total number of matching lineitems, we can use `count` aggregate." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
returnflaglinestatus
1AF11958
2NF288
3NO23326
4RF11963
" ], "text/plain": [ "4x3 DataFrames.DataFrame\n", "| Row | returnflag | linestatus | |\n", "|-----|------------|------------|-------|\n", "| 1 | \"A\" | \"F\" | 11958 |\n", "| 2 | \"N\" | \"F\" | 288 |\n", "| 3 | \"N\" | \"O\" | 23326 |\n", "| 4 | \"R\" | \"F\" | 11963 |" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(shipdate <= date(\"1998-12-01\") - DELTA*days)\n", " :group(returnflag, linestatus)\n", " :select(returnflag, linestatus, count(lineitem)),\n", " DELTA=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The query output now has the shape of the report. We only need to calculate the respective fields." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
returnflaglinestatussum_qtysum_base_pricesum_disc_pricesum_chargeavg_qtyavg_priceavg_disccount_order
1AF303771401515429.00381465877.43396851569.6425.4031610637230333577.140.0501195852149188211958
2NF73549721375.749247800.689616674.4325.5347222222222233754.780.04809027777777776288
3NO593340787065026.86748034019.78777970207.0525.43685158192574733741.960.0498851067478349723326
4RF306623406030555.01385792652.67401431970.2225.63094541502967533940.530.0497308367466354111963
" ], "text/plain": [ "4x10 DataFrames.DataFrame\n", "| Row | returnflag | linestatus | sum_qty | sum_base_price | sum_disc_price |\n", "|-----|------------|------------|---------|----------------|----------------|\n", "| 1 | \"A\" | \"F\" | 303771 | 401515429.00 | 381465877.43 |\n", "| 2 | \"N\" | \"F\" | 7354 | 9721375.74 | 9247800.68 |\n", "| 3 | \"N\" | \"O\" | 593340 | 787065026.86 | 748034019.78 |\n", "| 4 | \"R\" | \"F\" | 306623 | 406030555.01 | 385792652.67 |\n", "\n", "| Row | sum_charge | avg_qty | avg_price | avg_disc | count_order |\n", "|-----|--------------|---------|-----------|-----------|-------------|\n", "| 1 | 396851569.64 | 25.4032 | 33577.14 | 0.0501196 | 11958 |\n", "| 2 | 9616674.43 | 25.5347 | 33754.78 | 0.0480903 | 288 |\n", "| 3 | 777970207.05 | 25.4369 | 33741.96 | 0.0498851 | 23326 |\n", "| 4 | 401431970.22 | 25.6309 | 33940.53 | 0.0497308 | 11963 |" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(shipdate <= date(\"1998-12-01\") - DELTA*days)\n", " :group(returnflag, linestatus)\n", " :select(\n", " returnflag,\n", " linestatus,\n", " sum_qty => sum(lineitem.quantity),\n", " sum_base_price => sum(lineitem.extendedprice),\n", " sum_disc_price => sum(lineitem.(extendedprice*(1-discount))),\n", " sum_charge => sum(lineitem.(extendedprice*(1-discount)*(1+tax))),\n", " avg_qty => mean(lineitem.quantity),\n", " avg_price => mean(lineitem.extendedprice),\n", " avg_disc => mean(lineitem.discount),\n", " count_order => count(lineitem)),\n", " DELTA=90)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Minimum Cost Supplier Query (Q2)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 30]*\n", "\n", "This query finds which supplier should be selected to place an order for a given part in a given region.\n", "\n", "### Business Question\n", "The Minimum Cost Supplier Query finds, in a given region, for each part of a certain type and size, the supplier who\n", "can supply it at minimum cost. If several suppliers in that region offer the desired part type and size at the same\n", "(minimum) cost, the query lists the parts from suppliers with the 100 highest account balances. For each supplier,\n", "the query lists the supplier's account balance, name and nation; the part's number and manufacturer; the supplier's\n", "address, phone number and comment information.\n", "\n", "\n", "### Query Definition\n", "```\n", "select\n", " s_acctbal,\n", " s_name,\n", " n_name,\n", " p_partkey,\n", " p_mfgr,\n", " s_address,\n", " s_phone,\n", " s_comment\n", "from\n", " part,\n", " supplier,\n", " partsupp,\n", " nation,\n", " region\n", "where\n", " p_partkey = ps_partkey\n", " and s_suppkey = ps_suppkey\n", " and p_size = [SIZE]\n", " and p_type like '%[TYPE]'\n", " and s_nationkey = n_nationkey\n", " and n_regionkey = r_regionkey\n", " and r_name = '[REGION]'\n", " and ps_supplycost = (\n", " select\n", " min(ps_supplycost)\n", " from\n", " partsupp, supplier,\n", " nation, region\n", " where\n", " p_partkey = ps_partkey\n", " and s_suppkey = ps_suppkey\n", " and s_nationkey = n_nationkey\n", " and n_regionkey = r_regionkey\n", " and r_name = '[REGION]')\n", "order by\n", " s_acctbal desc,\n", " n_name,\n", " s_name,\n", " p_partkey\n", "limit 100;\n", "```\n", "\n", "### Substitution Parameters\n", "1. SIZE = 8;\n", "2. TYPE = BRASS;\n", "3. REGION = EUROPE." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This report extracts data from tables `part`, `supplier` as well as table `partsupp` that relates each supplier to their inventory of parts. We start with getting a list of parts of the given size and type in the inventories of all suppliers." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
partsupplieravailqtysupplycostcomment
1drab aquamarine red papaya paleSupplier#000000009773165.54ronic accounts haggle blithely across the blithely special platelets. furiously final ideas haggle carefully after the fluffily unusual dep
2drab aquamarine red papaya paleSupplier#0000000328847935.64ular, regular requests affix ironically enticing ideas. fur
3drab aquamarine red papaya paleSupplier#0000000556329148.79enticingly final ideas sleep fluffily. blithely final ideas boost fluffily ironic asymptote
4drab aquamarine red papaya paleSupplier#0000000788054425.78the express accounts. even, ironic pinto beans wake furious
5blue gainsboro sky burnished puffSupplier#0000000148029765.35ous packages sleep slyly across the quickly ironic accounts. regular packages thrash above the furiously final theodolites. furi
6blue gainsboro sky burnished puffSupplier#0000000401481319.99coys against the even instructions cajole furiously bold, daring courts. slyly ironic deposits ought to wake blithely regular ideas. slyly regular ideas according to the slyly express ac
7blue gainsboro sky burnished puffSupplier#000000066718634.58furiously slyly ironic packages. pending requests use ruthlessly across the blithely bold deposits. furiously fin
8blue gainsboro sky burnished puffSupplier#0000000122818237.16 the carefully special frays. final excuses haggle slyly. furiously regular deposits across t
9plum indian cornflower frosted purpleSupplier#0000000731076844.74y ironic instructions. even, final requests are. bold instructions use silent, regular accounts-- slyly final excuses doubt fluffily after the quickly p
10plum indian cornflower frosted purpleSupplier#0000000229272904.83ly regular packages. ruthless instructions haggle carefully final, special courts. silently ruthless courts hinder slyly regular packages; regular, regular pac
" ], "text/plain": [ "32x5 DataFrames.DataFrame\n", "| Row | part | supplier | availqty |\n", "|-----|-------------------------------------|----------------------|----------|\n", "| 1 | \"drab aquamarine red papaya pale\" | \"Supplier#000000009\" | 7731 |\n", "⋮\n", "| 31 | \"plum beige misty cream cornflower\" | \"Supplier#000000013\" | 2973 |\n", "| 32 | \"plum beige misty cream cornflower\" | \"Supplier#000000050\" | 5417 |\n", "\n", "| Row | supplycost |\n", "|-----|------------|\n", "| 1 | 65.54 |\n", "⋮\n", "| 31 | 727.39 |\n", "| 32 | 620.51 |\n", "\n", "| Row | comment |\n", "|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| 1 | \"ronic accounts haggle blithely across the blithely special platelets. furiously final ideas haggle carefully after the fluffily unusual dep\" |\n", "⋮\n", "| 31 | \"at slyly final packages. enticingly pending excuses across the sometimes final dependencies sleep evenly above the express dep\" |\n", "| 32 | \"requests against the blithely unusual deposits wake furiously furiously special platelets. ironically unusual accounts detect blithely. finally unusu\" |" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " (part.size == SIZE) &\n", " contains(part.type_, TYPE)),\n", " SIZE=8, TYPE=\"BRASS\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Among all suppliers, we are only interested in those who are situated in the given REGION." ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
partsupplieravailqtysupplycostcomment
1drab aquamarine red papaya paleSupplier#0000000328847935.64ular, regular requests affix ironically enticing ideas. fur
2blue gainsboro sky burnished puffSupplier#0000000401481319.99coys against the even instructions cajole furiously bold, daring courts. slyly ironic deposits ought to wake blithely regular ideas. slyly regular ideas according to the slyly express ac
3blue gainsboro sky burnished puffSupplier#000000066718634.58furiously slyly ironic packages. pending requests use ruthlessly across the blithely bold deposits. furiously fin
4tan ghost cyan salmon goldenrodSupplier#000000066322104.62g the quickly express dolphins. slyly even instructions nag furiously. carefully final ideas are idly above the ironic asymptotes. unusual requests haggle about t
5tan ghost cyan salmon goldenrodSupplier#0000000166830298.77ss orbits. furiously regular deposits haggle blithely carefully final pinto beans. ironic, silent accounts thrash quickly about the accounts. iron
6frosted blue ghost purple coralSupplier#000000025290774.80r pinto beans-- quickly final instructions integrate alongside of the sly, silent th
7burlywood seashell slate white creamSupplier#0000000409737799.94ns cajole idly. deposits are quickly blithely even deposits. quickly regular packages boost carefully. blithely ironic asympto
8burlywood seashell slate white creamSupplier#0000000779194240.78l requests cajole carefully. blithely express gifts
" ], "text/plain": [ "8x5 DataFrames.DataFrame\n", "| Row | part | supplier |\n", "|-----|----------------------------------------|----------------------|\n", "| 1 | \"drab aquamarine red papaya pale\" | \"Supplier#000000032\" |\n", "⋮\n", "| 7 | \"burlywood seashell slate white cream\" | \"Supplier#000000040\" |\n", "| 8 | \"burlywood seashell slate white cream\" | \"Supplier#000000077\" |\n", "\n", "| Row | availqty | supplycost |\n", "|-----|----------|------------|\n", "| 1 | 8847 | 935.64 |\n", "⋮\n", "| 7 | 9737 | 799.94 |\n", "| 8 | 9194 | 240.78 |\n", "\n", "| Row | comment |\n", "|-----|----------------------------------------------------------------------------------------------------------------------------------|\n", "| 1 | \"ular, regular requests affix ironically enticing ideas. fur\" |\n", "⋮\n", "| 7 | \"ns cajole idly. deposits are quickly blithely even deposits. quickly regular packages boost carefully. blithely ironic asympto\" |\n", "| 8 | \"l requests cajole carefully. blithely express gifts\" |" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " (part.size == SIZE) &\n", " contains(part.type_, TYPE) &\n", " (supplier.nation.region.name == REGION)),\n", " SIZE=8, TYPE=\"BRASS\", REGION=\"EUROPE\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some parts are offered by more than one supplier. We can use combinator `and_around(part)` to relate each row of the output to all the others that provide the same part." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
partsuppliersupplycostothersupplycost
1drab aquamarine red papaya paleSupplier#000000032935.64[935.64]
2blue gainsboro sky burnished puffSupplier#000000040319.99[319.99,634.58]
3blue gainsboro sky burnished puffSupplier#000000066634.58[319.99,634.58]
4tan ghost cyan salmon goldenrodSupplier#000000066104.62[104.62,298.77]
5tan ghost cyan salmon goldenrodSupplier#000000016298.77[104.62,298.77]
6frosted blue ghost purple coralSupplier#000000025774.80[774.80]
7burlywood seashell slate white creamSupplier#000000040799.94[799.94,240.78]
8burlywood seashell slate white creamSupplier#000000077240.78[799.94,240.78]
" ], "text/plain": [ "8x4 DataFrames.DataFrame\n", "| Row | part | supplier |\n", "|-----|----------------------------------------|----------------------|\n", "| 1 | \"drab aquamarine red papaya pale\" | \"Supplier#000000032\" |\n", "⋮\n", "| 7 | \"burlywood seashell slate white cream\" | \"Supplier#000000040\" |\n", "| 8 | \"burlywood seashell slate white cream\" | \"Supplier#000000077\" |\n", "\n", "| Row | supplycost | othersupplycost |\n", "|-----|------------|-----------------|\n", "| 1 | 935.64 | [935.64] |\n", "⋮\n", "| 7 | 799.94 | [799.94,240.78] |\n", "| 8 | 240.78 | [799.94,240.78] |" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " (part.size == SIZE) &\n", " contains(part.type_, TYPE) &\n", " (supplier.nation.region.name == REGION))\n", " :select(part, supplier, supplycost, othersupplycost => and_around(part).supplycost),\n", " SIZE=8, TYPE=\"BRASS\", REGION=\"EUROPE\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly, we can select rows that provide their part at the minimal cost." ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
partsupplieravailqtysupplycostcomment
1drab aquamarine red papaya paleSupplier#0000000328847935.64ular, regular requests affix ironically enticing ideas. fur
2blue gainsboro sky burnished puffSupplier#0000000401481319.99coys against the even instructions cajole furiously bold, daring courts. slyly ironic deposits ought to wake blithely regular ideas. slyly regular ideas according to the slyly express ac
3tan ghost cyan salmon goldenrodSupplier#000000066322104.62g the quickly express dolphins. slyly even instructions nag furiously. carefully final ideas are idly above the ironic asymptotes. unusual requests haggle about t
4frosted blue ghost purple coralSupplier#000000025290774.80r pinto beans-- quickly final instructions integrate alongside of the sly, silent th
5burlywood seashell slate white creamSupplier#0000000779194240.78l requests cajole carefully. blithely express gifts
" ], "text/plain": [ "5x5 DataFrames.DataFrame\n", "| Row | part | supplier |\n", "|-----|----------------------------------------|----------------------|\n", "| 1 | \"drab aquamarine red papaya pale\" | \"Supplier#000000032\" |\n", "| 2 | \"blue gainsboro sky burnished puff\" | \"Supplier#000000040\" |\n", "| 3 | \"tan ghost cyan salmon goldenrod\" | \"Supplier#000000066\" |\n", "| 4 | \"frosted blue ghost purple coral\" | \"Supplier#000000025\" |\n", "| 5 | \"burlywood seashell slate white cream\" | \"Supplier#000000077\" |\n", "\n", "| Row | availqty | supplycost |\n", "|-----|----------|------------|\n", "| 1 | 8847 | 935.64 |\n", "| 2 | 1481 | 319.99 |\n", "| 3 | 322 | 104.62 |\n", "| 4 | 290 | 774.80 |\n", "| 5 | 9194 | 240.78 |\n", "\n", "| Row | comment |\n", "|-----|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n", "| 1 | \"ular, regular requests affix ironically enticing ideas. fur\" |\n", "| 2 | \"coys against the even instructions cajole furiously bold, daring courts. slyly ironic deposits ought to wake blithely regular ideas. slyly regular ideas according to the slyly express ac\" |\n", "| 3 | \"g the quickly express dolphins. slyly even instructions nag furiously. carefully final ideas are idly above the ironic asymptotes. unusual requests haggle about t\" |\n", "| 4 | \"r pinto beans-- quickly final instructions integrate alongside of the sly, silent th\" |\n", "| 5 | \"l requests cajole carefully. blithely express gifts\" |" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " (part.size == SIZE) &\n", " contains(part.type_, TYPE) &\n", " (supplier.nation.region.name == REGION))\n", " :filter(supplycost == min(and_around(part).supplycost)),\n", " SIZE=8, TYPE=\"BRASS\", REGION=\"EUROPE\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have the rows that we need in the requested report. We only need to sort them and show the requested columns." ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
acctbalnamenationidmfgraddressphonecomment
19198.31Supplier#000000025RUSSIA1156Manufacturer#5RCQKONXMFnrodzz6w7fObFVV6CUm2q32-431-945-3541ely regular deposits. carefully regular sauternes engage furiously above the regular accounts. idly
24186.95Supplier#000000077GERMANY1362Manufacturer#1wVtcr0uH3CyrSiWMLsqnB09Syo,UuZxPMeBghlY17-281-345-4863the slyly final asymptotes. blithely pending theodoli
33556.47Supplier#000000032UNITED KINGDOM248Manufacturer#1yvoD3TtZSx1skQNCK8agk5bZlZLug33-484-637-7873usly even depths. quickly ironic theodolites s
42455.98Supplier#000000066UNITED KINGDOM805Manufacturer#5qYdruFJQJYYiKvnNVmYfCVydVB8bcW,AW,U6SOV333-300-836-9529ar requests. express orbits de
5-290.06Supplier#000000040RUSSIA493Manufacturer#4zyIeWzbbpkTV37vm1nmSGBxSgd2Kp32-231-247-6991 final patterns. accounts haggle idly pas
" ], "text/plain": [ "5x8 DataFrames.DataFrame\n", "| Row | acctbal | name | nation | id |\n", "|-----|---------|----------------------|------------------|------|\n", "| 1 | 9198.31 | \"Supplier#000000025\" | \"RUSSIA\" | 1156 |\n", "| 2 | 4186.95 | \"Supplier#000000077\" | \"GERMANY\" | 1362 |\n", "| 3 | 3556.47 | \"Supplier#000000032\" | \"UNITED KINGDOM\" | 248 |\n", "| 4 | 2455.98 | \"Supplier#000000066\" | \"UNITED KINGDOM\" | 805 |\n", "| 5 | -290.06 | \"Supplier#000000040\" | \"RUSSIA\" | 493 |\n", "\n", "| Row | mfgr | address |\n", "|-----|------------------|--------------------------------------------|\n", "| 1 | \"Manufacturer#5\" | \"RCQKONXMFnrodzz6w7fObFVV6CUm2q\" |\n", "| 2 | \"Manufacturer#1\" | \"wVtcr0uH3CyrSiWMLsqnB09Syo,UuZxPMeBghlY\" |\n", "| 3 | \"Manufacturer#1\" | \"yvoD3TtZSx1skQNCK8agk5bZlZLug\" |\n", "| 4 | \"Manufacturer#5\" | \"qYdruFJQJYYiKvnNVmYfCVydVB8bcW,AW,U6SOV3\" |\n", "| 5 | \"Manufacturer#4\" | \"zyIeWzbbpkTV37vm1nmSGBxSgd2Kp\" |\n", "\n", "| Row | phone |\n", "|-----|-------------------|\n", "| 1 | \"32-431-945-3541\" |\n", "| 2 | \"17-281-345-4863\" |\n", "| 3 | \"33-484-637-7873\" |\n", "| 4 | \"33-300-836-9529\" |\n", "| 5 | \"32-231-247-6991\" |\n", "\n", "| Row | comment |\n", "|-----|--------------------------------------------------------------------------------------------------------|\n", "| 1 | \"ely regular deposits. carefully regular sauternes engage furiously above the regular accounts. idly \" |\n", "| 2 | \"the slyly final asymptotes. blithely pending theodoli\" |\n", "| 3 | \"usly even depths. quickly ironic theodolites s\" |\n", "| 4 | \"ar requests. express orbits de\" |\n", "| 5 | \" final patterns. accounts haggle idly pas\" |" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " (part.size == SIZE) &\n", " contains(part.type_, TYPE) &\n", " (supplier.nation.region.name == REGION))\n", " :filter(supplycost == min(and_around(part).supplycost))\n", " :sort(\n", " supplier.acctbal:desc,\n", " supplier.nation.name,\n", " part.name,\n", " part.id)\n", " :select(\n", " supplier.acctbal,\n", " supplier.name,\n", " supplier.nation,\n", " part.id,\n", " part.mfgr,\n", " supplier.address,\n", " supplier.phone,\n", " supplier.comment)\n", " :take(100),\n", " SIZE=8, TYPE=\"BRASS\", REGION=\"EUROPE\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shipping Priority Query (Q3)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 33]*\n", "\n", "This query retrieves the 10 unshipped orders with the highest value.\n", "\n", "### Business Question\n", "The Shipping Priority Query retrieves the shipping priority and potential revenue, defined as the sum of\n", "`l_extendedprice * (1-l_discount)`, of the orders having the largest revenue among those that had not been shipped as of a given date. Orders are listed in decreasing order of revenue. If more than 10 unshipped orders exist, only the 10 orders with the largest revenue are listed.\n", "\n", "### Query Definition\n", "```\n", "select\n", " l_orderkey,\n", " sum(l_extendedprice*(1-l_discount)) as revenue,\n", " o_orderdate,\n", " o_shippriority\n", "from\n", " customer,\n", " orders,\n", " lineitem\n", "where\n", " c_mktsegment = '[SEGMENT]'\n", " and c_custkey = o_custkey\n", " and l_orderkey = o_orderkey\n", " and o_orderdate < date '[DATE]'\n", " and l_shipdate > date '[DATE]'\n", "group by\n", " l_orderkey,\n", " o_orderdate,\n", " o_shippriority\n", "order by\n", " revenue desc,\n", " o_orderdate\n", "limit 10;\n", "```\n", "\n", "### Substitution Parameters\n", "1. SEGMENT = BUILDING;\n", "2. DATE = 1995-03-15." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
idrevenueorderdateshippriority
16240245018.101995-01-280
25822234486.931995-03-140
39974231804.681995-03-060
44608197118.761995-03-130
54672182178.381995-03-060
67197179200.591995-02-110
7114173955.471995-03-050
811089172336.481995-02-040
91870170114.811995-02-210
107189165761.651995-03-010
" ], "text/plain": [ "10x4 DataFrames.DataFrame\n", "| Row | id | revenue | orderdate | shippriority |\n", "|-----|------|-----------|------------|--------------|\n", "| 1 | 6240 | 245018.10 | 1995-01-28 | 0 |\n", "⋮\n", "| 9 | 1870 | 170114.81 | 1995-02-21 | 0 |\n", "| 10 | 7189 | 165761.65 | 1995-03-01 | 0 |" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " order\n", " :filter(\n", " (customer.mktsegment == SEGMENT) &\n", " (orderdate < DATE))\n", " :select(\n", " id,\n", " revenue =>\n", " lineitem:filter(shipdate > DATE).(extendedprice*(1-discount)):sum,\n", " orderdate,\n", " shippriority)\n", " :sort(revenue:desc, orderdate)\n", " :take(10),\n", " SEGMENT=\"BUILDING\", DATE=Date(\"1995-03-15\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Order Priority Checking Query (Q4)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 35]*\n", "\n", "This query determines how well the order priority system is working and gives an assessment of customer satisfaction.\n", "\n", "### Business Question\n", "The Order Priority Checking Query counts the number of orders ordered in a given quarter of a given year in which\n", "at least one lineitem was received by the customer later than its committed date. The query lists the count of such\n", "orders for each order priority sorted in ascending priority order.\n", "\n", "### Query Definition\n", "```\n", "select\n", " o_orderpriority,\n", " count(*) as order_count\n", "from\n", " orders\n", "where\n", " o_orderdate >= date '[DATE]'\n", " and o_orderdate < date '[DATE]' + interval '3' month\n", " and exists (\n", " select\n", " *\n", " from\n", " lineitem\n", " where\n", " l_orderkey = o_orderkey\n", " and l_commitdate < l_receiptdate)\n", "group by\n", " o_orderpriority\n", "order by\n", " o_orderpriority;\n", "```\n", "\n", "### Substitution Parameters\n", "1. DATE = 1993-07-01." ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
orderpriorityorder_count
11-URGENT78
22-HIGH80
33-MEDIUM89
44-NOT SPECIFIED85
55-LOW105
" ], "text/plain": [ "5x2 DataFrames.DataFrame\n", "| Row | orderpriority | order_count |\n", "|-----|-------------------|-------------|\n", "| 1 | \"1-URGENT\" | 78 |\n", "| 2 | \"2-HIGH\" | 80 |\n", "| 3 | \"3-MEDIUM\" | 89 |\n", "| 4 | \"4-NOT SPECIFIED\" | 85 |\n", "| 5 | \"5-LOW\" | 105 |" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " order\n", " :filter(\n", " DATE <= orderdate < DATE + 3_months &&\n", " any(lineitem.(commitdate < receiptdate)))\n", " :group(orderpriority)\n", " :select(\n", " orderpriority,\n", " order_count => count(order)),\n", " DATE=Date(\"1993-07-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Local Supplier Volume Query (Q5)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 36]*\n", "\n", "This query lists the revenue volume done through local suppliers.\n", "\n", "### Business Question\n", "The Local Supplier Volume Query lists for each nation in a region the revenue volume that resulted from lineitem\n", "transactions in which the customer ordering parts and the supplier filling them were both within that nation. The\n", "query is run in order to determine whether to institute local distribution centers in a given region. The query considers only parts ordered in a given year. The query displays the nations and revenue volume in descending order by revenue. Revenue volume for all qualifying lineitems in a particular nation is defined as `sum(l_extendedprice * (1 - l_discount))`.\n", "\n", "### Query Definition\n", "```\n", "select\n", " n_name,\n", " sum(l_extendedprice * (1 - l_discount)) as revenue\n", "from\n", " customer,\n", " orders,\n", " lineitem,\n", " supplier,\n", " nation,\n", " region\n", "where\n", " c_custkey = o_custkey\n", " and l_orderkey = o_orderkey\n", " and l_suppkey = s_suppkey\n", " and c_nationkey = s_nationkey\n", " and s_nationkey = n_nationkey\n", " and n_regionkey = r_regionkey\n", " and r_name = '[REGION]'\n", " and o_orderdate >= date '[DATE]'\n", " and o_orderdate < date '[DATE]' + interval '1' year\n", "group by\n", " n_name\n", "order by\n", " revenue desc;\n", "```\n", "\n", "### Substitution Parameters\n", "1. REGION = ASIA;\n", "2. DATE = 1994-01-01." ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
nationrevenue
1VIETNAM807082.63
2INDIA697063.03
3INDONESIA463882.68
4CHINA440134.00
5JAPAN237479.43
" ], "text/plain": [ "5x2 DataFrames.DataFrame\n", "| Row | nation | revenue |\n", "|-----|-------------|-----------|\n", "| 1 | \"VIETNAM\" | 807082.63 |\n", "| 2 | \"INDIA\" | 697063.03 |\n", "| 3 | \"INDONESIA\" | 463882.68 |\n", "| 4 | \"CHINA\" | 440134.00 |\n", "| 5 | \"JAPAN\" | 237479.43 |" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(\n", " DATE <= order.orderdate < DATE + 1_year &&\n", " supplier.nation == order.customer.nation &&\n", " supplier.nation.region.name == REGION)\n", " :group(supplier.nation)\n", " :select(\n", " nation,\n", " revenue => sum(lineitem.(extendedprice*(1-discount))))\n", " :sort(revenue:desc),\n", " REGION=\"ASIA\", DATE=Date(\"1994-01-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Forecasting Revenue Change Query (Q6)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 38]*\n", "\n", "This query quantifies the amount of revenue increase that would have resulted from eliminating certain company-wide discounts in a given percentage range in a given year. Asking this type of \"what if\" query can be used to look\n", "for ways to increase revenues.\n", "\n", "### Business Question\n", "The Forecasting Revenue Change Query considers all the lineitems shipped in a given year with discounts between\n", "DISCOUNT-0.01 and DISCOUNT+0.01. The query lists the amount by which the total revenue would have\n", "increased if these discounts had been eliminated for lineitems with `l_quantity` less than `quantity`. Note that the\n", "potential revenue increase is equal to the sum of `[l_extendedprice * l_discount]` for all lineitems with discounts and quantities in the qualifying range.\n", "\n", "### Query Definition\n", "```\n", "select\n", " sum(l_extendedprice*l_discount) as revenue\n", "from\n", " lineitem\n", "where\n", " l_shipdate >= date '[DATE]'\n", " and l_shipdate < date '[DATE]' + interval '1' year\n", " and l_discount between [DISCOUNT] - 0.01 and [DISCOUNT] + 0.01\n", " and l_quantity < [QUANTITY];\n", "```\n", "\n", "### Substitution Parameters\n", "1. DATE = 1994-01-01;\n", "2. DISCOUNT = 0.06;\n", "3. QUANTITY = 24." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
revenue
1905922.82
" ], "text/plain": [ "1x1 DataFrames.DataFrame\n", "| Row | revenue |\n", "|-----|-----------|\n", "| 1 | 905922.82 |" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(\n", " DATE <= shipdate < DATE + 1_year &&\n", " DISCOUNT-0.011 <= discount <= DISCOUNT+0.011 &&\n", " quantity < QUANTITY)\n", " :group\n", " :select(\n", " revenue => sum(lineitem.(extendedprice*discount))),\n", " DATE=Date(\"1994-01-01\"), DISCOUNT=0.06, QUANTITY=24)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Volume Shipping Query (Q7)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 39]*\n", "\n", "This query determines the value of goods shipped between certain nations to help in the re-negotiation of shipping\n", "contracts.\n", "\n", "### Business Question\n", "The Volume Shipping Query finds, for two given nations, the gross discounted revenues derived from lineitems in\n", "which parts were shipped from a supplier in either nation to a customer in the other nation during 1995 and 1996.\n", "The query lists the supplier nation, the customer nation, the year, and the revenue from shipments that took place in that year. The query orders the answer by Supplier nation, Customer nation, and year (all ascending).\n", "\n", "### Query Definition\n", "```\n", "select\n", " supp_nation,\n", " cust_nation,\n", " l_year,\n", " sum(volume) as revenue\n", "from (\n", " select\n", " n1.n_name as supp_nation,\n", " n2.n_name as cust_nation,\n", " extract(year from l_shipdate) as l_year,\n", " l_extendedprice * (1 - l_discount) as volume\n", " from\n", " supplier,\n", " lineitem,\n", " orders,\n", " customer,\n", " nation n1,\n", " nation n2\n", " where\n", " s_suppkey = l_suppkey\n", " and o_orderkey = l_orderkey\n", " and c_custkey = o_custkey\n", " and s_nationkey = n1.n_nationkey\n", " and c_nationkey = n2.n_nationkey\n", " and (\n", " (n1.n_name = '[NATION1]' and n2.n_name = '[NATION2]')\n", " or (n1.n_name = '[NATION2]' and n2.n_name = '[NATION1]'))\n", " and l_shipdate between date '1995-01-01' and date '1996-12-31') as shipping\n", "group by\n", " supp_nation,\n", " cust_nation,\n", " l_year\n", "order by\n", " supp_nation,\n", " cust_nation,\n", " l_year;\n", "```\n", "\n", "### Substitution Parameters\n", "1. NATION1 = FRANCE;\n", "2. NATION2 = GERMANY." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
supp_nationcust_nationyearrevenue
1FRANCEGERMANY1995263047.87
2FRANCEGERMANY1996154119.14
3GERMANYFRANCE1995205237.67
4GERMANYFRANCE1996407967.21
" ], "text/plain": [ "4x4 DataFrames.DataFrame\n", "| Row | supp_nation | cust_nation | year | revenue |\n", "|-----|-------------|-------------|------|-----------|\n", "| 1 | \"FRANCE\" | \"GERMANY\" | 1995 | 263047.87 |\n", "| 2 | \"FRANCE\" | \"GERMANY\" | 1996 | 154119.14 |\n", "| 3 | \"GERMANY\" | \"FRANCE\" | 1995 | 205237.67 |\n", "| 4 | \"GERMANY\" | \"FRANCE\" | 1996 | 407967.21 |" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :define(\n", " supp_nation => supplier.nation.name,\n", " cust_nation => order.customer.nation.name)\n", " :filter(\n", " date(\"1995-01-01\") <= shipdate <= date(\"1996-12-31\") && (\n", " (supp_nation == NATION1 && cust_nation == NATION2) ||\n", " (supp_nation == NATION2 && cust_nation == NATION1)))\n", " :group(\n", " supp_nation,\n", " cust_nation,\n", " year => year(shipdate))\n", " :select(\n", " supp_nation,\n", " cust_nation,\n", " year,\n", " revenue => sum(lineitem.(extendedprice*(1-discount)))),\n", " NATION1=\"FRANCE\", NATION2=\"GERMANY\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## National Market Share Query (Q8)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 41]*\n", "\n", "This query determines how the market share of a given nation within a given region has changed over two years for\n", "a given part type.\n", "\n", "### Business Question\n", "The market share for a given nation within a given region is defined as the fraction of the revenue, the sum of\n", "`[l_extendedprice * (1-l_discount)]`, from the products of a specified type in that region that was supplied by suppliers from the given nation. The query determines this for the years 1995 and 1996 presented in this order.\n", "\n", "### Functional Query Definition\n", "```\n", "select\n", " o_year,\n", " sum(case\n", " when nation = '[NATION]'\n", " then volume\n", " else 0\n", " end) / sum(volume) as mkt_share\n", "from (\n", " select\n", " extract(year from o_orderdate) as o_year,\n", " l_extendedprice * (1-l_discount) as volume,\n", " n2.n_name as nation\n", " from\n", " part,\n", " supplier,\n", " lineitem,\n", " orders,\n", " customer,\n", " nation n1,\n", " nation n2,\n", " region\n", " where\n", " p_partkey = l_partkey\n", " and s_suppkey = l_suppkey\n", " and l_orderkey = o_orderkey\n", " and o_custkey = c_custkey\n", " and c_nationkey = n1.n_nationkey\n", " and n1.n_regionkey = r_regionkey\n", " and r_name = '[REGION]'\n", " and s_nationkey = n2.n_nationkey\n", " and o_orderdate between date '1995-01-01' and date '1996-12-31'\n", " and p_type = '[TYPE]') as all_nations\n", "group by\n", " o_year\n", "order by\n", " o_year;\n", "```\n", "\n", "### Substitution Parameters\n", "1. NATION = CANADA;\n", "2. REGION = AMERICA;\n", "3. TYPE = ECONOMY ANODIZED STEEL." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
yearmkt_share
119950.1379404076847995
219960.2615672520944461
" ], "text/plain": [ "2x2 DataFrames.DataFrame\n", "| Row | year | mkt_share |\n", "|-----|------|-----------|\n", "| 1 | 1995 | 0.13794 |\n", "| 2 | 1996 | 0.261567 |" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :define(\n", " year => year(order.orderdate),\n", " supp_nation => supplier.nation.name,\n", " volume => extendedprice*(1-discount))\n", " :filter(\n", " part.type_ == TYPE &&\n", " order.customer.nation.region.name == REGION &&\n", " 1995 <= year <= 1996)\n", " :group(year)\n", " :select(\n", " year,\n", " mkt_share =>\n", " sum(lineitem:filter(supp_nation == NATION).volume) / sum(lineitem.volume)),\n", " NATION=\"CANADA\", REGION=\"AMERICA\", TYPE=\"ECONOMY ANODIZED STEEL\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Product Type Profit Measure Query (Q9)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 43]*\n", "\n", "This query determines how much profit is made on a given line of parts, broken out by supplier nation and year.\n", "\n", "### Business Question\n", "The Product Type Profit Measure Query finds, for each nation and each year, the profit for all parts ordered in that\n", "year that contain a specified substring in their names and that were filled by a supplier in that nation. The profit is defined as the sum of `[(l_extendedprice*(1-l_discount)) - (ps_supplycost * l_quantity)]` for all lineitems describing parts in the specified line. The query lists the nations in ascending alphabetical order and, for each nation, the year and profit in descending order by year (most recent first).\n", "\n", "### Query Definition\n", "```\n", "select\n", " nation,\n", " o_year,\n", " sum(amount) as sum_profit\n", "from (\n", " select\n", " n_name as nation,\n", " extract(year from o_orderdate) as o_year,\n", " l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount\n", " from\n", " part,\n", " supplier,\n", " lineitem,\n", " partsupp,\n", " orders,\n", " nation\n", " where\n", " s_suppkey = l_suppkey\n", " and ps_suppkey = l_suppkey\n", " and ps_partkey = l_partkey\n", " and p_partkey = l_partkey\n", " and o_orderkey = l_orderkey\n", " and s_nationkey = n_nationkey\n", " and p_name like '%[COLOR]%') as profit\n", "group by\n", " nation,\n", " o_year\n", "order by\n", " nation,\n", " o_year desc;\n", "```\n", "\n", "### Substitution Parameters\n", "1. COLOR = green." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
nationyearsum_profit
1ALGERIA1998197990.07
2ALGERIA1997209363.95
3ALGERIA1996508610.09
4ALGERIA1995321224.39
5ALGERIA1994323614.10
6ALGERIA1993429217.36
7ALGERIA1992313931.42
8ARGENTINA1998207703.71
9ARGENTINA1997404879.36
10ARGENTINA1996277287.30
" ], "text/plain": [ "173x3 DataFrames.DataFrame\n", "| Row | nation | year | sum_profit |\n", "|-----|-----------|------|------------|\n", "| 1 | \"ALGERIA\" | 1998 | 197990.07 |\n", "⋮\n", "| 172 | \"VIETNAM\" | 1993 | 479879.33 |\n", "| 173 | \"VIETNAM\" | 1992 | 436117.97 |" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :define(\n", " amount =>\n", " extendedprice*(1-discount) - partsupp.supplycost*quantity)\n", " :filter(contains(part.name, COLOR))\n", " :group(\n", " nation => supplier.nation.name,\n", " year => year(order.orderdate))\n", " :sort(nation, year:desc)\n", " :select(\n", " nation,\n", " year,\n", " sum_profit => sum(lineitem.amount)),\n", " COLOR=\"green\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Returned Item Reporting Query (Q10)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 45]*\n", "\n", "The query identifies customers who might be having problems with the parts that are shipped to them.\n", "\n", "### Business Question\n", "The Returned Item Reporting Query finds the top 20 customers, in terms of their effect on lost revenue for a given\n", "quarter, who have returned parts. The query considers only parts that were ordered in the specified quarter. The\n", "query lists the customer's name, address, nation, phone number, account balance, comment information and revenue\n", "lost. The customers are listed in descending order of lost revenue. Revenue lost is defined as\n", "`sum(l_extendedprice*(1-l_discount))` for all qualifying lineitems.\n", "\n", "### Functional Query Definition\n", "```\n", "select\n", " c_custkey,\n", " c_name,\n", " sum(l_extendedprice * (1 - l_discount)) as revenue,\n", " c_acctbal,\n", " n_name,\n", " c_address,\n", " c_phone,\n", " c_comment\n", "from\n", " customer,\n", " orders,\n", " lineitem,\n", " nation\n", "where\n", " c_custkey = o_custkey\n", " and l_orderkey = o_orderkey\n", " and o_orderdate >= date '[DATE]'\n", " and o_orderdate < date '[DATE]' + interval '3' month\n", " and l_returnflag = 'R'\n", " and c_nationkey = n_nationkey\n", "group by\n", " c_custkey,\n", " c_name,\n", " c_acctbal,\n", " c_phone,\n", " n_name,\n", " c_address,\n", " c_comment\n", "order by\n", " revenue desc\n", "limit 20;\n", "```\n", "\n", "### Substitution Parameters\n", "1. DATE = 1993-10-01." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
idnamerevenueacctbalnationaddressphonecomment
1544Customer#000000544391580.054974.68ETHIOPIAJv7vcm,oE,HEyxekXKia1V5H1up2315-572-651-1323bout the packages integrate above the regular instructions. regular ideas hinder s
21105Customer#000001105375872.309491.46RUSSIAcZhhOUzv6,Vbaa2bFT32-885-298-6750y final packages. furiously ironic packages was. fluffily ironic instructions integrate
3961Customer#000000961372764.616963.68JAPAN5,81YDLFuRR47KKzv8GXdmi3zyP37PlPn22-989-463-6089e final requests: busily final accounts believe a
4266Customer#000000266347106.765481.00ALGERIAVSIEruiMdDvjDaTQxkuK60Yw3AGxO10-474-243-3974ccounts. quickly ironic excuses after the regular foxes wake along the ironic, fina
5683Customer#000000683328973.739120.93FRANCEG0, q8c6vBykpiLvcuSJLYvqE16-566-251-5446 somas use-- slyly express foxes wake quickly blithely iro
6415Customer#000000415327104.672317.93UNITED KINGDOM334jCRiUb,gx333-346-876-2972egular deposits. blithely ironic inst
7746Customer#000000746311870.961164.46SAUDI ARABIAJOKj8N2QKUm8Gi,F4qX0fLVy30-154-354-9928 final theodolites. final pinto beans haggle-- furiously
8421Customer#000000421311164.577073.17JORDANit3mUlkZAe9J8gmy23-918-228-2560lithely final deposits haggle furiously above the
9643Customer#000000643281356.655184.70ALGERIA9T 2avhfyF PQ10-978-597-2747fily along the quickly ironic ideas. final, final
101051Customer#000001051275867.469776.39BRAZILiHS,UFudVOOe12-869-221-1428cuses boost furiously silent deposits. quickly silent requests integrate quickly bold asymptotes; slyly regular ide
" ], "text/plain": [ "20x8 DataFrames.DataFrame\n", "| Row | id | name | revenue | acctbal | nation |\n", "|-----|------|----------------------|-----------|---------|-------------|\n", "| 1 | 544 | \"Customer#000000544\" | 391580.05 | 4974.68 | \"ETHIOPIA\" |\n", "⋮\n", "| 19 | 875 | \"Customer#000000875\" | 247406.98 | -949.28 | \"CANADA\" |\n", "| 20 | 1183 | \"Customer#000001183\" | 245991.31 | 4455.76 | \"ARGENTINA\" |\n", "\n", "| Row | address | phone |\n", "|-----|----------------------------------------|-------------------|\n", "| 1 | \"Jv7vcm,oE,HEyxekXKia1V5H1up23\" | \"15-572-651-1323\" |\n", "⋮\n", "| 19 | \"8pQ4YUYox0d\" | \"13-146-810-5423\" |\n", "| 20 | \"qdIqRUfpmvtWo0NGsyi4qyjkwzlImP9,NrSC\" | \"11-968-244-9275\" |\n", "\n", "| Row | comment |\n", "|-----|--------------------------------------------------------------------------------------------------|\n", "| 1 | \"bout the packages integrate above the regular instructions. regular ideas hinder s\" |\n", "⋮\n", "| 19 | \"ar theodolites snooze slyly. furiously express packages cajole blithely around the carefully r\" |\n", "| 20 | \"arefully regular dependencies. quick\" |" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " customer\n", " :define(\n", " returned =>\n", " order:filter(DATE <= orderdate < DATE+3_months).lineitem:filter(returnflag == \"R\"))\n", " :select(\n", " id,\n", " name,\n", " revenue => sum(returned.(extendedprice*(1-discount))),\n", " acctbal,\n", " nation,\n", " address,\n", " phone,\n", " comment)\n", " :sort(revenue:desc)\n", " :take(20),\n", " DATE=Date(\"1993-10-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Important Stock Identification Query (Q11)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 47]*\n", "\n", "This query finds the most important subset of suppliers' stock in a given nation.\n", "\n", "### Business Question\n", "The Important Stock Identification Query finds, from scanning the available stock of suppliers in a given nation, all the parts that represent a significant percentage of the total value of all available parts. The query displays the part number and the value of those parts in descending order of value.\n", "\n", "### Query Definition\n", "```\n", "select\n", " ps_partkey,\n", " sum(ps_supplycost * ps_availqty) as value\n", "from\n", " partsupp,\n", " supplier,\n", " nation\n", "where\n", " ps_suppkey = s_suppkey\n", " and s_nationkey = n_nationkey\n", " and n_name = '[NATION]'\n", "group by\n", " ps_partkey\n", "having\n", " sum(ps_supplycost * ps_availqty) > (\n", " select\n", " sum(ps_supplycost * ps_availqty) * [FRACTION]\n", " from\n", " partsupp,\n", " supplier,\n", " nation\n", " where\n", " ps_suppkey = s_suppkey\n", " and s_nationkey = n_nationkey\n", " and n_name = '[NATION]')\n", "order by\n", " value desc;\n", "```\n", "\n", "### Substitution Parameters\n", "1. NATION = GERMANY;\n", "2. FRACTION = 0.0001." ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
idvalue
1108313092535.78
2135611542206.53
37169945808.42
44489941036.40
51119382317.55
614549081368.05
712188999420.40
83248901344.93
913128834080.10
1014728609013.30
" ], "text/plain": [ "287x2 DataFrames.DataFrame\n", "| Row | id | value |\n", "|-----|------|-------------|\n", "| 1 | 1083 | 13092535.78 |\n", "⋮\n", "| 286 | 1172 | 88622.73 |\n", "| 287 | 462 | 86323.96 |" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(supplier.nation.name == NATION)\n", " :group(part)\n", " :define(value => sum(partsupp.(supplycost * availqty)))\n", " :filter(value > sum(and_around.value)*FRACTION)\n", " :select(part.id, value)\n", " :sort(value:desc),\n", " NATION=\"GERMANY\", FRACTION=0.0001)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Shipping Modes and Order Priority Query (Q12)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 49]*\n", "\n", "This query determines whether selecting less expensive modes of shipping is negatively affecting the critical-priority orders by causing more parts to be received by customers after the committed date.\n", "\n", "### Business Question\n", "The Shipping Modes and Order Priority Query counts, by ship mode, for lineitems actually received by customers in\n", "a given year, the number of lineitems belonging to orders for which the `l_receiptdate` exceeds the `l_commitdate` for two different specified ship modes. Only lineitems that were actually shipped before the `l_commitdate` are considered. The late lineitems are partitioned into two groups, those with priority URGENT or HIGH, and those with a\n", "priority other than URGENT or HIGH.\n", "\n", "### Query Definition\n", "```\n", "select\n", " l_shipmode,\n", " sum(case\n", " when o_orderpriority ='1-URGENT' or o_orderpriority ='2-HIGH'\n", " then 1\n", " else 0\n", " end) as high_line_count,\n", " sum(case\n", " when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH'\n", " then 1\n", " else 0\n", " end) as low_line_count\n", "from\n", " orders,\n", " lineitem\n", "where\n", " o_orderkey = l_orderkey\n", " and l_shipmode in ('[SHIPMODE1]', '[SHIPMODE2]')\n", " and l_commitdate < l_receiptdate\n", " and l_shipdate < l_commitdate\n", " and l_receiptdate >= date '[DATE]'\n", " and l_receiptdate < date '[DATE]' + interval '1' year\n", "group by\n", " l_shipmode\n", "order by\n", " l_shipmode;\n", "```\n", "\n", "### Substitution Parameters\n", "1. SHIPMODE1 = MAIL;\n", "2. SHIPMODE2 = SHIP;\n", "3. DATE = 1994-01-01." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
shipmodehigh_line_countlow_line_count
1MAIL4567
2SHIP4569
" ], "text/plain": [ "2x3 DataFrames.DataFrame\n", "| Row | shipmode | high_line_count | low_line_count |\n", "|-----|----------|-----------------|----------------|\n", "| 1 | \"MAIL\" | 45 | 67 |\n", "| 2 | \"SHIP\" | 45 | 69 |" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :define(\n", " high => order.orderpriority in [\"1-URGENT\", \"2-HIGH\"])\n", " :filter(\n", " shipmode in SHIPMODES &&\n", " shipdate < commitdate < receiptdate &&\n", " DATE <= receiptdate < DATE + 1_year)\n", " :group(shipmode)\n", " :select(\n", " shipmode,\n", " high_line_count => count(lineitem:filter(high)),\n", " low_line_count => count(lineitem:filter(!high))),\n", " SHIPMODES=[\"MAIL\", \"SHIP\"], DATE=Date(\"1994-01-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Customer Distribution Query (Q13)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 51]*\n", "\n", "This query seeks relationships between customers and the size of their orders.\n", "\n", "\n", "### Business Question\n", "This query determines the distribution of customers by the number of orders they have made, including customers\n", "who have no record of orders, past or present. It counts and reports how many customers have no orders, how many\n", "have 1, 2, 3, etc. A check is made to ensure that the orders counted do not fall into one of several special categories of orders. Special categories are identified in the order comment column by looking for a particular pattern.\n", "\n", "### Query Definition\n", "```\n", "select\n", " c_count,\n", " count(*) as custdist\n", "from (\n", " select\n", " c_custkey,\n", " count(o_orderkey)\n", " from\n", " customer left outer join orders\n", " on c_custkey = o_custkey and o_comment not like '%[WORD1]%[WORD2]%'\n", " group by\n", " c_custkey) as c_orders (c_custkey, c_count)\n", "group by\n", " c_count\n", "order by\n", " custdist desc,\n", " c_count desc;\n", "```\n", "\n", "### Substitution Parameters\n", "1. WORD1 = special.\n", "2. WORD2 = requests." ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
c_countcustdist
10400
21157
31255
41053
5851
6948
71743
81339
92237
102137
" ], "text/plain": [ "33x2 DataFrames.DataFrame\n", "| Row | c_count | custdist |\n", "|-----|---------|----------|\n", "| 1 | 0 | 400 |\n", "⋮\n", "| 32 | 35 | 1 |\n", "| 33 | 33 | 1 |" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " customer\n", " :group(\n", " c_count => count(order:filter(!contains(comment, WORDS))))\n", " :select(\n", " c_count,\n", " custdist => count(customer))\n", " :sort(\n", " custdist:desc,\n", " c_count:desc),\n", " WORDS=r\".*special.*requests.*\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Promotion Effect Query (Q14)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 52]*\n", "\n", "This query monitors the market response to a promotion such as TV advertisements or a special campaign.\n", "\n", "### Business Question\n", "The Promotion Effect Query determines what percentage of the revenue in a given year and month was derived from\n", "promotional parts. The query considers only parts actually shipped in that month and gives the percentage. Revenue\n", "is defined as `(l_extendedprice * (1-l_discount))`.\n", "\n", "### Query Definition\n", "```\n", "select\n", " 100.00 * sum(\n", " case\n", " when p_type like 'PROMO%'\n", " then l_extendedprice*(1-l_discount)\n", " else 0\n", " end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue\n", "from\n", " lineitem,\n", " part\n", "where\n", " l_partkey = p_partkey\n", " and l_shipdate >= date '[DATE]'\n", " and l_shipdate < date '[DATE]' + interval '1' month;\n", "```\n", "\n", "### Substitution Parameters\n", "1. DATE = 1995-09-01." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
promo_revenue
112.409568669150199
" ], "text/plain": [ "1x1 DataFrames.DataFrame\n", "| Row | promo_revenue |\n", "|-----|---------------|\n", "| 1 | 12.4096 |" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :define(\n", " volume => extendedprice*(1-discount),\n", " promo => startswith(part.type_, \"PROMO\"))\n", " :filter(DATE <= shipdate < DATE + 1_month)\n", " :group\n", " :select(\n", " promo_revenue =>\n", " 100*sum(lineitem:filter(promo).volume)/sum(lineitem.volume)),\n", " DATE=Date(\"1995-09-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Top Supplier Query (Q15)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 53]*\n", "\n", "This query determines the top supplier so it can be rewarded, given more business, or identified for special recognition.\n", "\n", "### Business Question\n", "The Top Supplier Query finds the supplier who contributed the most to the overall revenue for parts shipped during\n", "a given quarter of a given year. In case of a tie, the query lists all suppliers whose contribution was equal to the\n", "maximum, presented in supplier number order.\n", "\n", "### Query Definition\n", "```\n", "create view revenue (supplier_no, total_revenue) as\n", " select\n", " l_suppkey,\n", " sum(l_extendedprice * (1 - l_discount))\n", " from\n", " lineitem\n", " where\n", " l_shipdate >= date '[DATE]'\n", " and l_shipdate < date '[DATE]' + interval '3' month\n", " group by\n", " l_suppkey;\n", " \n", "select\n", " s_suppkey,\n", " s_name,\n", " s_address,\n", " s_phone,\n", " total_revenue\n", "from\n", " supplier,\n", " revenue\n", "where\n", " s_suppkey = supplier_no\n", " and total_revenue = (\n", " select\n", " max(total_revenue)\n", " from\n", " revenue)\n", "order by\n", " s_suppkey;\n", " \n", "drop view revenue;\n", "```\n", "\n", "### Substitution Parameters\n", "1. DATE = 1996-01-01." ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
idnameaddressphonetotal_revenue
117Supplier#000000017c2d,ESHRSkK3WYnxpgw6aOqN0q29-601-884-92191113801.53
" ], "text/plain": [ "1x5 DataFrames.DataFrame\n", "| Row | id | name | address |\n", "|-----|----|----------------------|------------------------------|\n", "| 1 | 17 | \"Supplier#000000017\" | \"c2d,ESHRSkK3WYnxpgw6aOqN0q\" |\n", "\n", "| Row | phone | total_revenue |\n", "|-----|-------------------|---------------|\n", "| 1 | \"29-601-884-9219\" | 1113801.53 |" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " supplier\n", " :define(\n", " total_revenue =>\n", " sum(lineitem:filter(DATE <= shipdate < DATE+3_months).(extendedprice*(1-discount))))\n", " :first(total_revenue)\n", " :select(\n", " id,\n", " name,\n", " address,\n", " phone,\n", " total_revenue),\n", " DATE=Date(\"1996-01-01\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parts/Supplier Relationship Query (Q16)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 55]*\n", "\n", "This query finds out how many suppliers can supply parts with given attributes. It might be used, for example, to\n", "determine whether there is a sufficient number of suppliers for heavily ordered parts.\n", "\n", "### Business Question\n", "The Parts/Supplier Relationship Query counts the number of suppliers who can supply parts that satisfy a particular\n", "customer's requirements. The customer is interested in parts of eight different sizes as long as they are not of a given type, not of a given brand, and not from a supplier who has had complaints registered at the Better Business Bureau. Results must be presented in descending count and ascending brand, type, and size.\n", "\n", "### Query Definition\n", "```\n", "select\n", " p_brand,\n", " p_type,\n", " p_size,\n", " count(distinct ps_suppkey) as supplier_cnt\n", "from\n", " partsupp,\n", " part\n", "where\n", " p_partkey = ps_partkey\n", " and p_brand <> '[BRAND]'\n", " and p_type not like '[TYPE]%'\n", " and p_size in ([SIZE1], [SIZE2], [SIZE3], [SIZE4], [SIZE5], [SIZE6], [SIZE7], [SIZE8])\n", " and ps_suppkey not in (\n", " select\n", " s_suppkey\n", " from\n", " supplier\n", " where\n", " s_comment like '%Customer%Complaints%')\n", "group by\n", " p_brand,\n", " p_type,\n", " p_size\n", "order by\n", " supplier_cnt desc,\n", " p_brand,\n", " p_type,\n", " p_size;\n", "```\n", "\n", "### Substitution Parameters\n", "1. BRAND = Brand#45.\n", "2. TYPE = MEDIUM POLISHED.\n", "3. SIZE1 = 49\n", "4. SIZE2 = 14\n", "5. SIZE3 = 23\n", "6. SIZE4 = 45\n", "7. SIZE5 = 19\n", "8. SIZE6 = 3\n", "9. SIZE7 = 36\n", "10. SIZE8 = 9." ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
brandtype_sizesupplier_cnt
1Brand#14PROMO BRUSHED STEEL98
2Brand#35SMALL POLISHED COPPER148
3Brand#11LARGE PLATED TIN234
4Brand#11MEDIUM ANODIZED BRASS454
5Brand#11MEDIUM BRUSHED BRASS454
6Brand#11PROMO ANODIZED BRASS494
7Brand#11PROMO ANODIZED TIN454
8Brand#11PROMO BURNISHED BRASS364
9Brand#11SMALL ANODIZED TIN454
10Brand#11SMALL PLATED COPPER454
" ], "text/plain": [ "242x4 DataFrames.DataFrame\n", "| Row | brand | type_ | size | supplier_cnt |\n", "|-----|------------|---------------------------|------|--------------|\n", "| 1 | \"Brand#14\" | \"PROMO BRUSHED STEEL\" | 9 | 8 |\n", "⋮\n", "| 241 | \"Brand#55\" | \"STANDARD BRUSHED COPPER\" | 3 | 4 |\n", "| 242 | \"Brand#55\" | \"STANDARD BRUSHED STEEL\" | 19 | 4 |" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :filter(\n", " part.brand != BRAND &&\n", " !startswith(part.type_, TYPE) &&\n", " part.size in SIZES &&\n", " !contains(supplier.comment, r\".*Customer.*Complaints.*\"))\n", " :group(\n", " part.brand,\n", " part.type_,\n", " part.size)\n", " :select(\n", " brand,\n", " type_,\n", " size,\n", " supplier_cnt => count(unique(partsupp.supplier)))\n", " :sort(supplier_cnt:desc),\n", " BRAND=\"Brand#45\", TYPE=\"MEDIUM POLISHED\", SIZES=[49, 14, 23, 45, 19, 3, 36, 9])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Small-Quantity-Order Revenue Query (Q17)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 57]*\n", "\n", "This query determines how much average yearly revenue would be lost if orders were no longer filled for small\n", "quantities of certain parts. This may reduce overhead expenses by concentrating sales on larger shipments.\n", "\n", "### Business Question\n", "The Small-Quantity-Order Revenue Query considers parts of a given brand and with a given container type and\n", "determines the average lineitem quantity of such parts ordered for all orders (past and pending) in the 7-year database. What would be the average yearly gross (undiscounted) loss in revenue if orders for these parts with a quantity of less than 20% of this average were no longer taken?\n", "\n", "### Functional Query Definition\n", "```\n", "select\n", " sum(l_extendedprice) / 7.0 as avg_yearly\n", "from\n", " lineitem,\n", " part\n", "where\n", " p_partkey = l_partkey\n", " and p_brand = '[BRAND]'\n", " and p_container = '[CONTAINER]'\n", " and l_quantity < (\n", " select\n", " 0.2 * avg(l_quantity)\n", " from\n", " lineitem\n", " where\n", " l_partkey = p_partkey);\n", "```\n", "\n", "### Substitution Parameters\n", "1. BRAND = Brand#23;\n", "2. CONTAINER = MED JAR." ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
avg_yearly
11217.62
" ], "text/plain": [ "1x1 DataFrames.DataFrame\n", "| Row | avg_yearly |\n", "|-----|------------|\n", "| 1 | 1217.62 |" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(part.brand == BRAND && part.container == CONTAINER)\n", " :filter(quantity < 0.2 * mean(and_around(part).quantity))\n", " :group\n", " :select(\n", " avg_yearly => sum(lineitem.extendedprice)/7),\n", " BRAND=\"Brand#23\", CONTAINER=\"MED JAR\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Large Volume Customer Query (Q18)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 58]*\n", "\n", "The Large Volume Customer Query ranks customers based on their having placed a large quantity order. Large\n", "quantity orders are defined as those orders whose total quantity is above a certain level.\n", "\n", "\n", "### Business Question\n", "The Large Volume Customer Query finds a list of the top 100 customers who have ever placed large quantity orders.\n", "The query lists the customer name, customer key, the order key, date and total price and the quantity for the order.\n", "\n", "### Functional Query Definition\n", "```\n", "select\n", " c_name,\n", " c_custkey,\n", " o_orderkey,\n", " o_orderdate,\n", " o_totalprice,\n", " sum(l_quantity)\n", "from\n", " customer,\n", " orders,\n", " lineitem\n", "where\n", " o_orderkey in (\n", " select\n", " l_orderkey\n", " from\n", " lineitem\n", " group by\n", " l_orderkey\n", " having\n", " sum(l_quantity) > [QUANTITY])\n", " and c_custkey = o_custkey\n", " and o_orderkey = l_orderkey\n", "group by\n", " c_name,\n", " c_custkey,\n", " o_orderkey,\n", " o_orderdate,\n", " o_totalprice\n", "order by\n", " o_totalprice desc,\n", " o_orderdate\n", "limit 100;\n", "```\n", "\n", "### Substitution Parameters\n", "1. QUANTITY = 300" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
namecustkeyorderkeyorderdatetotalpricequantity
1Customer#00000014214217221997-04-09423609.13303
2Customer#00000053353372941995-10-21372230.14305
" ], "text/plain": [ "2x6 DataFrames.DataFrame\n", "| Row | name | custkey | orderkey | orderdate | totalprice |\n", "|-----|----------------------|---------|----------|------------|------------|\n", "| 1 | \"Customer#000000142\" | 142 | 1722 | 1997-04-09 | 423609.13 |\n", "| 2 | \"Customer#000000533\" | 533 | 7294 | 1995-10-21 | 372230.14 |\n", "\n", "| Row | quantity |\n", "|-----|----------|\n", "| 1 | 303 |\n", "| 2 | 305 |" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " order\n", " :select(\n", " customer.name,\n", " custkey => customer.id,\n", " orderkey => id,\n", " orderdate,\n", " totalprice,\n", " quantity => sum(lineitem.quantity))\n", " :filter(quantity > QUANTITY)\n", " :sort(\n", " totalprice:desc,\n", " orderdate)\n", " :take(100),\n", " QUANTITY=300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Discounted Revenue Query (Q19)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 60]*\n", "\n", "The Discounted Revenue Query reports the gross discounted revenue attributed to the sale of selected parts handled\n", "in a particular manner. This query is an example of code such as might be produced programmatically by a data\n", "mining tool.\n", "\n", "### Business Question\n", "The Discounted Revenue query finds the gross discounted revenue for all orders for three different types of parts\n", "that were shipped by air and delivered in person. Parts are selected based on the combination of specific brands, a\n", "list of containers, and a range of sizes.\n", "\n", "### Query Definition\n", "```\n", "select\n", " sum(l_extendedprice * (1 - l_discount) ) as revenue\n", "from\n", " lineitem,\n", " part\n", "where\n", " (\n", " p_partkey = l_partkey\n", " and p_brand = '[BRAND1]'\n", " and p_container in ('SM CASE', 'SM BOX', 'SM PACK', 'SM PKG')\n", " and l_quantity >= [QUANTITY1] and l_quantity <= [QUANTITY1] + 10\n", " and p_size between 1 and 5\n", " and l_shipmode in ('AIR', 'AIR REG')\n", " and l_shipinstruct = 'DELIVER IN PERSON'\n", " )\n", " or\n", " (\n", " p_partkey = l_partkey\n", " and p_brand = '[BRAND2]'\n", " and p_container in ('MED BAG', 'MED BOX', 'MED PKG', 'MED PACK')\n", " and l_quantity >= [QUANTITY2] and l_quantity <= [QUANTITY2] + 10\n", " and p_size between 1 and 10\n", " and l_shipmode in ('AIR', 'AIR REG')\n", " and l_shipinstruct = 'DELIVER IN PERSON'\n", " )\n", " or\n", " (\n", " p_partkey = l_partkey\n", " and p_brand = '[BRAND3]'\n", " and p_container in ('LG CASE', 'LG BOX', 'LG PACK', 'LG PKG')\n", " and l_quantity >= [QUANTITY3] and l_quantity <= [QUANTITY3] + 10\n", " and p_size between 1 and 15\n", " and l_shipmode in ('AIR', 'AIR REG')\n", " and l_shipinstruct = 'DELIVER IN PERSON'\n", " );\n", "```\n", "\n", "### Substitution Parameters\n", "1. QUANTITY1 = 1.\n", "2. QUANTITY2 = 10.\n", "3. QUANTITY3 = 20.\n", "4. BRAND1 = Brand#53.\n", "5. BRAND2 = Brand#35.\n", "6. BRAND3 = Brand#32." ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
revenue
140908.53
" ], "text/plain": [ "1x1 DataFrames.DataFrame\n", "| Row | revenue |\n", "|-----|----------|\n", "| 1 | 40908.53 |" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter(\n", " (\n", " part.brand == BRAND1 &&\n", " part.container in [\"SM CASE\", \"SM BOX\", \"SM PACK\", \"SM PKG\"] &&\n", " QUANTITY1 <= quantity <= QUANTITY1+10 &&\n", " 1 <= part.size <= 5 &&\n", " shipmode in [\"AIR\", \"AIR REG\"] &&\n", " shipinstruct == \"DELIVER IN PERSON\"\n", " )\n", " ||\n", " (\n", " part.brand == BRAND2 && \n", " part.container in [\"MED BAG\", \"MED BOX\", \"MED PKG\", \"MED PACK\"] &&\n", " QUANTITY2 <= quantity <= QUANTITY2+10 &&\n", " 1 <= part.size <= 10 &&\n", " shipmode in [\"AIR\", \"AIR REG\"] &&\n", " shipinstruct == \"DELIVER IN PERSON\"\n", " )\n", " ||\n", " (\n", " part.brand == BRAND3 &&\n", " part.container in [\"LG CASE\", \"LG BOX\", \"LG PACK\", \"LG PKG\"] &&\n", " QUANTITY3 <= quantity <= QUANTITY3+10 &&\n", " 1 <= part.size <= 15 &&\n", " shipmode in [\"AIR\", \"AIR REG\"] &&\n", " shipinstruct == \"DELIVER IN PERSON\"\n", " ))\n", " :group\n", " :select(\n", " revenue => sum(lineitem.(extendedprice*(1-discount)))),\n", " QUANTITY1=1, QUANTITY2=10, QUANTITY3=20,\n", " BRAND1=\"Brand#53\", BRAND2=\"Brand#35\", BRAND3=\"Brand#32\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Potential Part Promotion Query (Q20)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 62]*\n", "\n", "The Potential Part Promotion Query identifies suppliers in a particular nation having selected parts that may be candidates for a promotional offer.\n", "\n", "### Business Question\n", "The Potential Part Promotion query identifies suppliers who have an excess of a given part available; an excess is\n", "defined to be more than 50% of the parts like the given part that the supplier shipped in a given year for a given\n", "nation. Only parts whose names share a certain naming convention are considered.\n", "\n", "### Query Definition\n", "```\n", "select\n", " s_name,\n", " s_address\n", "from\n", " supplier,\n", " nation\n", "where\n", " s_suppkey in (\n", " select\n", " ps_suppkey\n", " from\n", " partsupp\n", " where\n", " ps_partkey in (\n", " select\n", " p_partkey\n", " from\n", " part\n", " where\n", " p_name like '[COLOR]%')\n", " and ps_availqty > (\n", " select\n", " 0.5 * sum(l_quantity)\n", " from\n", " lineitem\n", " where\n", " l_partkey = ps_partkey\n", " and l_suppkey = ps_suppkey\n", " and l_shipdate >= date('[DATE]')\n", " and l_shipdate < date('[DATE]') + interval '1' year))\n", " and s_nationkey = n_nationkey\n", " and n_name = '[NATION]'\n", "order by\n", " s_name;\n", "```\n", "\n", "### Substitution Parameters\n", "1. COLOR = forest.\n", "2. DATE = 1994-01-01.\n", "3. NATION = CANADA." ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
nameaddress
1Supplier#000000020iybAE,RmTymrZVYaFZva2SH,j
" ], "text/plain": [ "1x2 DataFrames.DataFrame\n", "| Row | name | address |\n", "|-----|----------------------|-----------------------------|\n", "| 1 | \"Supplier#000000020\" | \"iybAE,RmTymrZVYaFZva2SH,j\" |" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " partsupp\n", " :define(\n", " expectedqty =>\n", " sum(lineitem:filter(DATE <= shipdate < DATE + 1_year).quantity))\n", " :filter(\n", " supplier.nation.name == NATION &&\n", " startswith(part.name, COLOR) &&\n", " availqty > 0.5*expectedqty)\n", " :group(supplier)\n", " :select(\n", " supplier.name,\n", " supplier.address)\n", " :sort(name),\n", " COLOR=\"forest\", DATE=Date(\"1994-01-01\"), NATION=\"CANADA\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Suppliers Who Kept Orders Waiting Query (Q21)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 64]*\n", "\n", "This query identifies certain suppliers who were not able to ship required parts in a timely manner.\n", "\n", "### Business Question\n", "The Suppliers Who Kept Orders Waiting query identifies suppliers, for a given nation, whose product was part of a\n", "multi-supplier order (with current status of 'F') where they were the only supplier who failed to meet the committed\n", "delivery date.\n", "\n", "### Query Definition\n", "```\n", "select\n", " s_name,\n", " count(*) as numwait\n", "from\n", " supplier,\n", " lineitem l1,\n", " orders,\n", " nation\n", "where\n", " s_suppkey = l1.l_suppkey\n", " and o_orderkey = l1.l_orderkey\n", " and o_orderstatus = 'F'\n", " and l1.l_receiptdate > l1.l_commitdate\n", " and exists (\n", " select\n", " *\n", " from\n", " lineitem l2\n", " where\n", " l2.l_orderkey = l1.l_orderkey\n", " and l2.l_suppkey <> l1.l_suppkey)\n", " and not exists (\n", " select\n", " *\n", " from\n", " lineitem l3\n", " where\n", " l3.l_orderkey = l1.l_orderkey\n", " and l3.l_suppkey <> l1.l_suppkey\n", " and l3.l_receiptdate > l3.l_commitdate)\n", " and s_nationkey = n_nationkey\n", " and n_name = '[NATION]'\n", "group by\n", " s_name\n", "order by\n", " numwait desc,\n", " s_name\n", "limit 100;\n", "```\n", "\n", "### Substitution Parameters\n", "1. NATION = SAUDI ARABIA." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
namenumwait
1Supplier#00000007410
" ], "text/plain": [ "1x2 DataFrames.DataFrame\n", "| Row | name | numwait |\n", "|-----|----------------------|---------|\n", "| 1 | \"Supplier#000000074\" | 10 |" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " lineitem\n", " :filter((\n", " order.orderstatus == \"F\" &&\n", " supplier.nation.name == NATION &&\n", " receiptdate > commitdate &&\n", " any(order.lineitem.supplier != SUPP) &&\n", " !any(order.lineitem.(supplier != SUPP && receiptdate > commitdate)))\n", " :given(SUPP => supplier))\n", " :group(supplier)\n", " :select(\n", " supplier.name,\n", " numwait => count(lineitem))\n", " :sort(\n", " numwait:desc,\n", " name)\n", " :take(100),\n", " NATION=\"SAUDI ARABIA\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Global Sales Opportunity Query (Q22)\n", "*[TPC Benchmark (TM) H Standard Specification Revision 2.17.1; Page 66]*\n", "\n", "The Global Sales Opportunity Query identifies geographies where there are customers who may be likely to make a\n", "purchase.\n", "\n", "### Business Question\n", "This query counts how many customers within a specific range of country codes have not placed orders for 7 years\n", "but who have a greater than average “positive” account balance. It also reflects the magnitude of that balance.\n", "Country code is defined as the first two characters of c_phone.\n", "\n", "### Query Definition\n", "```\n", "select\n", " cntrycode,\n", " count(*) as numcust,\n", " sum(c_acctbal) as totacctbal\n", "from (\n", " select\n", " substring(c_phone from 1 for 2) as cntrycode,\n", " c_acctbal\n", " from\n", " customer\n", " where\n", " substring(c_phone from 1 for 2) in\n", " ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]')\n", " and c_acctbal > (\n", " select\n", " avg(c_acctbal)\n", " from\n", " customer\n", " where\n", " c_acctbal > 0.00\n", " and substring (c_phone from 1 for 2) in\n", " ('[I1]','[I2]','[I3]','[I4]','[I5]','[I6]','[I7]'))\n", " and not exists (\n", " select\n", " *\n", " from\n", " orders\n", " where\n", " o_custkey = c_custkey)) as custsale\n", "group by\n", " cntrycode\n", "order by\n", " cntrycode;\n", "```\n", "\n", "### Substitution Parameters\n", "1. I1 = 13.\n", "2. I2 = 31.\n", "3. I3 = 23.\n", "4. I4 = 29.\n", "5. I5 = 30.\n", "6. I6 = 18.\n", "7. I7 = 17." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/html": [ "
cntrycodenumcusttotacctbal
113859306.91
217649853.61
3181189404.25
423325939.81
5291188722.85
6301392703.39
731969434.60
" ], "text/plain": [ "7x3 DataFrames.DataFrame\n", "| Row | cntrycode | numcust | totacctbal |\n", "|-----|-----------|---------|------------|\n", "| 1 | \"13\" | 8 | 59306.91 |\n", "⋮\n", "| 6 | \"30\" | 13 | 92703.39 |\n", "| 7 | \"31\" | 9 | 69434.60 |" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@query(\n", " customer\n", " :define(\n", " cntrycode => substr(phone, 1, 2))\n", " :filter(\n", " cntrycode in IS &&\n", " acctbal > 0_usd)\n", " :filter(\n", " !exists(order) &&\n", " acctbal > mean(and_around.acctbal))\n", " :group(cntrycode)\n", " :select(\n", " cntrycode,\n", " numcust => count(customer),\n", " totacctbal => sum(customer.acctbal)),\n", " IS=[\"13\", \"31\", \"23\", \"29\", \"30\", \"18\", \"17\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Julia 0.5.0-dev", "language": "julia", "name": "julia-0.5" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.5.0" } }, "nbformat": 4, "nbformat_minor": 0 }