{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Extract The Data\n",
"\n",
"We've learned how to create a database, insert data and then update or delete it.\n",
"\n",
"We've also seen how to view the content of a table using pandas.\n",
"\n",
"But we've only seen how to display the entire contents of a single table. What if we only want a subset of the records? What if we want to extract a set based on records in more than one table?\n",
"\n",
"It's time to learn about the SELECT statement.\n",
"\n",
"First, let's connect to our database, empty both tables and create records for three flights.\n",
"\n",
"We'll need to enable foreign keys again as this setting applies to the connection:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" name | \n",
" country_code | \n",
" latitude | \n",
" longitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" hab1 | \n",
" GB | \n",
" 51.50722 | \n",
" 0.00000 | \n",
"
\n",
" \n",
" 1 | \n",
" hab2 | \n",
" GB | \n",
" 53.46667 | \n",
" -2.23333 | \n",
"
\n",
" \n",
" 2 | \n",
" hab3 | \n",
" US | \n",
" 40.76480 | \n",
" -73.98080 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" name country_code latitude longitude\n",
"0 hab1 GB 51.50722 0.00000\n",
"1 hab2 GB 53.46667 -2.23333\n",
"2 hab3 US 40.76480 -73.98080"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import sqlalchemy as sa\n",
"import pandas as pd\n",
"engine = sa.create_engine('sqlite:///flight.db')\n",
"connection = engine.connect()\n",
"connection.execute(\"PRAGMA foreign_keys=on\")\n",
"connection.execute(\"DELETE FROM flights\")\n",
"\n",
"sql = \"\"\"\n",
" INSERT INTO flights(name, country_code, latitude, longitude)\n",
" VALUES\n",
" ('hab1', 'GB', 51.50722, 0),\n",
" ('hab2', 'GB', 53.46667, -2.23333),\n",
" ('hab3', 'US', 40.7648, -73.9808)\n",
"\"\"\"\n",
"connection.execute(sql)\n",
"pd.read_sql('flights', connection)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And then generate 20 randomised readings for each flight:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [],
"source": [
"from collections import namedtuple\n",
"from random import uniform, randint\n",
"\n",
"Reading = namedtuple('Reading', 'flight, ts, temp, pressure, humidity')\n",
"\n",
"readings = [\n",
" Reading(\n",
" flight=flight,\n",
" ts=f'2015-01-01 09:{str(i+1).zfill(2)}:00',\n",
" temp=round(uniform(23, 27), 1),\n",
" pressure=randint(1020, 1025),\n",
" humidity=randint(30, 50))\n",
" for flight in ['hab1', 'hab2', 'hab3']\n",
" for i in range(20)\n",
"]\n",
"\n",
"sql = \"\"\"\n",
" INSERT INTO readings\n",
" (flight, ts, temp, pressure, humidity)\n",
" VALUES\n",
" (?, ?, ?, ?, ?)\n",
"\"\"\"\n",
"\n",
"for reading in readings:\n",
" values = (reading.flight, reading.ts, reading.temp, reading.pressure, reading.humidity)\n",
" connection.execute(sql, values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's extract and display the readings for flight hab2.\n",
"\n",
"We've seen previously that we can pass a table name to `read_sql`. Now, we'll see that we can also pass a SELECT statement:\n",
"\n",
"```sql\n",
"SELECT flight, ts, temp, pressure, humidity\n",
"FROM readings\n",
"WHERE flight = 'hab2'\n",
"```\n",
"\n",
"The SELECT clause defines the columns we want to see.\n",
"\n",
"The FROM clause defines the table(s) we want to query.\n",
"\n",
"We've seen the WHERE clause used previously in INSERT, UPDATE and DELETE statements. It applies a filter for us to define the records we want returned.\n",
"\n",
"Let's run that query:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" flight | \n",
" ts | \n",
" temp | \n",
" pressure | \n",
" humidity | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" hab2 | \n",
" 2015-01-01 09:01:00 | \n",
" 25.9 | \n",
" 1023 | \n",
" 40 | \n",
"
\n",
" \n",
" 1 | \n",
" hab2 | \n",
" 2015-01-01 09:02:00 | \n",
" 25.4 | \n",
" 1025 | \n",
" 40 | \n",
"
\n",
" \n",
" 2 | \n",
" hab2 | \n",
" 2015-01-01 09:03:00 | \n",
" 24.2 | \n",
" 1025 | \n",
" 49 | \n",
"
\n",
" \n",
" 3 | \n",
" hab2 | \n",
" 2015-01-01 09:04:00 | \n",
" 24.8 | \n",
" 1024 | \n",
" 48 | \n",
"
\n",
" \n",
" 4 | \n",
" hab2 | \n",
" 2015-01-01 09:05:00 | \n",
" 25.4 | \n",
" 1025 | \n",
" 31 | \n",
"
\n",
" \n",
" 5 | \n",
" hab2 | \n",
" 2015-01-01 09:06:00 | \n",
" 24.8 | \n",
" 1023 | \n",
" 42 | \n",
"
\n",
" \n",
" 6 | \n",
" hab2 | \n",
" 2015-01-01 09:07:00 | \n",
" 24.1 | \n",
" 1022 | \n",
" 48 | \n",
"
\n",
" \n",
" 7 | \n",
" hab2 | \n",
" 2015-01-01 09:08:00 | \n",
" 25.3 | \n",
" 1024 | \n",
" 42 | \n",
"
\n",
" \n",
" 8 | \n",
" hab2 | \n",
" 2015-01-01 09:09:00 | \n",
" 25.3 | \n",
" 1025 | \n",
" 34 | \n",
"
\n",
" \n",
" 9 | \n",
" hab2 | \n",
" 2015-01-01 09:10:00 | \n",
" 24.2 | \n",
" 1021 | \n",
" 33 | \n",
"
\n",
" \n",
" 10 | \n",
" hab2 | \n",
" 2015-01-01 09:11:00 | \n",
" 25.4 | \n",
" 1022 | \n",
" 44 | \n",
"
\n",
" \n",
" 11 | \n",
" hab2 | \n",
" 2015-01-01 09:12:00 | \n",
" 23.5 | \n",
" 1022 | \n",
" 34 | \n",
"
\n",
" \n",
" 12 | \n",
" hab2 | \n",
" 2015-01-01 09:13:00 | \n",
" 24.5 | \n",
" 1020 | \n",
" 37 | \n",
"
\n",
" \n",
" 13 | \n",
" hab2 | \n",
" 2015-01-01 09:14:00 | \n",
" 26.0 | \n",
" 1022 | \n",
" 33 | \n",
"
\n",
" \n",
" 14 | \n",
" hab2 | \n",
" 2015-01-01 09:15:00 | \n",
" 25.4 | \n",
" 1023 | \n",
" 41 | \n",
"
\n",
" \n",
" 15 | \n",
" hab2 | \n",
" 2015-01-01 09:16:00 | \n",
" 23.2 | \n",
" 1025 | \n",
" 39 | \n",
"
\n",
" \n",
" 16 | \n",
" hab2 | \n",
" 2015-01-01 09:17:00 | \n",
" 24.1 | \n",
" 1020 | \n",
" 36 | \n",
"
\n",
" \n",
" 17 | \n",
" hab2 | \n",
" 2015-01-01 09:18:00 | \n",
" 26.8 | \n",
" 1023 | \n",
" 49 | \n",
"
\n",
" \n",
" 18 | \n",
" hab2 | \n",
" 2015-01-01 09:19:00 | \n",
" 23.8 | \n",
" 1023 | \n",
" 38 | \n",
"
\n",
" \n",
" 19 | \n",
" hab2 | \n",
" 2015-01-01 09:20:00 | \n",
" 25.1 | \n",
" 1022 | \n",
" 44 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" flight ts temp pressure humidity\n",
"0 hab2 2015-01-01 09:01:00 25.9 1023 40\n",
"1 hab2 2015-01-01 09:02:00 25.4 1025 40\n",
"2 hab2 2015-01-01 09:03:00 24.2 1025 49\n",
"3 hab2 2015-01-01 09:04:00 24.8 1024 48\n",
"4 hab2 2015-01-01 09:05:00 25.4 1025 31\n",
"5 hab2 2015-01-01 09:06:00 24.8 1023 42\n",
"6 hab2 2015-01-01 09:07:00 24.1 1022 48\n",
"7 hab2 2015-01-01 09:08:00 25.3 1024 42\n",
"8 hab2 2015-01-01 09:09:00 25.3 1025 34\n",
"9 hab2 2015-01-01 09:10:00 24.2 1021 33\n",
"10 hab2 2015-01-01 09:11:00 25.4 1022 44\n",
"11 hab2 2015-01-01 09:12:00 23.5 1022 34\n",
"12 hab2 2015-01-01 09:13:00 24.5 1020 37\n",
"13 hab2 2015-01-01 09:14:00 26.0 1022 33\n",
"14 hab2 2015-01-01 09:15:00 25.4 1023 41\n",
"15 hab2 2015-01-01 09:16:00 23.2 1025 39\n",
"16 hab2 2015-01-01 09:17:00 24.1 1020 36\n",
"17 hab2 2015-01-01 09:18:00 26.8 1023 49\n",
"18 hab2 2015-01-01 09:19:00 23.8 1023 38\n",
"19 hab2 2015-01-01 09:20:00 25.1 1022 44"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql = \"\"\"\n",
" SELECT flight, ts, temp, pressure, humidity\n",
" FROM readings\n",
" WHERE flight = 'hab2'\n",
"\"\"\"\n",
"pd.read_sql(sql, connection)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try another example where we might be interested only in the temperature of readings taken before 09:15:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" temp | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 25.9 | \n",
"
\n",
" \n",
" 1 | \n",
" 25.4 | \n",
"
\n",
" \n",
" 2 | \n",
" 24.2 | \n",
"
\n",
" \n",
" 3 | \n",
" 24.8 | \n",
"
\n",
" \n",
" 4 | \n",
" 25.4 | \n",
"
\n",
" \n",
" 5 | \n",
" 24.8 | \n",
"
\n",
" \n",
" 6 | \n",
" 24.1 | \n",
"
\n",
" \n",
" 7 | \n",
" 25.3 | \n",
"
\n",
" \n",
" 8 | \n",
" 25.3 | \n",
"
\n",
" \n",
" 9 | \n",
" 24.2 | \n",
"
\n",
" \n",
" 10 | \n",
" 25.4 | \n",
"
\n",
" \n",
" 11 | \n",
" 23.5 | \n",
"
\n",
" \n",
" 12 | \n",
" 24.5 | \n",
"
\n",
" \n",
" 13 | \n",
" 26.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" temp\n",
"0 25.9\n",
"1 25.4\n",
"2 24.2\n",
"3 24.8\n",
"4 25.4\n",
"5 24.8\n",
"6 24.1\n",
"7 25.3\n",
"8 25.3\n",
"9 24.2\n",
"10 25.4\n",
"11 23.5\n",
"12 24.5\n",
"13 26.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql = \"\"\"\n",
" SELECT temp\n",
" FROM readings\n",
" WHERE\n",
" flight = 'hab2'\n",
" AND ts < '2015-01-01 09:15:00'\n",
"\"\"\"\n",
"pd.read_sql(sql, connection)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But what if we wanted to see the first few readings for any flights that took place in the UK?\n",
"\n",
"We'll use a JOIN clause to join the readings and flights tables together\n",
"\n",
"```sql\n",
"SELECT readings.*\n",
"FROM readings JOIN flights\n",
" ON readings.flight = flights.name\n",
"WHERE \n",
" flights.country_code = 'GB'\n",
" AND readings.ts < '2015-01-01 09:05:00'\n",
"```\n",
"\n",
"Here, we're telling the database engine that, for any record in the readings table, it can use the value in the 'flight' column to find a record in the flights table by matching on the 'name' column.\n",
"\n",
"It's common in SQL to use aliases for table names in queries using joins:\n",
"\n",
"```sql\n",
"SELECT r.*\n",
"FROM readings AS r JOIN flights AS f\n",
" ON r.flight = f.name\n",
"WHERE \n",
" f.country_code = 'GB'\n",
" AND r.ts < '2015-01-01 09:05:00'\n",
"```\n",
"\n",
"We can also include fields from both tables in the select clause:\n",
"\n",
"```sql\n",
"SELECT r.flight, r.ts, r.temp, r.pressure, f.longitude, f.latitude\n",
"FROM readings AS r JOIN flights AS f\n",
" ON r.flight = f.name\n",
"WHERE \n",
" f.country_code = 'GB'\n",
" AND r.ts < '2015-01-01 09:05:00'\n",
"```\n",
"\n",
"Let's try running that last example:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" flight | \n",
" ts | \n",
" temp | \n",
" pressure | \n",
" longitude | \n",
" latitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" hab1 | \n",
" 2015-01-01 09:01:00 | \n",
" 24.7 | \n",
" 1024 | \n",
" 0.00000 | \n",
" 51.50722 | \n",
"
\n",
" \n",
" 1 | \n",
" hab1 | \n",
" 2015-01-01 09:02:00 | \n",
" 23.0 | \n",
" 1024 | \n",
" 0.00000 | \n",
" 51.50722 | \n",
"
\n",
" \n",
" 2 | \n",
" hab1 | \n",
" 2015-01-01 09:03:00 | \n",
" 24.1 | \n",
" 1025 | \n",
" 0.00000 | \n",
" 51.50722 | \n",
"
\n",
" \n",
" 3 | \n",
" hab1 | \n",
" 2015-01-01 09:04:00 | \n",
" 26.5 | \n",
" 1020 | \n",
" 0.00000 | \n",
" 51.50722 | \n",
"
\n",
" \n",
" 4 | \n",
" hab2 | \n",
" 2015-01-01 09:01:00 | \n",
" 25.9 | \n",
" 1023 | \n",
" -2.23333 | \n",
" 53.46667 | \n",
"
\n",
" \n",
" 5 | \n",
" hab2 | \n",
" 2015-01-01 09:02:00 | \n",
" 25.4 | \n",
" 1025 | \n",
" -2.23333 | \n",
" 53.46667 | \n",
"
\n",
" \n",
" 6 | \n",
" hab2 | \n",
" 2015-01-01 09:03:00 | \n",
" 24.2 | \n",
" 1025 | \n",
" -2.23333 | \n",
" 53.46667 | \n",
"
\n",
" \n",
" 7 | \n",
" hab2 | \n",
" 2015-01-01 09:04:00 | \n",
" 24.8 | \n",
" 1024 | \n",
" -2.23333 | \n",
" 53.46667 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" flight ts temp pressure longitude latitude\n",
"0 hab1 2015-01-01 09:01:00 24.7 1024 0.00000 51.50722\n",
"1 hab1 2015-01-01 09:02:00 23.0 1024 0.00000 51.50722\n",
"2 hab1 2015-01-01 09:03:00 24.1 1025 0.00000 51.50722\n",
"3 hab1 2015-01-01 09:04:00 26.5 1020 0.00000 51.50722\n",
"4 hab2 2015-01-01 09:01:00 25.9 1023 -2.23333 53.46667\n",
"5 hab2 2015-01-01 09:02:00 25.4 1025 -2.23333 53.46667\n",
"6 hab2 2015-01-01 09:03:00 24.2 1025 -2.23333 53.46667\n",
"7 hab2 2015-01-01 09:04:00 24.8 1024 -2.23333 53.46667"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sql = \"\"\"\n",
" SELECT r.flight, r.ts, r.temp, r.pressure, f.longitude, f.latitude\n",
" FROM readings AS r JOIN flights AS f\n",
" ON r.flight = f.name\n",
" WHERE \n",
" f.country_code = 'GB'\n",
" AND r.ts < '2015-01-01 09:05:00'\n",
"\"\"\"\n",
"pd.read_sql(sql, connection)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:sql_python_tutorial]",
"language": "python",
"name": "conda-env-sql_python_tutorial-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}