{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Aggregation\n", "\n", "SQL's SELECT statement can also perform aggregation on sets of records.\n", "\n", "Let's connect to our database:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sqlalchemy as sa\n", "import pandas as pd\n", "engine = sa.create_engine('sqlite:///flight.db')\n", "connection = engine.connect()\n", "connection.execute(\"PRAGMA foreign_keys=on\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We're expecting have three records in our flights table and sixty readings.\n", "\n", "Let's see if that's correct:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
COUNT(*)
03
\n", "
" ], "text/plain": [ " COUNT(*)\n", "0 3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql(\"SELECT COUNT(*) FROM flights\", connection)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
COUNT(*)
060
\n", "
" ], "text/plain": [ " COUNT(*)\n", "0 60" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.read_sql(\"SELECT COUNT(*) FROM readings\", connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "COUNT is one of SQL's aggregation functions. We also have SUM, MAX, MIN and AVG available.\n", "\n", "Let's try some examples:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MAX(pressure)
01025
\n", "
" ], "text/plain": [ " MAX(pressure)\n", "0 1025" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", " SELECT MAX(pressure)\n", " FROM readings \n", " WHERE flight = 'hab1'\n", "\"\"\"\n", "pd.read_sql(sql, connection)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AVG(temp)
025.485
\n", "
" ], "text/plain": [ " AVG(temp)\n", "0 25.485" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", " SELECT AVG(temp)\n", " FROM readings\n", " WHERE flight = 'hab3'\n", "\"\"\"\n", "pd.read_sql(sql, connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also group our record set before aggregation using a GROUP BY clause.\n", "\n", "Let's check that our sixty readings is comprised of twenty for each flight:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
flightCOUNT(*)
0hab120
1hab220
2hab320
\n", "
" ], "text/plain": [ " flight COUNT(*)\n", "0 hab1 20\n", "1 hab2 20\n", "2 hab3 20" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", " SELECT flight, COUNT(*)\n", " FROM readings\n", " GROUP BY flight\n", "\"\"\"\n", "pd.read_sql(sql, connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or find the average pressure reading for each flight:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
flightAVG(pressure)
0hab11022.00
1hab21022.95
2hab31022.15
\n", "
" ], "text/plain": [ " flight AVG(pressure)\n", "0 hab1 1022.00\n", "1 hab2 1022.95\n", "2 hab3 1022.15" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", " SELECT flight, AVG(pressure)\n", " FROM readings\n", " GROUP BY flight\n", "\"\"\"\n", "pd.read_sql(sql, connection)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There's nothing to stop us joining multiple tables in an aggregation query.\n", "\n", "We can also use aliases in the column names.\n", "\n", "Let's find the maximum and minimum temperature reading by country:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
country_codemax_tempmin_temp
0GB26.923.0
1US26.823.5
\n", "
" ], "text/plain": [ " country_code max_temp min_temp\n", "0 GB 26.9 23.0\n", "1 US 26.8 23.5" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sql = \"\"\"\n", " SELECT\n", " f.country_code,\n", " MAX(r.temp) AS max_temp,\n", " MIN(r.temp) AS min_temp\n", " FROM readings AS r JOIN flights AS f\n", " ON r.flight = f.name\n", " GROUP BY f.country_code\n", "\"\"\"\n", "pd.read_sql(sql, connection)" ] } ], "metadata": { "kernelspec": { "display_name": "Python [conda env:sql_python_tutorial]", "language": "python", "name": "conda-env-sql_python_tutorial-py" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }