{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "s1", "content", "l1" ] }, "source": [ "# SQL in Python.\n", "\n", "As mentioned in earlier lessons, Python is very flexible and has a wide range of libraries and third-party modules to support many operations. SQL (Structured Query Language) can be executed from within Python using sqlite3. The sqlite3 module offers support to connect to an external database and execute SQL queries. However, this module does not offer the complete querying capability of a typical SQL engine and functions as a light-weight API version of the querying engine. Other modules like MySQLdb (same as mysql-python), offer a more extensive range of functions and query processing abilities.\n", "\n", "We will be discussing sqlite3 module, as it is the widely used. Though it is a light-weight module, it supports almost all basic sql operations and can be implemented for a database of up to 140 Terabytes in size.\n", "\n", "## Connect() Method - Connecting to the Database \n", "\n", "The first step in executing SQL through Python is connecting to an external database file. The 'connect' method in the sqlite module helps to create a connection with an external database. The method accepts the name of the external database as argument. We can also create a database in-memory by passing \":memory:\" as the argument, however care needs to be taken as this consumes RAM.\n", "\n", "The connection is stored as a connection object. Methods like cursor, commit, close, rollback, execute, create_function, etc., can be called on the connection object. In order to learn the full-range of methods and their descriptions, please refer to the sqlite module documentation (Link: https://docs.python.org/2/library/sqlite3.html#module-sqlite3).\n", "\n", "### Exercise\n", "\n", "Data History: As part of this exercise, we will work with the 'Murders' data. This dataset consists of the number of murders committed in the given metropolitan city. The data for two years, 2014 and 2015 is given, along with a column which shows the change in number of murders.\n", "\n", "Use the connect method to establish a connection with a database file created within memory (Hint: Use :memory: to create an in-memory database)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "ce", "l1" ] }, "outputs": [], "source": [ "import sqlite3\n", "\n", "murcon = sqlite3.connect(':memory:')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "hint", "l1" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s1", "ans", "l1" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s1", "hid", "l1" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " if str(type(murcon)) == \"\":\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s2", "content", "l2" ] }, "source": [ "## pandas to_sql() - Copying an Existing Data Set\n", "\n", "New tables can be created in a database and data can be inserted through queries, or, data can also be integrated from an existing source table. The 'Murders' data set can be integrated into the murdersdb database using to_sql method of pandas module. There are two steps to do this:
\n", "
1) Load the data from the source file into a pandas data frame\n", "
2) Use the to_sql method to copy the data from the data frame into a new/existing table\n", "
\n", "\n", "### Exercise\n", "\n", "Load the murders data from source file into a table called 'murderstable' in the murdersdb database" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s2", "ce", "l2" ] }, "outputs": [], "source": [ "import csv\n", "import pandas as pd\n", "\n", "murdersdf = pd.read_csv('https://raw.githubusercontent.com/colaberry/538data/master/murder_2016/murder_2015_final.csv')\n", "murdersdf.head(5)\n", "# renaming columns as SQL columns cannot start with a number\n", "murdersdf.columns = ['city','state','murders_2014','murders_2015','change']\n", "murdersdf.to_sql(name='murderstable',con=murcon,if_exists='replace',index=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s2", "hint", "l2" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s2", "ans", "l2" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s2", "hid", "l2" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " if len(murdersdf.columns) == len(['city','state','murders_2014','murders_2015','change']):\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s3", "content", "l3" ] }, "source": [ "## Cursor() method\n", "\n", "The cursor method can be called on the connection object. This method directs control for carrying out operations through the connection. Once the cursor object is created, it can be used to carry out querying operations by using methods like execute, executemany, executeall, fetchone, fetchmany, fetchall, rowcount, etc.
\n", "
1) Execute: SQL queries can be passed as arguments to this method for execution\n", "
2) fetchone/many/all: this method collects the output of the SQL query, one/many/all rows at a time, and prints them out
\n", "
\n", "### Exercise (a)\n", "\n", "Create a cursor object on the connection created previously. Use the cursor object to execute a 'Select' query to show (use fetchall method to store the query result in a variable 'queryone') the first 5 rows of the table created in the previous exercise. Print 'queryone' variable to see the result." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s3", "ce", "l3" ] }, "outputs": [], "source": [ "murcur = murcon.cursor()\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s3", "hint", "l3" ] }, "outputs": [], "source": [ "# You can use .execute method on a cursor object to execute a SELECT query. Use LIMIT to print first 5 rows." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s3", "ans", "l3" ] }, "outputs": [], "source": [ "murcur.execute(\"SELECT * FROM murderstable LIMIT 5\")\n", "queryone = murcur.fetchall()\n", "print(queryone)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s3", "hid", "l3" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", "\n", " test = [('Baltimore', 'Maryland', 211, 344, 133),\n", " ('Chicago', 'Illinois', 411, 478, 67),\n", " ('Houston', 'Texas', 242, 303, 61),\n", " ('Cleveland', 'Ohio', 63, 120, 57),\n", " ('Washington', 'D.C.', 105, 162, 57)]\n", "\n", " if test == queryone:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s4", "content", "l4" ] }, "source": [ "There is an alternate way to execute a query using the read_sql_query() method from pandas. The method can be called on a pandas instance and takes two main arguments - the SQL query to be executed on the data set and the connection object which connects the database.\n", "\n", "### Exercise (b)\n", "\n", "Use the read_sql_query() method on the 'murdersdf' data frame to print the same output as Exercise (a)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s4", "ce", "l4" ] }, "outputs": [], "source": [ "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s4", "hint", "l4" ] }, "outputs": [], "source": [ "# The read_sql_query function can be called as 'pd.read_sql_query()'.\n", "# It has two main arguments, the SQL query to be executed and the database connection object." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s4", "ans", "l4" ] }, "outputs": [], "source": [ "print(pd.read_sql_query(\"SELECT * FROM murderstable LIMIT 5\",murcon))" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s4", "hid", "l4" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Baltimore', 'Maryland', 211, 344, 133),\n", " ('Chicago', 'Illinois', 411, 478, 67),\n", " ('Houston', 'Texas', 242, 303, 61),\n", " ('Cleveland', 'Ohio', 63, 120, 57),\n", " ('Washington', 'D.C.', 105, 162, 57)]\n", " \n", " testdf = pd.DataFrame(test, columns=['city','state','murders_2014','murders_2015','change'])\n", "\n", " if testdf.equals(pd.read_sql_query(\"SELECT * FROM murderstable LIMIT 5\",murcon)):\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s5", "content", "l5" ] }, "source": [ "## CRUD Operations\n", "\n", "Create, read, update and delete opearations performed on databases are oftened referred to as CRUD operations. For details on framing SQL queries for these operations please refer to any online SQL tutorial.\n", "\n", "The aim of the following set of exercises is to familiarize us with CRUD operations performed through python. We will the murders data set when needed.\n", "\n", "### Creating a table\n", "\n", "### Exercise\n", "\n", "Create a table with table name as 'murderstabletwo' and the same columns as the murderstable\n", "\n", "Remember: Use triple quotes to enclose arguments which span multiple-lines" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s5", "ce", "l5" ] }, "outputs": [], "source": [ "try:\n", " murcur.execute(\"\"\"CREATE TABLE murderstabletwo (\n", " city TEXT,\n", " state TEXT,\n", " murders_2014 INTEGER,\n", " murders_2015 INTEGER,\n", " change INTEGER)\"\"\")\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s5", "hint", "l5" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s5", "ans", "l5" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s5", "hid", "l5" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "ref_tmp_var = True\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s6", "content", "l6" ] }, "source": [ "We have previously executed a SELECT query to read some data from the newly created table. That is an example of a read query.\n", "\n", "### Reading a table\n", "\n", "### Exercise\n", "\n", "Execute a read query, which reads the contents of the murderstable and sorts the output rows in a descending order of 2015 murders. Store the first five rows of this output in a variable ('topfive') and print it out." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s6", "ce", "l6" ] }, "outputs": [], "source": [ "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s6", "hint", "l6" ] }, "outputs": [], "source": [ "# use \"select\" query to read and fetchall to print." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s6", "ans", "l6" ] }, "outputs": [], "source": [ "murcur.execute(\"SELECT * FROM murderstable ORDER BY murders_2015 DESC LIMIT 5\")\n", "topfive = murcur.fetchall()\n", "print(topfive)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s6", "hid", "l6" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Chicago', 'Illinois', 411, 478, 67), ('New York', 'New York', 333, 352, 19), ('Baltimore', 'Maryland', 211, 344, 133), ('Houston', 'Texas', 242, 303, 61), ('Detroit', 'Michigan', 298, 295, -3)]\n", "\n", " if test == topfive:\n", " ref_assert_var = True\n", " ref_tmp_var = True\n", " else:\n", " ref_assert_var = False\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true, "tags": [ "s7", "content", "l7" ] }, "source": [ "### Inserting into a table\n", "\n", "### Exercise\n", "\n", "Insert the top five rows of data in the 'topfive' list, into the second table ('murderstabletwo') created previously. Use the 'executemany' function in order to insert multiple records using a single query. Retrieve the contents of the 'murderstabletwo' using a SELECT query and store the output into a variable 'querytwo'. Print out 'querytwo'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s7", "ce", "l7" ] }, "outputs": [], "source": [ "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s7", "hint", "l7" ] }, "outputs": [], "source": [ "# In this case, executemany() method takes two arguments, the sql query and the data frame which contains values of each row.\n", "# Use '?' as placeholder for values within the query. This is a more secure way of passing arguments as is explained below." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s7", "ans", "l7" ] }, "outputs": [], "source": [ "murcur.executemany(\"INSERT INTO murderstabletwo VALUES (?,?,?,?,?)\",topfive)\n", "murcur.execute(\"SELECT * FROM murderstabletwo\")\n", "querytwo = murcur.fetchall()\n", "print(querytwo)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s7", "hid", "l7" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " ref_assert_var = False\n", " test = [('Chicago', 'Illinois', 411, 478, 67), ('New York', 'New York', 333, 352, 19), ('Baltimore', 'Maryland', 211, 344, 133), ('Houston', 'Texas', 242, 303, 61), ('Detroit', 'Michigan', 298, 295, -3)]\n", " if querytwo:\n", " ref_assert_var = True\n", " for i in test:\n", " if not i in querytwo:\n", " ref_assert_var = False\n", " break\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", " else:\n", " print('querytwo is empty, please follow instructions.')\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s8", "content", "l8" ] }, "source": [ "We used '?' as a placeholder in the above query. This is because, using %s to pass arguments to the SQL execute method is insecure, as the string input is taken in as entered, and it may contain SQL key words which may perform unintentional actions and lead to a SQL injection attack. Using ? ensures that the user input is sanitized and processed in the query.\n", "\n", "### Updating a row in the table\n", "\n", "### Exercise\n", "\n", "Make 'Chicago' the safest city. Update the 2015 murders value for Chicago as zero. Retreive the updated contents of 'murderstabletwo' and store them in a variable 'querythree'. Print 'querythree'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "ce", "l8" ] }, "outputs": [], "source": [ "#" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s8", "hint", "l8" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s8", "ans", "l8" ] }, "outputs": [], "source": [ "murcur.execute(\"UPDATE murderstabletwo SET murders_2015=0 WHERE city='Chicago'\")\n", "murcur.execute(\"SELECT * FROM murderstabletwo\")\n", "querythree = murcur.fetchall()\n", "print(querythree)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s8", "hid", "l8" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "try:\n", " test = [('Chicago', 'Illinois', 411, 0, 67),\n", " ('New York', 'New York', 333, 352, 19),\n", " ('Baltimore', 'Maryland', 211, 344, 133),\n", " ('Houston', 'Texas', 242, 303, 61),\n", " ('Detroit', 'Michigan', 298, 295, -3)]\n", "\n", " if querythree:\n", " ref_assert_var = True\n", " for i in test:\n", " if not i in querythree:\n", " ref_assert_var = False\n", " break\n", " if ref_assert_var:\n", " ref_tmp_var = True\n", " else:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", " else:\n", " print('querythree is empty, please follow instructions.')\n", "\n", "except Exception:\n", " print('Please follow the instructions given and use the same variables provided in the instructions.')\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s9", "content", "l9" ] }, "source": [ "### Deleting Rows and Dropping Tables\n", "\n", "Deleting rows is the same as Update query. As in SQL, deleting only deletes the data and does not delete the table itself. 'Drop' is the command that can be used to delete a table\n", "\n", "### Exercise\n", "\n", "Delete the table 'murderstabletwo' and 'murderstable'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s9", "ce", "l9" ] }, "outputs": [], "source": [ "try:\n", " murcur.execute(\"DROP TABLE murderstable\")\n", " murcur.execute(\"DROP TABLE murderstabletwo\")\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s9", "hint", "l9" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s9", "ans", "l9" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s9", "hid", "l9" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "ref_tmp_var = True\n", "\n", "assert ref_tmp_var" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "s10", "content", "l10" ] }, "source": [ "## Commit, rollback and close\n", "\n", "The operations performed to the database can all be saved by calling the commit method on the connection object. Another method 'rollback' can rollback all the changes done to the database after the last commit. 'Close' method called on the connection object closes the connection to the database.\n", "\n", "### Exercise\n", "\n", "Rollback changes made to the database. Commit the changes and close the connection." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "s10", "ce", "l10" ] }, "outputs": [], "source": [ "try:\n", " murcon.rollback()\n", " murcon.commit()\n", " murcon.close()\n", "except Exception as e:\n", " print(e)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s10", "hint", "l10" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s10", "ans", "l10" ] }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true, "tags": [ "s10", "hid", "l10" ] }, "outputs": [], "source": [ "ref_tmp_var = False\n", "\n", "ref_tmp_var = True\n", "\n", "assert ref_tmp_var" ] } ], "metadata": { "celltoolbar": "Tags", "executed_sections": [], "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }