{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#Writing SQL Queries in Python: The Basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We started you off by having you write SQL queries in PostgreSQL's interactive shell (`psql`). Which is cool, but you're probably wondering: \"Hey how do I actually get information from a SQL database into my Python programs? That's where I can do some real damage.\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Python PostgreSQL libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to connect to a SQL database and make queries in Python, we need to use a *database adapter library*. Such libraries provide the functionality needed to connect to a database, send SQL statements, and receive responses, taking all necessary steps to ensure that (e.g.) SQL types are converted safely to Python types, etc.\n", "\n", "Each RDBMS has several different adapter libraries. For this tutorial, we're going to use a library called [psycopg2](http://initd.org/psycopg/docs/index.html). There are several others (including [pg8000](https://github.com/mfenniak/pg8000)), but the main benefits of psycopg2 is that it comes pre-installed with Anaconda. (And it's based on a C library, so it's pretty fast.)\n", "\n", "This library assumes that you've completed the tutorial in the [SQL introduction](04_SQL_Part1.md) notes, and that you have a PostgreSQL server with a copy of the [MONDIAL database](http://www.dbis.informatik.uni-goettingen.de/Mondial/) running on your computer on the default port." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Connecting to a database with psycopg2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When using a SQL server from Python, you'll be working with several different kinds of objects:\n", "\n", "* A *connection object*, which gives you access to the server; and\n", "* *Cursor objects*, which you use to make SQL queries and retrieve data returned from those queries.\n", "\n", "To create a connection object, call `psycopg2`'s `connect()` function:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "import psycopg2\n", "conn = psycopg2.connect(database=\"mondial\")\n", "print type(conn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `connect()` function takes a number of named parameters. The only one we're using here is `database`, which specifies which database to connect to. If we were attempting to connect to a PostgreSQL server on someone else's machine, we might need to use the `host` and `port` parameters. [Consult the documentation](http://initd.org/psycopg/docs/module.html#psycopg2.connect) for more information." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Making a query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have an open connection, let's attempt to query the database. To perform a query, we first need a cursor object, which we can create using the connection object's `.cursor()` method:" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "cursor = conn.cursor()\n", "print type(cursor)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cursor object has several methods of interest to us. The first and most important is `.execute()`, which takes a SQL statement (in a Python string) as a parameter:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cursor.execute(\"SELECT name, length FROM river WHERE length > 4000\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.execute()` performs the query, but doesn't evaluate to anything. After calling `.execute()`, you can call the cursor's `.fetchone()` method to get the first row returned from the query:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('Irtysch', Decimal('4248'))" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.fetchone()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Subsequent requests to `.fetchone()` will return subsequent rows:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('Jenissej', Decimal('4092'))" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.fetchone()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To retrieve all of the rows returned from a query, you can use the cursor object in a `for` loop, like so:" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Irtysch', Decimal('4248'))\n", "('Jenissej', Decimal('4092'))\n", "('Lena', Decimal('4400'))\n", "('Hwangho', Decimal('4845'))\n", "('Jangtse', Decimal('6380'))\n", "('Mekong', Decimal('4350'))\n", "('Missouri', Decimal('4130'))\n", "('Amazonas', Decimal('6448'))\n", "('Niger', Decimal('4184'))\n", "('Zaire', Decimal('4374'))\n" ] } ], "source": [ "cursor.execute(\"SELECT name, length FROM river WHERE length > 4000\")\n", "for row in cursor:\n", " print row" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling the `.fetchone()` method, or iterating over the cursor object, yields a series of n-tuples, with one element in the tuple per field requested in the query." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Interpolating values in a query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's say that we're starting with data from some other source, say, a list of cities whose population we're interested in." ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "collapsed": true }, "outputs": [], "source": [ "cities_of_interest = ['Paris', 'Nairobi', 'Buenos Aires', 'Kyoto']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to perform queries on the MONDIAL database to get the population for each of these cities. Somehow, we need to *build* a series of SQL queries in Python that include the names in the list.\n", "\n", "You might think that you could simply do something like this:" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SELECT population FROM city WHERE name = 'Paris'\n" ] } ], "source": [ "query = \"SELECT population FROM city WHERE name = '\" + cities_of_interest[0] + \"'\";\n", "print query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This looks good, until you have a name with problematic punctuation:" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SELECT population FROM city WHERE name = 'Martha's Vineyard'\n" ] } ], "source": [ "problematic_city = \"Martha's Vineyard\"\n", "query = \"SELECT population FROM city WHERE name = '\" + problematic_city + \"'\"\n", "print query" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See the trouble? The apostrophe in the name of the city made its way into our query string. This query would be a syntax error in SQL, since SQL will believe the string to have ended at the apostrophe in `Martha's`. Troublesome!\n", "\n", "To solve this problem, the cursor object's `.execute()` method comes with a built-in means of interpolating values into queries. Simply put `%s` in your query string wherever you want to insert a value, and then pass as a second parameter to `.execute()` a list of values that you want to be included in the query:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cursor.execute(\"SELECT population FROM city WHERE name = %s\",\n", " [\"Martha's Vineyard\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Psycopg2 will take care of the nasty business of quoting your string for you, and you'll be protected from [SQL injection attacks](https://en.wikipedia.org/wiki/SQL_injection).\n", "\n", "Here's a complete example, iterating over a list of cities and getting the population for each one:" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Paris 2152423\n", "Nairobi 1346000\n", "Buenos Aires 2988006\n", "Kyoto 1415000\n" ] } ], "source": [ "for city_name in cities_of_interest:\n", " cursor.execute(\"SELECT population FROM city WHERE name = %s\",\n", " [city_name])\n", " population = cursor.fetchone()[0] # fetchone() returns a tuple w/1 val\n", " print city_name, population" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##Coping with errors\n", "\n", "Psycopg2 is very persnickety about errors. If you have a syntax error, like so:" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ProgrammingError", "evalue": "syntax error at or near \"SMELLECT\"\nLINE 1: SMELLECT * FORM cheese WERE stink > 15 ODOR DESC\n ^\n", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mProgrammingError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mcursor\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"SMELLECT * FORM cheese WERE stink > 15 ODOR DESC\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mProgrammingError\u001b[0m: syntax error at or near \"SMELLECT\"\nLINE 1: SMELLECT * FORM cheese WERE stink > 15 ODOR DESC\n ^\n" ] } ], "source": [ "cursor = conn.cursor()\n", "cursor.execute(\"SMELLECT * FORM cheese WERE stink > 15 ODOR DESC\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "... you'll get a syntax error as expected. But then subsequent attempts to use the cursor will frustratingly fail:" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "collapsed": false }, "outputs": [ { "ename": "InternalError", "evalue": "current transaction is aborted, commands ignored until end of transaction block\n", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mInternalError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcursor\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"SELECT population FROM city WHERE name = 'Paris'\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mInternalError\u001b[0m: current transaction is aborted, commands ignored until end of transaction block\n" ] } ], "source": [ "cursor.execute(\"SELECT population FROM city WHERE name = 'Paris'\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The way to fix this problem is to close the connection and re-open it, or simply call the connection object's `rollback` method:" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "collapsed": true }, "outputs": [], "source": [ "conn.rollback()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now your queries can proceed as planned:" ] }, { "cell_type": "code", "execution_count": 59, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "(Decimal('2152423'),)" ] }, "execution_count": 59, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cursor.execute(\"SELECT population FROM city WHERE name = 'Paris'\")\n", "cursor.fetchone()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##More information\n", "\n", "For more information, consult [psychopg2's documentation](http://initd.org/psycopg/docs/index.html)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }