{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import sqlite3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9.1 Reading data from SQL databases"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So far we've only talked about reading data from CSV files. That's a pretty common way to store data, but there are many others! Pandas can read from HTML, JSON, SQL, Excel (!!!), HDF5, Stata, and a few other things. In this chapter we'll talk about reading data from SQL databases.\n",
    "\n",
    "You can read data from a SQL database using the `pd.read_sql` function. `read_sql` will automatically convert SQL column names to DataFrame column names.\n",
    "\n",
    "`read_sql` takes 2 arguments: a `SELECT` statement, and a database connection object. This is great because it means you can read from *any* kind of SQL database -- it doesn't matter if it's MySQL, SQLite, PostgreSQL, or something else.\n",
    "\n",
    "This example reads from a SQLite database, but any other database would work the same way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>date_time</th>\n",
       "      <th>temp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2012-01-01 00:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 2</td>\n",
       "      <td> 2012-01-01 01:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 3</td>\n",
       "      <td> 2012-01-01 02:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id            date_time  temp\n",
       "0   1  2012-01-01 00:00:00  -1.8\n",
       "1   2  2012-01-01 01:00:00  -1.8\n",
       "2   3  2012-01-01 02:00:00  -1.8"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "con = sqlite3.connect(\"../data/weather_2012.sqlite\")\n",
    "df = pd.read_sql(\"SELECT * from weather_2012 LIMIT 3\", con)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`read_sql` doesn't automatically set the primary key (`id`) to be the index of the dataframe. You can make it do that by adding an `index_col` argument to `read_sql`. \n",
    "\n",
    "If you've used `read_csv` a lot, you may have seen that it has an `index_col` argument as well. This one behaves the same way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date_time</th>\n",
       "      <th>temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 2012-01-01 00:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2012-01-01 01:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td> 2012-01-01 02:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              date_time  temp\n",
       "id                           \n",
       "1   2012-01-01 00:00:00  -1.8\n",
       "2   2012-01-01 01:00:00  -1.8\n",
       "3   2012-01-01 02:00:00  -1.8"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_sql(\"SELECT * from weather_2012 LIMIT 3\", con, index_col='id')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you want your dataframe to be indexed by more than one column, you can give a list of columns to `index_col`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>temp</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>id</th>\n",
       "      <th>date_time</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <th>2012-01-01 00:00:00</th>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <th>2012-01-01 01:00:00</th>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <th>2012-01-01 02:00:00</th>\n",
       "      <td>-1.8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        temp\n",
       "id date_time                \n",
       "1  2012-01-01 00:00:00  -1.8\n",
       "2  2012-01-01 01:00:00  -1.8\n",
       "3  2012-01-01 02:00:00  -1.8"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_sql(\"SELECT * from weather_2012 LIMIT 3\", con, \n",
    "                 index_col=['id', 'date_time'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9.2 Writing to a SQLite database"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas has a `write_frame` function which creates a database table from a dataframe. Right now this only works for SQLite databases. Let's use it to move our 2012 weather data into SQL.\n",
    "\n",
    "You'll notice that this function is in `pd.io.sql`. There are a ton of useful functions for reading and writing various kind of data in `pd.io`, and it's worth spending some time exploring them. ([see the documentation!](http://pandas.pydata.org/pandas-docs/stable/io.html))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/lib64/python2.7/site-packages/pandas/io/sql.py:1309: UserWarning: The spaces in these column names will not be changed. In pandas versions < 0.14, spaces were converted to underscores.\n",
      "  warnings.warn(_SAFE_NAMES_WARNING)\n"
     ]
    }
   ],
   "source": [
    "weather_df = pd.read_csv('../data/weather_2012.csv')\n",
    "con = sqlite3.connect(\"../data/test_db.sqlite\")\n",
    "con.execute(\"DROP TABLE IF EXISTS weather_2012\")\n",
    "weather_df.to_sql(\"weather_2012\", con)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now read from the `weather_2012` table in  `test_db.sqlite`, and we see that we get the same data back:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>Date/Time</th>\n",
       "      <th>Temp (C)</th>\n",
       "      <th>Dew Point Temp (C)</th>\n",
       "      <th>Rel Hum (%)</th>\n",
       "      <th>Wind Spd (km/h)</th>\n",
       "      <th>Visibility (km)</th>\n",
       "      <th>Stn Press (kPa)</th>\n",
       "      <th>Weather</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td> 0</td>\n",
       "      <td> 2012-01-01 00:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "      <td>-3.9</td>\n",
       "      <td> 86</td>\n",
       "      <td> 4</td>\n",
       "      <td> 8</td>\n",
       "      <td> 101.24</td>\n",
       "      <td>                  Fog</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 1</td>\n",
       "      <td> 2012-01-01 01:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "      <td>-3.7</td>\n",
       "      <td> 87</td>\n",
       "      <td> 4</td>\n",
       "      <td> 8</td>\n",
       "      <td> 101.24</td>\n",
       "      <td>                  Fog</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 2</td>\n",
       "      <td> 2012-01-01 02:00:00</td>\n",
       "      <td>-1.8</td>\n",
       "      <td>-3.4</td>\n",
       "      <td> 89</td>\n",
       "      <td> 7</td>\n",
       "      <td> 4</td>\n",
       "      <td> 101.26</td>\n",
       "      <td> Freezing Drizzle,Fog</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index            Date/Time  Temp (C)  Dew Point Temp (C)  Rel Hum (%)  \\\n",
       "0      0  2012-01-01 00:00:00      -1.8                -3.9           86   \n",
       "1      1  2012-01-01 01:00:00      -1.8                -3.7           87   \n",
       "2      2  2012-01-01 02:00:00      -1.8                -3.4           89   \n",
       "\n",
       "   Wind Spd (km/h)  Visibility (km)  Stn Press (kPa)               Weather  \n",
       "0                4                8           101.24                   Fog  \n",
       "1                4                8           101.24                   Fog  \n",
       "2                7                4           101.26  Freezing Drizzle,Fog  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "con = sqlite3.connect(\"../data/test_db.sqlite\")\n",
    "df = pd.read_sql(\"SELECT * from weather_2012 LIMIT 3\", con)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The nice thing about having your data in a database is that you can do arbitrary SQL queries. This is cool especially if you're more familiar with SQL. Here's an example of sorting by the Weather column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>index</th>\n",
       "      <th>Date/Time</th>\n",
       "      <th>Temp (C)</th>\n",
       "      <th>Dew Point Temp (C)</th>\n",
       "      <th>Rel Hum (%)</th>\n",
       "      <th>Wind Spd (km/h)</th>\n",
       "      <th>Visibility (km)</th>\n",
       "      <th>Stn Press (kPa)</th>\n",
       "      <th>Weather</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>  67</td>\n",
       "      <td> 2012-01-03 19:00:00</td>\n",
       "      <td>-16.9</td>\n",
       "      <td>-24.8</td>\n",
       "      <td> 50</td>\n",
       "      <td> 24</td>\n",
       "      <td> 25</td>\n",
       "      <td> 101.74</td>\n",
       "      <td> Clear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td> 114</td>\n",
       "      <td> 2012-01-05 18:00:00</td>\n",
       "      <td> -7.1</td>\n",
       "      <td>-14.4</td>\n",
       "      <td> 56</td>\n",
       "      <td> 11</td>\n",
       "      <td> 25</td>\n",
       "      <td> 100.71</td>\n",
       "      <td> Clear</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td> 115</td>\n",
       "      <td> 2012-01-05 19:00:00</td>\n",
       "      <td> -9.2</td>\n",
       "      <td>-15.4</td>\n",
       "      <td> 61</td>\n",
       "      <td>  7</td>\n",
       "      <td> 25</td>\n",
       "      <td> 100.80</td>\n",
       "      <td> Clear</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   index            Date/Time  Temp (C)  Dew Point Temp (C)  Rel Hum (%)  \\\n",
       "0     67  2012-01-03 19:00:00     -16.9               -24.8           50   \n",
       "1    114  2012-01-05 18:00:00      -7.1               -14.4           56   \n",
       "2    115  2012-01-05 19:00:00      -9.2               -15.4           61   \n",
       "\n",
       "   Wind Spd (km/h)  Visibility (km)  Stn Press (kPa) Weather  \n",
       "0               24               25           101.74   Clear  \n",
       "1               11               25           100.71   Clear  \n",
       "2                7               25           100.80   Clear  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "con = sqlite3.connect(\"../data/test_db.sqlite\")\n",
    "df = pd.read_sql(\"SELECT * from weather_2012 ORDER BY Weather LIMIT 3\", con)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you have a PostgreSQL database or MySQL database, reading from it works exactly the same way as reading from a SQLite database. You create a connection using `psycopg2.connect()` or `MySQLdb.connect()`, and then use\n",
    "\n",
    "`pd.read_sql(\"SELECT whatever from your_table\", con)`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 9.3 Connecting to other kinds of database"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To connect to a MySQL database:\n",
    "\n",
    "*Note: For these to work, you will need a working MySQL / PostgreSQL database, with the correct localhost, database name, etc.*"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "import MySQLdb\n",
    "con = MySQLdb.connect(host=\"localhost\", db=\"test\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To connect to a PostgreSQL database:"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {
    "collapsed": false
   },
   "source": [
    "import psycopg2\n",
    "con = psycopg2.connect(host=\"localhost\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<style>\n",
    "    @font-face {\n",
    "        font-family: \"Computer Modern\";\n",
    "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
    "    }\n",
    "    div.cell{\n",
    "        width:800px;\n",
    "        margin-left:16% !important;\n",
    "        margin-right:auto;\n",
    "    }\n",
    "    h1 {\n",
    "        font-family: Helvetica, serif;\n",
    "    }\n",
    "    h4{\n",
    "        margin-top:12px;\n",
    "        margin-bottom: 3px;\n",
    "       }\n",
    "    div.text_cell_render{\n",
    "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
    "        line-height: 145%;\n",
    "        font-size: 130%;\n",
    "        width:800px;\n",
    "        margin-left:auto;\n",
    "        margin-right:auto;\n",
    "    }\n",
    "    .CodeMirror{\n",
    "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
    "    }\n",
    "    .text_cell_render h5 {\n",
    "        font-weight: 300;\n",
    "        font-size: 22pt;\n",
    "        color: #4057A1;\n",
    "        font-style: italic;\n",
    "        margin-bottom: .5em;\n",
    "        margin-top: 0.5em;\n",
    "        display: block;\n",
    "    }\n",
    "    \n",
    "    .warning{\n",
    "        color: rgb( 240, 20, 20 )\n",
    "        }  "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.8"
  },
  "signature": "sha256:358c6afeae83edbf449270614551dff12e927538bc48c2d47da9a6a737f6ad31"
 },
 "nbformat": 4,
 "nbformat_minor": 0
}