{
 "metadata": {
  "name": ""
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "SQL"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Preface - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This captures all the `SQL` stuff that I wanted collate and have in one place."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sys\n",
      "import os\n",
      "import numpy as np\n",
      "import pandas as pd\n",
      "import seaborn as sns\n",
      "from pandas.io import sql\n",
      "import matplotlib\n",
      "import matplotlib.pyplot as plt \n",
      "%matplotlib inline"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Basic database and DB connection - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Recall that `SQL` is Declarative! \n",
      "\n",
      "Just, specify what you want in the `select` statement.\n",
      "\n",
      "Recall the lexical ordering for writing queries:\n",
      "\n",
      "* SELECT\n",
      "* FROM\n",
      "* WHERE\n",
      "* GROUP BY\n",
      "* HAVING\n",
      "* UNION\n",
      "* ORDER BY\n",
      "\n",
      "But, the order of execution is different:\n",
      "\n",
      "* `FROM` - Generates the data set (with join).\n",
      "* `WHERE` - Reduces the generated data set.\n",
      "* `GROUPBY` - Aggregates the reduced data set.\n",
      "* `HAVING` - Reduces the aggregated data set.\n",
      "* `SELECT` - Transforms the reduced aggregated data set.\n",
      "* `ORDER` - BY sorts the transformed data set.\n",
      "* `LIMIT` - OFFSET frames the sorted data set.\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Connecting - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "To connect normally, I use `psql`.\n",
      "\n",
      "This is simply: \n",
      "\n",
      "`$ psql test`.\n",
      "\n",
      "You then enter the database and can very easily commit changes:\n",
      "\n",
      "`test=# DROP TABLE posts;`"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "SQLAlchemy - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "`sqlalchemy` is nice if we want to talk to the database from `python` directly.\n",
      "  \n",
      "Create db connection using `sqlalchemy`.\n",
      "\n",
      "The return value of `create_engine()` is an instance of `Engine`, and it represents the core interface to the database"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import sqlalchemy\n",
      "from sqlalchemy import create_engine\n",
      "engine = create_engine('postgresql://lmartin@localhost:5432/test')"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 3
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "WHERE - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So, I have a test `postgres` database in which one column is `timestamp`.\n",
      "\n",
      "I can select dates from this column easily (below).\n",
      "\n",
      "Note that there are some critical rules about working with [`timestamp` objects](http://www.postgresql.org/docs/9.0/static/datatype-datetime.html):\n",
      "\n",
      "* The time-zero is specified relative to UTC. \n",
      "* For example, `-07:00` is PST.\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "l='2013-1-01 10:00:00-07:00' # Note that the timezone relative to UTC is specified in the query! \n",
      "u='2013-12-22 17:00:00-07:00'\n",
      "infs=\" 'CMV','ASPERGILLUS','GPR' \"\n",
      "\n",
      "all_tests=sql.read_sql(\"select specimen as s,tested_infection as t \"\n",
      "                \"from Clinical_Table \"\n",
      "                \"WHERE tx_date < '%s' AND tx_date > '%s' AND tested_infection IN (%s) \"%(u,l,infs),engine)\n",
      "all_tests.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>s</th>\n",
        "      <th>t</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> Serum</td>\n",
        "      <td> CMV</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> Serum</td>\n",
        "      <td> CMV</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Serum</td>\n",
        "      <td> CMV</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 10,
       "text": [
        "       s    t\n",
        "0  Serum  CMV\n",
        "1  Serum  CMV\n",
        "2  Serum  CMV"
       ]
      }
     ],
     "prompt_number": 10
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "JOIN - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Inner join simply entries common to both tables."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Pull data for patient \n",
      "i_join=sql.read_sql(\"select stat_table.patient_id as i  \"\n",
      "                 \"from stat_table \"\n",
      "                 \"INNER JOIN sample_table \"\n",
      "                 \"ON stat_table.patient_id=sample_table.patient_id \",engine)\n",
      "i_join.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>i</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 8,
       "text": [
        "     i\n",
        "0  I10\n",
        "1  I10\n",
        "2  I10"
       ]
      }
     ],
     "prompt_number": 8
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Semi-join filters records based upon data in another table.\n",
      "\n",
      "Note that we are including a `sub-query` in the `WHERE` clause."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "s_join=sql.read_sql(\"select stat_table.patient_id as i \"\n",
      "                \"from stat_table \"\n",
      "                \"WHERE patient_id IN (SELECT sample_table.patient_id FROM sample_table) \",engine)\n",
      "\n",
      "s_join.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>i</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> I10</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 9,
       "text": [
        "     i\n",
        "0  I10\n",
        "1  I10\n",
        "2  I10"
       ]
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "GROUPBY - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The columns that we define in the `GROUP BY` are the keys that we can use:\n",
      "\n",
      "If you have a `GROUP BY` clause, you may only reference:\n",
      "\n",
      "* (1) columns from that clause \n",
      "* (2) aggregate functions\n",
      "\n",
      "Common aggregate functions that we can apply following `GROUP BY`:\n",
      "\n",
      "http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html\n",
      "\n",
      "For example, we compute the total number of records within each group!\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cd=sql.read_sql(\"select specimen random_name, count(*) AS c \"\n",
      "                \"from Clinical_Table \"\n",
      "                \"GROUP BY random_name \",engine)\n",
      "cd.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>random_name</th>\n",
        "      <th>c</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>          Lumen</td>\n",
        "      <td>  460</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>           Skin</td>\n",
        "      <td>   46</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Nasopharyngeal</td>\n",
        "      <td> 1762</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "      random_name     c\n",
        "0           Lumen   460\n",
        "1            Skin    46\n",
        "2  Nasopharyngeal  1762"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can the `HAVING` clause to filter grouped categories.\n",
      "\n",
      "`ORDER` let's us order the results by a specified column and `LIMIT` simply restricts the output count:\n",
      "\n",
      "* http://blog.jooq.org/2014/12/04/do-you-really-understand-sqls-group-by-and-having-clauses/"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "out=sql.read_sql(\"select specimen random_name, count(*) AS c \"\n",
      "                \"from Clinical_Table \"\n",
      "                \"GROUP BY specimen \" \n",
      "                \"HAVING sum(test_result) > 1 \"\n",
      "                \"order by 2 desc \"\n",
      "                \"limit 3 \",engine)\n",
      "cd"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>random_name</th>\n",
        "      <th>c</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> Serum</td>\n",
        "      <td> 12502</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>   BAL</td>\n",
        "      <td>  9969</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> Urine</td>\n",
        "      <td>  5823</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 11,
       "text": [
        "  random_name      c\n",
        "0       Serum  12502\n",
        "1         BAL   9969\n",
        "2       Urine   5823"
       ]
      }
     ],
     "prompt_number": 11
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We count the unique number of tested infections per specimen group."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cd=sql.read_sql(\"select specimen s, count(distinct tested_infection) AS counts \"\n",
      "                \"from Clinical_Table \"\n",
      "                \"GROUP BY s \",engine)\n",
      "print cd.shape\n",
      "cd.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(21, 2)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>s</th>\n",
        "      <th>counts</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td>         BAL</td>\n",
        "      <td> 75</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> Bone Marrow</td>\n",
        "      <td>  3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>         CSF</td>\n",
        "      <td> 53</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 250,
       "text": [
        "             s  counts\n",
        "0          BAL      75\n",
        "1  Bone Marrow       3\n",
        "2          CSF      53"
       ]
      }
     ],
     "prompt_number": 250
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A nice things to recall is that we can partition `timestamp` objects into weeks.\n",
      "\n",
      "There are many useful function we can apply to datetime objecets:\n",
      "\n",
      "* http://www.postgresql.org/docs/8.1/static/functions-datetime.html"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "l='2013-1-01 10:00:00-07:00' # Note that the timezone relative to UTC is specified in the query! \n",
      "u='2013-12-22 17:00:00-07:00'\n",
      "infs=\" 'CMV','ASPERGILLUS','GPR' \"\n",
      "t=sql.read_sql(\" select EXTRACT(WEEK FROM tx_date) as week,tested_infection,specimen,COUNT(*)\"\n",
      "               \" from Clinical_Table \"\n",
      "               \" WHERE tx_date < '%s' AND tx_date > '%s' AND tested_infection in (%s) \"\n",
      "               \" GROUP BY EXTRACT(WEEK FROM tx_date),tested_infection,specimen\"%(u,l,infs),engine)\n",
      "t.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>week</th>\n",
        "      <th>tested_infection</th>\n",
        "      <th>specimen</th>\n",
        "      <th>count</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 43</td>\n",
        "      <td>         CMV</td>\n",
        "      <td>      Tissue</td>\n",
        "      <td>  1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 51</td>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td>       Stool</td>\n",
        "      <td>  3</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 35</td>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 18</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 265,
       "text": [
        "   week tested_infection     specimen  count\n",
        "0    43              CMV       Tissue      1\n",
        "1    51      ASPERGILLUS        Stool      3\n",
        "2    35      ASPERGILLUS  Whole Blood     18"
       ]
      }
     ],
     "prompt_number": 265
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Window functions - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "A window function:\n",
      "\n",
      "* Performs a calculation across a set of table rows that are somehow related to the current row. \n",
      "* Does not cause rows to become grouped into a single output row.\n",
      "* The rows retain their separate identities.\n",
      "\n",
      "http://postgresguide.com/tips/window.html\n",
      "\n",
      "\n",
      "Ok, so let's assume we want to rank the specimens sampled per week per tested infection!\n",
      "\n",
      "First, recall a basic query below."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "l='2013-1-01 10:00:00-07:00' # Note that the timezone relative to UTC is specified in the query! \n",
      "u='2013-12-22 17:00:00-07:00'\n",
      "infs=\" 'CMV','ASPERGILLUS','GPR' \"\n",
      "t=sql.read_sql(\" select tx_date d,tested_infection t,specimen s\"\n",
      "               \" from Clinical_Table \"\n",
      "               \" WHERE tx_date < '%s' AND tx_date > '%s' AND tested_infection in (%s) \"%(u,l,infs),engine)\n",
      "t.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>d</th>\n",
        "      <th>t</th>\n",
        "      <th>s</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 2013-08-30 00:00:00-07:00</td>\n",
        "      <td> CMV</td>\n",
        "      <td> Serum</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2013-08-30 00:00:00-07:00</td>\n",
        "      <td> CMV</td>\n",
        "      <td> Serum</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 2013-08-30 00:00:00-07:00</td>\n",
        "      <td> CMV</td>\n",
        "      <td> Serum</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 13,
       "text": [
        "                           d    t      s\n",
        "0  2013-08-30 00:00:00-07:00  CMV  Serum\n",
        "1  2013-08-30 00:00:00-07:00  CMV  Serum\n",
        "2  2013-08-30 00:00:00-07:00  CMV  Serum"
       ]
      }
     ],
     "prompt_number": 13
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can use `window functions` to rank over a certain grouping, as needed.\n",
      "\n",
      "https://robots.thoughtbot.com/postgres-window-functions"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "l='2013-1-01 10:00:00-07:00' # Note that the timezone relative to UTC is specified in the query! \n",
      "u='2013-12-22 17:00:00-07:00'\n",
      "infs=\" 'CMV','ASPERGILLUS','GPR' \"\n",
      "t=sql.read_sql(\" select tested_infection as a,specimen as s,rank() OVER (PARTITION BY tested_infection ORDER BY specimen DESC)\"\n",
      "               \" from Clinical_Table \"\n",
      "               \" WHERE tx_date < '%s' AND tx_date > '%s' AND tested_infection in (%s) \"%(u,l,infs),engine)\n",
      "t.head(5)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>a</th>\n",
        "      <th>s</th>\n",
        "      <th>rank</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> ASPERGILLUS</td>\n",
        "      <td> Whole Blood</td>\n",
        "      <td> 1</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 12,
       "text": [
        "             a            s  rank\n",
        "0  ASPERGILLUS  Whole Blood     1\n",
        "1  ASPERGILLUS  Whole Blood     1\n",
        "2  ASPERGILLUS  Whole Blood     1\n",
        "3  ASPERGILLUS  Whole Blood     1\n",
        "4  ASPERGILLUS  Whole Blood     1"
       ]
      }
     ],
     "prompt_number": 12
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It's computed the frequency of items in the group and ranked accordingly."
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "ORMs - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "ORM presents a high level and abstracted pattern of usage assocaited with `sqlalchemy`.\n",
      "\n",
      "When using the ORM, the configurational process starts by describing the database tables we\u2019ll be dealing with.\n",
      "\n",
      "http://docs.sqlalchemy.org/en/rel_0_9/orm/tutorial.html"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sqlalchemy.ext.declarative import declarative_base\n",
      "Base = declarative_base()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 14
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now that we have a \u201cbase\u201d, we can define any number of mapped classes in terms of it:\n",
      "\n",
      "* at a minimum needs a __tablename__ attribute\n",
      "* at least one Column which is part of a primary key\n",
      "\n",
      "Here's a new table `users`."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sqlalchemy import Column, Integer, String\n",
      "\n",
      "class User(Base):\n",
      "    __tablename__ = 'users'\n",
      "    id = Column(Integer, primary_key=True)\n",
      "    name = Column(String)\n",
      "    fullname = Column(String)\n",
      "    password = Column(String)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 16
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "With our User class constructed via the Declarative system, we have defined information about our table, known as table metadata. "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "User.__table__"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 17,
       "text": [
        "Table('users', MetaData(bind=None), Column('id', Integer(), table=<users>, primary_key=True, nullable=False), Column('name', String(), table=<users>), Column('fullname', String(), table=<users>), Column('password', String(), table=<users>), schema=None)"
       ]
      }
     ],
     "prompt_number": 17
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      " As our database does not actually have a users table present, we can use MetaData to issue CREATE TABLE.\n",
      " \n",
      " Now the table exists."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Base.metadata.create_all(engine) "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We\u2019re now ready to start talking to the database."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sqlalchemy.orm import sessionmaker\n",
      "Session = sessionmaker(bind=engine)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 18
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Whenever you need to have a conversation with the database, you instantiate a Session."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session = Session()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Whenever you need to have a conversation with the database, you instantiate a Session."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "ed_user = User(name='ed', fullname='Ed Jones', password='edspassword')\n",
      "session.add(ed_user)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session.add_all([\n",
      "    User(name='wendy', fullname='Wendy Williams', password='foobar'),\n",
      "    User(name='mary', fullname='Mary Contrary', password='xxg527'),\n",
      "    User(name='fred', fullname='Fred Flinstone', password='blah')])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We tell the Session that we\u2019d like to issue all remaining changes to the database and commit the transaction."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session.commit()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 27
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, let's make sure we can read it back out."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "t=sql.read_sql(\" select *\"\n",
      "               \" from Users \",engine)\n",
      "t.head(3)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>name</th>\n",
        "      <th>fullname</th>\n",
        "      <th>password</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td>    ed</td>\n",
        "      <td>       Ed Jones</td>\n",
        "      <td> edspassword</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td> wendy</td>\n",
        "      <td> Wendy Williams</td>\n",
        "      <td>      foobar</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "      <td>  mary</td>\n",
        "      <td>  Mary Contrary</td>\n",
        "      <td>      xxg527</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 28,
       "text": [
        "   id   name        fullname     password\n",
        "0   1     ed        Ed Jones  edspassword\n",
        "1   2  wendy  Wendy Williams       foobar\n",
        "2   3   mary   Mary Contrary       xxg527"
       ]
      }
     ],
     "prompt_number": 28
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "More complex window functions - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's define a new table."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Base = declarative_base()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 92
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Note that we can re-define existing `class` defined:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Post.__table_args__ = {'extend_existing': True}\n",
      "Comments.__table_args__ = {'extend_existing': True}"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 90
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from sqlalchemy.dialects.postgresql import CHAR, TIMESTAMP\n",
      "class Post(Base):\n",
      "    __tablename__ = 'posts'\n",
      "    id = Column(Integer, primary_key=True)\n",
      "    body = Column(String)    \n",
      "    created_at = Column(TIMESTAMP)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 93
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "class Comments(Base):\n",
      "    __tablename__ = 'comments'\n",
      "    id = Column(Integer, primary_key=True)\n",
      "    post_id = Column(Integer) \n",
      "    body = Column(String) \n",
      "    created_at = Column(TIMESTAMP)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 94
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Create the tables."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "Base.metadata.create_all(engine) "
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 95
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Add data to the new tables."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session = Session()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 96
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session.add_all([\n",
      "    Post(id=1, body='foo'),\n",
      "    Post(id=2, body='bar'),])\n",
      "\n",
      "session.add_all([\n",
      "    Comments(id=1, post_id=1, body='foo old'),\n",
      "    Comments(id=2, post_id=1, body='foo new'),\n",
      "    Comments(id=3, post_id=1, body='foo newer'),\n",
      "    Comments(id=4, post_id=1, body='foo newest'),])\n",
      "\n",
      "session.add_all([\n",
      "    Comments(id=5, post_id=2, body='bar old'),\n",
      "    Comments(id=6, post_id=2, body='bar new'),\n",
      "    Comments(id=7, post_id=2, body='bar newer'),\n",
      "    Comments(id=8, post_id=2, body='bar newest'),])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 97
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "session.commit()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 98
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "p=sql.read_sql(\" select *\"\n",
      "               \" from posts \",engine)\n",
      "p"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>body</th>\n",
        "      <th>created_at</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td> foo</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td> bar</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 101,
       "text": [
        "   id body created_at\n",
        "0   1  foo       None\n",
        "1   2  bar       None"
       ]
      }
     ],
     "prompt_number": 101
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "c=sql.read_sql(\" select *\"\n",
      "               \" from Comments \",engine)\n",
      "c"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>id</th>\n",
        "      <th>post_id</th>\n",
        "      <th>body</th>\n",
        "      <th>created_at</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td>    foo old</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 2</td>\n",
        "      <td> 1</td>\n",
        "      <td>    foo new</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 3</td>\n",
        "      <td> 1</td>\n",
        "      <td>  foo newer</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 4</td>\n",
        "      <td> 1</td>\n",
        "      <td> foo newest</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 5</td>\n",
        "      <td> 2</td>\n",
        "      <td>    bar old</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5</th>\n",
        "      <td> 6</td>\n",
        "      <td> 2</td>\n",
        "      <td>    bar new</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6</th>\n",
        "      <td> 7</td>\n",
        "      <td> 2</td>\n",
        "      <td>  bar newer</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7</th>\n",
        "      <td> 8</td>\n",
        "      <td> 2</td>\n",
        "      <td> bar newest</td>\n",
        "      <td> None</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 102,
       "text": [
        "   id  post_id        body created_at\n",
        "0   1        1     foo old       None\n",
        "1   2        1     foo new       None\n",
        "2   3        1   foo newer       None\n",
        "3   4        1  foo newest       None\n",
        "4   5        2     bar old       None\n",
        "5   6        2     bar new       None\n",
        "6   7        2   bar newer       None\n",
        "7   8        2  bar newest       None"
       ]
      }
     ],
     "prompt_number": 102
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "c=sql.read_sql(\" select posts.id AS post_id, comments.id AS comment_ids, comments.body AS body \"\n",
      "               \" from posts LEFT OUTER JOIN comments on posts.id=comments.post_id \",engine)\n",
      "c"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>post_id</th>\n",
        "      <th>comment_ids</th>\n",
        "      <th>body</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>0</th>\n",
        "      <td> 1</td>\n",
        "      <td> 1</td>\n",
        "      <td>    foo old</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td> 1</td>\n",
        "      <td> 2</td>\n",
        "      <td>    foo new</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td> 1</td>\n",
        "      <td> 3</td>\n",
        "      <td>  foo newer</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>3</th>\n",
        "      <td> 1</td>\n",
        "      <td> 4</td>\n",
        "      <td> foo newest</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>4</th>\n",
        "      <td> 2</td>\n",
        "      <td> 5</td>\n",
        "      <td>    bar old</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>5</th>\n",
        "      <td> 2</td>\n",
        "      <td> 6</td>\n",
        "      <td>    bar new</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6</th>\n",
        "      <td> 2</td>\n",
        "      <td> 7</td>\n",
        "      <td>  bar newer</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>7</th>\n",
        "      <td> 2</td>\n",
        "      <td> 8</td>\n",
        "      <td> bar newest</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 106,
       "text": [
        "   post_id  comment_ids        body\n",
        "0        1            1     foo old\n",
        "1        1            2     foo new\n",
        "2        1            3   foo newer\n",
        "3        1            4  foo newest\n",
        "4        2            5     bar old\n",
        "5        2            6     bar new\n",
        "6        2            7   bar newer\n",
        "7        2            8  bar newest"
       ]
      }
     ],
     "prompt_number": 106
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "Derived tables - "
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's say we want something more complex, like the fraction of tests for a given infection that are from a given specimen."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cd=sql.read_sql(\"select * from Clinical_Table \",engine2)\n",
      "a=cd.groupby(['tested_infection']).size()\n",
      "a.sort(ascending=False)\n",
      "b=cd[cd['tested_infection'] == 'CMV']\n",
      "ser=b[b['specimen'] == 'Serum']\n",
      "ser.shape[0]/float(b.shape[0])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(43251, 10)\n"
       ]
      },
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 234,
       "text": [
        "0.835023664638269"
       ]
      }
     ],
     "prompt_number": 234
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "`Group by` using `Pandas` is nice for this!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "all_tests=sql.read_sql(\"select * from Clinical_Table\",engine)\n",
      "a=all_tests[['tested_infection','specimen']]\n",
      "b=a.groupby('tested_infection').agg(lambda x:float(x[x.specimen=='Serum'].shape[0])/x.shape[0])\n",
      "b.sort('specimen',ascending=False,inplace=True)\n",
      "b.head(4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>specimen</th>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>tested_infection</th>\n",
        "      <th></th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>Hepatitis A AB</th>\n",
        "      <td> 1.000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>HEPATITIS B</th>\n",
        "      <td> 1.000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>EBV</th>\n",
        "      <td> 0.881657</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>CMV</th>\n",
        "      <td> 0.835024</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 235,
       "text": [
        "                  specimen\n",
        "tested_infection          \n",
        "Hepatitis A AB    1.000000\n",
        "HEPATITIS B       1.000000\n",
        "EBV               0.881657\n",
        "CMV               0.835024"
       ]
      }
     ],
     "prompt_number": 235
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "But, we can also use Derived tables to get this same result!\n",
      "\n",
      "These are basically like variables.\n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "tmp=sql.read_sql(\" select tested_infection,COUNT(*)/CAST(total AS float) as pct \"\n",
      "                 \" from Clinical_Table c \"\n",
      "                 \" inner join \"\n",
      "                 \" ( \"  \n",
      "                 \"   SELECT tested_infection as inf, COUNT(*) as total \"\n",
      "                 \"   from Clinical_Table \"\n",
      "                 \"   GROUP BY tested_infection \"              \n",
      "                 \" ) d \"\n",
      "                 \" on c.tested_infection = d.inf \"\n",
      "                 \" WHERE c.specimen = 'Serum' \"\n",
      "                 \" GROUP BY tested_infection,total \",engine)\n",
      "\n",
      "print tmp.shape\n",
      "tmp.sort('pct',inplace=True,ascending=False)\n",
      "tmp.head(4)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "(57, 2)\n"
       ]
      },
      {
       "html": [
        "<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
        "<table border=\"1\" class=\"dataframe\">\n",
        "  <thead>\n",
        "    <tr style=\"text-align: right;\">\n",
        "      <th></th>\n",
        "      <th>tested_infection</th>\n",
        "      <th>pct</th>\n",
        "    </tr>\n",
        "  </thead>\n",
        "  <tbody>\n",
        "    <tr>\n",
        "      <th>2 </th>\n",
        "      <td>    HEPATITIS B</td>\n",
        "      <td> 1.000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>6 </th>\n",
        "      <td> Hepatitis A AB</td>\n",
        "      <td> 1.000000</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>17</th>\n",
        "      <td>            EBV</td>\n",
        "      <td> 0.881657</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>9 </th>\n",
        "      <td>            CMV</td>\n",
        "      <td> 0.835024</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
        "</div>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 236,
       "text": [
        "   tested_infection       pct\n",
        "2       HEPATITIS B  1.000000\n",
        "6    Hepatitis A AB  1.000000\n",
        "17              EBV  0.881657\n",
        "9               CMV  0.835024"
       ]
      }
     ],
     "prompt_number": 236
    }
   ],
   "metadata": {}
  }
 ]
}