{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Measure calculations with Pandas\n",
    "\n",
    "This notebook demonstrates how to calculate measure ratios and percentiles, using the [Methotrexate measure](https://openprescribing.net/measure/methotrexate/) for CCGs as an example.  It is straightforward to perform calculations for practices instead, and we will note how to do this below.\n",
    "\n",
    "Calculations are performed using data in the `hscic.normalised_prescribing_standard` table in BigQuery."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "We need `pandas` for the computation, and `requests` to validate the computation against the OpenPrescribing implementation, via the OpenPrescribing API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import requests"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting the data\n",
    "\n",
    "We'll make two queries against BigQuery, one for the numerator values and the other for the denominator values.\n",
    "\n",
    "Some notes:\n",
    "\n",
    "* The `WHERE` conditions on the `bnf_code` column come from the [measure definition](https://github.com/ebmdatalab/openprescribing/blob/master/openprescribing/frontend/management/commands/measure_definitions/methotrexate.json).\n",
    "* In order to match the OpenPrescibing implementation exactly, we need to restrict prescriptions to those prescribed by GP practices (`setting = 4`) in CCGs (`org_type = 'CCG'`).\n",
    "* To calculate ratios and percentiles for practices instead of CCGs, replace `pct` with `practice` in the `SELECT` and `GROUP BY` clauses.\n",
    "* The `CONCAT/CAST/EXTRACT/LPAD` dance converts the dates stored in BigQuery to strings of the form `YYYY_MM`.  This is not necessary, but makes the data easier to work with.\n",
    "* Here we're only retrieving data from 2018_06 onwards, but there is data going back to 2010_08."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "project_name = 'ebmdatalab'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "numerator_query = '''\n",
    "SELECT\n",
    "    pct,\n",
    "    CONCAT(\n",
    "        CAST(EXTRACT(YEAR FROM month) AS STRING),\n",
    "        \"_\",\n",
    "        LPAD(CAST(EXTRACT(MONTH FROM month) AS STRING), 2, \"0\")\n",
    "    ) AS month,\n",
    "    SUM(items) AS value\n",
    "FROM\n",
    "    hscic.normalised_prescribing_standard AS prescriptions\n",
    "INNER JOIN hscic.practices\n",
    "    ON prescriptions.practice = practices.code\n",
    "INNER JOIN hscic.ccgs\n",
    "    ON prescriptions.pct = ccgs.code\n",
    "WHERE\n",
    "    bnf_code LIKE '1001030U0%AC'\n",
    "    AND setting = 4\n",
    "    AND org_type = 'CCG'\n",
    "    AND month >= TIMESTAMP('2018-06-01')\n",
    "GROUP BY pct, month\n",
    "'''\n",
    "\n",
    "denominator_query = '''\n",
    "SELECT\n",
    "    pct,\n",
    "    CONCAT(\n",
    "        CAST(EXTRACT(YEAR FROM prescriptions.month) AS STRING),\n",
    "        \"_\",\n",
    "        LPAD(CAST(EXTRACT(MONTH FROM prescriptions.month) AS STRING), 2, \"0\")\n",
    "    ) AS month,\n",
    "    SUM(items) AS items,\n",
    "    SUM(total_list_size) AS population,\n",
    "    SUM(CAST(JSON_EXTRACT(star_pu, '$.oral_antibacterials_item') AS FLOAT64)) AS star_pu\n",
    "FROM\n",
    "    hscic.normalised_prescribing_standard AS prescriptions\n",
    "INNER JOIN hscic.practices\n",
    "    ON prescriptions.practice = practices.code\n",
    "INNER JOIN hscic.practice_statistics_all_years stats\n",
    "    ON prescriptions.practice = stats.practice\n",
    "    AND prescriptions.month = stats.month\n",
    "INNER JOIN hscic.ccgs\n",
    "    ON prescriptions.pct = ccgs.code\n",
    "WHERE\n",
    "    (bnf_code LIKE '1001030U0%AB' OR bnf_code LIKE '1001030U0%AC')\n",
    "    AND setting = 4\n",
    "    AND org_type = 'CCG'\n",
    "    AND prescriptions.month >= TIMESTAMP('2018-06-01')\n",
    "GROUP BY pct, month\n",
    "'''"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requesting query... ok.\n",
      "Job ID: 3f017fdd-2a4a-4d93-976c-cc69211526ff\n",
      "Query running...\n",
      "Query done.\n",
      "Processed: 38.5 GB Billed: 38.5 GB\n",
      "Standard price: $0.19 USD\n",
      "\n",
      "Retrieving results...\n",
      "Got 1159 rows.\n",
      "\n",
      "Total time taken 3.08 s.\n",
      "Finished at 2019-02-25 12:22:11.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pct</th>\n",
       "      <th>month</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>06P</td>\n",
       "      <td>2018_08</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10Q</td>\n",
       "      <td>2018_10</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>09E</td>\n",
       "      <td>2018_08</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>01K</td>\n",
       "      <td>2018_11</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>06V</td>\n",
       "      <td>2018_11</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   pct    month  value\n",
       "0  06P  2018_08      3\n",
       "1  10Q  2018_10      5\n",
       "2  09E  2018_08      9\n",
       "3  01K  2018_11      3\n",
       "4  06V  2018_11      1"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "numerators_raw = pd.read_gbq(numerator_query, project_name, dialect='standard')\n",
    "numerators_raw.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requesting query... ok.\n",
      "Job ID: 110b5c54-f744-4b56-8d90-40ed3ac50ded\n",
      "Query running...\n",
      "Query done.\n",
      "Cache hit.\n",
      "\n",
      "Retrieving results...\n",
      "Got 1365 rows.\n",
      "\n",
      "Total time taken 0.8 s.\n",
      "Finished at 2019-02-25 12:22:13.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pct</th>\n",
       "      <th>month</th>\n",
       "      <th>items</th>\n",
       "      <th>population</th>\n",
       "      <th>star_pu</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>06D</td>\n",
       "      <td>2018_09</td>\n",
       "      <td>403</td>\n",
       "      <td>124796</td>\n",
       "      <td>75059.487761</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>01Y</td>\n",
       "      <td>2018_09</td>\n",
       "      <td>549</td>\n",
       "      <td>278827</td>\n",
       "      <td>157100.837373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>04Q</td>\n",
       "      <td>2018_12</td>\n",
       "      <td>444</td>\n",
       "      <td>141740</td>\n",
       "      <td>84531.203842</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>05H</td>\n",
       "      <td>2018_10</td>\n",
       "      <td>380</td>\n",
       "      <td>177051</td>\n",
       "      <td>102371.390365</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>01J</td>\n",
       "      <td>2018_06</td>\n",
       "      <td>426</td>\n",
       "      <td>165352</td>\n",
       "      <td>92706.343987</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   pct    month  items  population        star_pu\n",
       "0  06D  2018_09    403      124796   75059.487761\n",
       "1  01Y  2018_09    549      278827  157100.837373\n",
       "2  04Q  2018_12    444      141740   84531.203842\n",
       "3  05H  2018_10    380      177051  102371.390365\n",
       "4  01J  2018_06    426      165352   92706.343987"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "denominators_raw = pd.read_gbq(denominator_query, project_name, dialect='standard')\n",
    "denominators_raw.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "### Select desired denominator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>pct</th>\n",
       "      <th>month</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>06D</td>\n",
       "      <td>2018_09</td>\n",
       "      <td>403</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>01Y</td>\n",
       "      <td>2018_09</td>\n",
       "      <td>549</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>04Q</td>\n",
       "      <td>2018_12</td>\n",
       "      <td>444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>05H</td>\n",
       "      <td>2018_10</td>\n",
       "      <td>380</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>01J</td>\n",
       "      <td>2018_06</td>\n",
       "      <td>426</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   pct    month  value\n",
       "0  06D  2018_09    403\n",
       "1  01Y  2018_09    549\n",
       "2  04Q  2018_12    444\n",
       "3  05H  2018_10    380\n",
       "4  01J  2018_06    426"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# refer to https://docs.google.com/spreadsheets/d/1F7a92URkQgX244LPFvZxl6tEmWdJbVELm3R1BfKpspw/edit#gid=187146618\n",
    "\n",
    "denominators_select = denominators_raw.copy()\n",
    "\n",
    "denominators_select[\"value\"] = denominators_select[\"items\"] # or \"population\" or \"star_pu\"\n",
    "    \n",
    "denominators_select = denominators_select[[\"pct\",\"month\",\"value\"]]\n",
    "denominators_select.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reshaping the data\n",
    "\n",
    "Querying BigQuery gives us a `DataFrame` with one row per CCG per month.  Instead, we want a `DataFrame` with one row per CCG and one column per month.\n",
    "\n",
    "We can achieve this with [`set_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html) and [`unstack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html).\n",
    "\n",
    "Some notes:\n",
    "\n",
    "* If there is no row for a given CCG and month in the raw table, then the corresponding value in the unstacked table will be `NaN`.  We'll resolve this when calculating the ratios below.\n",
    "* To calculate ratios and percentiles for practices instead of CCGs, replace `pct` with `practice`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>month</th>\n",
       "      <th>2018_06</th>\n",
       "      <th>2018_07</th>\n",
       "      <th>2018_08</th>\n",
       "      <th>2018_09</th>\n",
       "      <th>2018_10</th>\n",
       "      <th>2018_11</th>\n",
       "      <th>2018_12</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pct</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>00D</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00J</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00K</th>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00L</th>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00M</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>5.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "month  2018_06  2018_07  2018_08  2018_09  2018_10  2018_11  2018_12\n",
       "pct                                                                 \n",
       "00D        NaN      NaN      NaN      NaN      NaN      2.0      1.0\n",
       "00J        NaN      NaN      NaN      NaN      1.0      NaN      NaN\n",
       "00K        2.0      NaN      NaN      NaN      NaN      NaN      1.0\n",
       "00L        1.0      1.0      1.0      3.0      1.0      3.0      3.0\n",
       "00M        3.0      3.0      5.0      2.0      3.0      3.0      5.0"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "numerators = numerators_raw.set_index(['pct', 'month']).unstack()['value']\n",
    "numerators.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>month</th>\n",
       "      <th>2018_06</th>\n",
       "      <th>2018_07</th>\n",
       "      <th>2018_08</th>\n",
       "      <th>2018_09</th>\n",
       "      <th>2018_10</th>\n",
       "      <th>2018_11</th>\n",
       "      <th>2018_12</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pct</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>00C</th>\n",
       "      <td>324</td>\n",
       "      <td>347</td>\n",
       "      <td>343</td>\n",
       "      <td>325</td>\n",
       "      <td>364</td>\n",
       "      <td>346</td>\n",
       "      <td>355</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00D</th>\n",
       "      <td>856</td>\n",
       "      <td>869</td>\n",
       "      <td>883</td>\n",
       "      <td>829</td>\n",
       "      <td>897</td>\n",
       "      <td>828</td>\n",
       "      <td>851</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00J</th>\n",
       "      <td>790</td>\n",
       "      <td>800</td>\n",
       "      <td>833</td>\n",
       "      <td>763</td>\n",
       "      <td>803</td>\n",
       "      <td>780</td>\n",
       "      <td>790</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00K</th>\n",
       "      <td>544</td>\n",
       "      <td>586</td>\n",
       "      <td>551</td>\n",
       "      <td>531</td>\n",
       "      <td>549</td>\n",
       "      <td>549</td>\n",
       "      <td>532</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00L</th>\n",
       "      <td>1032</td>\n",
       "      <td>1086</td>\n",
       "      <td>1122</td>\n",
       "      <td>1001</td>\n",
       "      <td>1098</td>\n",
       "      <td>1072</td>\n",
       "      <td>1078</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "month  2018_06  2018_07  2018_08  2018_09  2018_10  2018_11  2018_12\n",
       "pct                                                                 \n",
       "00C        324      347      343      325      364      346      355\n",
       "00D        856      869      883      829      897      828      851\n",
       "00J        790      800      833      763      803      780      790\n",
       "00K        544      586      551      531      549      549      532\n",
       "00L       1032     1086     1122     1001     1098     1072     1078"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "denominators = denominators_select.set_index(['pct', 'month']).unstack()['value']\n",
    "denominators.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Doing the calculations\n",
    "\n",
    "With the data in the right form, we can now do the calculations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Ratios"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To match the OpenPrescribing implementation, if either the numerator or denominator is missing for a given CCG and month, we set the ratio to zero.  This is what `fillna()` is doing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>month</th>\n",
       "      <th>2018_06</th>\n",
       "      <th>2018_07</th>\n",
       "      <th>2018_08</th>\n",
       "      <th>2018_09</th>\n",
       "      <th>2018_10</th>\n",
       "      <th>2018_11</th>\n",
       "      <th>2018_12</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pct</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>00C</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00D</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.002415</td>\n",
       "      <td>0.001175</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00J</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.001245</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00K</th>\n",
       "      <td>0.003676</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.001880</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00L</th>\n",
       "      <td>0.000969</td>\n",
       "      <td>0.000921</td>\n",
       "      <td>0.000891</td>\n",
       "      <td>0.002997</td>\n",
       "      <td>0.000911</td>\n",
       "      <td>0.002799</td>\n",
       "      <td>0.002783</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "month   2018_06   2018_07   2018_08   2018_09   2018_10   2018_11   2018_12\n",
       "pct                                                                        \n",
       "00C    0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000\n",
       "00D    0.000000  0.000000  0.000000  0.000000  0.000000  0.002415  0.001175\n",
       "00J    0.000000  0.000000  0.000000  0.000000  0.001245  0.000000  0.000000\n",
       "00K    0.003676  0.000000  0.000000  0.000000  0.000000  0.000000  0.001880\n",
       "00L    0.000969  0.000921  0.000891  0.002997  0.000911  0.002799  0.002783"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ratios = (numerators / denominators).fillna(0)\n",
    "ratios.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Percentile ranks\n",
    "\n",
    "The simpler `ratios.rank(method='min', pct=True) * 100` doesn't produce quite the same results as the OpenPrescribing implementation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>month</th>\n",
       "      <th>2018_06</th>\n",
       "      <th>2018_07</th>\n",
       "      <th>2018_08</th>\n",
       "      <th>2018_09</th>\n",
       "      <th>2018_10</th>\n",
       "      <th>2018_11</th>\n",
       "      <th>2018_12</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>pct</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>00C</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00D</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>25.773196</td>\n",
       "      <td>17.525773</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00J</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>20.618557</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00K</th>\n",
       "      <td>33.505155</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>23.195876</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>00L</th>\n",
       "      <td>15.979381</td>\n",
       "      <td>18.556701</td>\n",
       "      <td>17.010309</td>\n",
       "      <td>27.835052</td>\n",
       "      <td>16.494845</td>\n",
       "      <td>27.835052</td>\n",
       "      <td>29.896907</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "month    2018_06    2018_07    2018_08    2018_09    2018_10    2018_11  \\\n",
       "pct                                                                       \n",
       "00C     0.000000   0.000000   0.000000   0.000000   0.000000   0.000000   \n",
       "00D     0.000000   0.000000   0.000000   0.000000   0.000000  25.773196   \n",
       "00J     0.000000   0.000000   0.000000   0.000000  20.618557   0.000000   \n",
       "00K    33.505155   0.000000   0.000000   0.000000   0.000000   0.000000   \n",
       "00L    15.979381  18.556701  17.010309  27.835052  16.494845  27.835052   \n",
       "\n",
       "month    2018_12  \n",
       "pct               \n",
       "00C     0.000000  \n",
       "00D    17.525773  \n",
       "00J     0.000000  \n",
       "00K    23.195876  \n",
       "00L    29.896907  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "percentile_ranks = (ratios.rank(method='min') - 1) / (ratios.count() - 1) * 100\n",
    "percentile_ranks.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deciles\n",
    "\n",
    "We use strings for the index keys, as they are easier to work with than floats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>month</th>\n",
       "      <th>2018_06</th>\n",
       "      <th>2018_07</th>\n",
       "      <th>2018_08</th>\n",
       "      <th>2018_09</th>\n",
       "      <th>2018_10</th>\n",
       "      <th>2018_11</th>\n",
       "      <th>2018_12</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>0.001537</td>\n",
       "      <td>0.001064</td>\n",
       "      <td>0.001243</td>\n",
       "      <td>0.001538</td>\n",
       "      <td>0.001215</td>\n",
       "      <td>0.001436</td>\n",
       "      <td>0.001590</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>0.003179</td>\n",
       "      <td>0.003416</td>\n",
       "      <td>0.002815</td>\n",
       "      <td>0.003464</td>\n",
       "      <td>0.002763</td>\n",
       "      <td>0.003132</td>\n",
       "      <td>0.002795</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>0.005211</td>\n",
       "      <td>0.004675</td>\n",
       "      <td>0.005813</td>\n",
       "      <td>0.005738</td>\n",
       "      <td>0.004805</td>\n",
       "      <td>0.005750</td>\n",
       "      <td>0.005322</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50</th>\n",
       "      <td>0.010710</td>\n",
       "      <td>0.008696</td>\n",
       "      <td>0.009756</td>\n",
       "      <td>0.008830</td>\n",
       "      <td>0.009443</td>\n",
       "      <td>0.009195</td>\n",
       "      <td>0.009009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>60</th>\n",
       "      <td>0.016117</td>\n",
       "      <td>0.014645</td>\n",
       "      <td>0.014059</td>\n",
       "      <td>0.013013</td>\n",
       "      <td>0.013431</td>\n",
       "      <td>0.013701</td>\n",
       "      <td>0.012082</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>70</th>\n",
       "      <td>0.022892</td>\n",
       "      <td>0.022638</td>\n",
       "      <td>0.020071</td>\n",
       "      <td>0.022348</td>\n",
       "      <td>0.023063</td>\n",
       "      <td>0.022093</td>\n",
       "      <td>0.020681</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>0.049535</td>\n",
       "      <td>0.052413</td>\n",
       "      <td>0.049464</td>\n",
       "      <td>0.046032</td>\n",
       "      <td>0.043519</td>\n",
       "      <td>0.046047</td>\n",
       "      <td>0.043762</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>0.104575</td>\n",
       "      <td>0.100952</td>\n",
       "      <td>0.097624</td>\n",
       "      <td>0.093164</td>\n",
       "      <td>0.090902</td>\n",
       "      <td>0.093508</td>\n",
       "      <td>0.088868</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>0.338286</td>\n",
       "      <td>0.287554</td>\n",
       "      <td>0.319460</td>\n",
       "      <td>0.310615</td>\n",
       "      <td>0.291085</td>\n",
       "      <td>0.307860</td>\n",
       "      <td>0.297619</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "month   2018_06   2018_07   2018_08   2018_09   2018_10   2018_11   2018_12\n",
       "0      0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000\n",
       "10     0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000\n",
       "20     0.001537  0.001064  0.001243  0.001538  0.001215  0.001436  0.001590\n",
       "30     0.003179  0.003416  0.002815  0.003464  0.002763  0.003132  0.002795\n",
       "40     0.005211  0.004675  0.005813  0.005738  0.004805  0.005750  0.005322\n",
       "50     0.010710  0.008696  0.009756  0.008830  0.009443  0.009195  0.009009\n",
       "60     0.016117  0.014645  0.014059  0.013013  0.013431  0.013701  0.012082\n",
       "70     0.022892  0.022638  0.020071  0.022348  0.023063  0.022093  0.020681\n",
       "80     0.049535  0.052413  0.049464  0.046032  0.043519  0.046047  0.043762\n",
       "90     0.104575  0.100952  0.097624  0.093164  0.090902  0.093508  0.088868\n",
       "100    0.338286  0.287554  0.319460  0.310615  0.291085  0.307860  0.297619"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "deciles = pd.DataFrame(\n",
    "    [ratios.quantile(i * 0.1) for i in range(11)],\n",
    "    index=[str(i * 10) for i in range(11)]\n",
    ")\n",
    "deciles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Verifying the calculations\n",
    "\n",
    "We can compare our calculations against the OpenPrescribing implementation by querying the OpenPrescribing API for all deciles, and for a handful of CCGs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "url = 'https://openprescribing.net/api/1.0/measure/'\n",
    "params = {\n",
    "    'format': 'json',\n",
    "    'measure': 'methotrexate',\n",
    "}\n",
    "rsp = requests.get(url, params)\n",
    "\n",
    "for record in rsp.json()['measures'][0]['data']:\n",
    "    month = record['date'][:4] + '_' + record['date'][5:7]\n",
    "    if month < '2018_06':\n",
    "        continue\n",
    "    for k in record['percentiles']['ccg']:\n",
    "        if abs(record['percentiles']['ccg'][k] - deciles[month][k]) > 0.001:\n",
    "            print('Decile', k, 'differs in month', month)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Percentile differs for CCG 00R in month 2018_08\n"
     ]
    }
   ],
   "source": [
    "url = 'https://openprescribing.net/api/1.0/measure_by_ccg/'\n",
    "\n",
    "for ccg_id in ratios.index.to_series().sample(4):\n",
    "    params = {\n",
    "        'format': 'json',\n",
    "        'measure': 'methotrexate',\n",
    "        'org': ccg_id,\n",
    "    }\n",
    "    rsp = requests.get(url, params)\n",
    "\n",
    "    for record in rsp.json()['measures'][0]['data']:\n",
    "        month = record['date'][:4] + '_' + record['date'][5:7]\n",
    "        if month < '2018_06':\n",
    "            continue\n",
    "        if abs(record['calc_value'] - ratios[month][ccg_id]) > 0.001:\n",
    "            print('Ratio differs for CCG', ccg_id, 'in month', month)\n",
    "        if abs(record['percentile'] - percentile_ranks[month][ccg_id]) > 0.001:\n",
    "            print('Percentile differs for CCG', ccg_id, 'in month', month)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "jupytext": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}