{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Measure calculations with Pandas\n",
"\n",
"This notebook demonstrates how to calculate measure ratios and percentiles, using the [Methotrexate measure](https://openprescribing.net/measure/methotrexate/) for CCGs as an example. It is straightforward to perform calculations for practices instead, and we will note how to do this below.\n",
"\n",
"Calculations are performed using data in the `hscic.normalised_prescribing_standard` table in BigQuery."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"We need `pandas` for the computation, and `requests` to validate the computation against the OpenPrescribing implementation, via the OpenPrescribing API."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"import pandas as pd\n",
"import requests"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Getting the data\n",
"\n",
"We'll make two queries against BigQuery, one for the numerator values and the other for the denominator values.\n",
"\n",
"Some notes:\n",
"\n",
"* The `WHERE` conditions on the `bnf_code` column come from the [measure definition](https://github.com/ebmdatalab/openprescribing/blob/master/openprescribing/frontend/management/commands/measure_definitions/methotrexate.json).\n",
"* In order to match the OpenPrescibing implementation exactly, we need to restrict prescriptions to those prescribed by GP practices (`setting = 4`) in CCGs (`org_type = 'CCG'`).\n",
"* To calculate ratios and percentiles for practices instead of CCGs, replace `pct` with `practice` in the `SELECT` and `GROUP BY` clauses.\n",
"* The `CONCAT/CAST/EXTRACT/LPAD` dance converts the dates stored in BigQuery to strings of the form `YYYY_MM`. This is not necessary, but makes the data easier to work with.\n",
"* Here we're only retrieving data from 2018_06 onwards, but there is data going back to 2010_08."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"project_name = 'ebmdatalab'"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"numerator_query = '''\n",
"SELECT\n",
" pct,\n",
" CONCAT(\n",
" CAST(EXTRACT(YEAR FROM month) AS STRING),\n",
" \"_\",\n",
" LPAD(CAST(EXTRACT(MONTH FROM month) AS STRING), 2, \"0\")\n",
" ) AS month,\n",
" SUM(items) AS value\n",
"FROM\n",
" hscic.normalised_prescribing_standard AS prescriptions\n",
"INNER JOIN hscic.practices\n",
" ON prescriptions.practice = practices.code\n",
"INNER JOIN hscic.ccgs\n",
" ON prescriptions.pct = ccgs.code\n",
"WHERE\n",
" bnf_code LIKE '1001030U0%AC'\n",
" AND setting = 4\n",
" AND org_type = 'CCG'\n",
" AND month >= TIMESTAMP('2018-06-01')\n",
"GROUP BY pct, month\n",
"'''\n",
"\n",
"denominator_query = '''\n",
"SELECT\n",
" pct,\n",
" CONCAT(\n",
" CAST(EXTRACT(YEAR FROM prescriptions.month) AS STRING),\n",
" \"_\",\n",
" LPAD(CAST(EXTRACT(MONTH FROM prescriptions.month) AS STRING), 2, \"0\")\n",
" ) AS month,\n",
" SUM(items) AS items,\n",
" SUM(total_list_size) AS population,\n",
" SUM(CAST(JSON_EXTRACT(star_pu, '$.oral_antibacterials_item') AS FLOAT64)) AS star_pu\n",
"FROM\n",
" hscic.normalised_prescribing_standard AS prescriptions\n",
"INNER JOIN hscic.practices\n",
" ON prescriptions.practice = practices.code\n",
"INNER JOIN hscic.practice_statistics_all_years stats\n",
" ON prescriptions.practice = stats.practice\n",
" AND prescriptions.month = stats.month\n",
"INNER JOIN hscic.ccgs\n",
" ON prescriptions.pct = ccgs.code\n",
"WHERE\n",
" (bnf_code LIKE '1001030U0%AB' OR bnf_code LIKE '1001030U0%AC')\n",
" AND setting = 4\n",
" AND org_type = 'CCG'\n",
" AND prescriptions.month >= TIMESTAMP('2018-06-01')\n",
"GROUP BY pct, month\n",
"'''"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requesting query... ok.\n",
"Job ID: 3f017fdd-2a4a-4d93-976c-cc69211526ff\n",
"Query running...\n",
"Query done.\n",
"Processed: 38.5 GB Billed: 38.5 GB\n",
"Standard price: $0.19 USD\n",
"\n",
"Retrieving results...\n",
"Got 1159 rows.\n",
"\n",
"Total time taken 3.08 s.\n",
"Finished at 2019-02-25 12:22:11.\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" pct | \n",
" month | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 06P | \n",
" 2018_08 | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" 10Q | \n",
" 2018_10 | \n",
" 5 | \n",
"
\n",
" \n",
" 2 | \n",
" 09E | \n",
" 2018_08 | \n",
" 9 | \n",
"
\n",
" \n",
" 3 | \n",
" 01K | \n",
" 2018_11 | \n",
" 3 | \n",
"
\n",
" \n",
" 4 | \n",
" 06V | \n",
" 2018_11 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pct month value\n",
"0 06P 2018_08 3\n",
"1 10Q 2018_10 5\n",
"2 09E 2018_08 9\n",
"3 01K 2018_11 3\n",
"4 06V 2018_11 1"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numerators_raw = pd.read_gbq(numerator_query, project_name, dialect='standard')\n",
"numerators_raw.head()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requesting query... ok.\n",
"Job ID: 110b5c54-f744-4b56-8d90-40ed3ac50ded\n",
"Query running...\n",
"Query done.\n",
"Cache hit.\n",
"\n",
"Retrieving results...\n",
"Got 1365 rows.\n",
"\n",
"Total time taken 0.8 s.\n",
"Finished at 2019-02-25 12:22:13.\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" pct | \n",
" month | \n",
" items | \n",
" population | \n",
" star_pu | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 06D | \n",
" 2018_09 | \n",
" 403 | \n",
" 124796 | \n",
" 75059.487761 | \n",
"
\n",
" \n",
" 1 | \n",
" 01Y | \n",
" 2018_09 | \n",
" 549 | \n",
" 278827 | \n",
" 157100.837373 | \n",
"
\n",
" \n",
" 2 | \n",
" 04Q | \n",
" 2018_12 | \n",
" 444 | \n",
" 141740 | \n",
" 84531.203842 | \n",
"
\n",
" \n",
" 3 | \n",
" 05H | \n",
" 2018_10 | \n",
" 380 | \n",
" 177051 | \n",
" 102371.390365 | \n",
"
\n",
" \n",
" 4 | \n",
" 01J | \n",
" 2018_06 | \n",
" 426 | \n",
" 165352 | \n",
" 92706.343987 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pct month items population star_pu\n",
"0 06D 2018_09 403 124796 75059.487761\n",
"1 01Y 2018_09 549 278827 157100.837373\n",
"2 04Q 2018_12 444 141740 84531.203842\n",
"3 05H 2018_10 380 177051 102371.390365\n",
"4 01J 2018_06 426 165352 92706.343987"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"denominators_raw = pd.read_gbq(denominator_query, project_name, dialect='standard')\n",
"denominators_raw.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Select desired denominator"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" pct | \n",
" month | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 06D | \n",
" 2018_09 | \n",
" 403 | \n",
"
\n",
" \n",
" 1 | \n",
" 01Y | \n",
" 2018_09 | \n",
" 549 | \n",
"
\n",
" \n",
" 2 | \n",
" 04Q | \n",
" 2018_12 | \n",
" 444 | \n",
"
\n",
" \n",
" 3 | \n",
" 05H | \n",
" 2018_10 | \n",
" 380 | \n",
"
\n",
" \n",
" 4 | \n",
" 01J | \n",
" 2018_06 | \n",
" 426 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" pct month value\n",
"0 06D 2018_09 403\n",
"1 01Y 2018_09 549\n",
"2 04Q 2018_12 444\n",
"3 05H 2018_10 380\n",
"4 01J 2018_06 426"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# refer to https://docs.google.com/spreadsheets/d/1F7a92URkQgX244LPFvZxl6tEmWdJbVELm3R1BfKpspw/edit#gid=187146618\n",
"\n",
"denominators_select = denominators_raw.copy()\n",
"\n",
"denominators_select[\"value\"] = denominators_select[\"items\"] # or \"population\" or \"star_pu\"\n",
" \n",
"denominators_select = denominators_select[[\"pct\",\"month\",\"value\"]]\n",
"denominators_select.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reshaping the data\n",
"\n",
"Querying BigQuery gives us a `DataFrame` with one row per CCG per month. Instead, we want a `DataFrame` with one row per CCG and one column per month.\n",
"\n",
"We can achieve this with [`set_index()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html) and [`unstack()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.unstack.html).\n",
"\n",
"Some notes:\n",
"\n",
"* If there is no row for a given CCG and month in the raw table, then the corresponding value in the unstacked table will be `NaN`. We'll resolve this when calculating the ratios below.\n",
"* To calculate ratios and percentiles for practices instead of CCGs, replace `pct` with `practice`."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" month | \n",
" 2018_06 | \n",
" 2018_07 | \n",
" 2018_08 | \n",
" 2018_09 | \n",
" 2018_10 | \n",
" 2018_11 | \n",
" 2018_12 | \n",
"
\n",
" \n",
" pct | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 00D | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 2.0 | \n",
" 1.0 | \n",
"
\n",
" \n",
" 00J | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.0 | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" 00K | \n",
" 2.0 | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" 1.0 | \n",
"
\n",
" \n",
" 00L | \n",
" 1.0 | \n",
" 1.0 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 1.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
"
\n",
" \n",
" 00M | \n",
" 3.0 | \n",
" 3.0 | \n",
" 5.0 | \n",
" 2.0 | \n",
" 3.0 | \n",
" 3.0 | \n",
" 5.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2018_06 2018_07 2018_08 2018_09 2018_10 2018_11 2018_12\n",
"pct \n",
"00D NaN NaN NaN NaN NaN 2.0 1.0\n",
"00J NaN NaN NaN NaN 1.0 NaN NaN\n",
"00K 2.0 NaN NaN NaN NaN NaN 1.0\n",
"00L 1.0 1.0 1.0 3.0 1.0 3.0 3.0\n",
"00M 3.0 3.0 5.0 2.0 3.0 3.0 5.0"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"numerators = numerators_raw.set_index(['pct', 'month']).unstack()['value']\n",
"numerators.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" month | \n",
" 2018_06 | \n",
" 2018_07 | \n",
" 2018_08 | \n",
" 2018_09 | \n",
" 2018_10 | \n",
" 2018_11 | \n",
" 2018_12 | \n",
"
\n",
" \n",
" pct | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 00C | \n",
" 324 | \n",
" 347 | \n",
" 343 | \n",
" 325 | \n",
" 364 | \n",
" 346 | \n",
" 355 | \n",
"
\n",
" \n",
" 00D | \n",
" 856 | \n",
" 869 | \n",
" 883 | \n",
" 829 | \n",
" 897 | \n",
" 828 | \n",
" 851 | \n",
"
\n",
" \n",
" 00J | \n",
" 790 | \n",
" 800 | \n",
" 833 | \n",
" 763 | \n",
" 803 | \n",
" 780 | \n",
" 790 | \n",
"
\n",
" \n",
" 00K | \n",
" 544 | \n",
" 586 | \n",
" 551 | \n",
" 531 | \n",
" 549 | \n",
" 549 | \n",
" 532 | \n",
"
\n",
" \n",
" 00L | \n",
" 1032 | \n",
" 1086 | \n",
" 1122 | \n",
" 1001 | \n",
" 1098 | \n",
" 1072 | \n",
" 1078 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2018_06 2018_07 2018_08 2018_09 2018_10 2018_11 2018_12\n",
"pct \n",
"00C 324 347 343 325 364 346 355\n",
"00D 856 869 883 829 897 828 851\n",
"00J 790 800 833 763 803 780 790\n",
"00K 544 586 551 531 549 549 532\n",
"00L 1032 1086 1122 1001 1098 1072 1078"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"denominators = denominators_select.set_index(['pct', 'month']).unstack()['value']\n",
"denominators.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Doing the calculations\n",
"\n",
"With the data in the right form, we can now do the calculations."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Ratios"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To match the OpenPrescribing implementation, if either the numerator or denominator is missing for a given CCG and month, we set the ratio to zero. This is what `fillna()` is doing."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" month | \n",
" 2018_06 | \n",
" 2018_07 | \n",
" 2018_08 | \n",
" 2018_09 | \n",
" 2018_10 | \n",
" 2018_11 | \n",
" 2018_12 | \n",
"
\n",
" \n",
" pct | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 00C | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 00D | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.002415 | \n",
" 0.001175 | \n",
"
\n",
" \n",
" 00J | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.001245 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 00K | \n",
" 0.003676 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.001880 | \n",
"
\n",
" \n",
" 00L | \n",
" 0.000969 | \n",
" 0.000921 | \n",
" 0.000891 | \n",
" 0.002997 | \n",
" 0.000911 | \n",
" 0.002799 | \n",
" 0.002783 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2018_06 2018_07 2018_08 2018_09 2018_10 2018_11 2018_12\n",
"pct \n",
"00C 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000\n",
"00D 0.000000 0.000000 0.000000 0.000000 0.000000 0.002415 0.001175\n",
"00J 0.000000 0.000000 0.000000 0.000000 0.001245 0.000000 0.000000\n",
"00K 0.003676 0.000000 0.000000 0.000000 0.000000 0.000000 0.001880\n",
"00L 0.000969 0.000921 0.000891 0.002997 0.000911 0.002799 0.002783"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratios = (numerators / denominators).fillna(0)\n",
"ratios.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Percentile ranks\n",
"\n",
"The simpler `ratios.rank(method='min', pct=True) * 100` doesn't produce quite the same results as the OpenPrescribing implementation."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" month | \n",
" 2018_06 | \n",
" 2018_07 | \n",
" 2018_08 | \n",
" 2018_09 | \n",
" 2018_10 | \n",
" 2018_11 | \n",
" 2018_12 | \n",
"
\n",
" \n",
" pct | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 00C | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 00D | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 25.773196 | \n",
" 17.525773 | \n",
"
\n",
" \n",
" 00J | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 20.618557 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 00K | \n",
" 33.505155 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 23.195876 | \n",
"
\n",
" \n",
" 00L | \n",
" 15.979381 | \n",
" 18.556701 | \n",
" 17.010309 | \n",
" 27.835052 | \n",
" 16.494845 | \n",
" 27.835052 | \n",
" 29.896907 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2018_06 2018_07 2018_08 2018_09 2018_10 2018_11 \\\n",
"pct \n",
"00C 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"00D 0.000000 0.000000 0.000000 0.000000 0.000000 25.773196 \n",
"00J 0.000000 0.000000 0.000000 0.000000 20.618557 0.000000 \n",
"00K 33.505155 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
"00L 15.979381 18.556701 17.010309 27.835052 16.494845 27.835052 \n",
"\n",
"month 2018_12 \n",
"pct \n",
"00C 0.000000 \n",
"00D 17.525773 \n",
"00J 0.000000 \n",
"00K 23.195876 \n",
"00L 29.896907 "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"percentile_ranks = (ratios.rank(method='min') - 1) / (ratios.count() - 1) * 100\n",
"percentile_ranks.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Deciles\n",
"\n",
"We use strings for the index keys, as they are easier to work with than floats."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" month | \n",
" 2018_06 | \n",
" 2018_07 | \n",
" 2018_08 | \n",
" 2018_09 | \n",
" 2018_10 | \n",
" 2018_11 | \n",
" 2018_12 | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 10 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
" 0.000000 | \n",
"
\n",
" \n",
" 20 | \n",
" 0.001537 | \n",
" 0.001064 | \n",
" 0.001243 | \n",
" 0.001538 | \n",
" 0.001215 | \n",
" 0.001436 | \n",
" 0.001590 | \n",
"
\n",
" \n",
" 30 | \n",
" 0.003179 | \n",
" 0.003416 | \n",
" 0.002815 | \n",
" 0.003464 | \n",
" 0.002763 | \n",
" 0.003132 | \n",
" 0.002795 | \n",
"
\n",
" \n",
" 40 | \n",
" 0.005211 | \n",
" 0.004675 | \n",
" 0.005813 | \n",
" 0.005738 | \n",
" 0.004805 | \n",
" 0.005750 | \n",
" 0.005322 | \n",
"
\n",
" \n",
" 50 | \n",
" 0.010710 | \n",
" 0.008696 | \n",
" 0.009756 | \n",
" 0.008830 | \n",
" 0.009443 | \n",
" 0.009195 | \n",
" 0.009009 | \n",
"
\n",
" \n",
" 60 | \n",
" 0.016117 | \n",
" 0.014645 | \n",
" 0.014059 | \n",
" 0.013013 | \n",
" 0.013431 | \n",
" 0.013701 | \n",
" 0.012082 | \n",
"
\n",
" \n",
" 70 | \n",
" 0.022892 | \n",
" 0.022638 | \n",
" 0.020071 | \n",
" 0.022348 | \n",
" 0.023063 | \n",
" 0.022093 | \n",
" 0.020681 | \n",
"
\n",
" \n",
" 80 | \n",
" 0.049535 | \n",
" 0.052413 | \n",
" 0.049464 | \n",
" 0.046032 | \n",
" 0.043519 | \n",
" 0.046047 | \n",
" 0.043762 | \n",
"
\n",
" \n",
" 90 | \n",
" 0.104575 | \n",
" 0.100952 | \n",
" 0.097624 | \n",
" 0.093164 | \n",
" 0.090902 | \n",
" 0.093508 | \n",
" 0.088868 | \n",
"
\n",
" \n",
" 100 | \n",
" 0.338286 | \n",
" 0.287554 | \n",
" 0.319460 | \n",
" 0.310615 | \n",
" 0.291085 | \n",
" 0.307860 | \n",
" 0.297619 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"month 2018_06 2018_07 2018_08 2018_09 2018_10 2018_11 2018_12\n",
"0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000\n",
"10 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000\n",
"20 0.001537 0.001064 0.001243 0.001538 0.001215 0.001436 0.001590\n",
"30 0.003179 0.003416 0.002815 0.003464 0.002763 0.003132 0.002795\n",
"40 0.005211 0.004675 0.005813 0.005738 0.004805 0.005750 0.005322\n",
"50 0.010710 0.008696 0.009756 0.008830 0.009443 0.009195 0.009009\n",
"60 0.016117 0.014645 0.014059 0.013013 0.013431 0.013701 0.012082\n",
"70 0.022892 0.022638 0.020071 0.022348 0.023063 0.022093 0.020681\n",
"80 0.049535 0.052413 0.049464 0.046032 0.043519 0.046047 0.043762\n",
"90 0.104575 0.100952 0.097624 0.093164 0.090902 0.093508 0.088868\n",
"100 0.338286 0.287554 0.319460 0.310615 0.291085 0.307860 0.297619"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"deciles = pd.DataFrame(\n",
" [ratios.quantile(i * 0.1) for i in range(11)],\n",
" index=[str(i * 10) for i in range(11)]\n",
")\n",
"deciles"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Verifying the calculations\n",
"\n",
"We can compare our calculations against the OpenPrescribing implementation by querying the OpenPrescribing API for all deciles, and for a handful of CCGs."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"url = 'https://openprescribing.net/api/1.0/measure/'\n",
"params = {\n",
" 'format': 'json',\n",
" 'measure': 'methotrexate',\n",
"}\n",
"rsp = requests.get(url, params)\n",
"\n",
"for record in rsp.json()['measures'][0]['data']:\n",
" month = record['date'][:4] + '_' + record['date'][5:7]\n",
" if month < '2018_06':\n",
" continue\n",
" for k in record['percentiles']['ccg']:\n",
" if abs(record['percentiles']['ccg'][k] - deciles[month][k]) > 0.001:\n",
" print('Decile', k, 'differs in month', month)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Percentile differs for CCG 00R in month 2018_08\n"
]
}
],
"source": [
"url = 'https://openprescribing.net/api/1.0/measure_by_ccg/'\n",
"\n",
"for ccg_id in ratios.index.to_series().sample(4):\n",
" params = {\n",
" 'format': 'json',\n",
" 'measure': 'methotrexate',\n",
" 'org': ccg_id,\n",
" }\n",
" rsp = requests.get(url, params)\n",
"\n",
" for record in rsp.json()['measures'][0]['data']:\n",
" month = record['date'][:4] + '_' + record['date'][5:7]\n",
" if month < '2018_06':\n",
" continue\n",
" if abs(record['calc_value'] - ratios[month][ccg_id]) > 0.001:\n",
" print('Ratio differs for CCG', ccg_id, 'in month', month)\n",
" if abs(record['percentile'] - percentile_ranks[month][ccg_id]) > 0.001:\n",
" print('Percentile differs for CCG', ccg_id, 'in month', month)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"jupytext": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}