{ "cells": [ { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import altair as alt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transforming Data\n", "\n", "The bread and butter of a data workflow is cleaning and preparation, taking raw datasets and transforming them into a useful form.\n", "\n", "Today, we'll use some prescription cost data from the NHS to make a chart of the top 50 most used Cardiovascular substances.\n", "\n", "



\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introducing Tools: Pandas\n", "\n", "The first tool we'll use today is `Pandas`, a Python library used to work with datasets. It provides access to `DataFrames` - tables we analyse with code.\n", "\n", "Python already has a few built in data structures, for example lists and dictionaries:\n", "\n", "


" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "london = {\n", " \"name\": \"London\",\n", " \"population\": 8308369,\n", " \"area\": 1572\n", "} # This is an example of a dictionary\n", "\n", "locations = [\n", " {\n", " \"name\": \"London\",\n", " \"population\": 8_982_000,\n", " \"area\": 606\n", " },\n", " {\n", " \"name\": \"Newport\",\n", " \"population\": 128_060,\n", " \"area\": 32.52\n", " },\n", " {\n", " \"name\": \"Darlington\",\n", " \"population\": 93_015,\n", " \"area\": 7.62\n", " },\n", "\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "
\n", "
\n", "
\n", "
\n", "Which we can turn into Pandas `DataFrames`" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepopulationarea
0London8982000606.00
1Newport12806032.52
2Darlington930157.62
\n", "
" ], "text/plain": [ " name population area\n", "0 London 8982000 606.00\n", "1 Newport 128060 32.52\n", "2 Darlington 93015 7.62" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(locations)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "
\n", "


\n", "\n", "
\n", "\n", "and manipulate in different ways.\n", "\n", "For example, we can add a density column:" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepopulationareadensity
0London8982000606.0014821.782178
1Newport12806032.523937.884379
2Darlington930157.6212206.692913
\n", "
" ], "text/plain": [ " name population area density\n", "0 London 8982000 606.00 14821.782178\n", "1 Newport 128060 32.52 3937.884379\n", "2 Darlington 93015 7.62 12206.692913" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['density'] = df['population'] / df['area']\n", "df" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namepopulationareadensity
0London8982000606.0014821.782178
2Darlington930157.6212206.692913
1Newport12806032.523937.884379
\n", "
" ], "text/plain": [ " name population area density\n", "0 London 8982000 606.00 14821.782178\n", "2 Darlington 93015 7.62 12206.692913\n", "1 Newport 128060 32.52 3937.884379" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "sorted_df = df.sort_values(by=\"density\", ascending=False)\n", "sorted_df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Your Own Data: Prescription Cost Analysis\n", "\n", "Let's use this dataset of prescription costs to make a scatter chart of the most prescribed substances in the category 'Cardiovascular System'/\n", "\n", "We will:\n", "\n", "1. Load in the data, from an `.xlsx` document with lots of sheets.\n", "2. Rename and restructure our fields.\n", "3. Filter our data.\n", "4. Export it to a csv for use with Vega-lite.\n", "\n", "


\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading our Data\n", "\n", "I have uploaded the `xlsx` spreedsheet to GitHub but you could easily use a file from your computer.\n", "\n", "We want data just from the sheet 'Chemical_Substances'." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "df = pd.read_excel(\"https://github.com/EconomicsObservatory/courses/raw/main/5/sample_data/pca_summary_tables_2023_24_v001.xlsx\", sheet_name=\"Chemical_Substances\", skiprows=3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at the DataFrame." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Financial YearBNF Chemical Substance CodeBNF Chemical Substance NameBNF Paragraph CodeBNF Paragraph NameBNF Section CodeBNF Section NameBNF Chapter CodeBNF Chapter NameTotal ItemsTotal Cost (£)Cost Per Item (£)
02023/20240101010C0Aluminium hydroxide10101Antacids and simeticone101Dyspepsia and gastro-oesophageal reflux disease1Gastro-Intestinal System113.7113.710000
12023/20240101010F0Magnesium carbonate10101Antacids and simeticone101Dyspepsia and gastro-oesophageal reflux disease1Gastro-Intestinal System39859544.37149.608970
22023/20240101010G0Co-magaldrox (Magnesium/aluminium hydroxide)10101Antacids and simeticone101Dyspepsia and gastro-oesophageal reflux disease1Gastro-Intestinal System28198144222.895.114650
32023/20240101010I0Magnesium oxide10101Antacids and simeticone101Dyspepsia and gastro-oesophageal reflux disease1Gastro-Intestinal System4439421785.4295.018117
42023/20240101010J0Magnesium trisilicate10101Antacids and simeticone101Dyspepsia and gastro-oesophageal reflux disease1Gastro-Intestinal System10686383.255.976826
\n", "
" ], "text/plain": [ " Financial Year BNF Chemical Substance Code \\\n", "0 2023/2024 0101010C0 \n", "1 2023/2024 0101010F0 \n", "2 2023/2024 0101010G0 \n", "3 2023/2024 0101010I0 \n", "4 2023/2024 0101010J0 \n", "\n", " BNF Chemical Substance Name BNF Paragraph Code \\\n", "0 Aluminium hydroxide 10101 \n", "1 Magnesium carbonate 10101 \n", "2 Co-magaldrox (Magnesium/aluminium hydroxide) 10101 \n", "3 Magnesium oxide 10101 \n", "4 Magnesium trisilicate 10101 \n", "\n", " BNF Paragraph Name BNF Section Code \\\n", "0 Antacids and simeticone 101 \n", "1 Antacids and simeticone 101 \n", "2 Antacids and simeticone 101 \n", "3 Antacids and simeticone 101 \n", "4 Antacids and simeticone 101 \n", "\n", " BNF Section Name BNF Chapter Code \\\n", "0 Dyspepsia and gastro-oesophageal reflux disease 1 \n", "1 Dyspepsia and gastro-oesophageal reflux disease 1 \n", "2 Dyspepsia and gastro-oesophageal reflux disease 1 \n", "3 Dyspepsia and gastro-oesophageal reflux disease 1 \n", "4 Dyspepsia and gastro-oesophageal reflux disease 1 \n", "\n", " BNF Chapter Name Total Items Total Cost (£) Cost Per Item (£) \n", "0 Gastro-Intestinal System 1 13.71 13.710000 \n", "1 Gastro-Intestinal System 398 59544.37 149.608970 \n", "2 Gastro-Intestinal System 28198 144222.89 5.114650 \n", "3 Gastro-Intestinal System 4439 421785.42 95.018117 \n", "4 Gastro-Intestinal System 1068 6383.25 5.976826 " ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "## Renaming and Restructuring\n", "\n", "We have lots of columns in our data, not all of which we need. Let's rename the columns and only keep the ones we want." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "df = df.rename(columns={ # specifying that we want to rename columns\n", " \"BNF Chemical Substance Name\": \"substance_name\",\n", " \"BNF Paragraph Name\": \"paragraph_name\",\n", " \"BNF Section Name\": \"section_name\",\n", " \"BNF Chapter Name\": \"chapter_name\",\n", " \"Total Items\": \"total_items\",\n", " \"Total Cost (£)\": \"total_cost\",\n", " \"Cost Per Item (£)\": \"cost_per_item\",\n", "})\n", "\n", "df = df[['subtance_name', 'paragraph_name', 'section_name', 'chapter_name', 'total_items', 'total_cost', 'cost_per_item']] # specifying just the columns we want to keep" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "


\n", "\n", "## Filtering our Data\n", "\n", "We only want data for the 'Cardiovascular System' chapter. Let's filter using `df.query`." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subtance_nameparagraph_namesection_namechapter_nametotal_itemstotal_costcost_per_item
111DigoxinCardiac glycosidesPositive inotropic drugsCardiovascular System25615068415584.683.285405
112EnoximonePhosphodiesterase Type-3 inhibitorsPositive inotropic drugsCardiovascular System115.0215.020000
113BendroflumethiazideThiazides and related diureticsDiureticsCardiovascular System58954694962387.040.841729
114ChlorothiazideThiazides and related diureticsDiureticsCardiovascular System2314106645.4946.087074
115ChlortalidoneThiazides and related diureticsDiureticsCardiovascular System155461154916.5174.290268
........................
286GemfibrozilLipid-regulating drugsLipid-regulating drugsCardiovascular System6040198960.8432.940536
287Nicotinic acidLipid-regulating drugsLipid-regulating drugsCardiovascular System4133.3233.330000
288Pravastatin sodiumLipid-regulating drugsLipid-regulating drugsCardiovascular System22337904518365.472.022735
289SimvastatinLipid-regulating drugsLipid-regulating drugsCardiovascular System1366672514597488.571.068104
290Sodium tetradecyl sulphateLocal sclerosantsLocal sclerosantsCardiovascular System29.084.540000
\n", "

180 rows × 7 columns

\n", "
" ], "text/plain": [ " subtance_name paragraph_name \\\n", "111 Digoxin Cardiac glycosides \n", "112 Enoximone Phosphodiesterase Type-3 inhibitors \n", "113 Bendroflumethiazide Thiazides and related diuretics \n", "114 Chlorothiazide Thiazides and related diuretics \n", "115 Chlortalidone Thiazides and related diuretics \n", ".. ... ... \n", "286 Gemfibrozil Lipid-regulating drugs \n", "287 Nicotinic acid Lipid-regulating drugs \n", "288 Pravastatin sodium Lipid-regulating drugs \n", "289 Simvastatin Lipid-regulating drugs \n", "290 Sodium tetradecyl sulphate Local sclerosants \n", "\n", " section_name chapter_name total_items \\\n", "111 Positive inotropic drugs Cardiovascular System 2561506 \n", "112 Positive inotropic drugs Cardiovascular System 1 \n", "113 Diuretics Cardiovascular System 5895469 \n", "114 Diuretics Cardiovascular System 2314 \n", "115 Diuretics Cardiovascular System 15546 \n", ".. ... ... ... \n", "286 Lipid-regulating drugs Cardiovascular System 6040 \n", "287 Lipid-regulating drugs Cardiovascular System 4 \n", "288 Lipid-regulating drugs Cardiovascular System 2233790 \n", "289 Lipid-regulating drugs Cardiovascular System 13666725 \n", "290 Local sclerosants Cardiovascular System 2 \n", "\n", " total_cost cost_per_item \n", "111 8415584.68 3.285405 \n", "112 15.02 15.020000 \n", "113 4962387.04 0.841729 \n", "114 106645.49 46.087074 \n", "115 1154916.51 74.290268 \n", ".. ... ... \n", "286 198960.84 32.940536 \n", "287 133.32 33.330000 \n", "288 4518365.47 2.022735 \n", "289 14597488.57 1.068104 \n", "290 9.08 4.540000 \n", "\n", "[180 rows x 7 columns]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.query(\"chapter_name == 'Cardiovascular System'\")\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have 180 rows. Let's just keep the top 50 with the highest item count, which we can do by sorting and taking the top 50 rows:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subtance_nameparagraph_namesection_namechapter_nametotal_itemstotal_costcost_per_item
279AtorvastatinLipid-regulating drugsLipid-regulating drugsCardiovascular System654753561.517661e+082.317912
217AmlodipineCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System374213993.695931e+070.987652
191RamiprilRenin-angiotensin system drugsHypertension and heart failureCardiovascular System334477354.859265e+071.452794
151Bisoprolol fumarateBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System287994552.899027e+071.006626
259AspirinAntiplatelet drugsAntiplatelet drugsCardiovascular System201997641.858923e+070.920270
........................
173IndoraminAlpha-adrenoceptor blocking drugsHypertension and heart failureCardiovascular System14.017000e+0140.170000
112EnoximonePhosphodiesterase Type-3 inhibitorsPositive inotropic drugsCardiovascular System11.502000e+0115.020000
131Co-triamterzide(Triamterene/hydrochlorothiazide)Potassium sparing diuretics and compoundsDiureticsCardiovascular System19.500000e-010.950000
240Ephedrine hydrochlorideVasoconstrictor sympathomimeticsSympathomimetics and other vasoconstrictor drugsCardiovascular System11.909000e+0119.090000
181Co-zidocapt (Hydrochlorothiazide/captopril)Renin-angiotensin system drugsHypertension and heart failureCardiovascular System11.400000e+0114.000000
\n", "

180 rows × 7 columns

\n", "
" ], "text/plain": [ " subtance_name \\\n", "279 Atorvastatin \n", "217 Amlodipine \n", "191 Ramipril \n", "151 Bisoprolol fumarate \n", "259 Aspirin \n", ".. ... \n", "173 Indoramin \n", "112 Enoximone \n", "131 Co-triamterzide(Triamterene/hydrochlorothiazide) \n", "240 Ephedrine hydrochloride \n", "181 Co-zidocapt (Hydrochlorothiazide/captopril) \n", "\n", " paragraph_name \\\n", "279 Lipid-regulating drugs \n", "217 Calcium-channel blockers \n", "191 Renin-angiotensin system drugs \n", "151 Beta-adrenoceptor blocking drugs \n", "259 Antiplatelet drugs \n", ".. ... \n", "173 Alpha-adrenoceptor blocking drugs \n", "112 Phosphodiesterase Type-3 inhibitors \n", "131 Potassium sparing diuretics and compounds \n", "240 Vasoconstrictor sympathomimetics \n", "181 Renin-angiotensin system drugs \n", "\n", " section_name chapter_name \\\n", "279 Lipid-regulating drugs Cardiovascular System \n", "217 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "191 Hypertension and heart failure Cardiovascular System \n", "151 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "259 Antiplatelet drugs Cardiovascular System \n", ".. ... ... \n", "173 Hypertension and heart failure Cardiovascular System \n", "112 Positive inotropic drugs Cardiovascular System \n", "131 Diuretics Cardiovascular System \n", "240 Sympathomimetics and other vasoconstrictor drugs Cardiovascular System \n", "181 Hypertension and heart failure Cardiovascular System \n", "\n", " total_items total_cost cost_per_item \n", "279 65475356 1.517661e+08 2.317912 \n", "217 37421399 3.695931e+07 0.987652 \n", "191 33447735 4.859265e+07 1.452794 \n", "151 28799455 2.899027e+07 1.006626 \n", "259 20199764 1.858923e+07 0.920270 \n", ".. ... ... ... \n", "173 1 4.017000e+01 40.170000 \n", "112 1 1.502000e+01 15.020000 \n", "131 1 9.500000e-01 0.950000 \n", "240 1 1.909000e+01 19.090000 \n", "181 1 1.400000e+01 14.000000 \n", "\n", "[180 rows x 7 columns]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = df.sort_values(\"total_items\", ascending=False)\n", "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "Our data is sorted - let's just take the top 50 rows." ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [], "source": [ "df = df.head(50)" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
subtance_nameparagraph_namesection_namechapter_nametotal_itemstotal_costcost_per_item
279AtorvastatinLipid-regulating drugsLipid-regulating drugsCardiovascular System654753561.517661e+082.317912
217AmlodipineCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System374213993.695931e+070.987652
191RamiprilRenin-angiotensin system drugsHypertension and heart failureCardiovascular System334477354.859265e+071.452794
151Bisoprolol fumarateBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System287994552.899027e+071.006626
259AspirinAntiplatelet drugsAntiplatelet drugsCardiovascular System201997641.858923e+070.920270
289SimvastatinLipid-regulating drugsLipid-regulating drugsCardiovascular System136667251.459749e+071.068104
205Losartan potassiumRenin-angiotensin system drugsHypertension and heart failureCardiovascular System119122961.589276e+071.334147
260ClopidogrelAntiplatelet drugsAntiplatelet drugsCardiovascular System113527191.459731e+071.285799
121FurosemideLoop diureticsDiureticsCardiovascular System110716181.083850e+070.978945
258ApixabanOral anticoagulantsAnticoagulants and protamineCardiovascular System84704752.388879e+0828.202419
203Candesartan cilexetilRenin-angiotensin system drugsHypertension and heart failureCardiovascular System79090861.863590e+072.356265
186LisinoprilRenin-angiotensin system drugsHypertension and heart failureCardiovascular System76952061.051503e+071.366439
156Propranolol hydrochlorideBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System74758452.274102e+073.041933
172Doxazosin mesilateAlpha-adrenoceptor blocking drugsHypertension and heart failureCardiovascular System72120821.553900e+072.154579
113BendroflumethiazideThiazides and related diureticsDiureticsCardiovascular System58954694.962387e+060.841729
117IndapamideThiazides and related diureticsDiureticsCardiovascular System56685111.131126e+071.995456
268Rosuvastatin calciumLipid-regulating drugsLipid-regulating drugsCardiovascular System48407461.040777e+072.150035
249EdoxabanOral anticoagulantsAnticoagulants and protamineCardiovascular System45245222.281015e+0850.414491
216Isosorbide mononitrateNitratesNitrates, calcium-channel blockers & other ant...Cardiovascular System44491731.668716e+073.750620
150AtenololBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System44470116.378309e+061.434291
257RivaroxabanOral anticoagulantsAnticoagulants and protamineCardiovascular System42759062.256078e+0852.762567
220FelodipineCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System38991442.209864e+075.667563
187Perindopril erbumineRenin-angiotensin system drugsHypertension and heart failureCardiovascular System37035738.244155e+062.226000
222Lercanidipine hydrochlorideCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System36379082.180692e+075.994357
254Warfarin sodiumOral anticoagulantsAnticoagulants and protamineCardiovascular System35026367.181262e+062.050245
124SpironolactonePotassium-sparing diuretics and aldosterone an...DiureticsCardiovascular System34956871.302949e+073.727303
283EzetimibeLipid-regulating drugsLipid-regulating drugsCardiovascular System32433952.758251e+078.504209
111DigoxinCardiac glycosidesPositive inotropic drugsCardiovascular System25615068.415585e+063.285405
288Pravastatin sodiumLipid-regulating drugsLipid-regulating drugsCardiovascular System22337904.518365e+062.022735
219Diltiazem hydrochlorideCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System21198891.825240e+078.610073
120BumetanideLoop diureticsDiureticsCardiovascular System19009638.161129e+064.293155
214Glyceryl trinitrateNitratesNitrates, calcium-channel blockers & other ant...Cardiovascular System16660734.859361e+062.916655
204IrbesartanRenin-angiotensin system drugsHypertension and heart failureCardiovascular System16054692.701446e+061.682652
125EplerenonePotassium-sparing diuretics and aldosterone an...DiureticsCardiovascular System14743579.841154e+066.674879
225NifedipineCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System12757751.620652e+0712.703277
201Sacubitril/valsartanRenin-angiotensin system drugsHypertension and heart failureCardiovascular System12490371.087769e+0887.088650
183Enalapril maleateRenin-angiotensin system drugsHypertension and heart failureCardiovascular System12256363.127985e+062.552132
228NicorandilOther antianginal drugsNitrates, calcium-channel blockers & other ant...Cardiovascular System9808411.810236e+061.845595
230RanolazineOther antianginal drugsNitrates, calcium-channel blockers & other ant...Cardiovascular System9285133.887056e+0741.863237
147CarvedilolBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System6835671.536766e+062.248157
148NebivololBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System6432566.564158e+0610.204581
285FenofibrateLipid-regulating drugsLipid-regulating drugsCardiovascular System5921772.576592e+064.351050
221LacidipineCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System5854842.772281e+064.735024
265Tranexamic acidAntifibrinolytic drugs and haemostaticsAntifibrinolytic drugs and haemostaticsCardiovascular System5841623.454837e+065.914176
229IvabradineOther antianginal drugsNitrates, calcium-channel blockers & other ant...Cardiovascular System5770521.386686e+0724.030516
226Verapamil hydrochlorideCalcium-channel blockersNitrates, calcium-channel blockers & other ant...Cardiovascular System5746294.675124e+068.135899
264TicagrelorAntiplatelet drugsAntiplatelet drugsCardiovascular System5146042.817548e+0754.751768
138Flecainide acetateDrugs for arrhythmiasAnti-arrhythmic drugsCardiovascular System4600731.705849e+063.707778
135Amiodarone hydrochlorideDrugs for arrhythmiasAnti-arrhythmic drugsCardiovascular System4136247.689201e+051.858983
157Sotalol hydrochlorideBeta-adrenoceptor blocking drugsBeta-adrenoceptor blocking drugsCardiovascular System4058261.000734e+062.465919
\n", "
" ], "text/plain": [ " subtance_name \\\n", "279 Atorvastatin \n", "217 Amlodipine \n", "191 Ramipril \n", "151 Bisoprolol fumarate \n", "259 Aspirin \n", "289 Simvastatin \n", "205 Losartan potassium \n", "260 Clopidogrel \n", "121 Furosemide \n", "258 Apixaban \n", "203 Candesartan cilexetil \n", "186 Lisinopril \n", "156 Propranolol hydrochloride \n", "172 Doxazosin mesilate \n", "113 Bendroflumethiazide \n", "117 Indapamide \n", "268 Rosuvastatin calcium \n", "249 Edoxaban \n", "216 Isosorbide mononitrate \n", "150 Atenolol \n", "257 Rivaroxaban \n", "220 Felodipine \n", "187 Perindopril erbumine \n", "222 Lercanidipine hydrochloride \n", "254 Warfarin sodium \n", "124 Spironolactone \n", "283 Ezetimibe \n", "111 Digoxin \n", "288 Pravastatin sodium \n", "219 Diltiazem hydrochloride \n", "120 Bumetanide \n", "214 Glyceryl trinitrate \n", "204 Irbesartan \n", "125 Eplerenone \n", "225 Nifedipine \n", "201 Sacubitril/valsartan \n", "183 Enalapril maleate \n", "228 Nicorandil \n", "230 Ranolazine \n", "147 Carvedilol \n", "148 Nebivolol \n", "285 Fenofibrate \n", "221 Lacidipine \n", "265 Tranexamic acid \n", "229 Ivabradine \n", "226 Verapamil hydrochloride \n", "264 Ticagrelor \n", "138 Flecainide acetate \n", "135 Amiodarone hydrochloride \n", "157 Sotalol hydrochloride \n", "\n", " paragraph_name \\\n", "279 Lipid-regulating drugs \n", "217 Calcium-channel blockers \n", "191 Renin-angiotensin system drugs \n", "151 Beta-adrenoceptor blocking drugs \n", "259 Antiplatelet drugs \n", "289 Lipid-regulating drugs \n", "205 Renin-angiotensin system drugs \n", "260 Antiplatelet drugs \n", "121 Loop diuretics \n", "258 Oral anticoagulants \n", "203 Renin-angiotensin system drugs \n", "186 Renin-angiotensin system drugs \n", "156 Beta-adrenoceptor blocking drugs \n", "172 Alpha-adrenoceptor blocking drugs \n", "113 Thiazides and related diuretics \n", "117 Thiazides and related diuretics \n", "268 Lipid-regulating drugs \n", "249 Oral anticoagulants \n", "216 Nitrates \n", "150 Beta-adrenoceptor blocking drugs \n", "257 Oral anticoagulants \n", "220 Calcium-channel blockers \n", "187 Renin-angiotensin system drugs \n", "222 Calcium-channel blockers \n", "254 Oral anticoagulants \n", "124 Potassium-sparing diuretics and aldosterone an... \n", "283 Lipid-regulating drugs \n", "111 Cardiac glycosides \n", "288 Lipid-regulating drugs \n", "219 Calcium-channel blockers \n", "120 Loop diuretics \n", "214 Nitrates \n", "204 Renin-angiotensin system drugs \n", "125 Potassium-sparing diuretics and aldosterone an... \n", "225 Calcium-channel blockers \n", "201 Renin-angiotensin system drugs \n", "183 Renin-angiotensin system drugs \n", "228 Other antianginal drugs \n", "230 Other antianginal drugs \n", "147 Beta-adrenoceptor blocking drugs \n", "148 Beta-adrenoceptor blocking drugs \n", "285 Lipid-regulating drugs \n", "221 Calcium-channel blockers \n", "265 Antifibrinolytic drugs and haemostatics \n", "229 Other antianginal drugs \n", "226 Calcium-channel blockers \n", "264 Antiplatelet drugs \n", "138 Drugs for arrhythmias \n", "135 Drugs for arrhythmias \n", "157 Beta-adrenoceptor blocking drugs \n", "\n", " section_name chapter_name \\\n", "279 Lipid-regulating drugs Cardiovascular System \n", "217 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "191 Hypertension and heart failure Cardiovascular System \n", "151 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "259 Antiplatelet drugs Cardiovascular System \n", "289 Lipid-regulating drugs Cardiovascular System \n", "205 Hypertension and heart failure Cardiovascular System \n", "260 Antiplatelet drugs Cardiovascular System \n", "121 Diuretics Cardiovascular System \n", "258 Anticoagulants and protamine Cardiovascular System \n", "203 Hypertension and heart failure Cardiovascular System \n", "186 Hypertension and heart failure Cardiovascular System \n", "156 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "172 Hypertension and heart failure Cardiovascular System \n", "113 Diuretics Cardiovascular System \n", "117 Diuretics Cardiovascular System \n", "268 Lipid-regulating drugs Cardiovascular System \n", "249 Anticoagulants and protamine Cardiovascular System \n", "216 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "150 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "257 Anticoagulants and protamine Cardiovascular System \n", "220 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "187 Hypertension and heart failure Cardiovascular System \n", "222 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "254 Anticoagulants and protamine Cardiovascular System \n", "124 Diuretics Cardiovascular System \n", "283 Lipid-regulating drugs Cardiovascular System \n", "111 Positive inotropic drugs Cardiovascular System \n", "288 Lipid-regulating drugs Cardiovascular System \n", "219 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "120 Diuretics Cardiovascular System \n", "214 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "204 Hypertension and heart failure Cardiovascular System \n", "125 Diuretics Cardiovascular System \n", "225 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "201 Hypertension and heart failure Cardiovascular System \n", "183 Hypertension and heart failure Cardiovascular System \n", "228 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "230 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "147 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "148 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "285 Lipid-regulating drugs Cardiovascular System \n", "221 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "265 Antifibrinolytic drugs and haemostatics Cardiovascular System \n", "229 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "226 Nitrates, calcium-channel blockers & other ant... Cardiovascular System \n", "264 Antiplatelet drugs Cardiovascular System \n", "138 Anti-arrhythmic drugs Cardiovascular System \n", "135 Anti-arrhythmic drugs Cardiovascular System \n", "157 Beta-adrenoceptor blocking drugs Cardiovascular System \n", "\n", " total_items total_cost cost_per_item \n", "279 65475356 1.517661e+08 2.317912 \n", "217 37421399 3.695931e+07 0.987652 \n", "191 33447735 4.859265e+07 1.452794 \n", "151 28799455 2.899027e+07 1.006626 \n", "259 20199764 1.858923e+07 0.920270 \n", "289 13666725 1.459749e+07 1.068104 \n", "205 11912296 1.589276e+07 1.334147 \n", "260 11352719 1.459731e+07 1.285799 \n", "121 11071618 1.083850e+07 0.978945 \n", "258 8470475 2.388879e+08 28.202419 \n", "203 7909086 1.863590e+07 2.356265 \n", "186 7695206 1.051503e+07 1.366439 \n", "156 7475845 2.274102e+07 3.041933 \n", "172 7212082 1.553900e+07 2.154579 \n", "113 5895469 4.962387e+06 0.841729 \n", "117 5668511 1.131126e+07 1.995456 \n", "268 4840746 1.040777e+07 2.150035 \n", "249 4524522 2.281015e+08 50.414491 \n", "216 4449173 1.668716e+07 3.750620 \n", "150 4447011 6.378309e+06 1.434291 \n", "257 4275906 2.256078e+08 52.762567 \n", "220 3899144 2.209864e+07 5.667563 \n", "187 3703573 8.244155e+06 2.226000 \n", "222 3637908 2.180692e+07 5.994357 \n", "254 3502636 7.181262e+06 2.050245 \n", "124 3495687 1.302949e+07 3.727303 \n", "283 3243395 2.758251e+07 8.504209 \n", "111 2561506 8.415585e+06 3.285405 \n", "288 2233790 4.518365e+06 2.022735 \n", "219 2119889 1.825240e+07 8.610073 \n", "120 1900963 8.161129e+06 4.293155 \n", "214 1666073 4.859361e+06 2.916655 \n", "204 1605469 2.701446e+06 1.682652 \n", "125 1474357 9.841154e+06 6.674879 \n", "225 1275775 1.620652e+07 12.703277 \n", "201 1249037 1.087769e+08 87.088650 \n", "183 1225636 3.127985e+06 2.552132 \n", "228 980841 1.810236e+06 1.845595 \n", "230 928513 3.887056e+07 41.863237 \n", "147 683567 1.536766e+06 2.248157 \n", "148 643256 6.564158e+06 10.204581 \n", "285 592177 2.576592e+06 4.351050 \n", "221 585484 2.772281e+06 4.735024 \n", "265 584162 3.454837e+06 5.914176 \n", "229 577052 1.386686e+07 24.030516 \n", "226 574629 4.675124e+06 8.135899 \n", "264 514604 2.817548e+07 54.751768 \n", "138 460073 1.705849e+06 3.707778 \n", "135 413624 7.689201e+05 1.858983 \n", "157 405826 1.000734e+06 2.465919 " ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", "\n", "# Exporting the Data\n", "\n", "Finally let's save the cleaned and filtered data as a `CSV` to use with Vega-lite." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "df.to_csv(\"sample_data/cardiovascular_system_substances.csv\", index=False)" ] } ], "metadata": { "kernelspec": { "display_name": "base", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }