{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Course 2 week 1 lecture notebook Ex 02\n", "# Risk Scores, Pandas and Numpy\n", "\n", "Here, you'll get a chance to see the risk scores implemented as Python functions.\n", "- Atrial fibrillation: Chads-vasc score\n", "- Liver disease: MELD score\n", "- Heart disease: ASCVD score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Compute the chads-vasc risk score for atrial fibrillation. \n", "- Look for the `# TODO` comments to see which parts you will complete." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Complete the function that calculates the chads-vasc score. \n", "# Look for the # TODO comments to see which sections you should fill in.\n", "\n", "def chads_vasc_score(input_c, input_h, input_a2, input_d, input_s2, input_v, input_a, input_sc):\n", " # congestive heart failure\n", " coef_c = 1 \n", " \n", " # Coefficient for hypertension\n", " coef_h = 1 \n", " \n", " # Coefficient for Age >= 75 years\n", " coef_a2 = 2\n", " \n", " # Coefficient for diabetes mellitus\n", " coef_d = 1\n", " \n", " # Coefficient for stroke\n", " coef_s2 = 2\n", " \n", " # Coefficient for vascular disease\n", " coef_v = 1\n", " \n", " # Coefficient for age 65 to 74 years\n", " coef_a = 1\n", " \n", " # TODO Coefficient for female\n", " coef_sc = 1\n", " \n", " # Calculate the risk score\n", " risk_score = (input_c * coef_c) +\\\n", " (input_h * coef_h) +\\\n", " (input_a2 * coef_a2) +\\\n", " (input_d * coef_d) +\\\n", " (input_s2 * coef_s2) +\\\n", " (input_v * coef_v) +\\\n", " (input_a * coef_a) +\\\n", " (input_sc * coef_sc)\n", " \n", " return risk_score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Calculate the risk score\n", "\n", "Calculate the chads-vasc score for a patient who has the following attributes:\n", "- Congestive heart failure? No\n", "- Hypertension: yes\n", "- Age 75 or older: no\n", "- Diabetes mellitus: no\n", "- Stroke: no\n", "- Vascular disease: yes\n", "- Age 65 to 74: no\n", "- Female? : yes" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The chads-vasc score for this patient is 3\n" ] } ], "source": [ "# Calculate the patient's Chads-vasc risk score\n", "tmp_c = 0\n", "tmp_h = 1\n", "tmp_a2 = 0\n", "tmp_d = 0\n", "tmp_s2 = 0\n", "tmp_v = 1\n", "tmp_a = 0\n", "tmp_sc = 1\n", "\n", "print(f\"The chads-vasc score for this patient is\",\n", " f\"{chads_vasc_score(tmp_c, tmp_h, tmp_a2, tmp_d, tmp_s2, tmp_v, tmp_a, tmp_sc)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Expected output\n", "```CPP\n", "The chads-vasc score for this patient is 3\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Risk score for liver disease\n", "\n", "Complete the implementation of the MELD score and use it to calculate the risk score for a particular patient.\n", "- Look for the `# TODO` comments to see which parts you will complete." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def liver_disease_mortality(input_creatine, input_bilirubin, input_inr):\n", " \"\"\"\n", " Calculate the probability of mortality given that the patient has\n", " liver disease. \n", " Parameters:\n", " Creatine: mg/dL\n", " Bilirubin: mg/dL\n", " INR: \n", " \"\"\"\n", " # Coefficient values\n", " coef_creatine = 0.957\n", " coef_bilirubin = 0.378\n", " coef_inr = 1.12\n", " intercept = 0.643\n", " # Calculate the natural logarithm of input variables\n", " log_cre = np.log(input_creatine)\n", " log_bil = np.log(input_bilirubin)\n", " \n", " # TODO: Calculate the natural log of input_inr\n", " log_inr = np.log(input_inr)\n", " \n", " # Compute output\n", " meld_score = (coef_creatine * log_cre) +\\\n", " (coef_bilirubin * log_bil ) +\\\n", " (coef_inr * log_inr) +\\\n", " intercept\n", " \n", " # TODO: Multiply meld_score by 10 to get the final risk score\n", " meld_score = meld_score * 10\n", " \n", " return meld_score" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For a patient who has \n", "- Creatinine: 1 mg/dL\n", "- Bilirubin: 2 mg/dL\n", "- INR: 1.1\n", "\n", "Calculate their MELD score" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The patient's MELD score is: 10.12\n" ] } ], "source": [ "tmp_meld_score = liver_disease_mortality(1.0, 2.0, 1.1)\n", "print(f\"The patient's MELD score is: {tmp_meld_score:.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Expected output\n", "```CPP\n", "The patient's MELD score is: 10.12\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ASCVD Risk score for heart disease\n", "\n", "Complete the function that calculates the ASCVD risk score!\n", "\n", "- Ln(Age), coefficient is 17.114\n", "- Ln(total cholesterol): coefficient is 0.94\n", "- Ln(HDL): coefficient is -18.920\n", "- Ln(Age) x Ln(HDL-C): coefficient is 4.475\n", "- Ln (Untreated systolic BP): coefficient is 27.820\n", "- Ln (Age) x Ln 10 (Untreated systolic BP): coefficient is -6.087\n", "- Current smoker (1 or 0): coefficient is 0.691\n", "- Diabetes (1 or 0): coefficient is 0.874\n", "\n", "\n", "Remember that after you calculate the sum of the products (of inputs and coefficients), use this formula to get the risk score:\n", "\n", "$$Risk = 1 - 0.9533^{e^{sumProd - 86.61}}$$\n", "\n", "This is 0.9533 raised to the power of this expression: $e^{sumProd - 86.61}$, and not 0.9533 multiplied by that exponential.\n", "\n", "- Look for the `# TODO` comments to see which parts you will complete." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def ascvd(x_age,\n", " x_cho,\n", " x_hdl,\n", " x_sbp,\n", " x_smo,\n", " x_dia,\n", " verbose=False\n", " ):\n", " \"\"\"\n", " Atherosclerotic Cardiovascular Disease\n", " (ASCVD) Risk Estimator Plus\n", " \"\"\"\n", " \n", " # Define the coefficients\n", " b_age = 17.114\n", " b_cho = 0.94\n", " b_hdl = -18.92\n", " b_age_hdl = 4.475\n", " b_sbp = 27.82\n", " b_age_sbp = -6.087\n", " b_smo = 0.691\n", " b_dia = 0.874\n", " \n", " # Calculate the sum of the products of inputs and coefficients\n", " sum_prod = b_age * np.log(x_age) + \\\n", " b_cho * np.log(x_cho) + \\\n", " b_hdl * np.log(x_hdl) + \\\n", " b_age_hdl * np.log(x_age) * np.log(x_hdl) +\\\n", " b_sbp * np.log(x_sbp) +\\\n", " b_age_sbp * np.log(x_age) * np.log(x_sbp) +\\\n", " b_smo * x_smo + \\\n", " b_dia * x_dia\n", " \n", " if verbose:\n", " print(f\"np.log(x_age):{np.log(x_age):.2f}\")\n", " print(f\"np.log(x_cho):{np.log(x_cho):.2f}\")\n", " print(f\"np.log(x_hdl):{np.log(x_hdl):.2f}\")\n", " print(f\"np.log(x_age) * np.log(x_hdl):{np.log(x_age) * np.log(x_hdl):.2f}\")\n", " print(f\"np.log(x_sbp): {np.log(x_sbp):2f}\")\n", " print(f\"np.log(x_age) * np.log(x_sbp): {np.log(x_age) * np.log(x_sbp):.2f}\")\n", " print(f\"sum_prod {sum_prod:.2f}\")\n", " \n", " # TODO: Risk Score = 1 - (0.9533^( e^(sum - 86.61) ) )\n", " risk_score = 1 - np.power(0.9533,np.exp(sum_prod - 86.61)) \n", " \n", " return risk_score" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tmp_risk_score = ascvd(x_age=55,\n", " x_cho=213,\n", " x_hdl=50,\n", " x_sbp=120,\n", " x_smo=0,\n", " x_dia=0, \n", " verbose=True\n", " )\n", "print(f\"\\npatient's ascvd risk score is {tmp_risk_score:.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Expected output\n", "```CPP\n", "patient's ascvd risk score is 0.03\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", " Solution\n", "\n", "

\n", "\n", "risk_score = 1 - 0.9533**(np.exp(86.16-86.61))\n", "\n", "

\n", "\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Numpy and Pandas Operations\n", "\n", "In this exercise, you will load a small dataset and compare how pandas functions and numpy functions are slightly different. This exercise will help you when you pre-process the data in this week's assignment." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Import packages\n", "import numpy as np\n", "import pandas as pd\n", "\n", "# Import a predefined function that will generate data\n", "from utils import load_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# generate the features 'X' and labels 'y'\n", "X, y = load_data(100)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# View the first few rows and column names of the features data frame\n", "X.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#view the labels\n", "y.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### How does `.mean` differ from pandas and numpy?\n", "\n", "Even though you've likely used numpy and pandas before, it helps to pay attention to how they are slightly different in their default behaviors.\n", "\n", "See how calculating the mean using pandas differs a bit from when calculating the mean with numpy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pandas.DataFrame.mean\n", "\n", "Call the .mean function of the pandas DataFrame." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Call the .mean function of the data frame without choosing an axis\n", "print(f\"Pandas: X.mean():\\n{X.mean()}\")\n", "print()\n", "# Call the .mean function of the data frame, choosing axis=0\n", "print(f\"Pandas: X.mean(axis=0)\\n{X.mean(axis=0)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For pandas DataFrames:\n", "- By default, pandas treats each column separately. \n", "- You can also explicitly instruct the function to calculate the mean for each column by setting axis=0.\n", "- In both cases, you get the same result." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Numpy.ndarray.mean\n", "Compare this with what happens when you call `.mean` and the object is a numpy array.\n", "\n", "First store the tabular data into a numpy ndarray." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Store the data frame data into a numpy array\n", "X_np = np.array(X)\n", "\n", "# view the first 2 rows of the numpy array\n", "print(f\"First 2 rows of the numpy array:\\n{X_np[0:2,:]}\")\n", "print()\n", "\n", "# Call the .mean function of the numpy array without choosing an axis\n", "print(f\"Numpy.ndarray.mean: X_np.mean:\\n{X_np.mean()}\")\n", "print()\n", "# Call the .mean function of the numpy array, choosing axis=0\n", "print(f\"Numpy.ndarray.mean: X_np.mean(axis=0):\\n{X_np.mean(axis=0)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice how the default behavior of numpy.ndarray.mean differs.\n", "- By default, the mean is calculated for all values in the rows and columns. You get a single mean for the entire 2D array.\n", "- To explicitly calculate the mean for each column separately, you can set axis=0." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Question\n", "If you know that you want to calculate the mean for each column, how will you choose to call the .mean function if you want this to work for both pandas DataFrames and numpy arrays?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### This is the end of this practice section.\n", "\n", "Please continue on with the lecture videos!\n", "\n", "---" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 4 }