{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Small diabetes study\n", "\n", "In this assignment, we will work with a small dataset of diabetes patients taken from [here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html).\n", "\n", "\n", "## Introduction to probability and statistics" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib\n", "import pytest\n", "import ipytest\n", "import unittest\n", "\n", "ipytest.autoconfig()\n", "\n", "df = pd.read_csv(\"../../assets/data/diabetes.tsv\",sep='\\t')\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "In this dataset, columns as the following:\n", "* Age and sex are self-explanatory\n", "* BMI is body mass index\n", "* BP is average blood pressure\n", "* S1 through S6 are different blood measurements\n", "* Y is the qualitative measure of disease progression over one year\n", "\n", "Let's study this dataset using methods of probability and statistics.\n", "\n", "### Task 1: Compute mean values and variance for all values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_df_mean(df):\n", " if df is None:\n", " raise Exception('df cannot be None.')\n", " return df____\n", "\n", "def get_df_std(df):\n", " if df is None:\n", " raise Exception('df cannot be None.')\n", " return df____\n", "\n", "df_mean = get_df_mean(df)\n", "df_std = get_df_std(df)\n", "\n", "print(df_mean, df_std)" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "
pandas.DataFrame.mean and pandas.DataFrame.std.\n",
"\n",
"pandas.DataFrame.boxplot.\n",
"\n",
"pandas.DataFrame.plot.\n",
"\n",
"pandas.DataFrame.corr.\n",
"\n",
"pandas.DataFrame.corrwith to get the correlation, and pandas.DataFrame.plot.scatter to plot the scatterplots.\n",
"\n",
"