{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Estimation of COVID-19 pandemic\n",
    "\n",
    "## Loading data\n",
    "\n",
    "We will use data on COVID-19 infected individuals, provided by the [Center for Systems Science and Engineering](https://systems.jhu.edu/) (CSSE) at [Johns Hopkins University](https://jhu.edu/). Dataset is available in [this GitHub Repository](https://github.com/CSSEGISandData/COVID-19)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pytest\n",
    "import ipytest\n",
    "import unittest\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "from pandas.testing import assert_frame_equal\n",
    "from pandas.testing import assert_series_equal\n",
    "\n",
    "ipytest.autoconfig()\n",
    "plt.rcParams[\"figure.figsize\"] = (10, 3)  # make figures larger"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can load the most recent data directly from GitHub using `pd.read_csv`. If for some reason the data is not available, you can always use the copy available locally in the `data` folder - just uncomment the line below that defines `base_url`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# base_url = \"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/\"  # loading from Internet\n",
    "base_url = \"../../assets/data/estimation-covid-19/\"  # loading from disk\n",
    "infected_dataset_url = base_url + \"time_series_covid19_confirmed_global.csv\"\n",
    "recovered_dataset_url = base_url + \"time_series_covid19_recovered_global.csv\"\n",
    "deaths_dataset_url = base_url + \"time_series_covid19_deaths_global.csv\"\n",
    "countries_dataset_url = base_url + \"UID_ISO_FIPS_LookUp_Table.csv\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's now load the data for infected individuals and see how the data looks like:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "infected = pd.read_csv(infected_dataset_url)\n",
    "infected.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that each row of the table defines the number of infected individuals for each country and/or province, and columns correspond to dates. Similar tables can be loaded for other data, such as number of recovered and number of deaths."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "recovered = pd.read_csv(recovered_dataset_url)\n",
    "deaths = pd.read_csv(deaths_dataset_url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Making sense of the data\n",
    "\n",
    "From the table above the role of province column is not clear. Let's see the different values that are present in `Province/State` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "infected[\"Province/State\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From the names we can deduce that countries like Australia and China have more detailed breakdown by provinces. Let's look for information on China to see the example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def column_filter(df, column_name, column_value):\n",
    "    \"\"\"\n",
    "    Filters a pandas DataFrame based on a column value.\n",
    "\n",
    "    Returns:\n",
    "        pandas.DataFrame: The filtered DataFrame.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column_name not in df.columns:\n",
    "        raise Exception(f\"{column_name} does not exist in df\")\n",
    "    return df[df[_______] == ______]\n",
    "\n",
    "column_filter(infected, \"Country/Region\", \"China\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\"numbers\": [1, 2, 3, 4, 5], \"bools\": [False, False, True, True, True]}\n",
    "    )\n",
    "\n",
    "\n",
    "class TestColumnFilter(unittest.TestCase):\n",
    "    def test_column_filter_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.DataFrame({\"numbers\": [3], \"bools\": [True]})\n",
    "\n",
    "        # act\n",
    "        result = column_filter(test_df, \"numbers\", 3)\n",
    "\n",
    "        # assert\n",
    "        assert result.reset_index(drop=True).equals(expected_result)\n",
    "        \n",
    "    def test_column_filter_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(None, \"numbers\", 3)\n",
    "    \n",
    "    def test_column_filter_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(pd.DataFrame(), \"numbers\", 3)\n",
    "    \n",
    "    def test_column_filter_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(1, \"numbers\", 3)\n",
    "    \n",
    "    def test_column_filter_with_invalid_column_name_type(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(test_df, 123, 3)\n",
    "\n",
    "    def test_column_filter_with_empty_column_name(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(test_df, \"\", 3)\n",
    "    \n",
    "    def test_column_filter_with_none_column_name(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            column_filter(test_df, None, 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to fill <code>column_name</code> and <code>column_value</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-processing the data \n",
    "\n",
    "We are not interested in breaking countries down to further territories, thus we would first get rid of this breakdown and add information on all territories together, to get info for the whole country. This can be done using `groupby`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def groupby_sum(df, column_name):\n",
    "    \"\"\"\n",
    "    Groups a column in a Pandas DataFrame and computes the sum of the values in each group.\n",
    "\n",
    "    Returns:\n",
    "        pd.DataFrame: A Pandas DataFrame containing the groupby and sum results.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column_name not in df.columns:\n",
    "        raise Exception(\"Column does not exist.\")\n",
    "    # Group and aggregate data\n",
    "    return df.________\n",
    "\n",
    "# Group and sum infected cases by country/region\n",
    "infected = ______\n",
    "# Group and sum recovered cases by country/region\n",
    "recovered = ______\n",
    "# Group and sum deaths cases by country/region\n",
    "deaths = ______\n",
    "\n",
    "infected.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame({\"c1\": [1, 1, 1, 2, 2], \"c2\": [6, 7, 8, 9, 10]})\n",
    "\n",
    "\n",
    "class TestGroupbySum(unittest.TestCase):\n",
    "    def test_groupby_sum_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expect_result = pd.DataFrame(data=[[21], [19]], index=[1, 2], columns=[\"c2\"])\n",
    "\n",
    "        # act\n",
    "        actual_result = groupby_sum(test_df, \"c1\")\n",
    "\n",
    "        # assert\n",
    "        assert_frame_equal(actual_result, expect_result, check_names=False)\n",
    "\n",
    "    def test_groupby_sum_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            groupby_sum(None, \"c1\")\n",
    "    \n",
    "    def test_groupby_sum_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            groupby_sum(pd.DataFrame(), \"c1\")\n",
    "    \n",
    "    def test_groupby_sum_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            groupby_sum(123, \"c1\")\n",
    "\n",
    "    def test_groupby_sum_with_invalid_column_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            groupby_sum(test_df, \"c100\")\n",
    "    \n",
    "    def test_groupby_sum_with_invalid_column_name_type(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter(test_df, 123)\n",
    "\n",
    "    def test_groupby_sum_with_empty_column_name(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter(test_df, \"\")\n",
    "    \n",
    "    def test_groupby_sum_with_none_column_name(self):\n",
    "        #assign\n",
    "        test_df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter(test_df, None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.groupby()</code> and <code>aggregation function sum()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that due to using `groupby` all DataFrames are now indexed by Country/Region. We can thus access the data for a specific country by using `.loc`:|"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_infected_vs_recovered(column_name):\n",
    "    infected.loc[column_name][2:].plot()\n",
    "    recovered.loc[column_name][2:].plot()\n",
    "    plt.show()\n",
    "\n",
    "plot_infected_vs_recovered(\"US\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Note** how we use `[2:]` to remove first two elements of a sequence that contain geolocation of a country. We can also drop those two columns altogether:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def drop_columns(df, columns):\n",
    "    \"\"\"\n",
    "    Drops the specified columns from a Pandas DataFrame.\n",
    "    \n",
    "    Returns:\n",
    "        df after dropping\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if columns is None or not isinstance(columns, list) or len(columns) == 0:\n",
    "        raise Exception(\"columns is not a valid list\")\n",
    "    if not set(columns).issubset(set(df.columns)):\n",
    "        raise Exception(\"columns contains invalid column names\")\n",
    "    return df._________\n",
    "\n",
    "# Dropping the \"Lat\" and \"Long\" columns from infected, recovered, deaths DataFrame.\n",
    "______\n",
    "______\n",
    "______"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"c1\": [1, 2, 3, 4, 5],\n",
    "            \"c2\": [6, 7, 8, 9, 10],\n",
    "            \"c3\": [11, 12, 13, 14, 15],\n",
    "            \"c4\": [16, 17, 18, 19, 20],\n",
    "        }\n",
    "    )\n",
    "\n",
    "class TestDropColumns(unittest.TestCase):\n",
    "    def test_drop_columns_with_empty_df(self):\n",
    "        # act\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(pd.DataFrame(), \"c1\")\n",
    "\n",
    "    def test_drop_columns_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.DataFrame(\n",
    "            {\n",
    "                \"c3\": [11, 12, 13, 14, 15],\n",
    "                \"c4\": [16, 17, 18, 19, 20],\n",
    "            }\n",
    "        )\n",
    "        # act\n",
    "        drop_columns(test_df, [\"c1\", \"c2\"])\n",
    "\n",
    "        # assert\n",
    "        assert_frame_equal(test_df, expected_result)\n",
    "\n",
    "    def test_drop_columns_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(None, \"c1\")\n",
    "\n",
    "    def test_drop_columns_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(123, \"c1\")\n",
    "    \n",
    "    def test_drop_columns_with_none_columns(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(test_df, None)\n",
    "    \n",
    "    def test_drop_columns_with_empty_columns(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(test_df, [])\n",
    "    \n",
    "    def test_drop_columns_with_invalid_columns_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(test_df, 123)\n",
    "    \n",
    "    def test_drop_columns_with_invalid_columns_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(test_df, [\"c1\", \"c100\"])\n",
    "\n",
    "    def test_drop_columns_with_invalid_columns_input(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            drop_columns(test_df, \"c1000\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.drop(columns=coulumns, inplace=True)</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Investigating the data\n",
    "\n",
    "Let's now switch to investigating a specific country. Let's create a frame that contains the data on infections indexed by date:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mkframe(infected_df, recovered_df, deaths_df, index_name):\n",
    "    \"\"\"\n",
    "    This function creates a new DataFrame by merging three input DataFrames and \n",
    "    converting the index to datetime format.\n",
    "\n",
    "    Returns:\n",
    "        pandas.DataFrame: A new DataFrame containing columns for infected, recovered, and deaths, \n",
    "        with the index converted to datetime format.\n",
    "    \"\"\"\n",
    "    if infected_df is None or not isinstance(infected_df, pd.DataFrame) or infected_df.empty:\n",
    "        raise Exception(\"invalid infected_df\")\n",
    "    if recovered_df is None or not isinstance(recovered_df, pd.DataFrame) or recovered_df.empty:\n",
    "        raise Exception(\"invalid recovered_df\")\n",
    "    if deaths_df is None or not isinstance(deaths_df, pd.DataFrame) or deaths_df.empty:\n",
    "        raise Exception(\"invalid deaths_df\")\n",
    "    if not isinstance(index_name, str) or index_name is None or not index_name.strip():\n",
    "        raise Exception(\"column_name is not a valid string\")    \n",
    "    if index_name not in infected_df.index:\n",
    "        raise Exception(f\"{index_name} does not exist in {infected_df}\")\n",
    "    if index_name not in recovered_df.index:\n",
    "        raise Exception(f\"{index_name} does not exist in {recovered_df}\")\n",
    "    if index_name not in deaths_df.index:\n",
    "        raise Exception(f\"{index_name} does not exist in {deaths_df}\")\n",
    "    df = pd.DataFrame(\n",
    "        {\n",
    "            # Select the row with index_name from three DataFrames\n",
    "            \"infected\": infected_df.______,\n",
    "            \"recovered\": recovered_df.______,\n",
    "            \"deaths\": deaths_df.______,\n",
    "        }\n",
    "    )\n",
    "    df.index = pd.to_datetime(df.index)\n",
    "    return df\n",
    "\n",
    "# Merge the three DataFrame infected, recovered, and deaths into a new DataFrame\n",
    "# and use the \"US\" column as the index of the new DataFrame.\n",
    "df = ______"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df_1():\n",
    "    return pd.DataFrame(\n",
    "        data=[[2, 5, 9], [3, 4, 10], [9, 9, 8]],\n",
    "        columns=[\"1/22/20\", \"1/23/20\", \"1/24/20\"],\n",
    "        index=[\"US\", \"UK\", \"FR\"],\n",
    "    )\n",
    "\n",
    "def create_test_df_2():\n",
    "    return pd.DataFrame(\n",
    "        data=[[9, 9, 8], [2, 5, 9], [3, 4, 10]],\n",
    "        columns=[\"1/22/20\", \"1/23/20\", \"1/24/20\"],\n",
    "        index=[\"US\", \"UK\", \"FR\"],\n",
    "    )\n",
    "\n",
    "def create_test_df_3():\n",
    "    return pd.DataFrame(\n",
    "        data=[[3, 4, 10], [9, 9, 8], [2, 5, 9]],\n",
    "        columns=[\"1/22/20\", \"1/23/20\", \"1/24/20\"],\n",
    "        index=[\"US\", \"UK\", \"FR\"],\n",
    "    )\n",
    "\n",
    "class TestMkframe(unittest.TestCase):\n",
    "    def test_mkframe_happy_case(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "        expected_result = pd.DataFrame(\n",
    "            data=[[2, 9, 3], [5, 9, 4], [9, 8, 10]],\n",
    "            columns=[\"infected\", \"recovered\", \"deaths\"],\n",
    "            index=[\"2020-01-22\", \"2020-01-23\", \"2020-01-24\"],\n",
    "        )\n",
    "        expected_result.index = pd.to_datetime(expected_result.index)\n",
    "\n",
    "        # act\n",
    "        test_df = mkframe(test_df_1, test_df_2, test_df_3, \"US\")\n",
    "\n",
    "        # assert\n",
    "        assert_frame_equal(test_df, expected_result)\n",
    "\n",
    "    def test_mkframe_with_none_df_1(self):\n",
    "        # assign\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(None, test_df_2, test_df_3, \"US\")\n",
    "\n",
    "    def test_mkframe_with_none_df_2(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, None, test_df_3, \"US\")\n",
    "\n",
    "    def test_mkframe_with_none_df_3(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, None, \"US\")\n",
    "    \n",
    "    def test_mkframe_with_empty_df_1(self):\n",
    "        # assign\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(pd.DataFrame(), test_df_2, test_df_3, \"US\")\n",
    "    \n",
    "    def test_mkframe_with_empty_df_2(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, pd.DataFrame(), test_df_3, \"US\")\n",
    "    \n",
    "    def test_mkframe_with_empty_df_3(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, pd.DataFrame(), \"US\")\n",
    "    \n",
    "    def test_mkframe_with_invalid_df_1_type(self):\n",
    "        # assign\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(123, test_df_2, test_df_3, \"US\")\n",
    "    \n",
    "    def test_mkframe_with_invalid_df_2_type(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, 123, test_df_3, \"US\")\n",
    "    \n",
    "    def test_mkframe_with_invalid_df_1_type(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, 123, \"US\")\n",
    "\n",
    "    def test_mkframe_with_invalid_column_name(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, test_df_3, \"China\")\n",
    "    \n",
    "    def test_mkframe_with_empty_column_name(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, test_df_3, \"\")\n",
    "    \n",
    "    def test_mkframe_with_none_column_name(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, test_df_3, None)\n",
    "\n",
    "    def test_mkframe_with_invalid_column_type(self):\n",
    "        # assign\n",
    "        test_df_1 = create_test_df_1()\n",
    "        test_df_2 = create_test_df_2()\n",
    "        test_df_3 = create_test_df_3()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            mkframe(test_df_1, test_df_2, test_df_3, 123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.loc[]</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's compute the number of new infected people each day. This will allow us to see the speed at which pandemic progresses. The easiest day to do it is to use `diff`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def append_diff_column(df, new_column, column_to_diff):\n",
    "    \"\"\"\n",
    "    Append a new column to a dataframe, where the values in the new column are calculated as the difference\n",
    "    between consecutive values in an existing column.\n",
    "\n",
    "    Returns:\n",
    "        pandas.Series: The newly created column containing the differences between consecutive values\n",
    "        in the original column.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column_to_diff not in df.columns:\n",
    "        raise Exception(\"column_name_to_diff not exist in df\")\n",
    "    if new_column is None or not isinstance(new_column, str) or not new_column.strip():\n",
    "        raise Exception(\"new_column is not a valid string\")\n",
    "    if column_to_diff is None or not isinstance(column_to_diff, str) or not column_to_diff.strip():\n",
    "        raise Exception(\"column_to_diff is not a valid string\")\n",
    "    # The values in the new_column are calculated as the difference between consecutive values in column_to_diff\n",
    "    df[new_column] = df[______].______\n",
    "    return df[new_column]\n",
    "\n",
    "# Add a new column \"ninfected\" diffed by \"infected\" column to the DataFrame \"df\", and display the plot\n",
    "______.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"date\": [\n",
    "                \"2022-01-01\",\n",
    "                \"2022-01-02\",\n",
    "                \"2022-01-03\",\n",
    "                \"2022-01-04\",\n",
    "                \"2022-01-05\",\n",
    "                \"2022-01-06\",\n",
    "            ],\n",
    "            \"column1\": [1, 2, 4, 6, 9, 13],\n",
    "            \"column2\": [1, 3, 6, 10, 15, 21],\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "class TestAppendDiffColumn(unittest.TestCase):\n",
    "    def test_append_diff_column_happy_case(self):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "        expected_result = pd.Series(\n",
    "            [np.nan, 1.0, 2.0, 2.0, 3.0, 4.0], name=\"new_column\"\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        actual_result = append_diff_column(\n",
    "            df, \"new_column\", \"column1\"\n",
    "        )\n",
    "\n",
    "        # assert\n",
    "        assert_series_equal(actual_result, expected_result)\n",
    "\n",
    "    def test_append_diff_column_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                pd.DataFrame(), \"new_column\", \"column_to_diff\"\n",
    "            )\n",
    "\n",
    "    def test_append_diff_column_with_none_df(self):\n",
    "        # act\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                None, \"new_column\", \"column1\"\n",
    "            )\n",
    "\n",
    "    def test_append_diff_column_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                \"invalid_df\", \"new_column\", \"column_to_diff\"\n",
    "            )\n",
    "    \n",
    "    def test_append_diff_column_with_invalid_new_column_type(self):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, 123, \"column_to_diff\"\n",
    "            )   \n",
    "\n",
    "    def test_append_diff_column_with_none_new_column(self):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, None, \"column_to_diff\"\n",
    "            ) \n",
    "    \n",
    "    def test_append_diff_column_with_empty_new_column(self):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "        \n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, \"\", \"column_to_diff\"\n",
    "            ) \n",
    "\n",
    "    def test_append_diff_column_with_none_column_to_diff(\n",
    "        self,\n",
    "    ):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, \"new_column\", None\n",
    "            )\n",
    "\n",
    "    def test_append_diff_column_with_empty_column_to_diff(\n",
    "        self,\n",
    "    ):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, \"new_column\", \"\"\n",
    "            )\n",
    "\n",
    "    def test_append_diff_column_with_invalid_column_to_diff_name(\n",
    "        self,\n",
    "    ):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, \"new_column\", \"invalid_column\"\n",
    "            )\n",
    "\n",
    "    def test_append_diff_column_with_invalid_column_to_diff_type(\n",
    "        self,\n",
    "    ):\n",
    "        # assign\n",
    "        df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            append_diff_column(\n",
    "                df, \"new_column\", 123\n",
    "            )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.diff()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see high fluctuations in data. Let's look closer at one of the months:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def filter_ninfected_by_year_and_month(df, year, month):\n",
    "    \"\"\"\n",
    "    Filter a DataFrame by year and month, and return a column.\n",
    "\n",
    "    Returns:\n",
    "        pandas.Series: A Series object containing the filtered \"ninfected\" column.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if year is None or not isinstance(year, int) or year < 0:\n",
    "        raise Exception(\"invalid year\")\n",
    "    if month is None or not isinstance(month, int) or month > 13 or month < 0:\n",
    "        raise Exception(\"invalid month\")\n",
    "    return df[______ & ______][\"ninfected\"]\n",
    "\n",
    "filter_ninfected_by_year_and_month(df, 2020, 7).plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    test_df = pd.DataFrame(\n",
    "        data=[[2, 9, 3, None], [5, 9, 4, 3], [9, 8, 10, 4]],\n",
    "        columns=[\"infected\", \"recovered\", \"deaths\", \"ninfected\"],\n",
    "        index=[\"2020-01-22\", \"2020-01-23\", \"2020-01-24\"],\n",
    "    )\n",
    "    test_df.index = pd.to_datetime(test_df.index)\n",
    "    return test_df\n",
    "\n",
    "\n",
    "class TestFilterNinfectedByYearAndMonth(unittest.TestCase):\n",
    "    def test_filter_ninfected_by_year_and_month_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.Series(\n",
    "            [None, 3, 4],\n",
    "            index=pd.to_datetime([\"2020-01-22\", \"2020-01-23\", \"2020-01-24\"]),\n",
    "            name=\"ninfected\",\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        result = filter_ninfected_by_year_and_month(test_df, 2020, 1)\n",
    "\n",
    "        # assert\n",
    "        assert result.equals(expected_result)\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(None, 2020, 1)\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(pd.DataFrame, 2020, 1)\n",
    "    \n",
    "    def test_filter_ninfected_by_year_and_month_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(123, 2020, 1)\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_none_year(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, None, 1)\n",
    "    \n",
    "    def test_filter_ninfected_by_year_and_month_with_invalid_year_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, \"invalid_year_type\", 1)\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_invalid_year_number(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, -10000, 1)\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_none_month(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, 2020, None)\n",
    "    \n",
    "    def test_filter_ninfected_by_year_and_month_with_invalid_month_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, 2020, \"invalid_month_type\")\n",
    "\n",
    "    def test_filter_ninfected_by_year_and_month_with_invalid_year_number(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            filter_ninfected_by_year_and_month(test_df, 2020, 10000)\n",
    "\n",
    "    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.index</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It clearly looks like there are weekly fluctuations in data. Because we want to be able to see the trends, it makes sense to smooth out the curve by computing running average (i.e. for each day we will compute the average value of the previous several days):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_rolling_window(df, column, window):\n",
    "    \"\"\"\n",
    "    Returns a rolling window object of the specified column with the specified window size.\n",
    "    \n",
    "    Returns:\n",
    "        A rolling window object of the specified column with the specified window size.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column not in df.columns:\n",
    "        raise Exception(\"invalid column\")\n",
    "    if window is None or not isinstance(window, int) or window <= 0 or window >= len(df.index):\n",
    "        raise Exception(\"invalid window\")\n",
    "    # Calculate the moving average\n",
    "    return ______\n",
    "\n",
    "# Calculate the rolling window with a window size of 7 on the 'ninfected' column, \n",
    "# then calculate the mean\n",
    "df[\"ninfav\"] = ______\n",
    "df[\"ninfav\"].plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "class TestGetRollingWindow(unittest.TestCase):\n",
    "    def test_get_rolling_window_happy_case(self):\n",
    "        # assign\n",
    "        test_df = pd.DataFrame({\n",
    "            'a': [1, 2, 3, 4, 5],\n",
    "            'b': [5, 4, 3, 2, 1]\n",
    "        })\n",
    "\n",
    "        # act\n",
    "        result = get_rolling_window(test_df, 'a', 3)\n",
    "\n",
    "        # assert\n",
    "        assert isinstance(result, pd.core.window.Rolling)\n",
    "\n",
    "    def test_get_rolling_window_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(None, 'a', 3)\n",
    "\n",
    "    def test_get_rolling_window_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(pd.DataFrame(), 'a', 3)\n",
    "\n",
    "    def test_get_rolling_window_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(123, 'a', 3)\n",
    "\n",
    "    def test_get_rolling_window_with_none_column(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, None, 3)\n",
    "\n",
    "    def test_get_rolling_window_with_invalid_column_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 123, 3)\n",
    "\n",
    "    def test_get_rolling_window_with_invalid_column_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 'c', 3)\n",
    "    \n",
    "    def test_get_rolling_window_with_empty_column(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"\", 3)\n",
    "\n",
    "    def test_get_rolling_window_with_none_window(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 'a', None)\n",
    "\n",
    "    def test_get_rolling_window_with_invalid_window_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"infected\", \"invalid_window_type\")\n",
    "\n",
    "    def test_get_rolling_window_with_negative_window(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"infected\", -10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to select a column and use <code>pandas.DataFrame.rolling(window)</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to be able to compare several countries, we might want to take the country's population into account, and compare the percentage of infected individuals with respect to country's population. In order to get country's population, let's load the dataset of countries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "countries = pd.read_csv(countries_dataset_url)\n",
    "countries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Because this dataset contains information on both countries and provinces, to get the population of the whole country we need to be a little bit clever: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def filter_by_country_region(df, countries_and_region):\n",
    "    \"\"\"\n",
    "    Filter the DataFrame by the given countries_and_region name and return rows with NaN Province_State.\n",
    "\n",
    "    Returns:\n",
    "        pandas DataFrame: the filtered DataFrame\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if countries_and_region not in df[\"Country_Region\"].unique():\n",
    "        raise Exception(\"countries_and_region name is wrong.\")\n",
    "    # Missing values are checked and processed quickly.\n",
    "    return df[\n",
    "        (df[\"Country_Region\"] == ______) & df[\"Province_State\"].______\n",
    "    ]\n",
    "\n",
    "filter_by_country_region(countries, \"US\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"Country_Region\": [\"US\", \"US\", \"UK\", \"FR\", \"JP\"],\n",
    "            \"Province_State\": [None, \"California\", None, None, \"Tokyo\"],\n",
    "            \"Confirmed\": [100, 50, 70, 80, 90],\n",
    "            \"Deaths\": [10, 5, 7, 8, 9],\n",
    "            \"Recovered\": [20, 10, 14, 16, 18],\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "class TestFilterByCountryRegion(unittest.TestCase):\n",
    "    def test_filter_by_country_region_happy_case(self):\n",
    "        # assign\n",
    "\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.DataFrame(\n",
    "            {\n",
    "                \"Country_Region\": [\"US\"],\n",
    "                \"Province_State\": [None],\n",
    "                \"Confirmed\": [100],\n",
    "                \"Deaths\": [10],\n",
    "                \"Recovered\": [20],\n",
    "            }\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        actual_result = filter_by_country_region(test_df, \"US\")\n",
    "\n",
    "        # assert\n",
    "        assert_frame_equal(expected_result, actual_result)\n",
    "\n",
    "    def test_filter_by_country_region_without_None_Province_State(self):\n",
    "        # arrange\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act\n",
    "        result = filter_by_country_region(test_df, \"JP\")\n",
    "\n",
    "        # assert\n",
    "        assert result.empty\n",
    "\n",
    "    def test_filter_by_country_region_with_wrong_country_region_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(test_df, \"Wrong_name\")\n",
    "\n",
    "    def test_filter_by_country_region_with_none_df(self):\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(None, \"US\")\n",
    "\n",
    "    def test_filter_by_country_region_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(pd.DataFrame(), \"US\")\n",
    "\n",
    "    def test_filter_by_country_region_with_none_country_region_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(test_df, None)\n",
    "\n",
    "    def test_filter_by_country_region_with_empty_country_region_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(test_df, \"\")\n",
    "\n",
    "    def test_filter_by_country_region_with_invalid_country_region_name_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with self.assertRaises(Exception):\n",
    "            filter_by_country_region(test_df, 123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to select a certain column and use <code>pandas.DataFrame.isna()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_pinfected(df):\n",
    "    \"\"\"\n",
    "    Computes the percentage of infected people in a given DataFrame `df`.\n",
    "\n",
    "    Returns:\n",
    "        pandas.Series: A new Series containing the percentage of infected people in the input DataFrame.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    pop = ______(countries, \"US\")[\"Population\"].______\n",
    "    return df[\"infected\"] * 100 / pop\n",
    "\n",
    "df[\"pinfected\"] = get_pinfected(df)\n",
    "df[\"pinfected\"].plot(figsize=(10, 3))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"Country_Region\": [\"US\", \"US\", \"Canada\", \"Canada\"],\n",
    "            \"Province_State\": [\"California\", \"New York\", \"Ontario\", \"Quebec\"],\n",
    "            \"Population\": [10000, 20000, 30000, 40000],\n",
    "            \"infected\": [1000, 2000, 3000, 4000],\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "class TestGetPinfected(unittest.TestCase):\n",
    "    def test_get_pinfected_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.Series(\n",
    "            [\n",
    "                0.00030352119521741776,\n",
    "                0.0006070423904348355,\n",
    "                0.0009105635856522532,\n",
    "                0.001214084780869671,\n",
    "            ],\n",
    "            name=\"infected\",\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        actual_result = get_pinfected(test_df)\n",
    "\n",
    "        # assert\n",
    "        assert_series_equal(expected_result, actual_result, rtol=1e-3)\n",
    "\n",
    "    def test_get_pinfected_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_pinfected(None)\n",
    "\n",
    "    def test_get_pinfected_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_pinfected(pd.DataFrame())\n",
    "    \n",
    "    def test_get_pinfected_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_pinfected(123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use the function defined before <code>filter_by_country_region()</code> and use<code>pandas.DataFrame.iloc[]</code> to select the first series number.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Computing $R_t$\n",
    "\n",
    "To see how infectious is the disease, we look at the **basic reproduction number** $R_0$, which indicated the number of people that an infected person would further infect. When $R_0$ is more than 1, the epidemic is likely to spread.\n",
    "\n",
    "$R_0$ is a property of the disease itself, and does not take into account some protective measures that people may take to slow down the pandemic. During the pandemic progression, we can estimate the reproduction number $R_t$ at any given time $t$. It has been shown that this number can be roughly estimated by taking a window of 8 days, and computing $$R_t=\\frac{I_{t-7}+I_{t-6}+I_{t-5}+I_{t-4}}{I_{t-3}+I_{t-2}+I_{t-1}+I_t}$$\n",
    "where $I_t$ is the number of newly infected individuals on day $t$.\n",
    "\n",
    "Let's compute $R_t$ for our pandemic data. To do this, we will take a rolling window of 8 `ninfected` values, and apply the function to compute the ratio above:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_rt(df, column_name, window):\n",
    "    \"\"\"\n",
    "    Calculate the Rt value of a given column in a DataFrame, using a rolling window.\n",
    "\n",
    "    Returns:\n",
    "        pandas.Series: A series containing the calculated Rt values.\n",
    "    \"\"\" \n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column_name not in df.columns:\n",
    "        raise Exception(\"invalid column\")\n",
    "    if window is None or not isinstance(window, int) or window <= 0 or window >= len(df.index):\n",
    "        raise Exception(\"invalid window\")\n",
    "    # Calculate Rt using a rolling window and a lambda function to sum the values\n",
    "    # from the fourth day of the window onwards, and divide by the sum of the values\n",
    "    # up to the third day of the window.\n",
    "    df[\"Rt\"] = get_rolling_window(df, column_name, window).apply(\n",
    "        ______ x: x[4:].______ / x[:4].______\n",
    "    )\n",
    "    return df[\"Rt\"]\n",
    "\n",
    "get_rt(df, \"ninfected\", 8)\n",
    "df[\"Rt\"].plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\n",
    "            \"date\": pd.date_range(\"2022-01-01\", periods=18),\n",
    "            \"infected\": [\n",
    "                10,\n",
    "                15,\n",
    "                20,\n",
    "                30,\n",
    "                35,\n",
    "                40,\n",
    "                45,\n",
    "                50,\n",
    "                55,\n",
    "                60,\n",
    "                70,\n",
    "                80,\n",
    "                90,\n",
    "                100,\n",
    "                110,\n",
    "                120,\n",
    "                130,\n",
    "                140,\n",
    "            ],\n",
    "        }\n",
    "    )\n",
    "\n",
    "\n",
    "class TestGetRt(unittest.TestCase):\n",
    "    def test_get_rt_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_output = pd.Series(\n",
    "            [\n",
    "                None,\n",
    "                None,\n",
    "                None,\n",
    "                None,\n",
    "                None,\n",
    "                None,\n",
    "                None,\n",
    "                2.2666666666666666,\n",
    "                1.9,\n",
    "                1.68,\n",
    "                1.5666666666666667,\n",
    "                1.5588235294117647,\n",
    "                1.5789473684210527,\n",
    "                1.619047619047619,\n",
    "                1.6170212765957446,\n",
    "                1.5849056603773586,\n",
    "                1.5333333333333334,\n",
    "                1.4705882352941178,\n",
    "            ],\n",
    "            dtype=np.float64,\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        result = get_rt(test_df, \"infected\", 8)\n",
    "\n",
    "        # assert\n",
    "        assert_series_equal(\n",
    "            result, expected_output, rtol=0.001, check_dtype=False, check_names=False\n",
    "        )\n",
    "\n",
    "    def test_get_rt_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(None, 'a', 3)\n",
    "\n",
    "    def test_get_rt_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(pd.DataFrame(), 'a', 3)\n",
    "\n",
    "    def test_get_rt_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(123, 'a', 3)\n",
    "\n",
    "    def test_get_rt_with_none_column(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, None, 3)\n",
    "\n",
    "    def test_get_rt_with_invalid_column_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 123, 3)\n",
    "\n",
    "    def test_get_rt_with_invalid_column_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 'c', 3)\n",
    "    \n",
    "    def test_get_rt_with_empty_column(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"\", 3)\n",
    "\n",
    "    def test_get_rt_with_none_window(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, 'a', None)\n",
    "\n",
    "    def test_get_rt_with_invalid_window_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"infected\", \"invalid_window_type\")\n",
    "\n",
    "    def test_get_rt_with_negative_window(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_rolling_window(test_df, \"infected\", -10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>lambda</code> and <code>sum()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that there are some gaps in the graph. Those can be caused by either `NaN`, if  `inf` values being present in the dataset. `inf` may be caused by division by 0, and `NaN` can indicate missing data, or no data available to compute the result (like in the very beginning of our frame, where rolling window of width 8 is not yet available). To make the graph nicer, we need to fill those values using `replace` and `fillna` function.\n",
    "\n",
    "Let's further look at the beginning of the pandemic. We will also limit the y-axis values to show only values below 6, in order to see better, and draw horizontal line at 1."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def rt_with_na_filled(df):\n",
    "    \"\"\"\n",
    "    Calculate Rt with NA filled.\n",
    "    \n",
    "    Returns:\n",
    "        A pandas Series object that contains Rt values with missing values (NaN) filled using the last non-missing value.\n",
    "    \"\"\"\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    # Filter out the data after May 1st 2020, replace infinite values with NaN, \n",
    "    # and fill the missing values using the last non-missing value.\n",
    "    return (\n",
    "        df[df.index < \"2020-05-01\"][\"Rt\"].______(np.inf, np.nan).______(method=\"pad\")\n",
    "    )\n",
    "\n",
    "ax = rt_with_na_filled(df).plot(figsize=(10, 3))\n",
    "ax.set_ylim([0, 6])\n",
    "ax.axhline(1, linestyle=\"--\", color=\"red\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    return pd.DataFrame(\n",
    "        {\"Rt\": [1.5, np.inf, 1.2, np.inf]},\n",
    "        index=pd.to_datetime([\"2020-04-29\", \"2020-04-30\", \"2020-05-01\", \"2020-05-02\"]),\n",
    "    )\n",
    "\n",
    "\n",
    "class TestRtWithNaFilled(unittest.TestCase):\n",
    "    def test_rt_with_na_filled_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.Series(\n",
    "            [1.5, 1.5], index=pd.to_datetime([\"2020-04-29\", \"2020-04-30\"]), name=\"Rt\"\n",
    "        )\n",
    "        # act\n",
    "        result = rt_with_na_filled(test_df)\n",
    "\n",
    "        # assert\n",
    "        pd.testing.assert_series_equal(result, expected_result)\n",
    "\n",
    "    def test_rt_with_na_filled_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            rt_with_na_filled(None)\n",
    "\n",
    "    def test_rt_with_na_filled_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            rt_with_na_filled(pd.DataFrame())\n",
    "    \n",
    "    def test_rt_with_na_filled_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            rt_with_na_filled(123)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.replace()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another interesting indicator of the pandemic is the **derivative**, or **daily difference** in new cases. It allows us to see clearly when pandemic is increasing or declining. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_df_column_diff(df, column_name):\n",
    "    if df is None or not isinstance(df, pd.DataFrame) or df.empty:\n",
    "        raise Exception(\"df is not a valid DataFrame\")\n",
    "    if column_name not in df.columns:\n",
    "        raise Exception(\"invalid column\")\n",
    "    # Calculate the difference between the current and the previous row's values for the given column\n",
    "    return df[column_name].______\n",
    "\n",
    "diff_series = get_df_column_diff(df, \"ninfected\")\n",
    "diff_series.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "    test_df = pd.DataFrame(\n",
    "        {\n",
    "            \"date\": [\n",
    "                \"2022-01-01\",\n",
    "                \"2022-01-02\",\n",
    "                \"2022-01-03\",\n",
    "                \"2022-01-04\",\n",
    "                \"2022-01-05\",\n",
    "                \"2022-01-06\",\n",
    "            ],\n",
    "            \"ninfected\": [100, 110, 120, 130, 140, 150],\n",
    "        }\n",
    "    )\n",
    "    test_df[\"date\"] = pd.to_datetime(test_df[\"date\"])\n",
    "    test_df.set_index(\"date\", inplace=True)\n",
    "    return test_df\n",
    "\n",
    "\n",
    "class TestGetDfColumnDiff(unittest.TestCase):\n",
    "    def test_get_df_column_diff_happy_case(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_diff = pd.Series(\n",
    "            [None, 10, 10, 10, 10, 10],\n",
    "            index=pd.to_datetime(\n",
    "                [\n",
    "                    \"2022-01-01\",\n",
    "                    \"2022-01-02\",\n",
    "                    \"2022-01-03\",\n",
    "                    \"2022-01-04\",\n",
    "                    \"2022-01-05\",\n",
    "                    \"2022-01-06\",\n",
    "                ]\n",
    "            ),\n",
    "        )\n",
    "\n",
    "        # act\n",
    "        column_diff = get_df_column_diff(test_df, \"ninfected\")\n",
    "\n",
    "        # assert\n",
    "        assert_series_equal(\n",
    "            column_diff, expected_diff, check_dtype=False, check_names=False\n",
    "        )\n",
    "\n",
    "    def test_get_df_column_diff_with_none_df(Self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(None, \"ninfected\")\n",
    "\n",
    "    def test_get_df_column_diff_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(pd.DataFrame(), \"ninfected\")\n",
    "    \n",
    "    def test_get_df_column_diff_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(123, \"ninfected\")\n",
    "\n",
    "    def test_get_df_column_diff_with_invalid_column_name(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(test_df, \"invalid_column_name\")\n",
    "    \n",
    "    def test_get_df_column_diff_with_none_column_name(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(test_df, None)\n",
    "\n",
    "    def test_get_df_column_diff_with_none_column_type(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(test_df, 123)\n",
    "    \n",
    "    def test_get_df_column_diff_with_invalid_column_name(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(test_df, \"invalid_column_name\")\n",
    "    \n",
    "    def test_get_df_column_diff_with_empty_column(Self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_df_column_diff(test_df, \"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.diff()</code> \n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given the fact that there are a lot of fluctuations in data caused by reporting, it makes sense to smooth the curve by running rolling average to get the overall picture. Let's again focus on the first months of the pandemic:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_smoothed_ax(df, column_name, datetime, window):\n",
    "    \"\"\"\n",
    "    Returns a rolling mean of the diff of a column in a DataFrame up to a specific datetime.\n",
    "   \n",
    "    Returns:\n",
    "        pandas Series with the smoothed values\n",
    "    \"\"\"\n",
    "    if df is None:\n",
    "        raise Exception(\"df cannot be None\")\n",
    "    if df.empty:\n",
    "        raise Exception(\"df cannot be empty\")\n",
    "    if column_name not in df.columns:\n",
    "        raise Exception(\"column not exist\")\n",
    "    # Filter the DataFrame to only include rows up to the datetime\n",
    "    df_filtered = df[______]\n",
    "    df_diff = df_filtered[column_name].diff()\n",
    "    # Calculate the rolling mean of the diff\n",
    "    df_rolling_mean = df_diff.rolling(window).______\n",
    "    return df_rolling_mean\n",
    "\n",
    "df_rolling_mean = get_smoothed_ax(df, \"ninfected\", \"2020-06-01\", 7)\n",
    "ax = df_rolling_mean.plot()\n",
    "ax.axhline(0, linestyle=\"-.\", color=\"red\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5><font color=blue>Check result by executing below... 📝</font></h5>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "jupyter": {
     "source_hidden": true
    },
    "tags": [
     "hide-input"
    ]
   },
   "outputs": [],
   "source": [
    "%%ipytest -qq\n",
    "\n",
    "def create_test_df():\n",
    "        test_df = pd.DataFrame(\n",
    "            data=[[2, 9, 3, None], [5, 9, 4, 3], [9, 8, 10, 4]],\n",
    "            columns=[\"infected\", \"recovered\", \"deaths\", \"ninfected\"],\n",
    "            index=[\"2020-01-22\", \"2020-01-23\", \"2020-01-24\"],\n",
    "        )\n",
    "        test_df.index = pd.to_datetime(test_df.index)\n",
    "        return test_df\n",
    "\n",
    "class TestGetSmoothedAx(unittest.TestCase):\n",
    "    def test_get_smoothed_ax_happy_case(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "        expected_result = pd.Series([None, None, 1], index=test_df.index[0:], name=\"ninfected\")\n",
    "\n",
    "        # act\n",
    "        result = get_smoothed_ax(test_df, \"ninfected\", \"2020-01-25\", 1)\n",
    "\n",
    "        # assert\n",
    "        assert result.equals(expected_result)\n",
    "\n",
    "    def test_get_smoothed_ax_with_none_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(None, \"ninfected\", \"2020-01-24\", 2)\n",
    "\n",
    "    def test_get_smoothed_ax_with_empty_df(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(pd.DataFrame, \"ninfected\", \"2020-01-24\", 2)\n",
    "    \n",
    "    def test_get_smoothed_ax_with_invalid_df_type(self):\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(123, \"ninfected\", \"2020-01-24\", 2)\n",
    "\n",
    "    def test_get_smoothed_ax_with_none_column_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, None, \"2020-01-24\", 2)\n",
    "\n",
    "    def test_get_smoothed_ax_with_invalid_column_name_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, 123, \"2020-01-24\", 2)\n",
    "    \n",
    "    def test_get_smoothed_ax_with_nonexistent_column(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, \"nonexistent_column\", \"2020-01-24\", 2)\n",
    "\n",
    "    def test_get_smoothed_ax_with_empty_column_name(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, \"\", \"2020-01-24\", 2)\n",
    "\n",
    "    def test_get_smoothed_ax_with_invalid_window_type(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, \"ninfected\", \"2020-01-24\", \"invalid_window_type\")\n",
    "    \n",
    "    def test_get_smoothed_ax_with_none_window(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, \"ninfected\", \"2020-01-24\", None)\n",
    "    \n",
    "    def test_get_smoothed_ax_with_invalid_window_number(self):\n",
    "        # assign\n",
    "        test_df = create_test_df()\n",
    "\n",
    "        # act & assert\n",
    "        with pytest.raises(Exception):\n",
    "            get_smoothed_ax(test_df, \"ninfected\", \"2020-01-24\", -1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "    \n",
    "<details><summary>👩‍💻 <b>Hint</b></summary>\n",
    "\n",
    "You can consider to use <code>pandas.DataFrame.index</code> and <code>mean()</code>.\n",
    "\n",
    "</details>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Challenge\n",
    "\n",
    "Now it is time for you to play more with the code and data! Here are a few suggestions you can experiment with:\n",
    "* See the spread of the pandemic in different countries.\n",
    "* Plot $R_t$ graphs for several countries on one plot for comparison, or make several plots side-by-side\n",
    "* See how the number of deaths and recoveries correlate with number of infected cases.\n",
    "* Try to find out how long a typical disease lasts by visually correlating infection rate and deaths rate and looking for some anomalies. You may need to look at different countries to find that out.\n",
    "* Calculate the fatality rate and how it changes over time. You may want to take into account the length of the disease in days to shift one time series before doing calculations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References\n",
    "\n",
    "You may look at further studies of COVID epidemic spread in the following publications:\n",
    "* [Sliding SIR Model for Rt Estimation during COVID Pandemic](https://soshnikov.com/science/sliding-sir-model-for-rt-estimation/), blog post by [Dmitry Soshnikov](http://soshnikov.com)\n",
    "* T.Petrova, D.Soshnikov, A.Grunin. [Estimation of Time-Dependent Reproduction Number for Global COVID-19 Outbreak](https://www.preprints.org/manuscript/202006.0289/v1). *Preprints* **2020**, 2020060289 (doi: 10.20944/preprints202006.0289.v1)\n",
    "* [Code for the above paper on GitHub](https://github.com/shwars/SlidingSIR)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Acknowledgments\n",
    "\n",
    "Thanks to Microsoft for creating the open-source course [Data Science for Beginners](https://github.com/microsoft/Data-Science-For-Beginners). It inspires the majority of the content in this chapter.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.16"
  },
  "vscode": {
   "interpreter": {
    "hash": "aee8b7b246df8f9039afb4144a1f6fd8d2ca17a180786b69acc140d282b71a49"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}