{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Covid detection in Ireland\n", "Statistics on detected spread of Coronavirus in Ireland. \n", "The dictionary below shows several pages from the source at https://www.worldometers.info/coronavirus/#countries.\n", "To chose a country other than Ireland, use 'country = key' in line 2 below.\n", "You can add to the list by exploring the sibling pages and adding their URLs to the dictionary below." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Locate and frame data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Choose country here...\n", "country = 'Ireland'\n", "\n", "# From this dictionary\n", "countries = {\n", "\t'Ireland': \t\t'https://www.worldometers.info/coronavirus/country/ireland/',\n", "\t'UK': \t\t\t'https://www.worldometers.info/coronavirus/country/uk/',\n", "\t'USA': \t\t\t'https://www.worldometers.info/coronavirus/country/us/',\n", "\t'Italy': \t\t'https://www.worldometers.info/coronavirus/country/italy/',\n", "\t'South Korea': \t'https://www.worldometers.info/coronavirus/country/south-korea',\n", "\t'Poland':\t\t'https://www.worldometers.info/coronavirus/country/poland/',\n", "\t'China':\t\t'https://www.worldometers.info/coronavirus/country/china/'\n", "}\n", "# You can add to this. The parent page is https://www.worldometers.info/coronavirus/#countries\n", "# Select the country, and paste the key: Url pair above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we scrape data from the page source" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'BeautifulSoup'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mrequests\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mBeautifulSoup\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mBeautifulSoup\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mre\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'BeautifulSoup'" ] } ], "source": [ "import requests\n", "from bs4 import BeautifulSoup\n", "import re\n", "import pandas as pd\n", "\n", "\n", "url = countries[country]\n", "content = requests.get(url).text\n", "soup = BeautifulSoup(content, features='html.parser')\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use this function again later, so define it here as a function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_data(div_id):\n", " '''Find the script element following div with id div_id'''\n", " script = soup.find('div', id = div_id).findNext('script')\n", "\n", " # Find the data line\n", " lines = script.text.split('\\n')\n", " datas = [l for l in lines if l.strip().startswith('data')]\n", " data_line = datas[0].strip()\n", "\n", " # Extract the data from the data_line\n", " counts_text = re.findall('[0-9]+', data_line)\n", " counts = list(map(int, counts_text))\n", "\n", " # Create DataFrame with Cumulative Sum of the daily data\n", " df = pd.DataFrame(counts, columns=['DailyIncrease'])\n", " df['Total'] = df.DailyIncrease.cumsum()\n", " \n", " return df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = get_data('graph-cases-daily')\n", "print(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plots\n", "We can plot this raw data already" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "\n", "plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Increase')\n", "plt.plot(df.Total, color = '#008800FF', label = 'Total Cases')\n", "\n", "plt.title(country)\n", "plt.legend()\n", "plt.grid()\n", "plt.xlabel('Day')\n", "plt.ylabel('Count')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can fit these curves to the nearest matching exponential. \n", "First, define the exponential function, and then use scypy to fit the curve" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# The function to fit datapoints to is exponential. B is the growth rate.\n", "import numpy as np\n", "def exponential(x, a, b, c):\n", " return a * np.exp(b * x) + c\n", "\n", "# Use scypy\n", "from scipy.optimize import curve_fit\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Plot again, now with the smoothened version overlaid." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "'''Plot the daily and total rates, and then the smootened version'''\n", "params_daily, _ = curve_fit(exponential, df.index, df.DailyIncrease)\n", "params_total, _ = curve_fit(exponential, df.index, df.Total)\n", "# Plot the data\n", "plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Increase')\n", "plt.plot(exponential(df.index, *params_daily), 'r-', color = '#88880088', label = 'Daily Increase (smooth)')\n", "\n", "plt.plot(df.Total, color = '#008800FF', label = 'Total Cases')\n", "plt.plot(exponential(df.index, *params_total), 'r-',color = '#00880088', label = 'Total (smooth)')\n", "\n", "plt.title(country)\n", "plt.legend()\n", "plt.grid()\n", "plt.xlabel('Day')\n", "plt.ylabel('Count')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parameters of interest\n", "In the exponential function above, the 'b' parameter represented the growth rate. We can use the rule of seventy to estimate the doubling rate from this." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# A quick lambda to limit decimal places displayed\n", "short = lambda x: '{:.1f}'.format(x)\n", "\n", "# The true value of seventy in the rule of seventy\n", "seventy = np.log(2)\n", "\n", "# And a lambda for the rule of seventy\n", "doubling_time = lambda x: seventy/x\n", "\n", "daily_growth_rate = params_daily[1]\n", "total_growth_rate = params_total[1]\n", "\n", "daily_doubling_rate = doubling_time(daily_growth_rate)\n", "total_doubling_rate = doubling_time(total_growth_rate)\n", "\n", "\n", "print(f'Daily Detections: growth rate = {short(daily_growth_rate)}, doubling time = {short(daily_doubling_rate)} days')\n", "print(f'Total Detections: growth rate = {short(total_growth_rate)}, doubling time = {short(total_doubling_rate)} days')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rate of change in parameters of interest\n", "We don't really have enough data points for this, but let's look to see if there is any emerging trend in parameters like the doubling rates.\n", "\n", "Using the dataframe, for each row, recalculate the exponential curve, and from that, what the estimated doubling rate was on that day. That is, we show the history of estimates of doubling rates as they would have been calculated on that day." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_stats(df):\n", "# print('start')\n", " params_daily, _ = curve_fit(exponential, df.index, df.DailyIncrease)\n", " params_total, _ = curve_fit(exponential, df.index, df.Total)\n", "# print('fin')\n", " daily_growth_rate = params_daily[1]\n", " total_growth_rate = params_total[1]\n", "\n", " daily_doubling_rate = doubling_time(daily_growth_rate)\n", " total_doubling_rate = doubling_time(total_growth_rate)\n", "\n", " return daily_growth_rate, daily_doubling_rate, total_growth_rate, total_doubling_rate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "minimum = 8\n", "\n", "for index, row in df.iterrows():\n", " \n", " head = df.head(index)\n", "\n", " try:\n", " daily_growth_rate, daily_doubling_rate, total_growth_rate, total_doubling_rate = get_stats(head)\n", " except:\n", " daily, total = 0, 0\n", " \n", " df.at[index, 'DGR'] = daily_growth_rate\n", " df.at[index, 'DDR'] = daily_doubling_rate\n", " df.at[index, 'TGR'] = total_growth_rate\n", " df.at[index, 'TDR'] = total_doubling_rate\n", " \n", "# print(df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the plot below, we want consistently negative growth rates, meaning that the number of new cases is declining daily." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(df.DGR, color = '#888800FF', label = 'Daily Growth Rate')\n", "plt.plot(df.TGR, color = '#008800FF', label = 'Total Growth Rate')\n", "\n", "plt.title(country)\n", "plt.legend()\n", "plt.grid()\n", "plt.xlabel('Day')\n", "plt.ylabel('Count')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Death Rates\n", "I don't want to be morbid, but these are starting to ramp up now too." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = get_data('graph-deaths-daily')\n", "print(df)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "# Plot the data\n", "plt.plot(df.DailyIncrease, color = '#888800FF', label = 'Daily Deaths')\n", "plt.plot(df.Total, color = '#008800FF', label = 'Total Deaths')\n", "\n", "plt.title(country)\n", "plt.legend()\n", "plt.grid()\n", "plt.xlabel('Day')\n", "plt.ylabel('Count')\n", "plt.show()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }