US COVID-19 cases and deaths by state

{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "BaVNFpZJgECX" }, "source": [ "# COVID-19 Data Collection and Analysis" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dLS-vnAJkZRc", "outputId": "59cc19f8-6b60-4e19-8d95-de0aff4428ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount(\"/content/drive\")" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "7PmTSUTFhqg6" }, "outputs": [], "source": [ "import re\n", "import requests\n", "import pandas as pd\n", "import numpy as np\n", "from time import sleep\n", "from bs4 import BeautifulSoup\n", "import pickle \n", "from typing import Optional" ] }, { "cell_type": "markdown", "metadata": { "id": "cmZHCw15fh_O" }, "source": [ "# 1. Data Collection\n", "\n", "We are interested in the degree to which the SARS-CoV-2 virus has affected United States citizens (SARS-CoV-2 is the virus that causes the COVID-19 disease). The Centers for Disease Control and Prevention (CDC) provides relevant data from USAFacts.org that includes the number of confirmed COVID-19 cases on a per-county basis. At the bottom of the web page (https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/), a blue table provides a list of states, each of which has its own web page displaying the reported numbers of cases and deaths.\n", "\n", "We automatically downloaded each state's data with Requests and then manipulated it with BeautifulSoup. Specifically, we first fetched the web page located at `base_url` and save the request's returned object (a respond object) to `home_page`. We then used the BeautifulSoup object to parse the home page as an HTML document in order to extract the link for every state. With these extracted URLs, we populated a `state_urls` dictionary by setting each key to be the state name and the value to be the full URL. To avoid download state web pages multiple times frequently, we iterated through the `state_urls`, make a web request for each URL, and save the contents out to a file on the hard drive." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "28WF-Ye9kzsh" }, "outputs": [], "source": [ "# Every state's url begins with this prefix\n", "base_url = 'https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/'\n", "# Datasets will be saved to this directory\n", "state_dir = \"./drive/MyDrive/state_data/\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JdtBOR97ffGn", "outputId": "47f3d8e9-c96d-4283-9bff-8dd62f295bee" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "200\n", "US COVID-19 cases and deaths by state | USAFacts Optional[int]:\n", " try:\n", " return population_dict.get(state).get(county)\n", " except AttributeError:\n", " print('incorrect state name!')\n", " return None\n", "\n", "def load_covid_data(state_info):\n", " covid_data = {}\n", " for (state, state_path) in state_info:\n", " covid_data[state] = []\n", " with open(state_path, 'r') as f:\n", " soup = BeautifulSoup(f.read(), 'html.parser')\n", " counties = soup.find_all('a', href=re.compile('county/'))\n", " for c in counties:\n", " row = c.find_parent('tr')\n", " cols = [col.text.replace(',','') for col in row.find_all('td')]\n", "\n", " county_name = c.text\n", " pop = get_pop(state, county_name)\n", " if ((get_pop(state, county_name)) is None) or (pop == 0):\n", " continue\n", " covid_data[state].append({'county_name': county_name,\n", " 'population': pop,\n", " '7_day_avg_cases': float(cols[0]),\n", " '7_date_ave_deaths': float(cols[1]),\n", " 'cases': int(cols[2]),\n", " 'deaths': int(cols[3])})\n", " return covid_data" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "Ckoiwt0m2KKQ" }, "outputs": [], "source": [ "state_info = [(state, state_dir + state) for state in state_urls.keys()]\n", "covid_data = load_covid_data(state_info)" ] }, { "cell_type": "markdown", "metadata": { "id": "WSEPIePT6U62" }, "source": [ "# 3. Exploratory Data Analysis (EDA) I \n", "\n", "We first observed the single-most extreme counties and states, then inspected all states, after having sorted the data by some features." ] }, { "cell_type": "markdown", "metadata": { "id": "d3KPJed-3XX4" }, "source": [ "\n", "We computed \n", "1. The single county (and the state to which it belongs) that has the lowest rate of COVID cases per 100k people.\n", "1. The single county (and the state to which it belongs) that has the highest rate of COVID cases per 100k people." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8IOTzSlb3bw3", "outputId": "87f9c267-73fd-4b7b-d979-e782a4b31653" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hoonah-Angoon Census Area (Alaska) has the lowest COVID cases per 100k: 0.0\n", "Loving County (Texas) has the highest COVID cases per 100k: 115976.33\n" ] } ], "source": [ "def calculate_county_stats(covid_data):\n", " \n", " min_county_count = 999999\n", " min_county_name = \"\"\n", " max_county_count = -1\n", " max_county_name = \"\"\n", " \n", " # looks through every county in every state, while checking if we have a new low or high\n", " for state in covid_data.keys():\n", " for county in covid_data[state]:\n", " pop = county['population']\n", " if (pop is None) or (pop == 0):\n", " continue\n", " covid_rate = round(county['cases'] / (pop/100000),2)\n", " if covid_rate < min_county_count:\n", " min_county_count = covid_rate\n", " min_county_name = county['county_name'] + \" (\" + state + \")\"\n", " if covid_rate > max_county_count:\n", " max_county_count = covid_rate\n", " max_county_name = county['county_name'] + \" (\" + state + \")\"\n", "\n", " print(min_county_name + \" has the lowest COVID cases per 100k: \" + str(float(min_county_count)))\n", " print(max_county_name + \" has the highest COVID cases per 100k: \" + str(float(max_county_count))) \n", "\n", "calculate_county_stats(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "uMLxRaVT5WQf" }, "source": [ "We calculated\n", "1. The state that has the lowest number of deaths\n", "1. The state that has the highest number of deaths\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1PrUM8B05crZ", "outputId": "0a8b05fd-cec5-46b4-c8f6-a7296efcfa1b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vermont has the fewest COVID deaths: 646\n", "California has the most COVID deaths: 89931\n" ] } ], "source": [ "def calculate_state_deaths(covid_data):\n", " \n", " min_state_deaths = 999999\n", " min_state_name = \"\"\n", " max_state_deaths = -1\n", " max_state_name = \"\"\n", " for state in covid_data.keys():\n", " cur_state_count = 0\n", " for county in covid_data[state]:\n", " cur_state_count += county['deaths']\n", " \n", " if cur_state_count < min_state_deaths:\n", " min_state_deaths = cur_state_count\n", " min_state_name = state\n", " if cur_state_count > max_state_deaths:\n", " max_state_deaths = cur_state_count\n", " max_state_name = state\n", "\n", " print(min_state_name + \" has the fewest COVID deaths: \" + str(min_state_deaths))\n", " print(max_state_name + \" has the most COVID deaths: \" + str(max_state_deaths)) \n", "\n", "calculate_state_deaths(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "Trsm-caH59xZ" }, "source": [ "We calculated\n", "1. The state that has the lowest rate of deaths based on its entire population\n", "1. The state that has the highest rate of deaths based on its entire population\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7uxLpphX59jM", "outputId": "7ce6b554-af5c-4794-8f2d-3e62fec13975" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hawaii has the lowest COVID death rate; 1 out of every 996 people has died\n", "Mississippi has the highest COVID death rate; 1 out of every 239 people has died\n" ] } ], "source": [ "def calculate_state_deathrate(covid_data):\n", " \n", " min_state_death_rate = -1\n", " min_state_name = \"\"\n", " max_state_death_rate = 9999999\n", " max_state_name = \"\"\n", " \n", " for state in covid_data.keys():\n", " cur_state_deaths = 0\n", " cur_state_population = 0\n", " for county in covid_data[state]:\n", " pop = county['population']\n", " if (county['cases'] > 0) and (pop is not None):\n", " cur_state_population += pop\n", " cur_state_deaths += county['deaths']\n", " \n", " cur_state_deathrate = float(cur_state_population) / cur_state_deaths\n", " \n", " if cur_state_deathrate > min_state_death_rate:\n", " min_state_death_rate = cur_state_deathrate\n", " min_state_name = state\n", " if cur_state_deathrate < max_state_death_rate:\n", " max_state_death_rate = cur_state_deathrate\n", " max_state_name = state\n", " \n", " print(min_state_name + \" has the lowest COVID death rate; 1 out of every \" + str(round(min_state_death_rate)) + \" people has died\")\n", " print(max_state_name + \" has the highest COVID death rate; 1 out of every \" + str(round(max_state_death_rate)) + \" people has died\")\n", "\n", "calculate_state_deathrate(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "x7Qo8Ozv7eXz" }, "source": [ "Complicated analysis requires a better data structure like pandas dataframe. We now convert the previous dictionary of lists of dictionaries to a pandas dataframe. Each row corresponds to a unique county. Five columns are county, state, # total covid cases (integer), # covid case per 100k (float), and # covid deaths (integer)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 224 }, "id": "52E_9h4Z7SMv", "outputId": "2df05e69-e74c-4351-9d50-a4110a5e31fd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3118, 5)\n" ] }, { "data": { "text/html": [ "

	county	state	# total covid cases	# covid cases per 100k	# covid deaths
0	Autauga County	Alabama	15863	28393.205534	216
1	Baldwin County	Alabama	55862	25023.965883	681
2	Barbour County	Alabama	5681	23013.043831	98
3	Bibb County	Alabama	6457	28833.616147	105
4	Blount County	Alabama	15005	25948.535261	243

\n", "

" ], "text/plain": [ " county state # total covid cases # covid cases per 100k \\\n", "0 Autauga County Alabama 15863 28393.205534 \n", "1 Baldwin County Alabama 55862 25023.965883 \n", "2 Barbour County Alabama 5681 23013.043831 \n", "3 Bibb County Alabama 6457 28833.616147 \n", "4 Blount County Alabama 15005 25948.535261 \n", "\n", " # covid deaths \n", "0 216 \n", "1 681 \n", "2 98 \n", "3 105 \n", "4 243 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def convert_to_pandas(covid_data):\n", " \n", " covid_data_flipped = []\n", " for state, counties in covid_data.items():\n", " for county in counties: \n", " pop = county['population']\n", " if (pop is None) or (pop == 0):\n", " continue\n", " cases = county['cases']\n", " cur_dict = {\"county\":county['county_name'], \"state\":state,\n", " \"# total covid cases\": cases,\n", " \"# covid cases per 100k\": cases/(pop/100000),\n", " \"# covid deaths\": county['deaths']}\n", " covid_data_flipped.append(cur_dict)\n", " covid_df = pd.json_normalize(covid_data_flipped)\n", " return covid_df\n", "\n", "covid_df = convert_to_pandas(covid_data)\n", "print(covid_df.shape)\n", "covid_df.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "covid_df.to_csv('./combined_data/covid_df.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": { "id": "vMVBrhvS8qE0" }, "source": [ "We can use this dataframe to compute same quantities as done above more easily\n", "\n", "1. the single county (and the state to which it belongs) that has the lowest rate of COVID cases per 100k people\n", "1. the single county (and the state to which it belongs) that has the highest rate of COVID cases per 100k people\n", "\n", "\n", "1. the state that has the lowest number of deaths\n", "1. the state that has the highest number of deaths\n", "\n", "\n", "1. The state that has the lowest rate of deaths based on its entire population\n", "1. The state that has the highest rate of deaths based on its entire population\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "BEsrXQAU8pP-", "outputId": "1e97dd6f-5195-43cd-8b01-a5f79ae017de" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kalawao County (Hawaii) has the lowest rate of confirmed COVID cases per 100k: 0.00\n", "Loving County (Texas) has the highest rate of confirmed COVID cases per 100k: 113,017.75\n" ] } ], "source": [ "def calculate_county_stats2(covid_df):\n", "\n", " sorted_df = covid_df.sort_values(by=['# covid cases per 100k'])\n", " lowest = sorted_df.iloc[0]\n", " highest = sorted_df.iloc[-1]\n", "\n", " print(f\"{lowest['county']} ({lowest['state']}) has the lowest rate of confirmed COVID cases per 100k: {lowest['# covid cases per 100k']:,.2f}\")\n", " print(f\"{highest['county']} ({highest['state']}) has the highest rate of confirmed COVID cases per 100k: {highest['# covid cases per 100k']:,.2f}\")\n", " \n", "calculate_county_stats2(covid_df)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Nhh5OeuB9Fao", "outputId": "3d19ce1c-7892-4c5a-c876-30386aaf95f4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vermont has the fewest COVID deaths: 640.0\n", "California has the most COVID deaths: 89667.0\n" ] } ], "source": [ "def calculate_state_deaths2(covid_df):\n", " \n", " state_deaths = covid_df.groupby('state').sum().sort_values(by=['# covid deaths'])\n", " lowest = state_deaths.iloc[0]\n", " highest = state_deaths.iloc[-1]\n", "\n", " print(lowest.name + \" has the fewest COVID deaths: \" + str(lowest['# covid deaths']))\n", " print(highest.name + \" has the most COVID deaths: \" + str(highest['# covid deaths']))\n", "\n", "calculate_state_deaths2(covid_df)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PukG7pDl9Uw_", "outputId": "1c0bcc88-abfa-41c2-9a5e-547fc53f9206" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hawaii has the lowest COVID death rate; 1 out of every 995 people has died\n", "Mississippi has the highest COVID death rate; 1 out of every 238 people has died\n" ] } ], "source": [ "def calculate_state_deathrate2(covid_df):\n", " \n", " covid_df2 = covid_df\n", " covid_df2['population'] = 100000*covid_df2['# total covid cases'] / covid_df2['# covid cases per 100k']\n", " covid_df2 = covid_df2.groupby('state').sum()\n", " covid_df2['death_rate'] = covid_df2['population'] / covid_df2['# covid deaths']\n", " covid_df2 = covid_df2.sort_values(by=['death_rate'])\n", "\n", " print(covid_df2.iloc[-1].name + \" has the lowest COVID death rate; 1 out of every \" + str(int(covid_df2.iloc[-1]['death_rate'])) + \" people has died\")\n", " print(covid_df2.iloc[0].name + \" has the highest COVID death rate; 1 out of every \" + str(int(covid_df2.iloc[0]['death_rate'])) + \" people has died\")\n", "\n", "calculate_state_deathrate2(covid_df)" ] }, { "cell_type": "markdown", "metadata": { "id": "Y9KrpyIp8_Ze" }, "source": [ "Furthermore, considering that the data is messy and some are not reliable, we attempted to understand some of the uncertainty around COVID data. We consider that false negatives of deaths of COVID-19 is minimal. Every disease has a mortality rate and we can consider it's constant throughout all people in the US. Although some are at highe risk (e.g. older folks, people with pre-existing conditions, etc), we can imagine that this variance in the population to be fairly uniform throughout the USA. Therefore, if all counties were equal in their testing, we are supposed to see a consistent ratio between # people who died from COVID and # of people who tested positive for COVID, which is called 'case fatality rate'." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 502 }, "id": "de9sEr-hBqN1", "outputId": "0bca88da-5f15-4d33-aa49-e1016b452bd2" }, "outputs": [ { "data": { "text/html": [ "

	county	state	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate
1702	Loup County	Nebraska	87	13102.409639	0	663	0	0.000000
1696	Keya Paha County	Nebraska	118	14640.198511	0	805	0	0.000000
544	Kalawao County	Hawaii	0	0.000000	0	0	0	0.000000
87	Skagway Municipality	Alaska	30	2535.925613	0	1182	0	0.000000
269	Jackson County	Colorado	166	11925.287356	0	1391	0	0.000000
...	...	...	...	...	...	...	...	...
2715	Sabine County	Texas	1254	11895.276039	89	10541	844	0.070973
427	Dodge County	Georgia	2110	10240.232953	154	20604	747	0.072986
1752	Storey County	Nevada	133	3225.806452	11	4122	266	0.082707
538	Wilcox County	Georgia	828	9588.882455	71	8634	822	0.085749
444	Glascock County	Georgia	269	9054.190508	25	2970	841	0.092937

\n", "

3118 rows × 8 columns

\n", "

" ], "text/plain": [ " county state # total covid cases \\\n", "1702 Loup County Nebraska 87 \n", "1696 Keya Paha County Nebraska 118 \n", "544 Kalawao County Hawaii 0 \n", "87 Skagway Municipality Alaska 30 \n", "269 Jackson County Colorado 166 \n", "... ... ... ... \n", "2715 Sabine County Texas 1254 \n", "427 Dodge County Georgia 2110 \n", "1752 Storey County Nevada 133 \n", "538 Wilcox County Georgia 828 \n", "444 Glascock County Georgia 269 \n", "\n", " # covid cases per 100k # covid deaths population \\\n", "1702 13102.409639 0 663 \n", "1696 14640.198511 0 805 \n", "544 0.000000 0 0 \n", "87 2535.925613 0 1182 \n", "269 11925.287356 0 1391 \n", "... ... ... ... \n", "2715 11895.276039 89 10541 \n", "427 10240.232953 154 20604 \n", "1752 3225.806452 11 4122 \n", "538 9588.882455 71 8634 \n", "444 9054.190508 25 2970 \n", "\n", " # covid deaths per 100k case_fatality_rate \n", "1702 0 0.000000 \n", "1696 0 0.000000 \n", "544 0 0.000000 \n", "87 0 0.000000 \n", "269 0 0.000000 \n", "... ... ... \n", "2715 844 0.070973 \n", "427 747 0.072986 \n", "1752 266 0.082707 \n", "538 822 0.085749 \n", "444 841 0.092937 \n", "\n", "[3118 rows x 8 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add_death_stats(covid_df):\n", " \n", " # can add an infintesimal or fillna after the fact to handle nans from divide by 0.\n", " \n", " covid_df['population'] = 100000*covid_df['# total covid cases'] / (covid_df['# covid cases per 100k']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df[\"population\"] = covid_df[\"population\"].astype('int32')\n", " \n", " covid_df['# covid deaths per 100k'] = 100000*covid_df['# covid deaths'] / (covid_df['population']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df[\"# covid deaths per 100k\"] = covid_df[\"# covid deaths per 100k\"].astype('int32')\n", " \n", " covid_df['case_fatality_rate'] = covid_df['# covid deaths'] / (covid_df['# total covid cases']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df = covid_df.sort_values(by=['case_fatality_rate'])\n", "\n", " return covid_df\n", "\n", "covid_updated = add_death_stats(covid_df)\n", "covid_updated" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "covid_updated.to_csv('./combined_data/covid_updated.csv', index = False)" ] }, { "cell_type": "markdown", "metadata": { "id": "2gVw9sYECdR1" }, "source": [ "From all those analyses above, we learned that states vary wildly in their death rate (e.g., The number of deaths in New Jersey or California is orders of magnitude higher than those in Hawaii or Alaska) and COVID testing. States also fluctuate a lot amongst their counties, as some counties with very bad statistics are within states with good statistics. \n", "\n", "When it comes to data reliability, some states and counties are probably more proactive when it comes to testing, so they could have higher cases counts. Other counties might have a similar number of cases or higher, but they are just not being represented in the data due to lower testing. Deaths are thus harder to overlook, so states with lax testing policies may have inflated deaths per case metrics. Perhaps we could supplement the data with some measure of testing rates in the county or state." ] }, { "cell_type": "markdown", "metadata": { "id": "RbR70HJmEm6a" }, "source": [ "# 4. Incorporate More Data\n", "\n", "We are also interested in how COVID has impacted our world. We can better understand this by looking at how it relates to demographics, income, education, health, and politicala voting. \n", "\n", "Our `case_fatality_rate` column can be viewed as an approximation of how effective and thorough COVID testing is for a given county. Our `# covid deaths` column can be viewed as an extreme indication of how severe COVID has impacted a given county. Our `# covid cases per 100k` column be viewed as middle-ground between the two aforementioned features. That is, it measures the impact of the disease and is influenced by the thoroughness of COVID testing. \n", "\n", "Using these three informative features, we can inspect how impacted each county is, while correlating this with other features of each county, such as income-level, health metrics, demographics, etc. \n", "\n", "In this project, we merged our COVID case data with 'election2020_by_county.csv' dataset. We only care about 15 columns which are hispanic, minority, female, unemployed, income, nodegree, bachelor, inactivity, obesity, desity, cancer, voter_turnout, voter_gap, trump, biden. We droppde fipscode and population columns.\n", "\n", "A data description is as follows:\n", "\n", "- state: the state in which the county lies\n", "- fipscode: an ID to identify each county\n", "- county: the name of each county\n", "- population: total population\n", "- hispanic: percent of adults that are hispanic\n", "- minority: percent of adults that are nonwhite\n", "- female: percent of adults that are female\n", "- unemployed: unemployment rate, as a percent\n", "- income: median income\n", "- nodegree: percent of adults who have not completed high school\n", "- bachelor: percent of adults with a bachelor’s degree\n", "- inactive: percent of adults who do not exercise in their leisure time\n", "- obesity: percent of adults with BMI > 30\n", "- density: population density, persons per square mile of land\n", "- cancer: prevalence of cancer per 100,000 individuals\n", "- voter_turnout: percentage of voting age population that voted\n", "- voter_gap: percentage point gap in 2020 presidential voting: trump-briden\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 473 }, "id": "Za799tkjB4Hs", "outputId": "3fba53e1-a3a9-4f28-a50d-f86f68e811ca" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3044, 23)\n" ] }, { "data": { "text/html": [ "

	county	state	# total covid cases	# covid cases per 100k	population	hispanic	minority	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
0	Loup County	Nebraska	87	13102.409639	663	0.0	0.4	...	6.2	14.4	33.0	30.7	1.3	NaN	-4.849885	65.0	81.5	16.5
1	Keya Paha County	Nebraska	118	14640.198511	805	1.1	2.2	...	8.0	15.8	30.2	29.6	7.9	256.3	8.000000	80.7	90.0	9.3
2	Jackson County	Colorado	166	11925.287356	1391	24.4	25.1	...	15.5	17.5	20.5	21.4	0.9	207.1	12.874251	58.1	77.9	19.8
3	Daggett County	Utah	37	3894.736842	949	3.3	5.1	...	12.4	19.3	20.3	26.1	14.5	185.3	-9.461967	62.4	80.2	17.8
4	Hinsdale County	Colorado	131	15975.609756	819	6.0	9.1	...	5.1	41.3	14.8	20.5	0.8	NaN	5.952381	15.6	55.9	40.3

\n", "

5 rows × 23 columns

\n", "

" ], "text/plain": [ " county state # total covid cases # covid cases per 100k \\\n", "0 Loup County Nebraska 87 13102.409639 \n", "1 Keya Paha County Nebraska 118 14640.198511 \n", "2 Jackson County Colorado 166 11925.287356 \n", "3 Daggett County Utah 37 3894.736842 \n", "4 Hinsdale County Colorado 131 15975.609756 \n", "\n", " # covid deaths population # covid deaths per 100k case_fatality_rate \\\n", "0 0 663 0 0.0 \n", "1 0 805 0 0.0 \n", "2 0 1391 0 0.0 \n", "3 0 949 0 0.0 \n", "4 0 819 0 0.0 \n", "\n", " hispanic minority ... nodegree bachelor inactivity obesity density \\\n", "0 0.0 0.4 ... 6.2 14.4 33.0 30.7 1.3 \n", "1 1.1 2.2 ... 8.0 15.8 30.2 29.6 7.9 \n", "2 24.4 25.1 ... 15.5 17.5 20.5 21.4 0.9 \n", "3 3.3 5.1 ... 12.4 19.3 20.3 26.1 14.5 \n", "4 6.0 9.1 ... 5.1 41.3 14.8 20.5 0.8 \n", "\n", " cancer voter_turnout voter_gap trump biden \n", "0 NaN -4.849885 65.0 81.5 16.5 \n", "1 256.3 8.000000 80.7 90.0 9.3 \n", "2 207.1 12.874251 58.1 77.9 19.8 \n", "3 185.3 -9.461967 62.4 80.2 17.8 \n", "4 NaN 5.952381 15.6 55.9 40.3 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def merge_data(covid_updated, filepath):\n", " \n", " data2020 = pd.read_csv(filepath).drop(columns=['fipscode', 'population'])\n", " return pd.merge(covid_updated, data2020, on=['state', 'county'])\n", "\n", "merged = merge_data(covid_updated, './drive/MyDrive/election2020_by_county.csv')\n", "print(merged.shape)\n", "merged.head()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "merged.to_csv('./combined_data/merged.csv', index = False)" ] }, { "cell_type": "markdown", "metadata": { "id": "FC2onjm5Guqg" }, "source": [ "Due to mismatching happened during merging, we have lost some rows. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zBf42-18B4B8", "outputId": "d76f87c3-6597-453f-dcd2-1e601c2bfdc6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "74\n" ] } ], "source": [ "print(len(covid_updated) - len(merged))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 520 }, "id": "uJ6uylCdGen9", "outputId": "fb258a24-d339-4092-f243-c5256dc4e127" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	county	state	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate
87	Skagway Municipality	Alaska	30	2535.925613	0	1182	0	0.000000
90	Wrangell City and Borough	Alaska	71	2837.729816	0	2501	0	0.000000
75	Hoonah-Angoon Census Area	Alaska	0	0.000000	0	0	0	0.000000
544	Kalawao County	Hawaii	0	0.000000	0	0	0	0.000000
68	Aleutians West Census Area	Alaska	863	15317.713880	2	5633	35	0.002317
...	...	...	...	...	...	...	...	...
1127	East Feliciana Parish	Louisiana	7320	38254.507447	170	19134	888	0.023224
1122	Claiborne Parish	Louisiana	3280	20931.716656	77	15669	491	0.023476
1129	Franklin Parish	Louisiana	7347	36707.469398	178	20014	889	0.024228
1148	Red River Parish	Louisiana	2110	24994.077233	53	8441	627	0.025118
1115	Bienville Parish	Louisiana	3990	30133.675704	110	13240	830	0.027569

\n", "

74 rows × 8 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " county state # total covid cases \\\n", "87 Skagway Municipality Alaska 30 \n", "90 Wrangell City and Borough Alaska 71 \n", "75 Hoonah-Angoon Census Area Alaska 0 \n", "544 Kalawao County Hawaii 0 \n", "68 Aleutians West Census Area Alaska 863 \n", "... ... ... ... \n", "1127 East Feliciana Parish Louisiana 7320 \n", "1122 Claiborne Parish Louisiana 3280 \n", "1129 Franklin Parish Louisiana 7347 \n", "1148 Red River Parish Louisiana 2110 \n", "1115 Bienville Parish Louisiana 3990 \n", "\n", " # covid cases per 100k # covid deaths population \\\n", "87 2535.925613 0 1182 \n", "90 2837.729816 0 2501 \n", "75 0.000000 0 0 \n", "544 0.000000 0 0 \n", "68 15317.713880 2 5633 \n", "... ... ... ... \n", "1127 38254.507447 170 19134 \n", "1122 20931.716656 77 15669 \n", "1129 36707.469398 178 20014 \n", "1148 24994.077233 53 8441 \n", "1115 30133.675704 110 13240 \n", "\n", " # covid deaths per 100k case_fatality_rate \n", "87 0 0.000000 \n", "90 0 0.000000 \n", "75 0 0.000000 \n", "544 0 0.000000 \n", "68 35 0.002317 \n", "... ... ... \n", "1127 888 0.023224 \n", "1122 491 0.023476 \n", "1129 889 0.024228 \n", "1148 627 0.025118 \n", "1115 830 0.027569 \n", "\n", "[74 rows x 8 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_counties = set()\n", "merged_counties = set()\n", "for index, row in merged.iterrows():\n", " merged_counties.add(row['county'].lower() + \"_\" + row['state'].lower())\n", "\n", "missing_idxs = []\n", "for index, row in covid_updated.iterrows():\n", " cur_county = row['county'].lower() + \"_\" + row['state'].lower()\n", " if cur_county not in merged_counties:\n", " # print(\"missing\",cur_county)\n", " missing_idxs.append(index)\n", " missing_counties.add(cur_county)\n", "\n", "covid_updated.loc[missing_idxs]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "LXr84099G3Ui", "outputId": "b7656532-271a-4974-85c2-11ca6ad9d901" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	3.021000e+03	3021.000000	3021.000000	3.021000e+03	3021.000000	3021.000000	3021.000000	3021.000000	3021.000000	3021.000000	...	3021.000000	3021.000000	3021.000000	3021.000000	3021.000000	2979.000000	2984.000000	2983.000000	2983.000000	2983.000000
mean	2.546425e+04	24314.359415	302.067196	1.055975e+05	366.179742	0.015760	9.259484	22.694009	49.932903	5.495035	...	14.970275	20.066137	25.910824	30.947104	250.792122	228.507486	35.459084	32.702749	65.487026	32.784278
std	8.662408e+04	6015.982401	1006.517139	3.391059e+05	162.864254	0.008579	13.881480	19.920798	2.375417	1.969195	...	6.740886	8.846873	5.161410	4.467600	1741.445511	56.125890	13.868524	31.279827	15.699166	15.590596
min	3.200000e+01	3274.314819	1.000000	1.680000e+02	11.000000	0.000527	0.000000	0.000000	19.166215	1.800000	...	1.900000	2.600000	8.100000	11.800000	0.100000	46.200000	-168.323353	-90.000000	4.000000	3.100000
25%	2.560000e+03	20683.907225	41.000000	1.106500e+04	249.000000	0.010637	2.000000	6.900000	49.462074	4.100000	...	9.900000	14.000000	22.600000	28.300000	17.000000	193.300000	27.675757	15.050000	56.700000	20.800000
50%	6.401000e+03	24365.817878	96.000000	2.584300e+04	358.000000	0.014321	4.000000	15.200000	50.392410	5.300000	...	13.500000	17.900000	25.800000	31.100000	45.300000	230.300000	35.055497	39.100000	68.700000	29.600000
75%	1.705000e+04	27802.251591	229.000000	6.784200e+04	468.000000	0.018885	9.500000	33.900000	51.080252	6.500000	...	19.200000	23.600000	29.400000	33.700000	112.900000	265.100000	42.473125	56.800000	77.500000	41.650000
max	2.751220e+06	113017.751479	31736.000000	1.003911e+07	1229.000000	0.092937	99.200000	99.400000	58.100420	24.000000	...	53.300000	75.100000	41.400000	47.600000	69468.400000	458.300000	100.000000	93.100000	96.200000	94.000000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 3.021000e+03 3021.000000 3021.000000 \n", "mean 2.546425e+04 24314.359415 302.067196 \n", "std 8.662408e+04 6015.982401 1006.517139 \n", "min 3.200000e+01 3274.314819 1.000000 \n", "25% 2.560000e+03 20683.907225 41.000000 \n", "50% 6.401000e+03 24365.817878 96.000000 \n", "75% 1.705000e+04 27802.251591 229.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 3.021000e+03 3021.000000 3021.000000 3021.000000 \n", "mean 1.055975e+05 366.179742 0.015760 9.259484 \n", "std 3.391059e+05 162.864254 0.008579 13.881480 \n", "min 1.680000e+02 11.000000 0.000527 0.000000 \n", "25% 1.106500e+04 249.000000 0.010637 2.000000 \n", "50% 2.584300e+04 358.000000 0.014321 4.000000 \n", "75% 6.784200e+04 468.000000 0.018885 9.500000 \n", "max 1.003911e+07 1229.000000 0.092937 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 3021.000000 3021.000000 3021.000000 ... 3021.000000 3021.000000 \n", "mean 22.694009 49.932903 5.495035 ... 14.970275 20.066137 \n", "std 19.920798 2.375417 1.969195 ... 6.740886 8.846873 \n", "min 0.000000 19.166215 1.800000 ... 1.900000 2.600000 \n", "25% 6.900000 49.462074 4.100000 ... 9.900000 14.000000 \n", "50% 15.200000 50.392410 5.300000 ... 13.500000 17.900000 \n", "75% 33.900000 51.080252 6.500000 ... 19.200000 23.600000 \n", "max 99.400000 58.100420 24.000000 ... 53.300000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 3021.000000 3021.000000 3021.000000 2979.000000 2984.000000 \n", "mean 25.910824 30.947104 250.792122 228.507486 35.459084 \n", "std 5.161410 4.467600 1741.445511 56.125890 13.868524 \n", "min 8.100000 11.800000 0.100000 46.200000 -168.323353 \n", "25% 22.600000 28.300000 17.000000 193.300000 27.675757 \n", "50% 25.800000 31.100000 45.300000 230.300000 35.055497 \n", "75% 29.400000 33.700000 112.900000 265.100000 42.473125 \n", "max 41.400000 47.600000 69468.400000 458.300000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 2983.000000 2983.000000 2983.000000 \n", "mean 32.702749 65.487026 32.784278 \n", "std 31.279827 15.699166 15.590596 \n", "min -90.000000 4.000000 3.100000 \n", "25% 15.050000 56.700000 20.800000 \n", "50% 39.100000 68.700000 29.600000 \n", "75% 56.800000 77.500000 41.650000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove rows with 0 deaths\n", "merged = merged.loc[merged['# covid deaths'] != 0]\n", "\n", "# Summary statistics\n", "merged.describe()" ] }, { "cell_type": "markdown", "metadata": { "id": "3jmCzIx7IRiQ" }, "source": [ "# 5. Exploratory Data Analysis (EDA) II\n", "\n", "We can partition any quantitative feature by using quantiles. With arbitrarily chosen minv and maxv, we can partition certain feature of interest multiple times and observe interesting relationships." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "2RzKgq32IHaD" }, "outputs": [], "source": [ "# Given minv and maxv, this function returns a subset of the dataframe that has feature values between minv and maxv inclusive.\n", "def partition_df(df, column_name, minv, maxv):\n", " return df.loc[(merged[column_name] >= minv) & (merged[column_name] <= maxv)]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "ErmhoSBhJnPV", "outputId": "fc28aeea-e89c-4fe7-d2f7-1d5fe0bcf66f" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	...	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000	117.000000
mean	4634.606838	26336.198665	85.504274	16558.410256	520.239316	0.021598	8.474359	43.990598	49.624858	9.065812	...	26.895726	11.892308	31.676923	36.230769	84.973504	244.556410	44.983493	13.635043	56.181197	42.546154
std	5658.986674	7096.649003	87.830030	17899.159118	152.402917	0.010778	21.306839	32.372695	3.836346	2.228389	...	5.736489	3.710822	4.358590	5.410948	176.212760	47.754773	10.879040	45.685134	22.877081	22.815828
min	184.000000	7931.904161	3.000000	1536.000000	174.000000	0.004765	0.000000	0.400000	35.462777	4.400000	...	8.000000	5.800000	19.700000	21.000000	1.400000	99.200000	-10.411765	-71.600000	13.500000	9.300000
25%	2003.000000	22647.596516	43.000000	8110.000000	434.000000	0.015873	0.700000	5.800000	48.747893	7.700000	...	23.400000	9.200000	29.000000	32.600000	17.500000	215.200000	39.175362	-27.900000	35.100000	19.900000
50%	2889.000000	27116.874368	56.000000	11839.000000	515.000000	0.018861	1.600000	49.700000	50.691193	8.700000	...	26.900000	11.300000	32.000000	36.400000	36.800000	243.900000	44.536665	14.200000	54.900000	40.800000
75%	5247.000000	31600.407747	96.000000	18067.000000	599.000000	0.022789	3.400000	73.100000	51.953125	10.100000	...	29.800000	13.800000	35.000000	40.300000	74.100000	272.700000	51.138995	58.500000	78.900000	63.000000
max	40588.000000	39821.693908	589.000000	130624.000000	1064.000000	0.067568	99.200000	99.400000	56.526573	17.600000	...	53.300000	27.900000	41.300000	47.600000	1261.500000	380.000000	70.003709	80.400000	89.700000	85.100000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 117.000000 117.000000 117.000000 \n", "mean 4634.606838 26336.198665 85.504274 \n", "std 5658.986674 7096.649003 87.830030 \n", "min 184.000000 7931.904161 3.000000 \n", "25% 2003.000000 22647.596516 43.000000 \n", "50% 2889.000000 27116.874368 56.000000 \n", "75% 5247.000000 31600.407747 96.000000 \n", "max 40588.000000 39821.693908 589.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 117.000000 117.000000 117.000000 117.000000 \n", "mean 16558.410256 520.239316 0.021598 8.474359 \n", "std 17899.159118 152.402917 0.010778 21.306839 \n", "min 1536.000000 174.000000 0.004765 0.000000 \n", "25% 8110.000000 434.000000 0.015873 0.700000 \n", "50% 11839.000000 515.000000 0.018861 1.600000 \n", "75% 18067.000000 599.000000 0.022789 3.400000 \n", "max 130624.000000 1064.000000 0.067568 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 117.000000 117.000000 117.000000 ... 117.000000 117.000000 \n", "mean 43.990598 49.624858 9.065812 ... 26.895726 11.892308 \n", "std 32.372695 3.836346 2.228389 ... 5.736489 3.710822 \n", "min 0.400000 35.462777 4.400000 ... 8.000000 5.800000 \n", "25% 5.800000 48.747893 7.700000 ... 23.400000 9.200000 \n", "50% 49.700000 50.691193 8.700000 ... 26.900000 11.300000 \n", "75% 73.100000 51.953125 10.100000 ... 29.800000 13.800000 \n", "max 99.400000 56.526573 17.600000 ... 53.300000 27.900000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 117.000000 117.000000 117.000000 117.000000 117.000000 \n", "mean 31.676923 36.230769 84.973504 244.556410 44.983493 \n", "std 4.358590 5.410948 176.212760 47.754773 10.879040 \n", "min 19.700000 21.000000 1.400000 99.200000 -10.411765 \n", "25% 29.000000 32.600000 17.500000 215.200000 39.175362 \n", "50% 32.000000 36.400000 36.800000 243.900000 44.536665 \n", "75% 35.000000 40.300000 74.100000 272.700000 51.138995 \n", "max 41.300000 47.600000 1261.500000 380.000000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 117.000000 117.000000 117.000000 \n", "mean 13.635043 56.181197 42.546154 \n", "std 45.685134 22.877081 22.815828 \n", "min -71.600000 13.500000 9.300000 \n", "25% -27.900000 35.100000 19.900000 \n", "50% 14.200000 54.900000 40.800000 \n", "75% 58.500000 78.900000 63.000000 \n", "max 80.400000 89.700000 85.100000 \n", "\n", "[8 rows x 21 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "partition_df(merged, 'income', 21000, 31000).describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "gTw1U7oEJwxU", "outputId": "ac425324-7f5a-49e9-878a-bad283604ada" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1.033000e+03	1033.000000	1033.000000	1.033000e+03	1033.000000	1033.000000	1033.000000	1033.000000	1033.000000	1033.000000	...	1033.000000	1033.000000	1033.000000	1033.000000	1033.000000	1013.000000	1014.000000	1014.000000	1014.000000	1014.000000
mean	4.635578e+04	22771.754766	493.433688	1.961860e+05	313.152953	0.014420	14.784705	24.974831	49.784908	5.065247	...	13.033495	25.626718	22.030591	26.236883	365.470474	213.825469	32.293654	24.426331	61.280375	36.854043
std	1.397306e+05	6629.900988	1602.636109	5.424150e+05	172.554693	0.009160	17.944395	19.940681	2.537714	1.874395	...	6.728381	10.933433	4.181953	2.969353	1641.553548	61.119227	17.665726	36.403793	18.306117	18.108292
min	3.500000e+01	3274.314819	1.000000	1.680000e+02	11.000000	0.001057	0.000000	0.000000	31.955024	1.800000	...	1.900000	2.600000	8.100000	11.800000	0.100000	46.200000	-21.417069	-90.000000	4.000000	3.100000
25%	2.310000e+03	18617.709313	33.000000	1.072400e+04	180.000000	0.008413	3.100000	9.800000	49.321586	3.800000	...	8.300000	17.300000	19.200000	25.000000	11.000000	169.700000	21.126119	-2.200000	47.850000	21.900000
50%	7.027000e+03	22735.283123	93.000000	3.256400e+04	285.000000	0.012746	7.400000	18.600000	50.331365	4.800000	...	11.300000	23.300000	22.400000	27.100000	45.800000	212.600000	29.870988	27.500000	62.800000	35.200000
75%	3.332500e+04	26132.178794	335.000000	1.529390e+05	414.000000	0.017883	19.200000	35.200000	51.018405	5.900000	...	16.300000	32.200000	25.000000	28.400000	162.300000	251.600000	40.219742	54.300000	76.275000	49.975000
max	2.751220e+06	113017.751479	31736.000000	1.003911e+07	1229.000000	0.092937	95.300000	97.400000	58.100420	24.000000	...	46.800000	75.100000	38.800000	29.400000	32903.300000	445.400000	99.552895	93.100000	96.200000	94.000000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1.033000e+03 1033.000000 1033.000000 \n", "mean 4.635578e+04 22771.754766 493.433688 \n", "std 1.397306e+05 6629.900988 1602.636109 \n", "min 3.500000e+01 3274.314819 1.000000 \n", "25% 2.310000e+03 18617.709313 33.000000 \n", "50% 7.027000e+03 22735.283123 93.000000 \n", "75% 3.332500e+04 26132.178794 335.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.033000e+03 1033.000000 1033.000000 1033.000000 \n", "mean 1.961860e+05 313.152953 0.014420 14.784705 \n", "std 5.424150e+05 172.554693 0.009160 17.944395 \n", "min 1.680000e+02 11.000000 0.001057 0.000000 \n", "25% 1.072400e+04 180.000000 0.008413 3.100000 \n", "50% 3.256400e+04 285.000000 0.012746 7.400000 \n", "75% 1.529390e+05 414.000000 0.017883 19.200000 \n", "max 1.003911e+07 1229.000000 0.092937 95.300000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1033.000000 1033.000000 1033.000000 ... 1033.000000 1033.000000 \n", "mean 24.974831 49.784908 5.065247 ... 13.033495 25.626718 \n", "std 19.940681 2.537714 1.874395 ... 6.728381 10.933433 \n", "min 0.000000 31.955024 1.800000 ... 1.900000 2.600000 \n", "25% 9.800000 49.321586 3.800000 ... 8.300000 17.300000 \n", "50% 18.600000 50.331365 4.800000 ... 11.300000 23.300000 \n", "75% 35.200000 51.018405 5.900000 ... 16.300000 32.200000 \n", "max 97.400000 58.100420 24.000000 ... 46.800000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1033.000000 1033.000000 1033.000000 1013.000000 1014.000000 \n", "mean 22.030591 26.236883 365.470474 213.825469 32.293654 \n", "std 4.181953 2.969353 1641.553548 61.119227 17.665726 \n", "min 8.100000 11.800000 0.100000 46.200000 -21.417069 \n", "25% 19.200000 25.000000 11.000000 169.700000 21.126119 \n", "50% 22.400000 27.100000 45.800000 212.600000 29.870988 \n", "75% 25.000000 28.400000 162.300000 251.600000 40.219742 \n", "max 38.800000 29.400000 32903.300000 445.400000 99.552895 \n", "\n", " voter_gap trump biden \n", "count 1014.000000 1014.000000 1014.000000 \n", "mean 24.426331 61.280375 36.854043 \n", "std 36.403793 18.306117 18.108292 \n", "min -90.000000 4.000000 3.100000 \n", "25% -2.200000 47.850000 21.900000 \n", "50% 27.500000 62.800000 35.200000 \n", "75% 54.300000 76.275000 49.975000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1015.000000	1015.000000	1015.000000	1.015000e+03	1015.000000	1015.000000	1015.000000	1015.000000	1015.000000	1015.000000	...	1015.000000	1015.000000	1015.000000	1015.000000	1015.000000	1002.000000	1004.000000	1003.000000	1003.000000	1003.000000
mean	17413.966502	24372.422566	226.448276	7.111427e+04	373.295567	0.016179	7.553990	18.795961	49.890443	5.326305	...	14.513399	18.695074	26.218128	31.132217	205.667586	234.567764	34.878074	38.696510	68.491426	29.794915
std	35239.630504	5506.207523	453.917323	1.464475e+05	157.158042	0.008917	11.862685	17.305463	2.238655	1.872417	...	6.318933	6.224281	3.983797	1.018496	2212.807074	54.672491	12.708142	24.848158	12.469813	12.388141
min	32.000000	6911.447084	1.000000	4.620000e+02	27.000000	0.000809	0.000000	1.500000	19.166215	1.800000	...	3.800000	4.400000	15.800000	29.400000	0.100000	82.500000	-168.323353	-65.600000	16.900000	5.600000
25%	2331.500000	21097.270656	37.000000	1.039850e+04	262.000000	0.010903	2.000000	6.000000	49.403040	4.100000	...	9.700000	14.300000	23.300000	30.200000	16.700000	199.825000	28.279638	23.300000	60.750000	20.300000
50%	6279.000000	24563.536449	98.000000	2.640300e+04	359.000000	0.014236	3.500000	12.200000	50.314825	5.100000	...	12.900000	17.600000	26.100000	31.100000	42.900000	236.200000	34.537246	42.100000	70.100000	28.000000
75%	15557.000000	27653.785878	219.500000	6.302800e+04	468.000000	0.019337	7.300000	26.850000	50.970982	6.300000	...	18.200000	21.700000	28.800000	32.000000	98.600000	269.675000	40.807564	57.600000	78.050000	37.500000
max	402352.000000	61934.129379	7728.000000	1.584063e+06	1212.000000	0.085749	99.200000	99.400000	56.633907	21.800000	...	53.300000	44.600000	41.400000	32.800000	69468.400000	425.400000	100.000000	86.000000	92.000000	82.500000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1015.000000 1015.000000 1015.000000 \n", "mean 17413.966502 24372.422566 226.448276 \n", "std 35239.630504 5506.207523 453.917323 \n", "min 32.000000 6911.447084 1.000000 \n", "25% 2331.500000 21097.270656 37.000000 \n", "50% 6279.000000 24563.536449 98.000000 \n", "75% 15557.000000 27653.785878 219.500000 \n", "max 402352.000000 61934.129379 7728.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.015000e+03 1015.000000 1015.000000 1015.000000 \n", "mean 7.111427e+04 373.295567 0.016179 7.553990 \n", "std 1.464475e+05 157.158042 0.008917 11.862685 \n", "min 4.620000e+02 27.000000 0.000809 0.000000 \n", "25% 1.039850e+04 262.000000 0.010903 2.000000 \n", "50% 2.640300e+04 359.000000 0.014236 3.500000 \n", "75% 6.302800e+04 468.000000 0.019337 7.300000 \n", "max 1.584063e+06 1212.000000 0.085749 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1015.000000 1015.000000 1015.000000 ... 1015.000000 1015.000000 \n", "mean 18.795961 49.890443 5.326305 ... 14.513399 18.695074 \n", "std 17.305463 2.238655 1.872417 ... 6.318933 6.224281 \n", "min 1.500000 19.166215 1.800000 ... 3.800000 4.400000 \n", "25% 6.000000 49.403040 4.100000 ... 9.700000 14.300000 \n", "50% 12.200000 50.314825 5.100000 ... 12.900000 17.600000 \n", "75% 26.850000 50.970982 6.300000 ... 18.200000 21.700000 \n", "max 99.400000 56.633907 21.800000 ... 53.300000 44.600000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1015.000000 1015.000000 1015.000000 1002.000000 1004.000000 \n", "mean 26.218128 31.132217 205.667586 234.567764 34.878074 \n", "std 3.983797 1.018496 2212.807074 54.672491 12.708142 \n", "min 15.800000 29.400000 0.100000 82.500000 -168.323353 \n", "25% 23.300000 30.200000 16.700000 199.825000 28.279638 \n", "50% 26.100000 31.100000 42.900000 236.200000 34.537246 \n", "75% 28.800000 32.000000 98.600000 269.675000 40.807564 \n", "max 41.400000 32.800000 69468.400000 425.400000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 1003.000000 1003.000000 1003.000000 \n", "mean 38.696510 68.491426 29.794915 \n", "std 24.848158 12.469813 12.388141 \n", "min -65.600000 16.900000 5.600000 \n", "25% 23.300000 60.750000 20.300000 \n", "50% 42.100000 70.100000 28.000000 \n", "75% 57.600000 78.050000 37.500000 \n", "max 86.000000 92.000000 82.500000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1036.000000	1036.000000	1036.000000	1.036000e+03	1036.000000	1036.000000	1036.000000	1036.000000	1036.000000	1036.000000	...	1036.000000	1036.000000	1036.000000	1036.000000	1036.000000	1026.000000	1029.000000	1029.000000	1029.000000	1029.000000
mean	11955.729730	25787.480672	178.888031	4.675872e+04	411.768340	0.016674	5.380309	23.938707	50.117482	6.053378	...	17.302220	15.871815	29.476834	35.479826	174.186969	237.092982	39.012722	35.537026	66.960641	31.423615
std	23933.119885	5361.167248	374.326997	9.641718e+04	141.464044	0.007432	8.434501	21.605540	2.295568	2.021584	...	6.487572	4.850790	4.223773	2.422581	1164.339521	48.778309	8.717395	29.339460	14.662722	14.686429
min	96.000000	9330.346798	1.000000	7.210000e+02	14.000000	0.000527	0.000000	0.400000	34.868478	1.800000	...	3.100000	5.800000	15.600000	32.800000	0.100000	61.800000	12.181303	-80.800000	8.800000	5.400000
25%	3003.250000	22698.614312	49.000000	1.222000e+04	321.750000	0.012394	1.600000	6.000000	49.643073	4.800000	...	12.275000	12.400000	26.500000	33.700000	22.950000	206.650000	33.091367	21.900000	60.000000	20.900000
50%	6047.000000	26021.133583	98.000000	2.313750e+04	411.000000	0.015669	2.800000	15.300000	50.510986	5.900000	...	16.400000	15.000000	29.500000	34.800000	46.350000	237.000000	39.270875	42.800000	70.300000	27.800000
75%	11853.500000	28860.094198	180.000000	4.566200e+04	494.000000	0.019305	5.400000	37.925000	51.268131	7.200000	...	22.200000	18.600000	32.400000	36.700000	102.325000	267.000000	44.732536	56.900000	77.500000	38.100000
max	410889.000000	72727.272727	7953.000000	1.749342e+06	996.000000	0.066603	91.800000	93.400000	56.526573	16.900000	...	40.500000	42.600000	41.300000	47.600000	35369.200000	458.300000	70.003709	87.900000	93.300000	89.600000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1036.000000 1036.000000 1036.000000 \n", "mean 11955.729730 25787.480672 178.888031 \n", "std 23933.119885 5361.167248 374.326997 \n", "min 96.000000 9330.346798 1.000000 \n", "25% 3003.250000 22698.614312 49.000000 \n", "50% 6047.000000 26021.133583 98.000000 \n", "75% 11853.500000 28860.094198 180.000000 \n", "max 410889.000000 72727.272727 7953.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.036000e+03 1036.000000 1036.000000 1036.000000 \n", "mean 4.675872e+04 411.768340 0.016674 5.380309 \n", "std 9.641718e+04 141.464044 0.007432 8.434501 \n", "min 7.210000e+02 14.000000 0.000527 0.000000 \n", "25% 1.222000e+04 321.750000 0.012394 1.600000 \n", "50% 2.313750e+04 411.000000 0.015669 2.800000 \n", "75% 4.566200e+04 494.000000 0.019305 5.400000 \n", "max 1.749342e+06 996.000000 0.066603 91.800000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1036.000000 1036.000000 1036.000000 ... 1036.000000 1036.000000 \n", "mean 23.938707 50.117482 6.053378 ... 17.302220 15.871815 \n", "std 21.605540 2.295568 2.021584 ... 6.487572 4.850790 \n", "min 0.400000 34.868478 1.800000 ... 3.100000 5.800000 \n", "25% 6.000000 49.643073 4.800000 ... 12.275000 12.400000 \n", "50% 15.300000 50.510986 5.900000 ... 16.400000 15.000000 \n", "75% 37.925000 51.268131 7.200000 ... 22.200000 18.600000 \n", "max 93.400000 56.526573 16.900000 ... 40.500000 42.600000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1036.000000 1036.000000 1036.000000 1026.000000 1029.000000 \n", "mean 29.476834 35.479826 174.186969 237.092982 39.012722 \n", "std 4.223773 2.422581 1164.339521 48.778309 8.717395 \n", "min 15.600000 32.800000 0.100000 61.800000 12.181303 \n", "25% 26.500000 33.700000 22.950000 206.650000 33.091367 \n", "50% 29.500000 34.800000 46.350000 237.000000 39.270875 \n", "75% 32.400000 36.700000 102.325000 267.000000 44.732536 \n", "max 41.300000 47.600000 35369.200000 458.300000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 1029.000000 1029.000000 1029.000000 \n", "mean 35.537026 66.960641 31.423615 \n", "std 29.339460 14.662722 14.686429 \n", "min -80.800000 8.800000 5.400000 \n", "25% 21.900000 60.000000 20.900000 \n", "50% 42.800000 70.300000 27.800000 \n", "75% 56.900000 77.500000 38.100000 \n", "max 87.900000 93.300000 89.600000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def view_partitions(df, feature, n_partitions=3, cols=None):\n", " if cols is None:\n", " cols = df.columns\n", " start = 0\n", " for i in range(n_partitions):\n", " stop = start + (1/n_partitions)\n", " display(partition_df(merged, feature,\n", " merged[feature].quantile(start),\n", " merged[feature].quantile(stop))[cols].describe())\n", " start = stop\n", "view_partitions(merged, 'obesity')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "dK1SwKuQJ7v9", "outputId": "d8612ca4-1607-4777-cecd-2a46370e4500" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1.007000e+03	1007.000000	1007.000000	1.007000e+03	1007.000000	1007.000000	1007.000000	1007.000000	1007.000000	1007.000000	...	1007.000000	1007.000000	1007.000000	1007.000000	1007.000000	992.000000	982.000000	982.000000	982.000000	982.000000
mean	4.741475e+04	23140.636087	483.670308	2.030346e+05	271.937438	0.012093	12.365045	22.890566	49.707254	4.982423	...	11.348163	26.617577	20.354816	27.668818	461.912115	213.861895	31.293156	17.149898	57.540224	40.390326
std	1.367664e+05	6443.122064	1512.796366	5.351729e+05	139.972161	0.006343	15.732390	18.824834	2.499203	1.795947	...	5.740513	10.472965	2.836158	4.213967	2770.745246	61.473242	17.404163	32.866155	16.459077	16.417551
min	6.300000e+01	3274.314819	1.000000	1.680000e+02	11.000000	0.001057	0.000000	0.000000	19.166215	1.900000	...	1.900000	2.600000	8.100000	11.800000	0.100000	46.200000	-21.417069	-90.000000	4.000000	3.100000
25%	3.455000e+03	19408.061608	38.000000	1.509950e+04	173.000000	0.007854	3.000000	8.600000	49.336073	3.800000	...	7.700000	18.700000	18.700000	25.300000	12.250000	166.950000	21.203168	-5.675000	46.025000	28.400000
50%	1.100700e+04	22993.187599	120.000000	4.758000e+04	255.000000	0.011327	5.900000	16.200000	50.208257	4.700000	...	10.100000	24.800000	21.200000	28.100000	46.900000	211.450000	28.190647	19.750000	59.000000	39.200000
75%	3.888200e+04	26299.994779	379.000000	1.662710e+05	342.000000	0.014652	14.350000	31.900000	50.863073	5.800000	...	13.100000	32.450000	22.600000	30.600000	177.150000	254.500000	37.751744	40.775000	69.275000	51.800000
max	2.751220e+06	113017.751479	31736.000000	1.003911e+07	994.000000	0.081481	94.100000	94.800000	58.100420	24.000000	...	43.900000	75.100000	23.700000	39.700000	69468.400000	445.400000	99.552895	93.100000	96.200000	94.000000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1.007000e+03 1007.000000 1007.000000 \n", "mean 4.741475e+04 23140.636087 483.670308 \n", "std 1.367664e+05 6443.122064 1512.796366 \n", "min 6.300000e+01 3274.314819 1.000000 \n", "25% 3.455000e+03 19408.061608 38.000000 \n", "50% 1.100700e+04 22993.187599 120.000000 \n", "75% 3.888200e+04 26299.994779 379.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.007000e+03 1007.000000 1007.000000 1007.000000 \n", "mean 2.030346e+05 271.937438 0.012093 12.365045 \n", "std 5.351729e+05 139.972161 0.006343 15.732390 \n", "min 1.680000e+02 11.000000 0.001057 0.000000 \n", "25% 1.509950e+04 173.000000 0.007854 3.000000 \n", "50% 4.758000e+04 255.000000 0.011327 5.900000 \n", "75% 1.662710e+05 342.000000 0.014652 14.350000 \n", "max 1.003911e+07 994.000000 0.081481 94.100000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1007.000000 1007.000000 1007.000000 ... 1007.000000 1007.000000 \n", "mean 22.890566 49.707254 4.982423 ... 11.348163 26.617577 \n", "std 18.824834 2.499203 1.795947 ... 5.740513 10.472965 \n", "min 0.000000 19.166215 1.900000 ... 1.900000 2.600000 \n", "25% 8.600000 49.336073 3.800000 ... 7.700000 18.700000 \n", "50% 16.200000 50.208257 4.700000 ... 10.100000 24.800000 \n", "75% 31.900000 50.863073 5.800000 ... 13.100000 32.450000 \n", "max 94.800000 58.100420 24.000000 ... 43.900000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1007.000000 1007.000000 1007.000000 992.000000 982.000000 \n", "mean 20.354816 27.668818 461.912115 213.861895 31.293156 \n", "std 2.836158 4.213967 2770.745246 61.473242 17.404163 \n", "min 8.100000 11.800000 0.100000 46.200000 -21.417069 \n", "25% 18.700000 25.300000 12.250000 166.950000 21.203168 \n", "50% 21.200000 28.100000 46.900000 211.450000 28.190647 \n", "75% 22.600000 30.600000 177.150000 254.500000 37.751744 \n", "max 23.700000 39.700000 69468.400000 445.400000 99.552895 \n", "\n", " voter_gap trump biden \n", "count 982.000000 982.000000 982.000000 \n", "mean 17.149898 57.540224 40.390326 \n", "std 32.866155 16.459077 16.417551 \n", "min -90.000000 4.000000 3.100000 \n", "25% -5.675000 46.025000 28.400000 \n", "50% 19.750000 59.000000 39.200000 \n", "75% 40.775000 69.275000 51.800000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1028.000000	1028.000000	1028.000000	1.028000e+03	1028.000000	1028.000000	1028.000000	1028.000000	1028.000000	1028.000000	...	1028.000000	1028.000000	1028.000000	1028.000000	1028.000000	1009.000000	1017.000000	1016.000000	1016.000000	1016.000000
mean	19504.956226	23995.600128	270.469844	7.753021e+04	392.975681	0.017394	10.354864	22.804086	49.936301	5.255837	...	14.964105	18.713132	25.842802	31.003794	183.769747	229.305055	34.853118	37.605906	67.993012	30.387106
std	51624.583537	5760.949741	767.045525	1.961051e+05	165.631212	0.009503	15.641803	20.733552	2.292643	1.910234	...	6.407062	5.868454	1.219701	3.033814	1134.003874	54.053794	12.759806	27.366819	13.721102	13.655311
min	35.000000	6103.922159	1.000000	4.860000e+02	14.000000	0.000809	0.000000	0.200000	32.813627	1.800000	...	3.100000	4.400000	23.800000	22.400000	0.100000	70.000000	-168.323353	-76.800000	11.200000	3.900000
25%	2028.000000	20493.710413	34.000000	8.923250e+03	283.000000	0.011700	2.100000	6.400000	49.342871	4.100000	...	10.500000	14.500000	24.900000	28.700000	13.350000	194.300000	28.566943	22.500000	60.175000	20.300000
50%	5230.000000	24010.922870	87.000000	2.177150e+04	378.000000	0.015793	4.200000	14.500000	50.324591	5.000000	...	13.400000	17.900000	25.800000	31.000000	40.100000	230.700000	34.537328	41.200000	69.800000	28.450000
75%	14111.500000	27195.090081	202.250000	5.723600e+04	480.000000	0.020755	10.600000	35.025000	51.076514	6.200000	...	18.100000	21.600000	26.800000	33.100000	107.425000	263.200000	40.992547	58.100000	78.200000	38.025000
max	714053.000000	61934.129379	12823.000000	2.559902e+06	1229.000000	0.092937	99.200000	99.400000	56.633907	21.800000	...	53.300000	46.300000	28.000000	42.000000	32903.300000	433.900000	100.000000	92.000000	95.900000	88.000000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1028.000000 1028.000000 1028.000000 \n", "mean 19504.956226 23995.600128 270.469844 \n", "std 51624.583537 5760.949741 767.045525 \n", "min 35.000000 6103.922159 1.000000 \n", "25% 2028.000000 20493.710413 34.000000 \n", "50% 5230.000000 24010.922870 87.000000 \n", "75% 14111.500000 27195.090081 202.250000 \n", "max 714053.000000 61934.129379 12823.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.028000e+03 1028.000000 1028.000000 1028.000000 \n", "mean 7.753021e+04 392.975681 0.017394 10.354864 \n", "std 1.961051e+05 165.631212 0.009503 15.641803 \n", "min 4.860000e+02 14.000000 0.000809 0.000000 \n", "25% 8.923250e+03 283.000000 0.011700 2.100000 \n", "50% 2.177150e+04 378.000000 0.015793 4.200000 \n", "75% 5.723600e+04 480.000000 0.020755 10.600000 \n", "max 2.559902e+06 1229.000000 0.092937 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1028.000000 1028.000000 1028.000000 ... 1028.000000 1028.000000 \n", "mean 22.804086 49.936301 5.255837 ... 14.964105 18.713132 \n", "std 20.733552 2.292643 1.910234 ... 6.407062 5.868454 \n", "min 0.200000 32.813627 1.800000 ... 3.100000 4.400000 \n", "25% 6.400000 49.342871 4.100000 ... 10.500000 14.500000 \n", "50% 14.500000 50.324591 5.000000 ... 13.400000 17.900000 \n", "75% 35.025000 51.076514 6.200000 ... 18.100000 21.600000 \n", "max 99.400000 56.633907 21.800000 ... 53.300000 46.300000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1028.000000 1028.000000 1028.000000 1009.000000 1017.000000 \n", "mean 25.842802 31.003794 183.769747 229.305055 34.853118 \n", "std 1.219701 3.033814 1134.003874 54.053794 12.759806 \n", "min 23.800000 22.400000 0.100000 70.000000 -168.323353 \n", "25% 24.900000 28.700000 13.350000 194.300000 28.566943 \n", "50% 25.800000 31.000000 40.100000 230.700000 34.537328 \n", "75% 26.800000 33.100000 107.425000 263.200000 40.992547 \n", "max 28.000000 42.000000 32903.300000 433.900000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 1016.000000 1016.000000 1016.000000 \n", "mean 37.605906 67.993012 30.387106 \n", "std 27.366819 13.721102 13.655311 \n", "min -76.800000 11.200000 3.900000 \n", "25% 22.500000 60.175000 20.300000 \n", "50% 41.200000 69.800000 28.450000 \n", "75% 58.100000 78.200000 38.025000 \n", "max 92.000000 95.900000 88.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	hispanic	minority	female	unemployed	...	nodegree	bachelor	inactivity	obesity	density	cancer	voter_turnout	voter_gap	trump	biden
count	1018.000000	1018.000000	1018.000000	1.018000e+03	1018.000000	1018.000000	1018.000000	1018.000000	1018.000000	1018.000000	...	1018.000000	1018.000000	1018.000000	1018.000000	1018.000000	1010.000000	1017.000000	1017.000000	1017.000000	1017.000000
mean	10061.785855	25805.500407	163.643418	3.817520e+04	433.435167	0.017819	4.975049	22.328193	50.173025	6.245678	...	18.565914	14.874361	31.541159	34.180157	105.106189	242.664257	40.216906	43.029695	70.764503	27.734808
std	27440.411056	5519.297680	481.803986	9.681484e+04	134.701725	0.008467	7.262961	20.043183	2.303834	1.958725	...	5.977301	4.333156	2.735776	3.406723	212.876177	48.768550	8.482994	27.054607	13.510946	13.552863
min	32.000000	6911.447084	1.000000	4.620000e+02	14.000000	0.000527	0.000000	0.400000	35.150327	1.800000	...	3.900000	5.800000	28.000000	23.800000	0.300000	94.400000	1.730104	-71.600000	13.500000	5.000000
25%	2675.000000	22639.004381	48.000000	1.110300e+04	353.000000	0.013173	1.500000	5.800000	49.796348	5.000000	...	14.200000	11.800000	29.300000	31.900000	24.525000	212.425000	34.466113	33.900000	65.900000	18.800000
50%	5405.000000	26231.941999	92.000000	2.070000e+04	431.000000	0.016078	2.600000	15.000000	50.616007	6.100000	...	18.300000	14.300000	31.000000	33.800000	46.450000	242.650000	40.067710	50.200000	74.200000	24.100000
75%	10497.250000	29331.867235	169.000000	4.047700e+04	521.000000	0.020233	5.100000	34.900000	51.302674	7.400000	...	22.600000	17.300000	33.375000	36.300000	93.525000	272.075000	45.901233	60.900000	79.800000	32.400000
max	661165.000000	53241.683639	11854.000000	2.253857e+06	996.000000	0.072477	63.500000	91.200000	56.633907	16.900000	...	40.500000	36.900000	41.400000	47.600000	2800.000000	458.300000	70.003709	88.300000	93.300000	85.100000

\n", "

8 rows × 21 columns

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1018.000000 1018.000000 1018.000000 \n", "mean 10061.785855 25805.500407 163.643418 \n", "std 27440.411056 5519.297680 481.803986 \n", "min 32.000000 6911.447084 1.000000 \n", "25% 2675.000000 22639.004381 48.000000 \n", "50% 5405.000000 26231.941999 92.000000 \n", "75% 10497.250000 29331.867235 169.000000 \n", "max 661165.000000 53241.683639 11854.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.018000e+03 1018.000000 1018.000000 1018.000000 \n", "mean 3.817520e+04 433.435167 0.017819 4.975049 \n", "std 9.681484e+04 134.701725 0.008467 7.262961 \n", "min 4.620000e+02 14.000000 0.000527 0.000000 \n", "25% 1.110300e+04 353.000000 0.013173 1.500000 \n", "50% 2.070000e+04 431.000000 0.016078 2.600000 \n", "75% 4.047700e+04 521.000000 0.020233 5.100000 \n", "max 2.253857e+06 996.000000 0.072477 63.500000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1018.000000 1018.000000 1018.000000 ... 1018.000000 1018.000000 \n", "mean 22.328193 50.173025 6.245678 ... 18.565914 14.874361 \n", "std 20.043183 2.303834 1.958725 ... 5.977301 4.333156 \n", "min 0.400000 35.150327 1.800000 ... 3.900000 5.800000 \n", "25% 5.800000 49.796348 5.000000 ... 14.200000 11.800000 \n", "50% 15.000000 50.616007 6.100000 ... 18.300000 14.300000 \n", "75% 34.900000 51.302674 7.400000 ... 22.600000 17.300000 \n", "max 91.200000 56.633907 16.900000 ... 40.500000 36.900000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1018.000000 1018.000000 1018.000000 1010.000000 1017.000000 \n", "mean 31.541159 34.180157 105.106189 242.664257 40.216906 \n", "std 2.735776 3.406723 212.876177 48.768550 8.482994 \n", "min 28.000000 23.800000 0.300000 94.400000 1.730104 \n", "25% 29.300000 31.900000 24.525000 212.425000 34.466113 \n", "50% 31.000000 33.800000 46.450000 242.650000 40.067710 \n", "75% 33.375000 36.300000 93.525000 272.075000 45.901233 \n", "max 41.400000 47.600000 2800.000000 458.300000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 1017.000000 1017.000000 1017.000000 \n", "mean 43.029695 70.764503 27.734808 \n", "std 27.054607 13.510946 13.552863 \n", "min -71.600000 13.500000 5.000000 \n", "25% 33.900000 65.900000 18.800000 \n", "50% 50.200000 74.200000 24.100000 \n", "75% 60.900000 79.800000 32.400000 \n", "max 88.300000 93.300000 85.100000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'inactivity')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "nyexzlhQJ-7K", "outputId": "3ddab391-c0b1-42f0-b209-e423d2121da9" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity	trump	biden
count	1007.000000	1007.000000	1007.000000	1007.000000	1.007000e+03	1007.000000	1007.000000	1007.000000	1007.000000	1002.000000	1002.000000
mean	35886.695134	8288.693148	25014.583343	142.903674	3.293902e+04	464.244290	0.020079	32.942304	29.309037	67.414072	31.157685
std	3799.138075	20595.465992	6509.586741	360.681418	8.483606e+04	157.840313	0.010388	4.450842	4.676727	15.976406	15.928395
min	21658.000000	35.000000	7186.858316	1.000000	4.030000e+02	48.000000	0.002333	17.400000	13.600000	11.200000	7.100000
25%	33575.000000	2237.500000	20856.173791	45.000000	9.733000e+03	367.500000	0.014051	30.100000	25.900000	58.600000	18.925000
50%	36566.000000	4471.000000	25428.263215	81.000000	1.778100e+04	458.000000	0.017502	32.900000	29.600000	72.200000	26.300000
75%	38891.000000	8407.500000	29159.082843	143.500000	3.193600e+04	551.500000	0.022740	35.900000	32.800000	79.600000	39.975000
max	41150.000000	402352.000000	56864.875543	7728.000000	1.584063e+06	1229.000000	0.092937	47.600000	41.400000	92.600000	88.000000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1007.000000 1007.000000 \n", "mean 35886.695134 8288.693148 25014.583343 \n", "std 3799.138075 20595.465992 6509.586741 \n", "min 21658.000000 35.000000 7186.858316 \n", "25% 33575.000000 2237.500000 20856.173791 \n", "50% 36566.000000 4471.000000 25428.263215 \n", "75% 38891.000000 8407.500000 29159.082843 \n", "max 41150.000000 402352.000000 56864.875543 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 142.903674 3.293902e+04 464.244290 \n", "std 360.681418 8.483606e+04 157.840313 \n", "min 1.000000 4.030000e+02 48.000000 \n", "25% 45.000000 9.733000e+03 367.500000 \n", "50% 81.000000 1.778100e+04 458.000000 \n", "75% 143.500000 3.193600e+04 551.500000 \n", "max 7728.000000 1.584063e+06 1229.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 1002.000000 1002.000000 \n", "mean 0.020079 32.942304 29.309037 67.414072 31.157685 \n", "std 0.010388 4.450842 4.676727 15.976406 15.928395 \n", "min 0.002333 17.400000 13.600000 11.200000 7.100000 \n", "25% 0.014051 30.100000 25.900000 58.600000 18.925000 \n", "50% 0.017502 32.900000 29.600000 72.200000 26.300000 \n", "75% 0.022740 35.900000 32.800000 79.600000 39.975000 \n", "max 0.092937 47.600000 41.400000 92.600000 88.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity	trump	biden
count	1007.000000	1.007000e+03	1007.000000	1007.000000	1.007000e+03	1007.000000	1007.000000	1007.000000	1007.000000	998.000000	998.000000
mean	45378.542205	2.117232e+04	24204.352233	261.272095	8.303669e+04	356.008937	0.015158	30.918272	25.678153	67.789178	30.421844
std	2620.626459	6.098195e+04	5334.500991	680.338463	2.086682e+05	139.711464	0.006599	3.699632	4.176813	13.072600	12.934935
min	41161.000000	3.200000e+01	6911.447084	1.000000	4.620000e+02	14.000000	0.000809	14.800000	11.200000	14.100000	5.000000
25%	43107.500000	2.489500e+03	20834.832707	36.000000	1.075750e+04	259.500000	0.010992	28.700000	23.000000	60.525000	21.000000
50%	45207.000000	6.889000e+03	24120.603015	100.000000	2.887900e+04	359.000000	0.014220	31.100000	25.700000	70.300000	27.900000
75%	47539.500000	1.574600e+04	27296.568873	229.000000	6.443850e+04	447.000000	0.018293	33.400000	28.300000	77.300000	37.600000
max	50134.000000	1.209302e+06	61745.301879	12823.000000	2.716939e+06	994.000000	0.048193	41.800000	37.800000	94.000000	85.000000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 45378.542205 2.117232e+04 24204.352233 \n", "std 2620.626459 6.098195e+04 5334.500991 \n", "min 41161.000000 3.200000e+01 6911.447084 \n", "25% 43107.500000 2.489500e+03 20834.832707 \n", "50% 45207.000000 6.889000e+03 24120.603015 \n", "75% 47539.500000 1.574600e+04 27296.568873 \n", "max 50134.000000 1.209302e+06 61745.301879 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 261.272095 8.303669e+04 356.008937 \n", "std 680.338463 2.086682e+05 139.711464 \n", "min 1.000000 4.620000e+02 14.000000 \n", "25% 36.000000 1.075750e+04 259.500000 \n", "50% 100.000000 2.887900e+04 359.000000 \n", "75% 229.000000 6.443850e+04 447.000000 \n", "max 12823.000000 2.716939e+06 994.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 998.000000 998.000000 \n", "mean 0.015158 30.918272 25.678153 67.789178 30.421844 \n", "std 0.006599 3.699632 4.176813 13.072600 12.934935 \n", "min 0.000809 14.800000 11.200000 14.100000 5.000000 \n", "25% 0.010992 28.700000 23.000000 60.525000 21.000000 \n", "50% 0.014220 31.100000 25.700000 70.300000 27.900000 \n", "75% 0.018293 33.400000 28.300000 77.300000 37.600000 \n", "max 0.048193 41.800000 37.800000 94.000000 85.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity	trump	biden
count	1007.000000	1.007000e+03	1007.000000	1007.000000	1.007000e+03	1007.000000	1007.000000	1007.000000	1007.000000	983.000000	983.000000
mean	60131.244290	4.693174e+04	23724.142668	502.025819	2.008166e+05	278.285998	0.012043	28.980735	22.745283	61.185453	36.840793
std	10901.921415	1.326955e+05	6079.872377	1543.128483	5.287700e+05	133.159375	0.006049	4.308677	4.360248	16.942304	16.875312
min	50154.000000	6.300000e+01	3274.314819	1.000000	1.680000e+02	11.000000	0.000527	11.800000	8.100000	4.000000	3.100000
25%	52596.000000	3.365500e+03	20339.539455	40.000000	1.461700e+04	182.500000	0.008142	26.800000	20.100000	50.700000	24.650000
50%	56270.000000	1.146300e+04	23930.162156	127.000000	4.890400e+04	270.000000	0.011547	29.400000	22.900000	63.100000	35.000000
75%	62830.500000	4.015200e+04	26817.374363	414.000000	1.692480e+05	351.000000	0.014857	32.100000	25.700000	73.400000	46.950000
max	125635.000000	2.751220e+06	113017.751479	31736.000000	1.003911e+07	1212.000000	0.081481	39.700000	37.800000	96.200000	94.000000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 60131.244290 4.693174e+04 23724.142668 \n", "std 10901.921415 1.326955e+05 6079.872377 \n", "min 50154.000000 6.300000e+01 3274.314819 \n", "25% 52596.000000 3.365500e+03 20339.539455 \n", "50% 56270.000000 1.146300e+04 23930.162156 \n", "75% 62830.500000 4.015200e+04 26817.374363 \n", "max 125635.000000 2.751220e+06 113017.751479 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 502.025819 2.008166e+05 278.285998 \n", "std 1543.128483 5.287700e+05 133.159375 \n", "min 1.000000 1.680000e+02 11.000000 \n", "25% 40.000000 1.461700e+04 182.500000 \n", "50% 127.000000 4.890400e+04 270.000000 \n", "75% 414.000000 1.692480e+05 351.000000 \n", "max 31736.000000 1.003911e+07 1212.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 983.000000 983.000000 \n", "mean 0.012043 28.980735 22.745283 61.185453 36.840793 \n", "std 0.006049 4.308677 4.360248 16.942304 16.875312 \n", "min 0.000527 11.800000 8.100000 4.000000 3.100000 \n", "25% 0.008142 26.800000 20.100000 50.700000 24.650000 \n", "50% 0.011547 29.400000 22.900000 63.100000 35.000000 \n", "75% 0.014857 32.100000 25.700000 73.400000 46.950000 \n", "max 0.081481 39.700000 37.800000 96.200000 94.000000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'income',\n", " cols=['income','# total covid cases', '# covid cases per 100k', '# covid deaths',\n", " 'population', '# covid deaths per 100k', 'case_fatality_rate',\n", " 'obesity', 'inactivity', 'trump', 'biden'])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "wU_sFjs5KA0D", "outputId": "93f56cf7-4681-4961-b454-4b7733b15626" }, "outputs": [ { "data": { "text/html": [ "\n", "

\n", "

	trump	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity
count	995.000000	995.000000	9.950000e+02	995.000000	995.000000	9.950000e+02	995.000000	995.000000	995.000000	995.000000
mean	47.241709	50299.426131	5.845852e+04	23489.186345	646.194975	2.454354e+05	305.234171	0.013401	29.725327	23.284322
std	11.340094	15572.223128	1.435519e+05	5632.102950	1677.245394	5.579268e+05	157.485499	0.007672	5.660119	5.411057
min	4.000000	21658.000000	1.840000e+02	7433.739051	2.000000	7.680000e+02	11.000000	0.001057	11.800000	8.100000
25%	40.250000	39014.500000	5.238500e+03	20108.638558	69.500000	2.243150e+04	190.500000	0.008494	26.200000	19.700000
50%	50.100000	48413.000000	1.591700e+04	23626.943005	169.000000	6.945000e+04	281.000000	0.012284	29.700000	23.000000
75%	56.700000	57390.000000	5.389150e+04	26781.198391	581.500000	2.343480e+05	389.000000	0.016087	33.200000	26.500000
max	61.000000	125635.000000	2.751220e+06	60447.419089	31736.000000	1.003911e+07	1064.000000	0.066603	47.600000	40.800000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 995.000000 995.000000 9.950000e+02 995.000000 \n", "mean 47.241709 50299.426131 5.845852e+04 23489.186345 \n", "std 11.340094 15572.223128 1.435519e+05 5632.102950 \n", "min 4.000000 21658.000000 1.840000e+02 7433.739051 \n", "25% 40.250000 39014.500000 5.238500e+03 20108.638558 \n", "50% 50.100000 48413.000000 1.591700e+04 23626.943005 \n", "75% 56.700000 57390.000000 5.389150e+04 26781.198391 \n", "max 61.000000 125635.000000 2.751220e+06 60447.419089 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 995.000000 9.950000e+02 995.000000 \n", "mean 646.194975 2.454354e+05 305.234171 \n", "std 1677.245394 5.579268e+05 157.485499 \n", "min 2.000000 7.680000e+02 11.000000 \n", "25% 69.500000 2.243150e+04 190.500000 \n", "50% 169.000000 6.945000e+04 281.000000 \n", "75% 581.500000 2.343480e+05 389.000000 \n", "max 31736.000000 1.003911e+07 1064.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 995.000000 995.000000 995.000000 \n", "mean 0.013401 29.725327 23.284322 \n", "std 0.007672 5.660119 5.411057 \n", "min 0.001057 11.800000 8.100000 \n", "25% 0.008494 26.200000 19.700000 \n", "50% 0.012284 29.700000 23.000000 \n", "75% 0.016087 33.200000 26.500000 \n", "max 0.066603 47.600000 40.800000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	trump	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity
count	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000
mean	68.313831	46840.147264	12720.003980	24710.531247	174.843781	50517.506468	377.168159	0.016190	31.632537	26.274229
std	3.905655	9379.280941	17651.768209	5359.878850	224.233632	66151.583773	144.713271	0.008867	3.556402	4.149821
min	61.000000	25768.000000	85.000000	3274.314819	1.000000	403.000000	14.000000	0.000527	19.500000	14.100000
25%	65.200000	40422.000000	3191.000000	21552.770707	49.000000	13260.000000	279.000000	0.011377	29.400000	23.300000
50%	68.600000	45719.000000	7100.000000	25026.614998	106.000000	28529.000000	371.000000	0.014282	31.700000	26.000000
75%	71.700000	51839.000000	14953.000000	28180.451997	216.000000	60353.000000	456.000000	0.018604	34.000000	29.100000
max	74.500000	97936.000000	202690.000000	55122.917010	2964.000000	636234.000000	1229.000000	0.085749	42.300000	39.900000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 1005.000000 \n", "mean 68.313831 46840.147264 12720.003980 24710.531247 \n", "std 3.905655 9379.280941 17651.768209 5359.878850 \n", "min 61.000000 25768.000000 85.000000 3274.314819 \n", "25% 65.200000 40422.000000 3191.000000 21552.770707 \n", "50% 68.600000 45719.000000 7100.000000 25026.614998 \n", "75% 71.700000 51839.000000 14953.000000 28180.451997 \n", "max 74.500000 97936.000000 202690.000000 55122.917010 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 174.843781 50517.506468 377.168159 \n", "std 224.233632 66151.583773 144.713271 \n", "min 1.000000 403.000000 14.000000 \n", "25% 49.000000 13260.000000 279.000000 \n", "50% 106.000000 28529.000000 371.000000 \n", "75% 216.000000 60353.000000 456.000000 \n", "max 2964.000000 636234.000000 1229.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 0.016190 31.632537 26.274229 \n", "std 0.008867 3.556402 4.149821 \n", "min 0.000527 19.500000 14.100000 \n", "25% 0.011377 29.400000 23.300000 \n", "50% 0.014282 31.700000 26.000000 \n", "75% 0.018604 34.000000 29.100000 \n", "max 0.085749 42.300000 39.900000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "

\n", "

	trump	income	# total covid cases	# covid cases per 100k	# covid deaths	population	# covid deaths per 100k	case_fatality_rate	obesity	inactivity
count	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000	1005.000000
mean	80.786965	43878.374129	5059.204975	24627.390904	85.427861	20120.834826	422.151244	0.017918	31.537114	28.320597
std	4.289503	9102.326701	6019.201974	6383.615594	103.655065	23113.800617	162.744774	0.008465	3.592986	4.534053
min	74.500000	23047.000000	32.000000	4916.885879	1.000000	168.000000	27.000000	0.001300	19.400000	13.600000
25%	77.500000	37417.000000	1364.000000	20633.467594	22.000000	5925.000000	327.000000	0.012937	28.900000	25.000000
50%	80.100000	42310.000000	3257.000000	24538.953707	54.000000	13287.000000	422.000000	0.016549	31.500000	27.900000
75%	83.600000	49184.000000	6730.000000	28499.655884	113.000000	25924.000000	525.000000	0.021352	33.900000	31.900000
max	96.200000	86354.000000	57505.000000	113017.751479	1435.000000	223233.000000	1212.000000	0.092937	43.200000	41.400000

\n", "

\n", " \n", " \n", " \n", "\n", " \n", "

\n", "

\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 1005.000000 \n", "mean 80.786965 43878.374129 5059.204975 24627.390904 \n", "std 4.289503 9102.326701 6019.201974 6383.615594 \n", "min 74.500000 23047.000000 32.000000 4916.885879 \n", "25% 77.500000 37417.000000 1364.000000 20633.467594 \n", "50% 80.100000 42310.000000 3257.000000 24538.953707 \n", "75% 83.600000 49184.000000 6730.000000 28499.655884 \n", "max 96.200000 86354.000000 57505.000000 113017.751479 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 85.427861 20120.834826 422.151244 \n", "std 103.655065 23113.800617 162.744774 \n", "min 1.000000 168.000000 27.000000 \n", "25% 22.000000 5925.000000 327.000000 \n", "50% 54.000000 13287.000000 422.000000 \n", "75% 113.000000 25924.000000 525.000000 \n", "max 1435.000000 223233.000000 1212.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 0.017918 31.537114 28.320597 \n", "std 0.008465 3.592986 4.534053 \n", "min 0.001300 19.400000 13.600000 \n", "25% 0.012937 28.900000 25.000000 \n", "50% 0.016549 31.500000 27.900000 \n", "75% 0.021352 33.900000 31.900000 \n", "max 0.092937 43.200000 41.400000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'trump',\n", " cols=['trump','income','# total covid cases', '# covid cases per 100k', '# covid deaths',\n", " 'population', '# covid deaths per 100k', 'case_fatality_rate',\n", " 'obesity', 'inactivity',])" ] }, { "cell_type": "markdown", "metadata": { "id": "8kcwLx7LKIhU" }, "source": [ "We split the data into equal lower, middle, and upper quantiles based on first obeity and then inactivity. We can see that the the average death rates of counties in these partitions is positivly correlated with both of these features. This was expected as preexisting health conditions (obescity) and heath risks (inactivity) increase all cause mortality but also have a strong effect on how serious a covid infection can be. Finally we see that income has an even stronger relationship with the death rate, though here the correlation is a negative one. Obesity and inactivity are both negatively correlated with income as well. The relationship between voting for Trump and income is not a string one strong, though there is a positive correlation between Trump voting and obesity, inactivity, and covid death rate. " ] }, { "cell_type": "markdown", "metadata": { "id": "onZEIEqYKVAn" }, "source": [ "# 6. Data Weakness\n", "\n", "We can tell from comparing the populations between the groups that this data is not treated granularly as would be ideal. Very small population counties that get weighted the same as very large population counties in regards to the mean. So rural areas get over represented in the averages nationwide. This also explains why the Trump vs Biden is so far skewed from the actual well known national average based on popular votes.\n", "\n", "Also, the difference between income correlates more with population density than it might with an individual socio economic status. First, a higher income might not go as far towards standard of living in the city as it does in rural areas. Second, by using the average income over the whole county, income inequality in that county is not factored in. There could be many low income individuals living with many high income individuals in the same county. " ] }, { "cell_type": "markdown", "metadata": { "id": "oWmVkVxeK3yd" }, "source": [ "# 7. Future Work\n", "\n", "We have done data gathing, parsing, and exploring data. We can continue to predict some behavior of the data (e.g. how a particular county will respond to COVID on a weekly basis).\n", "\n", "Alternatively, we could be interested in inference, whereby we are more concerned with trying to understand why and how a system behaves the way it does. We might wish to understand which factors most correlate and cause a certain event to happen. This could give us insights into where certain inequalities persist." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "project-notebook.ipynb", "provenance": [], "toc_visible": true }, "interpreter": { "hash": "6136f57e9522e1f7c9a1d00a6950a9c42424b667ef418888919cda0b3f236728" }, "kernelspec": { "display_name": "Python 3.8.8 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 0 }