{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "BaVNFpZJgECX" }, "source": [ "# COVID-19 Data Collection and Analysis" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "dLS-vnAJkZRc", "outputId": "59cc19f8-6b60-4e19-8d95-de0aff4428ba" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount(\"/content/drive\")" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "7PmTSUTFhqg6" }, "outputs": [], "source": [ "import re\n", "import requests\n", "import pandas as pd\n", "import numpy as np\n", "from time import sleep\n", "from bs4 import BeautifulSoup\n", "import pickle \n", "from typing import Optional" ] }, { "cell_type": "markdown", "metadata": { "id": "cmZHCw15fh_O" }, "source": [ "# 1. Data Collection\n", "\n", "We are interested in the degree to which the SARS-CoV-2 virus has affected United States citizens (SARS-CoV-2 is the virus that causes the COVID-19 disease). The Centers for Disease Control and Prevention (CDC) provides relevant data from USAFacts.org that includes the number of confirmed COVID-19 cases on a per-county basis. At the bottom of the web page (https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/), a blue table provides a list of states, each of which has its own web page displaying the reported numbers of cases and deaths.\n", "\n", "We automatically downloaded each state's data with Requests and then manipulated it with BeautifulSoup. Specifically, we first fetched the web page located at `base_url` and save the request's returned object (a respond object) to `home_page`. We then used the BeautifulSoup object to parse the home page as an HTML document in order to extract the link for every state. With these extracted URLs, we populated a `state_urls` dictionary by setting each key to be the state name and the value to be the full URL. To avoid download state web pages multiple times frequently, we iterated through the `state_urls`, make a web request for each URL, and save the contents out to a file on the hard drive." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "28WF-Ye9kzsh" }, "outputs": [], "source": [ "# Every state's url begins with this prefix\n", "base_url = 'https://usafacts.org/visualizations/coronavirus-covid-19-spread-map/'\n", "# Datasets will be saved to this directory\n", "state_dir = \"./drive/MyDrive/state_data/\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "JdtBOR97ffGn", "outputId": "47f3d8e9-c96d-4283-9bff-8dd62f295bee" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "200\n", "US COVID-19 cases and deaths by state | USAFacts Optional[int]:\n", " try:\n", " return population_dict.get(state).get(county)\n", " except AttributeError:\n", " print('incorrect state name!')\n", " return None\n", "\n", "def load_covid_data(state_info):\n", " covid_data = {}\n", " for (state, state_path) in state_info:\n", " covid_data[state] = []\n", " with open(state_path, 'r') as f:\n", " soup = BeautifulSoup(f.read(), 'html.parser')\n", " counties = soup.find_all('a', href=re.compile('county/'))\n", " for c in counties:\n", " row = c.find_parent('tr')\n", " cols = [col.text.replace(',','') for col in row.find_all('td')]\n", "\n", " county_name = c.text\n", " pop = get_pop(state, county_name)\n", " if ((get_pop(state, county_name)) is None) or (pop == 0):\n", " continue\n", " covid_data[state].append({'county_name': county_name,\n", " 'population': pop,\n", " '7_day_avg_cases': float(cols[0]),\n", " '7_date_ave_deaths': float(cols[1]),\n", " 'cases': int(cols[2]),\n", " 'deaths': int(cols[3])})\n", " return covid_data" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "Ckoiwt0m2KKQ" }, "outputs": [], "source": [ "state_info = [(state, state_dir + state) for state in state_urls.keys()]\n", "covid_data = load_covid_data(state_info)" ] }, { "cell_type": "markdown", "metadata": { "id": "WSEPIePT6U62" }, "source": [ "# 3. Exploratory Data Analysis (EDA) I \n", "\n", "We first observed the single-most extreme counties and states, then inspected all states, after having sorted the data by some features." ] }, { "cell_type": "markdown", "metadata": { "id": "d3KPJed-3XX4" }, "source": [ "\n", "We computed \n", "1. The single county (and the state to which it belongs) that has the lowest rate of COVID cases per 100k people.\n", "1. The single county (and the state to which it belongs) that has the highest rate of COVID cases per 100k people." ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "8IOTzSlb3bw3", "outputId": "87f9c267-73fd-4b7b-d979-e782a4b31653" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hoonah-Angoon Census Area (Alaska) has the lowest COVID cases per 100k: 0.0\n", "Loving County (Texas) has the highest COVID cases per 100k: 115976.33\n" ] } ], "source": [ "def calculate_county_stats(covid_data):\n", " \n", " min_county_count = 999999\n", " min_county_name = \"\"\n", " max_county_count = -1\n", " max_county_name = \"\"\n", " \n", " # looks through every county in every state, while checking if we have a new low or high\n", " for state in covid_data.keys():\n", " for county in covid_data[state]:\n", " pop = county['population']\n", " if (pop is None) or (pop == 0):\n", " continue\n", " covid_rate = round(county['cases'] / (pop/100000),2)\n", " if covid_rate < min_county_count:\n", " min_county_count = covid_rate\n", " min_county_name = county['county_name'] + \" (\" + state + \")\"\n", " if covid_rate > max_county_count:\n", " max_county_count = covid_rate\n", " max_county_name = county['county_name'] + \" (\" + state + \")\"\n", "\n", " print(min_county_name + \" has the lowest COVID cases per 100k: \" + str(float(min_county_count)))\n", " print(max_county_name + \" has the highest COVID cases per 100k: \" + str(float(max_county_count))) \n", "\n", "calculate_county_stats(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "uMLxRaVT5WQf" }, "source": [ "We calculated\n", "1. The state that has the lowest number of deaths\n", "1. The state that has the highest number of deaths\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "1PrUM8B05crZ", "outputId": "0a8b05fd-cec5-46b4-c8f6-a7296efcfa1b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vermont has the fewest COVID deaths: 646\n", "California has the most COVID deaths: 89931\n" ] } ], "source": [ "def calculate_state_deaths(covid_data):\n", " \n", " min_state_deaths = 999999\n", " min_state_name = \"\"\n", " max_state_deaths = -1\n", " max_state_name = \"\"\n", " for state in covid_data.keys():\n", " cur_state_count = 0\n", " for county in covid_data[state]:\n", " cur_state_count += county['deaths']\n", " \n", " if cur_state_count < min_state_deaths:\n", " min_state_deaths = cur_state_count\n", " min_state_name = state\n", " if cur_state_count > max_state_deaths:\n", " max_state_deaths = cur_state_count\n", " max_state_name = state\n", "\n", " print(min_state_name + \" has the fewest COVID deaths: \" + str(min_state_deaths))\n", " print(max_state_name + \" has the most COVID deaths: \" + str(max_state_deaths)) \n", "\n", "calculate_state_deaths(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "Trsm-caH59xZ" }, "source": [ "We calculated\n", "1. The state that has the lowest rate of deaths based on its entire population\n", "1. The state that has the highest rate of deaths based on its entire population\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7uxLpphX59jM", "outputId": "7ce6b554-af5c-4794-8f2d-3e62fec13975" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hawaii has the lowest COVID death rate; 1 out of every 996 people has died\n", "Mississippi has the highest COVID death rate; 1 out of every 239 people has died\n" ] } ], "source": [ "def calculate_state_deathrate(covid_data):\n", " \n", " min_state_death_rate = -1\n", " min_state_name = \"\"\n", " max_state_death_rate = 9999999\n", " max_state_name = \"\"\n", " \n", " for state in covid_data.keys():\n", " cur_state_deaths = 0\n", " cur_state_population = 0\n", " for county in covid_data[state]:\n", " pop = county['population']\n", " if (county['cases'] > 0) and (pop is not None):\n", " cur_state_population += pop\n", " cur_state_deaths += county['deaths']\n", " \n", " cur_state_deathrate = float(cur_state_population) / cur_state_deaths\n", " \n", " if cur_state_deathrate > min_state_death_rate:\n", " min_state_death_rate = cur_state_deathrate\n", " min_state_name = state\n", " if cur_state_deathrate < max_state_death_rate:\n", " max_state_death_rate = cur_state_deathrate\n", " max_state_name = state\n", " \n", " print(min_state_name + \" has the lowest COVID death rate; 1 out of every \" + str(round(min_state_death_rate)) + \" people has died\")\n", " print(max_state_name + \" has the highest COVID death rate; 1 out of every \" + str(round(max_state_death_rate)) + \" people has died\")\n", "\n", "calculate_state_deathrate(covid_data)" ] }, { "cell_type": "markdown", "metadata": { "id": "x7Qo8Ozv7eXz" }, "source": [ "Complicated analysis requires a better data structure like pandas dataframe. We now convert the previous dictionary of lists of dictionaries to a pandas dataframe. Each row corresponds to a unique county. Five columns are county, state, # total covid cases (integer), # covid case per 100k (float), and # covid deaths (integer)." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 224 }, "id": "52E_9h4Z7SMv", "outputId": "2df05e69-e74c-4351-9d50-a4110a5e31fd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3118, 5)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countystate# total covid cases# covid cases per 100k# covid deaths
0Autauga CountyAlabama1586328393.205534216
1Baldwin CountyAlabama5586225023.965883681
2Barbour CountyAlabama568123013.04383198
3Bibb CountyAlabama645728833.616147105
4Blount CountyAlabama1500525948.535261243
\n", "
" ], "text/plain": [ " county state # total covid cases # covid cases per 100k \\\n", "0 Autauga County Alabama 15863 28393.205534 \n", "1 Baldwin County Alabama 55862 25023.965883 \n", "2 Barbour County Alabama 5681 23013.043831 \n", "3 Bibb County Alabama 6457 28833.616147 \n", "4 Blount County Alabama 15005 25948.535261 \n", "\n", " # covid deaths \n", "0 216 \n", "1 681 \n", "2 98 \n", "3 105 \n", "4 243 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def convert_to_pandas(covid_data):\n", " \n", " covid_data_flipped = []\n", " for state, counties in covid_data.items():\n", " for county in counties: \n", " pop = county['population']\n", " if (pop is None) or (pop == 0):\n", " continue\n", " cases = county['cases']\n", " cur_dict = {\"county\":county['county_name'], \"state\":state,\n", " \"# total covid cases\": cases,\n", " \"# covid cases per 100k\": cases/(pop/100000),\n", " \"# covid deaths\": county['deaths']}\n", " covid_data_flipped.append(cur_dict)\n", " covid_df = pd.json_normalize(covid_data_flipped)\n", " return covid_df\n", "\n", "covid_df = convert_to_pandas(covid_data)\n", "print(covid_df.shape)\n", "covid_df.head()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "covid_df.to_csv('./combined_data/covid_df.csv', index=False)" ] }, { "cell_type": "markdown", "metadata": { "id": "vMVBrhvS8qE0" }, "source": [ "We can use this dataframe to compute same quantities as done above more easily\n", "\n", "1. the single county (and the state to which it belongs) that has the lowest rate of COVID cases per 100k people\n", "1. the single county (and the state to which it belongs) that has the highest rate of COVID cases per 100k people\n", "\n", "\n", "1. the state that has the lowest number of deaths\n", "1. the state that has the highest number of deaths\n", "\n", "\n", "1. The state that has the lowest rate of deaths based on its entire population\n", "1. The state that has the highest rate of deaths based on its entire population\n", "\n" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "BEsrXQAU8pP-", "outputId": "1e97dd6f-5195-43cd-8b01-a5f79ae017de" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Kalawao County (Hawaii) has the lowest rate of confirmed COVID cases per 100k: 0.00\n", "Loving County (Texas) has the highest rate of confirmed COVID cases per 100k: 113,017.75\n" ] } ], "source": [ "def calculate_county_stats2(covid_df):\n", "\n", " sorted_df = covid_df.sort_values(by=['# covid cases per 100k'])\n", " lowest = sorted_df.iloc[0]\n", " highest = sorted_df.iloc[-1]\n", "\n", " print(f\"{lowest['county']} ({lowest['state']}) has the lowest rate of confirmed COVID cases per 100k: {lowest['# covid cases per 100k']:,.2f}\")\n", " print(f\"{highest['county']} ({highest['state']}) has the highest rate of confirmed COVID cases per 100k: {highest['# covid cases per 100k']:,.2f}\")\n", " \n", "calculate_county_stats2(covid_df)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Nhh5OeuB9Fao", "outputId": "3d19ce1c-7892-4c5a-c876-30386aaf95f4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Vermont has the fewest COVID deaths: 640.0\n", "California has the most COVID deaths: 89667.0\n" ] } ], "source": [ "def calculate_state_deaths2(covid_df):\n", " \n", " state_deaths = covid_df.groupby('state').sum().sort_values(by=['# covid deaths'])\n", " lowest = state_deaths.iloc[0]\n", " highest = state_deaths.iloc[-1]\n", "\n", " print(lowest.name + \" has the fewest COVID deaths: \" + str(lowest['# covid deaths']))\n", " print(highest.name + \" has the most COVID deaths: \" + str(highest['# covid deaths']))\n", "\n", "calculate_state_deaths2(covid_df)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PukG7pDl9Uw_", "outputId": "1c0bcc88-abfa-41c2-9a5e-547fc53f9206" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hawaii has the lowest COVID death rate; 1 out of every 995 people has died\n", "Mississippi has the highest COVID death rate; 1 out of every 238 people has died\n" ] } ], "source": [ "def calculate_state_deathrate2(covid_df):\n", " \n", " covid_df2 = covid_df\n", " covid_df2['population'] = 100000*covid_df2['# total covid cases'] / covid_df2['# covid cases per 100k']\n", " covid_df2 = covid_df2.groupby('state').sum()\n", " covid_df2['death_rate'] = covid_df2['population'] / covid_df2['# covid deaths']\n", " covid_df2 = covid_df2.sort_values(by=['death_rate'])\n", "\n", " print(covid_df2.iloc[-1].name + \" has the lowest COVID death rate; 1 out of every \" + str(int(covid_df2.iloc[-1]['death_rate'])) + \" people has died\")\n", " print(covid_df2.iloc[0].name + \" has the highest COVID death rate; 1 out of every \" + str(int(covid_df2.iloc[0]['death_rate'])) + \" people has died\")\n", "\n", "calculate_state_deathrate2(covid_df)" ] }, { "cell_type": "markdown", "metadata": { "id": "Y9KrpyIp8_Ze" }, "source": [ "Furthermore, considering that the data is messy and some are not reliable, we attempted to understand some of the uncertainty around COVID data. We consider that false negatives of deaths of COVID-19 is minimal. Every disease has a mortality rate and we can consider it's constant throughout all people in the US. Although some are at highe risk (e.g. older folks, people with pre-existing conditions, etc), we can imagine that this variance in the population to be fairly uniform throughout the USA. Therefore, if all counties were equal in their testing, we are supposed to see a consistent ratio between # people who died from COVID and # of people who tested positive for COVID, which is called 'case fatality rate'." ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 502 }, "id": "de9sEr-hBqN1", "outputId": "0bca88da-5f15-4d33-aa49-e1016b452bd2" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countystate# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rate
1702Loup CountyNebraska8713102.409639066300.000000
1696Keya Paha CountyNebraska11814640.198511080500.000000
544Kalawao CountyHawaii00.0000000000.000000
87Skagway MunicipalityAlaska302535.9256130118200.000000
269Jackson CountyColorado16611925.2873560139100.000000
...........................
2715Sabine CountyTexas125411895.27603989105418440.070973
427Dodge CountyGeorgia211010240.232953154206047470.072986
1752Storey CountyNevada1333225.8064521141222660.082707
538Wilcox CountyGeorgia8289588.8824557186348220.085749
444Glascock CountyGeorgia2699054.1905082529708410.092937
\n", "

3118 rows × 8 columns

\n", "
" ], "text/plain": [ " county state # total covid cases \\\n", "1702 Loup County Nebraska 87 \n", "1696 Keya Paha County Nebraska 118 \n", "544 Kalawao County Hawaii 0 \n", "87 Skagway Municipality Alaska 30 \n", "269 Jackson County Colorado 166 \n", "... ... ... ... \n", "2715 Sabine County Texas 1254 \n", "427 Dodge County Georgia 2110 \n", "1752 Storey County Nevada 133 \n", "538 Wilcox County Georgia 828 \n", "444 Glascock County Georgia 269 \n", "\n", " # covid cases per 100k # covid deaths population \\\n", "1702 13102.409639 0 663 \n", "1696 14640.198511 0 805 \n", "544 0.000000 0 0 \n", "87 2535.925613 0 1182 \n", "269 11925.287356 0 1391 \n", "... ... ... ... \n", "2715 11895.276039 89 10541 \n", "427 10240.232953 154 20604 \n", "1752 3225.806452 11 4122 \n", "538 9588.882455 71 8634 \n", "444 9054.190508 25 2970 \n", "\n", " # covid deaths per 100k case_fatality_rate \n", "1702 0 0.000000 \n", "1696 0 0.000000 \n", "544 0 0.000000 \n", "87 0 0.000000 \n", "269 0 0.000000 \n", "... ... ... \n", "2715 844 0.070973 \n", "427 747 0.072986 \n", "1752 266 0.082707 \n", "538 822 0.085749 \n", "444 841 0.092937 \n", "\n", "[3118 rows x 8 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def add_death_stats(covid_df):\n", " \n", " # can add an infintesimal or fillna after the fact to handle nans from divide by 0.\n", " \n", " covid_df['population'] = 100000*covid_df['# total covid cases'] / (covid_df['# covid cases per 100k']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df[\"population\"] = covid_df[\"population\"].astype('int32')\n", " \n", " covid_df['# covid deaths per 100k'] = 100000*covid_df['# covid deaths'] / (covid_df['population']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df[\"# covid deaths per 100k\"] = covid_df[\"# covid deaths per 100k\"].astype('int32')\n", " \n", " covid_df['case_fatality_rate'] = covid_df['# covid deaths'] / (covid_df['# total covid cases']+0.0001)\n", "# covid_df.fillna(0, inplace=True)\n", " covid_df = covid_df.sort_values(by=['case_fatality_rate'])\n", "\n", " return covid_df\n", "\n", "covid_updated = add_death_stats(covid_df)\n", "covid_updated" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "covid_updated.to_csv('./combined_data/covid_updated.csv', index = False)" ] }, { "cell_type": "markdown", "metadata": { "id": "2gVw9sYECdR1" }, "source": [ "From all those analyses above, we learned that states vary wildly in their death rate (e.g., The number of deaths in New Jersey or California is orders of magnitude higher than those in Hawaii or Alaska) and COVID testing. States also fluctuate a lot amongst their counties, as some counties with very bad statistics are within states with good statistics. \n", "\n", "When it comes to data reliability, some states and counties are probably more proactive when it comes to testing, so they could have higher cases counts. Other counties might have a similar number of cases or higher, but they are just not being represented in the data due to lower testing. Deaths are thus harder to overlook, so states with lax testing policies may have inflated deaths per case metrics. Perhaps we could supplement the data with some measure of testing rates in the county or state." ] }, { "cell_type": "markdown", "metadata": { "id": "RbR70HJmEm6a" }, "source": [ "# 4. Incorporate More Data\n", "\n", "We are also interested in how COVID has impacted our world. We can better understand this by looking at how it relates to demographics, income, education, health, and politicala voting. \n", "\n", "Our `case_fatality_rate` column can be viewed as an approximation of how effective and thorough COVID testing is for a given county. Our `# covid deaths` column can be viewed as an extreme indication of how severe COVID has impacted a given county. Our `# covid cases per 100k` column be viewed as middle-ground between the two aforementioned features. That is, it measures the impact of the disease and is influenced by the thoroughness of COVID testing. \n", "\n", "Using these three informative features, we can inspect how impacted each county is, while correlating this with other features of each county, such as income-level, health metrics, demographics, etc. \n", "\n", "In this project, we merged our COVID case data with 'election2020_by_county.csv' dataset. We only care about 15 columns which are hispanic, minority, female, unemployed, income, nodegree, bachelor, inactivity, obesity, desity, cancer, voter_turnout, voter_gap, trump, biden. We droppde fipscode and population columns.\n", "\n", "A data description is as follows:\n", "\n", "- state: the state in which the county lies\n", "- fipscode: an ID to identify each county\n", "- county: the name of each county\n", "- population: total population\n", "- hispanic: percent of adults that are hispanic\n", "- minority: percent of adults that are nonwhite\n", "- female: percent of adults that are female\n", "- unemployed: unemployment rate, as a percent\n", "- income: median income\n", "- nodegree: percent of adults who have not completed high school\n", "- bachelor: percent of adults with a bachelor’s degree\n", "- inactive: percent of adults who do not exercise in their leisure time\n", "- obesity: percent of adults with BMI > 30\n", "- density: population density, persons per square mile of land\n", "- cancer: prevalence of cancer per 100,000 individuals\n", "- voter_turnout: percentage of voting age population that voted\n", "- voter_gap: percentage point gap in 2020 presidential voting: trump-briden\n", "\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 473 }, "id": "Za799tkjB4Hs", "outputId": "3fba53e1-a3a9-4f28-a50d-f86f68e811ca" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(3044, 23)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countystate# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminority...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
0Loup CountyNebraska8713102.409639066300.00.00.4...6.214.433.030.71.3NaN-4.84988565.081.516.5
1Keya Paha CountyNebraska11814640.198511080500.01.12.2...8.015.830.229.67.9256.38.00000080.790.09.3
2Jackson CountyColorado16611925.2873560139100.024.425.1...15.517.520.521.40.9207.112.87425158.177.919.8
3Daggett CountyUtah373894.736842094900.03.35.1...12.419.320.326.114.5185.3-9.46196762.480.217.8
4Hinsdale CountyColorado13115975.609756081900.06.09.1...5.141.314.820.50.8NaN5.95238115.655.940.3
\n", "

5 rows × 23 columns

\n", "
" ], "text/plain": [ " county state # total covid cases # covid cases per 100k \\\n", "0 Loup County Nebraska 87 13102.409639 \n", "1 Keya Paha County Nebraska 118 14640.198511 \n", "2 Jackson County Colorado 166 11925.287356 \n", "3 Daggett County Utah 37 3894.736842 \n", "4 Hinsdale County Colorado 131 15975.609756 \n", "\n", " # covid deaths population # covid deaths per 100k case_fatality_rate \\\n", "0 0 663 0 0.0 \n", "1 0 805 0 0.0 \n", "2 0 1391 0 0.0 \n", "3 0 949 0 0.0 \n", "4 0 819 0 0.0 \n", "\n", " hispanic minority ... nodegree bachelor inactivity obesity density \\\n", "0 0.0 0.4 ... 6.2 14.4 33.0 30.7 1.3 \n", "1 1.1 2.2 ... 8.0 15.8 30.2 29.6 7.9 \n", "2 24.4 25.1 ... 15.5 17.5 20.5 21.4 0.9 \n", "3 3.3 5.1 ... 12.4 19.3 20.3 26.1 14.5 \n", "4 6.0 9.1 ... 5.1 41.3 14.8 20.5 0.8 \n", "\n", " cancer voter_turnout voter_gap trump biden \n", "0 NaN -4.849885 65.0 81.5 16.5 \n", "1 256.3 8.000000 80.7 90.0 9.3 \n", "2 207.1 12.874251 58.1 77.9 19.8 \n", "3 185.3 -9.461967 62.4 80.2 17.8 \n", "4 NaN 5.952381 15.6 55.9 40.3 \n", "\n", "[5 rows x 23 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def merge_data(covid_updated, filepath):\n", " \n", " data2020 = pd.read_csv(filepath).drop(columns=['fipscode', 'population'])\n", " return pd.merge(covid_updated, data2020, on=['state', 'county'])\n", "\n", "merged = merge_data(covid_updated, './drive/MyDrive/election2020_by_county.csv')\n", "print(merged.shape)\n", "merged.head()" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "merged.to_csv('./combined_data/merged.csv', index = False)" ] }, { "cell_type": "markdown", "metadata": { "id": "FC2onjm5Guqg" }, "source": [ "Due to mismatching happened during merging, we have lost some rows. " ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zBf42-18B4B8", "outputId": "d76f87c3-6597-453f-dcd2-1e601c2bfdc6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "74\n" ] } ], "source": [ "print(len(covid_updated) - len(merged))" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 520 }, "id": "uJ6uylCdGen9", "outputId": "fb258a24-d339-4092-f243-c5256dc4e127" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countystate# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rate
87Skagway MunicipalityAlaska302535.9256130118200.000000
90Wrangell City and BoroughAlaska712837.7298160250100.000000
75Hoonah-Angoon Census AreaAlaska00.0000000000.000000
544Kalawao CountyHawaii00.0000000000.000000
68Aleutians West Census AreaAlaska86315317.71388025633350.002317
...........................
1127East Feliciana ParishLouisiana732038254.507447170191348880.023224
1122Claiborne ParishLouisiana328020931.71665677156694910.023476
1129Franklin ParishLouisiana734736707.469398178200148890.024228
1148Red River ParishLouisiana211024994.0772335384416270.025118
1115Bienville ParishLouisiana399030133.675704110132408300.027569
\n", "

74 rows × 8 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " county state # total covid cases \\\n", "87 Skagway Municipality Alaska 30 \n", "90 Wrangell City and Borough Alaska 71 \n", "75 Hoonah-Angoon Census Area Alaska 0 \n", "544 Kalawao County Hawaii 0 \n", "68 Aleutians West Census Area Alaska 863 \n", "... ... ... ... \n", "1127 East Feliciana Parish Louisiana 7320 \n", "1122 Claiborne Parish Louisiana 3280 \n", "1129 Franklin Parish Louisiana 7347 \n", "1148 Red River Parish Louisiana 2110 \n", "1115 Bienville Parish Louisiana 3990 \n", "\n", " # covid cases per 100k # covid deaths population \\\n", "87 2535.925613 0 1182 \n", "90 2837.729816 0 2501 \n", "75 0.000000 0 0 \n", "544 0.000000 0 0 \n", "68 15317.713880 2 5633 \n", "... ... ... ... \n", "1127 38254.507447 170 19134 \n", "1122 20931.716656 77 15669 \n", "1129 36707.469398 178 20014 \n", "1148 24994.077233 53 8441 \n", "1115 30133.675704 110 13240 \n", "\n", " # covid deaths per 100k case_fatality_rate \n", "87 0 0.000000 \n", "90 0 0.000000 \n", "75 0 0.000000 \n", "544 0 0.000000 \n", "68 35 0.002317 \n", "... ... ... \n", "1127 888 0.023224 \n", "1122 491 0.023476 \n", "1129 889 0.024228 \n", "1148 627 0.025118 \n", "1115 830 0.027569 \n", "\n", "[74 rows x 8 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "missing_counties = set()\n", "merged_counties = set()\n", "for index, row in merged.iterrows():\n", " merged_counties.add(row['county'].lower() + \"_\" + row['state'].lower())\n", "\n", "missing_idxs = []\n", "for index, row in covid_updated.iterrows():\n", " cur_county = row['county'].lower() + \"_\" + row['state'].lower()\n", " if cur_county not in merged_counties:\n", " # print(\"missing\",cur_county)\n", " missing_idxs.append(index)\n", " missing_counties.add(cur_county)\n", "\n", "covid_updated.loc[missing_idxs]" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "LXr84099G3Ui", "outputId": "b7656532-271a-4974-85c2-11ca6ad9d901" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count3.021000e+033021.0000003021.0000003.021000e+033021.0000003021.0000003021.0000003021.0000003021.0000003021.000000...3021.0000003021.0000003021.0000003021.0000003021.0000002979.0000002984.0000002983.0000002983.0000002983.000000
mean2.546425e+0424314.359415302.0671961.055975e+05366.1797420.0157609.25948422.69400949.9329035.495035...14.97027520.06613725.91082430.947104250.792122228.50748635.45908432.70274965.48702632.784278
std8.662408e+046015.9824011006.5171393.391059e+05162.8642540.00857913.88148019.9207982.3754171.969195...6.7408868.8468735.1614104.4676001741.44551156.12589013.86852431.27982715.69916615.590596
min3.200000e+013274.3148191.0000001.680000e+0211.0000000.0005270.0000000.00000019.1662151.800000...1.9000002.6000008.10000011.8000000.10000046.200000-168.323353-90.0000004.0000003.100000
25%2.560000e+0320683.90722541.0000001.106500e+04249.0000000.0106372.0000006.90000049.4620744.100000...9.90000014.00000022.60000028.30000017.000000193.30000027.67575715.05000056.70000020.800000
50%6.401000e+0324365.81787896.0000002.584300e+04358.0000000.0143214.00000015.20000050.3924105.300000...13.50000017.90000025.80000031.10000045.300000230.30000035.05549739.10000068.70000029.600000
75%1.705000e+0427802.251591229.0000006.784200e+04468.0000000.0188859.50000033.90000051.0802526.500000...19.20000023.60000029.40000033.700000112.900000265.10000042.47312556.80000077.50000041.650000
max2.751220e+06113017.75147931736.0000001.003911e+071229.0000000.09293799.20000099.40000058.10042024.000000...53.30000075.10000041.40000047.60000069468.400000458.300000100.00000093.10000096.20000094.000000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 3.021000e+03 3021.000000 3021.000000 \n", "mean 2.546425e+04 24314.359415 302.067196 \n", "std 8.662408e+04 6015.982401 1006.517139 \n", "min 3.200000e+01 3274.314819 1.000000 \n", "25% 2.560000e+03 20683.907225 41.000000 \n", "50% 6.401000e+03 24365.817878 96.000000 \n", "75% 1.705000e+04 27802.251591 229.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 3.021000e+03 3021.000000 3021.000000 3021.000000 \n", "mean 1.055975e+05 366.179742 0.015760 9.259484 \n", "std 3.391059e+05 162.864254 0.008579 13.881480 \n", "min 1.680000e+02 11.000000 0.000527 0.000000 \n", "25% 1.106500e+04 249.000000 0.010637 2.000000 \n", "50% 2.584300e+04 358.000000 0.014321 4.000000 \n", "75% 6.784200e+04 468.000000 0.018885 9.500000 \n", "max 1.003911e+07 1229.000000 0.092937 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 3021.000000 3021.000000 3021.000000 ... 3021.000000 3021.000000 \n", "mean 22.694009 49.932903 5.495035 ... 14.970275 20.066137 \n", "std 19.920798 2.375417 1.969195 ... 6.740886 8.846873 \n", "min 0.000000 19.166215 1.800000 ... 1.900000 2.600000 \n", "25% 6.900000 49.462074 4.100000 ... 9.900000 14.000000 \n", "50% 15.200000 50.392410 5.300000 ... 13.500000 17.900000 \n", "75% 33.900000 51.080252 6.500000 ... 19.200000 23.600000 \n", "max 99.400000 58.100420 24.000000 ... 53.300000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 3021.000000 3021.000000 3021.000000 2979.000000 2984.000000 \n", "mean 25.910824 30.947104 250.792122 228.507486 35.459084 \n", "std 5.161410 4.467600 1741.445511 56.125890 13.868524 \n", "min 8.100000 11.800000 0.100000 46.200000 -168.323353 \n", "25% 22.600000 28.300000 17.000000 193.300000 27.675757 \n", "50% 25.800000 31.100000 45.300000 230.300000 35.055497 \n", "75% 29.400000 33.700000 112.900000 265.100000 42.473125 \n", "max 41.400000 47.600000 69468.400000 458.300000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 2983.000000 2983.000000 2983.000000 \n", "mean 32.702749 65.487026 32.784278 \n", "std 31.279827 15.699166 15.590596 \n", "min -90.000000 4.000000 3.100000 \n", "25% 15.050000 56.700000 20.800000 \n", "50% 39.100000 68.700000 29.600000 \n", "75% 56.800000 77.500000 41.650000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Remove rows with 0 deaths\n", "merged = merged.loc[merged['# covid deaths'] != 0]\n", "\n", "# Summary statistics\n", "merged.describe()" ] }, { "cell_type": "markdown", "metadata": { "id": "3jmCzIx7IRiQ" }, "source": [ "# 5. Exploratory Data Analysis (EDA) II\n", "\n", "We can partition any quantitative feature by using quantiles. With arbitrarily chosen minv and maxv, we can partition certain feature of interest multiple times and observe interesting relationships." ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "2RzKgq32IHaD" }, "outputs": [], "source": [ "# Given minv and maxv, this function returns a subset of the dataframe that has feature values between minv and maxv inclusive.\n", "def partition_df(df, column_name, minv, maxv):\n", " return df.loc[(merged[column_name] >= minv) & (merged[column_name] <= maxv)]" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 428 }, "id": "ErmhoSBhJnPV", "outputId": "fc28aeea-e89c-4fe7-d2f7-1d5fe0bcf66f" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000...117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000117.000000
mean4634.60683826336.19866585.50427416558.410256520.2393160.0215988.47435943.99059849.6248589.065812...26.89572611.89230831.67692336.23076984.973504244.55641044.98349313.63504356.18119742.546154
std5658.9866747096.64900387.83003017899.159118152.4029170.01077821.30683932.3726953.8363462.228389...5.7364893.7108224.3585905.410948176.21276047.75477310.87904045.68513422.87708122.815828
min184.0000007931.9041613.0000001536.000000174.0000000.0047650.0000000.40000035.4627774.400000...8.0000005.80000019.70000021.0000001.40000099.200000-10.411765-71.60000013.5000009.300000
25%2003.00000022647.59651643.0000008110.000000434.0000000.0158730.7000005.80000048.7478937.700000...23.4000009.20000029.00000032.60000017.500000215.20000039.175362-27.90000035.10000019.900000
50%2889.00000027116.87436856.00000011839.000000515.0000000.0188611.60000049.70000050.6911938.700000...26.90000011.30000032.00000036.40000036.800000243.90000044.53666514.20000054.90000040.800000
75%5247.00000031600.40774796.00000018067.000000599.0000000.0227893.40000073.10000051.95312510.100000...29.80000013.80000035.00000040.30000074.100000272.70000051.13899558.50000078.90000063.000000
max40588.00000039821.693908589.000000130624.0000001064.0000000.06756899.20000099.40000056.52657317.600000...53.30000027.90000041.30000047.6000001261.500000380.00000070.00370980.40000089.70000085.100000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 117.000000 117.000000 117.000000 \n", "mean 4634.606838 26336.198665 85.504274 \n", "std 5658.986674 7096.649003 87.830030 \n", "min 184.000000 7931.904161 3.000000 \n", "25% 2003.000000 22647.596516 43.000000 \n", "50% 2889.000000 27116.874368 56.000000 \n", "75% 5247.000000 31600.407747 96.000000 \n", "max 40588.000000 39821.693908 589.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 117.000000 117.000000 117.000000 117.000000 \n", "mean 16558.410256 520.239316 0.021598 8.474359 \n", "std 17899.159118 152.402917 0.010778 21.306839 \n", "min 1536.000000 174.000000 0.004765 0.000000 \n", "25% 8110.000000 434.000000 0.015873 0.700000 \n", "50% 11839.000000 515.000000 0.018861 1.600000 \n", "75% 18067.000000 599.000000 0.022789 3.400000 \n", "max 130624.000000 1064.000000 0.067568 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 117.000000 117.000000 117.000000 ... 117.000000 117.000000 \n", "mean 43.990598 49.624858 9.065812 ... 26.895726 11.892308 \n", "std 32.372695 3.836346 2.228389 ... 5.736489 3.710822 \n", "min 0.400000 35.462777 4.400000 ... 8.000000 5.800000 \n", "25% 5.800000 48.747893 7.700000 ... 23.400000 9.200000 \n", "50% 49.700000 50.691193 8.700000 ... 26.900000 11.300000 \n", "75% 73.100000 51.953125 10.100000 ... 29.800000 13.800000 \n", "max 99.400000 56.526573 17.600000 ... 53.300000 27.900000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 117.000000 117.000000 117.000000 117.000000 117.000000 \n", "mean 31.676923 36.230769 84.973504 244.556410 44.983493 \n", "std 4.358590 5.410948 176.212760 47.754773 10.879040 \n", "min 19.700000 21.000000 1.400000 99.200000 -10.411765 \n", "25% 29.000000 32.600000 17.500000 215.200000 39.175362 \n", "50% 32.000000 36.400000 36.800000 243.900000 44.536665 \n", "75% 35.000000 40.300000 74.100000 272.700000 51.138995 \n", "max 41.300000 47.600000 1261.500000 380.000000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 117.000000 117.000000 117.000000 \n", "mean 13.635043 56.181197 42.546154 \n", "std 45.685134 22.877081 22.815828 \n", "min -71.600000 13.500000 9.300000 \n", "25% -27.900000 35.100000 19.900000 \n", "50% 14.200000 54.900000 40.800000 \n", "75% 58.500000 78.900000 63.000000 \n", "max 80.400000 89.700000 85.100000 \n", "\n", "[8 rows x 21 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "partition_df(merged, 'income', 21000, 31000).describe()" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "gTw1U7oEJwxU", "outputId": "ac425324-7f5a-49e9-878a-bad283604ada" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1.033000e+031033.0000001033.0000001.033000e+031033.0000001033.0000001033.0000001033.0000001033.0000001033.000000...1033.0000001033.0000001033.0000001033.0000001033.0000001013.0000001014.0000001014.0000001014.0000001014.000000
mean4.635578e+0422771.754766493.4336881.961860e+05313.1529530.01442014.78470524.97483149.7849085.065247...13.03349525.62671822.03059126.236883365.470474213.82546932.29365424.42633161.28037536.854043
std1.397306e+056629.9009881602.6361095.424150e+05172.5546930.00916017.94439519.9406812.5377141.874395...6.72838110.9334334.1819532.9693531641.55354861.11922717.66572636.40379318.30611718.108292
min3.500000e+013274.3148191.0000001.680000e+0211.0000000.0010570.0000000.00000031.9550241.800000...1.9000002.6000008.10000011.8000000.10000046.200000-21.417069-90.0000004.0000003.100000
25%2.310000e+0318617.70931333.0000001.072400e+04180.0000000.0084133.1000009.80000049.3215863.800000...8.30000017.30000019.20000025.00000011.000000169.70000021.126119-2.20000047.85000021.900000
50%7.027000e+0322735.28312393.0000003.256400e+04285.0000000.0127467.40000018.60000050.3313654.800000...11.30000023.30000022.40000027.10000045.800000212.60000029.87098827.50000062.80000035.200000
75%3.332500e+0426132.178794335.0000001.529390e+05414.0000000.01788319.20000035.20000051.0184055.900000...16.30000032.20000025.00000028.400000162.300000251.60000040.21974254.30000076.27500049.975000
max2.751220e+06113017.75147931736.0000001.003911e+071229.0000000.09293795.30000097.40000058.10042024.000000...46.80000075.10000038.80000029.40000032903.300000445.40000099.55289593.10000096.20000094.000000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1.033000e+03 1033.000000 1033.000000 \n", "mean 4.635578e+04 22771.754766 493.433688 \n", "std 1.397306e+05 6629.900988 1602.636109 \n", "min 3.500000e+01 3274.314819 1.000000 \n", "25% 2.310000e+03 18617.709313 33.000000 \n", "50% 7.027000e+03 22735.283123 93.000000 \n", "75% 3.332500e+04 26132.178794 335.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.033000e+03 1033.000000 1033.000000 1033.000000 \n", "mean 1.961860e+05 313.152953 0.014420 14.784705 \n", "std 5.424150e+05 172.554693 0.009160 17.944395 \n", "min 1.680000e+02 11.000000 0.001057 0.000000 \n", "25% 1.072400e+04 180.000000 0.008413 3.100000 \n", "50% 3.256400e+04 285.000000 0.012746 7.400000 \n", "75% 1.529390e+05 414.000000 0.017883 19.200000 \n", "max 1.003911e+07 1229.000000 0.092937 95.300000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1033.000000 1033.000000 1033.000000 ... 1033.000000 1033.000000 \n", "mean 24.974831 49.784908 5.065247 ... 13.033495 25.626718 \n", "std 19.940681 2.537714 1.874395 ... 6.728381 10.933433 \n", "min 0.000000 31.955024 1.800000 ... 1.900000 2.600000 \n", "25% 9.800000 49.321586 3.800000 ... 8.300000 17.300000 \n", "50% 18.600000 50.331365 4.800000 ... 11.300000 23.300000 \n", "75% 35.200000 51.018405 5.900000 ... 16.300000 32.200000 \n", "max 97.400000 58.100420 24.000000 ... 46.800000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1033.000000 1033.000000 1033.000000 1013.000000 1014.000000 \n", "mean 22.030591 26.236883 365.470474 213.825469 32.293654 \n", "std 4.181953 2.969353 1641.553548 61.119227 17.665726 \n", "min 8.100000 11.800000 0.100000 46.200000 -21.417069 \n", "25% 19.200000 25.000000 11.000000 169.700000 21.126119 \n", "50% 22.400000 27.100000 45.800000 212.600000 29.870988 \n", "75% 25.000000 28.400000 162.300000 251.600000 40.219742 \n", "max 38.800000 29.400000 32903.300000 445.400000 99.552895 \n", "\n", " voter_gap trump biden \n", "count 1014.000000 1014.000000 1014.000000 \n", "mean 24.426331 61.280375 36.854043 \n", "std 36.403793 18.306117 18.108292 \n", "min -90.000000 4.000000 3.100000 \n", "25% -2.200000 47.850000 21.900000 \n", "50% 27.500000 62.800000 35.200000 \n", "75% 54.300000 76.275000 49.975000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1015.0000001015.0000001015.0000001.015000e+031015.0000001015.0000001015.0000001015.0000001015.0000001015.000000...1015.0000001015.0000001015.0000001015.0000001015.0000001002.0000001004.0000001003.0000001003.0000001003.000000
mean17413.96650224372.422566226.4482767.111427e+04373.2955670.0161797.55399018.79596149.8904435.326305...14.51339918.69507426.21812831.132217205.667586234.56776434.87807438.69651068.49142629.794915
std35239.6305045506.207523453.9173231.464475e+05157.1580420.00891711.86268517.3054632.2386551.872417...6.3189336.2242813.9837971.0184962212.80707454.67249112.70814224.84815812.46981312.388141
min32.0000006911.4470841.0000004.620000e+0227.0000000.0008090.0000001.50000019.1662151.800000...3.8000004.40000015.80000029.4000000.10000082.500000-168.323353-65.60000016.9000005.600000
25%2331.50000021097.27065637.0000001.039850e+04262.0000000.0109032.0000006.00000049.4030404.100000...9.70000014.30000023.30000030.20000016.700000199.82500028.27963823.30000060.75000020.300000
50%6279.00000024563.53644998.0000002.640300e+04359.0000000.0142363.50000012.20000050.3148255.100000...12.90000017.60000026.10000031.10000042.900000236.20000034.53724642.10000070.10000028.000000
75%15557.00000027653.785878219.5000006.302800e+04468.0000000.0193377.30000026.85000050.9709826.300000...18.20000021.70000028.80000032.00000098.600000269.67500040.80756457.60000078.05000037.500000
max402352.00000061934.1293797728.0000001.584063e+061212.0000000.08574999.20000099.40000056.63390721.800000...53.30000044.60000041.40000032.80000069468.400000425.400000100.00000086.00000092.00000082.500000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1015.000000 1015.000000 1015.000000 \n", "mean 17413.966502 24372.422566 226.448276 \n", "std 35239.630504 5506.207523 453.917323 \n", "min 32.000000 6911.447084 1.000000 \n", "25% 2331.500000 21097.270656 37.000000 \n", "50% 6279.000000 24563.536449 98.000000 \n", "75% 15557.000000 27653.785878 219.500000 \n", "max 402352.000000 61934.129379 7728.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.015000e+03 1015.000000 1015.000000 1015.000000 \n", "mean 7.111427e+04 373.295567 0.016179 7.553990 \n", "std 1.464475e+05 157.158042 0.008917 11.862685 \n", "min 4.620000e+02 27.000000 0.000809 0.000000 \n", "25% 1.039850e+04 262.000000 0.010903 2.000000 \n", "50% 2.640300e+04 359.000000 0.014236 3.500000 \n", "75% 6.302800e+04 468.000000 0.019337 7.300000 \n", "max 1.584063e+06 1212.000000 0.085749 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1015.000000 1015.000000 1015.000000 ... 1015.000000 1015.000000 \n", "mean 18.795961 49.890443 5.326305 ... 14.513399 18.695074 \n", "std 17.305463 2.238655 1.872417 ... 6.318933 6.224281 \n", "min 1.500000 19.166215 1.800000 ... 3.800000 4.400000 \n", "25% 6.000000 49.403040 4.100000 ... 9.700000 14.300000 \n", "50% 12.200000 50.314825 5.100000 ... 12.900000 17.600000 \n", "75% 26.850000 50.970982 6.300000 ... 18.200000 21.700000 \n", "max 99.400000 56.633907 21.800000 ... 53.300000 44.600000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1015.000000 1015.000000 1015.000000 1002.000000 1004.000000 \n", "mean 26.218128 31.132217 205.667586 234.567764 34.878074 \n", "std 3.983797 1.018496 2212.807074 54.672491 12.708142 \n", "min 15.800000 29.400000 0.100000 82.500000 -168.323353 \n", "25% 23.300000 30.200000 16.700000 199.825000 28.279638 \n", "50% 26.100000 31.100000 42.900000 236.200000 34.537246 \n", "75% 28.800000 32.000000 98.600000 269.675000 40.807564 \n", "max 41.400000 32.800000 69468.400000 425.400000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 1003.000000 1003.000000 1003.000000 \n", "mean 38.696510 68.491426 29.794915 \n", "std 24.848158 12.469813 12.388141 \n", "min -65.600000 16.900000 5.600000 \n", "25% 23.300000 60.750000 20.300000 \n", "50% 42.100000 70.100000 28.000000 \n", "75% 57.600000 78.050000 37.500000 \n", "max 86.000000 92.000000 82.500000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1036.0000001036.0000001036.0000001.036000e+031036.0000001036.0000001036.0000001036.0000001036.0000001036.000000...1036.0000001036.0000001036.0000001036.0000001036.0000001026.0000001029.0000001029.0000001029.0000001029.000000
mean11955.72973025787.480672178.8880314.675872e+04411.7683400.0166745.38030923.93870750.1174826.053378...17.30222015.87181529.47683435.479826174.186969237.09298239.01272235.53702666.96064131.423615
std23933.1198855361.167248374.3269979.641718e+04141.4640440.0074328.43450121.6055402.2955682.021584...6.4875724.8507904.2237732.4225811164.33952148.7783098.71739529.33946014.66272214.686429
min96.0000009330.3467981.0000007.210000e+0214.0000000.0005270.0000000.40000034.8684781.800000...3.1000005.80000015.60000032.8000000.10000061.80000012.181303-80.8000008.8000005.400000
25%3003.25000022698.61431249.0000001.222000e+04321.7500000.0123941.6000006.00000049.6430734.800000...12.27500012.40000026.50000033.70000022.950000206.65000033.09136721.90000060.00000020.900000
50%6047.00000026021.13358398.0000002.313750e+04411.0000000.0156692.80000015.30000050.5109865.900000...16.40000015.00000029.50000034.80000046.350000237.00000039.27087542.80000070.30000027.800000
75%11853.50000028860.094198180.0000004.566200e+04494.0000000.0193055.40000037.92500051.2681317.200000...22.20000018.60000032.40000036.700000102.325000267.00000044.73253656.90000077.50000038.100000
max410889.00000072727.2727277953.0000001.749342e+06996.0000000.06660391.80000093.40000056.52657316.900000...40.50000042.60000041.30000047.60000035369.200000458.30000070.00370987.90000093.30000089.600000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1036.000000 1036.000000 1036.000000 \n", "mean 11955.729730 25787.480672 178.888031 \n", "std 23933.119885 5361.167248 374.326997 \n", "min 96.000000 9330.346798 1.000000 \n", "25% 3003.250000 22698.614312 49.000000 \n", "50% 6047.000000 26021.133583 98.000000 \n", "75% 11853.500000 28860.094198 180.000000 \n", "max 410889.000000 72727.272727 7953.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.036000e+03 1036.000000 1036.000000 1036.000000 \n", "mean 4.675872e+04 411.768340 0.016674 5.380309 \n", "std 9.641718e+04 141.464044 0.007432 8.434501 \n", "min 7.210000e+02 14.000000 0.000527 0.000000 \n", "25% 1.222000e+04 321.750000 0.012394 1.600000 \n", "50% 2.313750e+04 411.000000 0.015669 2.800000 \n", "75% 4.566200e+04 494.000000 0.019305 5.400000 \n", "max 1.749342e+06 996.000000 0.066603 91.800000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1036.000000 1036.000000 1036.000000 ... 1036.000000 1036.000000 \n", "mean 23.938707 50.117482 6.053378 ... 17.302220 15.871815 \n", "std 21.605540 2.295568 2.021584 ... 6.487572 4.850790 \n", "min 0.400000 34.868478 1.800000 ... 3.100000 5.800000 \n", "25% 6.000000 49.643073 4.800000 ... 12.275000 12.400000 \n", "50% 15.300000 50.510986 5.900000 ... 16.400000 15.000000 \n", "75% 37.925000 51.268131 7.200000 ... 22.200000 18.600000 \n", "max 93.400000 56.526573 16.900000 ... 40.500000 42.600000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1036.000000 1036.000000 1036.000000 1026.000000 1029.000000 \n", "mean 29.476834 35.479826 174.186969 237.092982 39.012722 \n", "std 4.223773 2.422581 1164.339521 48.778309 8.717395 \n", "min 15.600000 32.800000 0.100000 61.800000 12.181303 \n", "25% 26.500000 33.700000 22.950000 206.650000 33.091367 \n", "50% 29.500000 34.800000 46.350000 237.000000 39.270875 \n", "75% 32.400000 36.700000 102.325000 267.000000 44.732536 \n", "max 41.300000 47.600000 35369.200000 458.300000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 1029.000000 1029.000000 1029.000000 \n", "mean 35.537026 66.960641 31.423615 \n", "std 29.339460 14.662722 14.686429 \n", "min -80.800000 8.800000 5.400000 \n", "25% 21.900000 60.000000 20.900000 \n", "50% 42.800000 70.300000 27.800000 \n", "75% 56.900000 77.500000 38.100000 \n", "max 87.900000 93.300000 89.600000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def view_partitions(df, feature, n_partitions=3, cols=None):\n", " if cols is None:\n", " cols = df.columns\n", " start = 0\n", " for i in range(n_partitions):\n", " stop = start + (1/n_partitions)\n", " display(partition_df(merged, feature,\n", " merged[feature].quantile(start),\n", " merged[feature].quantile(stop))[cols].describe())\n", " start = stop\n", "view_partitions(merged, 'obesity')" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "dK1SwKuQJ7v9", "outputId": "d8612ca4-1607-4777-cecd-2a46370e4500" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1.007000e+031007.0000001007.0000001.007000e+031007.0000001007.0000001007.0000001007.0000001007.0000001007.000000...1007.0000001007.0000001007.0000001007.0000001007.000000992.000000982.000000982.000000982.000000982.000000
mean4.741475e+0423140.636087483.6703082.030346e+05271.9374380.01209312.36504522.89056649.7072544.982423...11.34816326.61757720.35481627.668818461.912115213.86189531.29315617.14989857.54022440.390326
std1.367664e+056443.1220641512.7963665.351729e+05139.9721610.00634315.73239018.8248342.4992031.795947...5.74051310.4729652.8361584.2139672770.74524661.47324217.40416332.86615516.45907716.417551
min6.300000e+013274.3148191.0000001.680000e+0211.0000000.0010570.0000000.00000019.1662151.900000...1.9000002.6000008.10000011.8000000.10000046.200000-21.417069-90.0000004.0000003.100000
25%3.455000e+0319408.06160838.0000001.509950e+04173.0000000.0078543.0000008.60000049.3360733.800000...7.70000018.70000018.70000025.30000012.250000166.95000021.203168-5.67500046.02500028.400000
50%1.100700e+0422993.187599120.0000004.758000e+04255.0000000.0113275.90000016.20000050.2082574.700000...10.10000024.80000021.20000028.10000046.900000211.45000028.19064719.75000059.00000039.200000
75%3.888200e+0426299.994779379.0000001.662710e+05342.0000000.01465214.35000031.90000050.8630735.800000...13.10000032.45000022.60000030.600000177.150000254.50000037.75174440.77500069.27500051.800000
max2.751220e+06113017.75147931736.0000001.003911e+07994.0000000.08148194.10000094.80000058.10042024.000000...43.90000075.10000023.70000039.70000069468.400000445.40000099.55289593.10000096.20000094.000000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1.007000e+03 1007.000000 1007.000000 \n", "mean 4.741475e+04 23140.636087 483.670308 \n", "std 1.367664e+05 6443.122064 1512.796366 \n", "min 6.300000e+01 3274.314819 1.000000 \n", "25% 3.455000e+03 19408.061608 38.000000 \n", "50% 1.100700e+04 22993.187599 120.000000 \n", "75% 3.888200e+04 26299.994779 379.000000 \n", "max 2.751220e+06 113017.751479 31736.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.007000e+03 1007.000000 1007.000000 1007.000000 \n", "mean 2.030346e+05 271.937438 0.012093 12.365045 \n", "std 5.351729e+05 139.972161 0.006343 15.732390 \n", "min 1.680000e+02 11.000000 0.001057 0.000000 \n", "25% 1.509950e+04 173.000000 0.007854 3.000000 \n", "50% 4.758000e+04 255.000000 0.011327 5.900000 \n", "75% 1.662710e+05 342.000000 0.014652 14.350000 \n", "max 1.003911e+07 994.000000 0.081481 94.100000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1007.000000 1007.000000 1007.000000 ... 1007.000000 1007.000000 \n", "mean 22.890566 49.707254 4.982423 ... 11.348163 26.617577 \n", "std 18.824834 2.499203 1.795947 ... 5.740513 10.472965 \n", "min 0.000000 19.166215 1.900000 ... 1.900000 2.600000 \n", "25% 8.600000 49.336073 3.800000 ... 7.700000 18.700000 \n", "50% 16.200000 50.208257 4.700000 ... 10.100000 24.800000 \n", "75% 31.900000 50.863073 5.800000 ... 13.100000 32.450000 \n", "max 94.800000 58.100420 24.000000 ... 43.900000 75.100000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1007.000000 1007.000000 1007.000000 992.000000 982.000000 \n", "mean 20.354816 27.668818 461.912115 213.861895 31.293156 \n", "std 2.836158 4.213967 2770.745246 61.473242 17.404163 \n", "min 8.100000 11.800000 0.100000 46.200000 -21.417069 \n", "25% 18.700000 25.300000 12.250000 166.950000 21.203168 \n", "50% 21.200000 28.100000 46.900000 211.450000 28.190647 \n", "75% 22.600000 30.600000 177.150000 254.500000 37.751744 \n", "max 23.700000 39.700000 69468.400000 445.400000 99.552895 \n", "\n", " voter_gap trump biden \n", "count 982.000000 982.000000 982.000000 \n", "mean 17.149898 57.540224 40.390326 \n", "std 32.866155 16.459077 16.417551 \n", "min -90.000000 4.000000 3.100000 \n", "25% -5.675000 46.025000 28.400000 \n", "50% 19.750000 59.000000 39.200000 \n", "75% 40.775000 69.275000 51.800000 \n", "max 93.100000 96.200000 94.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1028.0000001028.0000001028.0000001.028000e+031028.0000001028.0000001028.0000001028.0000001028.0000001028.000000...1028.0000001028.0000001028.0000001028.0000001028.0000001009.0000001017.0000001016.0000001016.0000001016.000000
mean19504.95622623995.600128270.4698447.753021e+04392.9756810.01739410.35486422.80408649.9363015.255837...14.96410518.71313225.84280231.003794183.769747229.30505534.85311837.60590667.99301230.387106
std51624.5835375760.949741767.0455251.961051e+05165.6312120.00950315.64180320.7335522.2926431.910234...6.4070625.8684541.2197013.0338141134.00387454.05379412.75980627.36681913.72110213.655311
min35.0000006103.9221591.0000004.860000e+0214.0000000.0008090.0000000.20000032.8136271.800000...3.1000004.40000023.80000022.4000000.10000070.000000-168.323353-76.80000011.2000003.900000
25%2028.00000020493.71041334.0000008.923250e+03283.0000000.0117002.1000006.40000049.3428714.100000...10.50000014.50000024.90000028.70000013.350000194.30000028.56694322.50000060.17500020.300000
50%5230.00000024010.92287087.0000002.177150e+04378.0000000.0157934.20000014.50000050.3245915.000000...13.40000017.90000025.80000031.00000040.100000230.70000034.53732841.20000069.80000028.450000
75%14111.50000027195.090081202.2500005.723600e+04480.0000000.02075510.60000035.02500051.0765146.200000...18.10000021.60000026.80000033.100000107.425000263.20000040.99254758.10000078.20000038.025000
max714053.00000061934.12937912823.0000002.559902e+061229.0000000.09293799.20000099.40000056.63390721.800000...53.30000046.30000028.00000042.00000032903.300000433.900000100.00000092.00000095.90000088.000000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1028.000000 1028.000000 1028.000000 \n", "mean 19504.956226 23995.600128 270.469844 \n", "std 51624.583537 5760.949741 767.045525 \n", "min 35.000000 6103.922159 1.000000 \n", "25% 2028.000000 20493.710413 34.000000 \n", "50% 5230.000000 24010.922870 87.000000 \n", "75% 14111.500000 27195.090081 202.250000 \n", "max 714053.000000 61934.129379 12823.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.028000e+03 1028.000000 1028.000000 1028.000000 \n", "mean 7.753021e+04 392.975681 0.017394 10.354864 \n", "std 1.961051e+05 165.631212 0.009503 15.641803 \n", "min 4.860000e+02 14.000000 0.000809 0.000000 \n", "25% 8.923250e+03 283.000000 0.011700 2.100000 \n", "50% 2.177150e+04 378.000000 0.015793 4.200000 \n", "75% 5.723600e+04 480.000000 0.020755 10.600000 \n", "max 2.559902e+06 1229.000000 0.092937 99.200000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1028.000000 1028.000000 1028.000000 ... 1028.000000 1028.000000 \n", "mean 22.804086 49.936301 5.255837 ... 14.964105 18.713132 \n", "std 20.733552 2.292643 1.910234 ... 6.407062 5.868454 \n", "min 0.200000 32.813627 1.800000 ... 3.100000 4.400000 \n", "25% 6.400000 49.342871 4.100000 ... 10.500000 14.500000 \n", "50% 14.500000 50.324591 5.000000 ... 13.400000 17.900000 \n", "75% 35.025000 51.076514 6.200000 ... 18.100000 21.600000 \n", "max 99.400000 56.633907 21.800000 ... 53.300000 46.300000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1028.000000 1028.000000 1028.000000 1009.000000 1017.000000 \n", "mean 25.842802 31.003794 183.769747 229.305055 34.853118 \n", "std 1.219701 3.033814 1134.003874 54.053794 12.759806 \n", "min 23.800000 22.400000 0.100000 70.000000 -168.323353 \n", "25% 24.900000 28.700000 13.350000 194.300000 28.566943 \n", "50% 25.800000 31.000000 40.100000 230.700000 34.537328 \n", "75% 26.800000 33.100000 107.425000 263.200000 40.992547 \n", "max 28.000000 42.000000 32903.300000 433.900000 100.000000 \n", "\n", " voter_gap trump biden \n", "count 1016.000000 1016.000000 1016.000000 \n", "mean 37.605906 67.993012 30.387106 \n", "std 27.366819 13.721102 13.655311 \n", "min -76.800000 11.200000 3.900000 \n", "25% 22.500000 60.175000 20.300000 \n", "50% 41.200000 69.800000 28.450000 \n", "75% 58.100000 78.200000 38.025000 \n", "max 92.000000 95.900000 88.000000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_ratehispanicminorityfemaleunemployed...nodegreebachelorinactivityobesitydensitycancervoter_turnoutvoter_gaptrumpbiden
count1018.0000001018.0000001018.0000001.018000e+031018.0000001018.0000001018.0000001018.0000001018.0000001018.000000...1018.0000001018.0000001018.0000001018.0000001018.0000001010.0000001017.0000001017.0000001017.0000001017.000000
mean10061.78585525805.500407163.6434183.817520e+04433.4351670.0178194.97504922.32819350.1730256.245678...18.56591414.87436131.54115934.180157105.106189242.66425740.21690643.02969570.76450327.734808
std27440.4110565519.297680481.8039869.681484e+04134.7017250.0084677.26296120.0431832.3038341.958725...5.9773014.3331562.7357763.406723212.87617748.7685508.48299427.05460713.51094613.552863
min32.0000006911.4470841.0000004.620000e+0214.0000000.0005270.0000000.40000035.1503271.800000...3.9000005.80000028.00000023.8000000.30000094.4000001.730104-71.60000013.5000005.000000
25%2675.00000022639.00438148.0000001.110300e+04353.0000000.0131731.5000005.80000049.7963485.000000...14.20000011.80000029.30000031.90000024.525000212.42500034.46611333.90000065.90000018.800000
50%5405.00000026231.94199992.0000002.070000e+04431.0000000.0160782.60000015.00000050.6160076.100000...18.30000014.30000031.00000033.80000046.450000242.65000040.06771050.20000074.20000024.100000
75%10497.25000029331.867235169.0000004.047700e+04521.0000000.0202335.10000034.90000051.3026747.400000...22.60000017.30000033.37500036.30000093.525000272.07500045.90123360.90000079.80000032.400000
max661165.00000053241.68363911854.0000002.253857e+06996.0000000.07247763.50000091.20000056.63390716.900000...40.50000036.90000041.40000047.6000002800.000000458.30000070.00370988.30000093.30000085.100000
\n", "

8 rows × 21 columns

\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " # total covid cases # covid cases per 100k # covid deaths \\\n", "count 1018.000000 1018.000000 1018.000000 \n", "mean 10061.785855 25805.500407 163.643418 \n", "std 27440.411056 5519.297680 481.803986 \n", "min 32.000000 6911.447084 1.000000 \n", "25% 2675.000000 22639.004381 48.000000 \n", "50% 5405.000000 26231.941999 92.000000 \n", "75% 10497.250000 29331.867235 169.000000 \n", "max 661165.000000 53241.683639 11854.000000 \n", "\n", " population # covid deaths per 100k case_fatality_rate hispanic \\\n", "count 1.018000e+03 1018.000000 1018.000000 1018.000000 \n", "mean 3.817520e+04 433.435167 0.017819 4.975049 \n", "std 9.681484e+04 134.701725 0.008467 7.262961 \n", "min 4.620000e+02 14.000000 0.000527 0.000000 \n", "25% 1.110300e+04 353.000000 0.013173 1.500000 \n", "50% 2.070000e+04 431.000000 0.016078 2.600000 \n", "75% 4.047700e+04 521.000000 0.020233 5.100000 \n", "max 2.253857e+06 996.000000 0.072477 63.500000 \n", "\n", " minority female unemployed ... nodegree bachelor \\\n", "count 1018.000000 1018.000000 1018.000000 ... 1018.000000 1018.000000 \n", "mean 22.328193 50.173025 6.245678 ... 18.565914 14.874361 \n", "std 20.043183 2.303834 1.958725 ... 5.977301 4.333156 \n", "min 0.400000 35.150327 1.800000 ... 3.900000 5.800000 \n", "25% 5.800000 49.796348 5.000000 ... 14.200000 11.800000 \n", "50% 15.000000 50.616007 6.100000 ... 18.300000 14.300000 \n", "75% 34.900000 51.302674 7.400000 ... 22.600000 17.300000 \n", "max 91.200000 56.633907 16.900000 ... 40.500000 36.900000 \n", "\n", " inactivity obesity density cancer voter_turnout \\\n", "count 1018.000000 1018.000000 1018.000000 1010.000000 1017.000000 \n", "mean 31.541159 34.180157 105.106189 242.664257 40.216906 \n", "std 2.735776 3.406723 212.876177 48.768550 8.482994 \n", "min 28.000000 23.800000 0.300000 94.400000 1.730104 \n", "25% 29.300000 31.900000 24.525000 212.425000 34.466113 \n", "50% 31.000000 33.800000 46.450000 242.650000 40.067710 \n", "75% 33.375000 36.300000 93.525000 272.075000 45.901233 \n", "max 41.400000 47.600000 2800.000000 458.300000 70.003709 \n", "\n", " voter_gap trump biden \n", "count 1017.000000 1017.000000 1017.000000 \n", "mean 43.029695 70.764503 27.734808 \n", "std 27.054607 13.510946 13.552863 \n", "min -71.600000 13.500000 5.000000 \n", "25% 33.900000 65.900000 18.800000 \n", "50% 50.200000 74.200000 24.100000 \n", "75% 60.900000 79.800000 32.400000 \n", "max 88.300000 93.300000 85.100000 \n", "\n", "[8 rows x 21 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'inactivity')" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "nyexzlhQJ-7K", "outputId": "3ddab391-c0b1-42f0-b209-e423d2121da9" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
income# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivitytrumpbiden
count1007.0000001007.0000001007.0000001007.0000001.007000e+031007.0000001007.0000001007.0000001007.0000001002.0000001002.000000
mean35886.6951348288.69314825014.583343142.9036743.293902e+04464.2442900.02007932.94230429.30903767.41407231.157685
std3799.13807520595.4659926509.586741360.6814188.483606e+04157.8403130.0103884.4508424.67672715.97640615.928395
min21658.00000035.0000007186.8583161.0000004.030000e+0248.0000000.00233317.40000013.60000011.2000007.100000
25%33575.0000002237.50000020856.17379145.0000009.733000e+03367.5000000.01405130.10000025.90000058.60000018.925000
50%36566.0000004471.00000025428.26321581.0000001.778100e+04458.0000000.01750232.90000029.60000072.20000026.300000
75%38891.0000008407.50000029159.082843143.5000003.193600e+04551.5000000.02274035.90000032.80000079.60000039.975000
max41150.000000402352.00000056864.8755437728.0000001.584063e+061229.0000000.09293747.60000041.40000092.60000088.000000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1007.000000 1007.000000 \n", "mean 35886.695134 8288.693148 25014.583343 \n", "std 3799.138075 20595.465992 6509.586741 \n", "min 21658.000000 35.000000 7186.858316 \n", "25% 33575.000000 2237.500000 20856.173791 \n", "50% 36566.000000 4471.000000 25428.263215 \n", "75% 38891.000000 8407.500000 29159.082843 \n", "max 41150.000000 402352.000000 56864.875543 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 142.903674 3.293902e+04 464.244290 \n", "std 360.681418 8.483606e+04 157.840313 \n", "min 1.000000 4.030000e+02 48.000000 \n", "25% 45.000000 9.733000e+03 367.500000 \n", "50% 81.000000 1.778100e+04 458.000000 \n", "75% 143.500000 3.193600e+04 551.500000 \n", "max 7728.000000 1.584063e+06 1229.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 1002.000000 1002.000000 \n", "mean 0.020079 32.942304 29.309037 67.414072 31.157685 \n", "std 0.010388 4.450842 4.676727 15.976406 15.928395 \n", "min 0.002333 17.400000 13.600000 11.200000 7.100000 \n", "25% 0.014051 30.100000 25.900000 58.600000 18.925000 \n", "50% 0.017502 32.900000 29.600000 72.200000 26.300000 \n", "75% 0.022740 35.900000 32.800000 79.600000 39.975000 \n", "max 0.092937 47.600000 41.400000 92.600000 88.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
income# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivitytrumpbiden
count1007.0000001.007000e+031007.0000001007.0000001.007000e+031007.0000001007.0000001007.0000001007.000000998.000000998.000000
mean45378.5422052.117232e+0424204.352233261.2720958.303669e+04356.0089370.01515830.91827225.67815367.78917830.421844
std2620.6264596.098195e+045334.500991680.3384632.086682e+05139.7114640.0065993.6996324.17681313.07260012.934935
min41161.0000003.200000e+016911.4470841.0000004.620000e+0214.0000000.00080914.80000011.20000014.1000005.000000
25%43107.5000002.489500e+0320834.83270736.0000001.075750e+04259.5000000.01099228.70000023.00000060.52500021.000000
50%45207.0000006.889000e+0324120.603015100.0000002.887900e+04359.0000000.01422031.10000025.70000070.30000027.900000
75%47539.5000001.574600e+0427296.568873229.0000006.443850e+04447.0000000.01829333.40000028.30000077.30000037.600000
max50134.0000001.209302e+0661745.30187912823.0000002.716939e+06994.0000000.04819341.80000037.80000094.00000085.000000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 45378.542205 2.117232e+04 24204.352233 \n", "std 2620.626459 6.098195e+04 5334.500991 \n", "min 41161.000000 3.200000e+01 6911.447084 \n", "25% 43107.500000 2.489500e+03 20834.832707 \n", "50% 45207.000000 6.889000e+03 24120.603015 \n", "75% 47539.500000 1.574600e+04 27296.568873 \n", "max 50134.000000 1.209302e+06 61745.301879 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 261.272095 8.303669e+04 356.008937 \n", "std 680.338463 2.086682e+05 139.711464 \n", "min 1.000000 4.620000e+02 14.000000 \n", "25% 36.000000 1.075750e+04 259.500000 \n", "50% 100.000000 2.887900e+04 359.000000 \n", "75% 229.000000 6.443850e+04 447.000000 \n", "max 12823.000000 2.716939e+06 994.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 998.000000 998.000000 \n", "mean 0.015158 30.918272 25.678153 67.789178 30.421844 \n", "std 0.006599 3.699632 4.176813 13.072600 12.934935 \n", "min 0.000809 14.800000 11.200000 14.100000 5.000000 \n", "25% 0.010992 28.700000 23.000000 60.525000 21.000000 \n", "50% 0.014220 31.100000 25.700000 70.300000 27.900000 \n", "75% 0.018293 33.400000 28.300000 77.300000 37.600000 \n", "max 0.048193 41.800000 37.800000 94.000000 85.000000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
income# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivitytrumpbiden
count1007.0000001.007000e+031007.0000001007.0000001.007000e+031007.0000001007.0000001007.0000001007.000000983.000000983.000000
mean60131.2442904.693174e+0423724.142668502.0258192.008166e+05278.2859980.01204328.98073522.74528361.18545336.840793
std10901.9214151.326955e+056079.8723771543.1284835.287700e+05133.1593750.0060494.3086774.36024816.94230416.875312
min50154.0000006.300000e+013274.3148191.0000001.680000e+0211.0000000.00052711.8000008.1000004.0000003.100000
25%52596.0000003.365500e+0320339.53945540.0000001.461700e+04182.5000000.00814226.80000020.10000050.70000024.650000
50%56270.0000001.146300e+0423930.162156127.0000004.890400e+04270.0000000.01154729.40000022.90000063.10000035.000000
75%62830.5000004.015200e+0426817.374363414.0000001.692480e+05351.0000000.01485732.10000025.70000073.40000046.950000
max125635.0000002.751220e+06113017.75147931736.0000001.003911e+071212.0000000.08148139.70000037.80000096.20000094.000000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " income # total covid cases # covid cases per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 60131.244290 4.693174e+04 23724.142668 \n", "std 10901.921415 1.326955e+05 6079.872377 \n", "min 50154.000000 6.300000e+01 3274.314819 \n", "25% 52596.000000 3.365500e+03 20339.539455 \n", "50% 56270.000000 1.146300e+04 23930.162156 \n", "75% 62830.500000 4.015200e+04 26817.374363 \n", "max 125635.000000 2.751220e+06 113017.751479 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1007.000000 1.007000e+03 1007.000000 \n", "mean 502.025819 2.008166e+05 278.285998 \n", "std 1543.128483 5.287700e+05 133.159375 \n", "min 1.000000 1.680000e+02 11.000000 \n", "25% 40.000000 1.461700e+04 182.500000 \n", "50% 127.000000 4.890400e+04 270.000000 \n", "75% 414.000000 1.692480e+05 351.000000 \n", "max 31736.000000 1.003911e+07 1212.000000 \n", "\n", " case_fatality_rate obesity inactivity trump biden \n", "count 1007.000000 1007.000000 1007.000000 983.000000 983.000000 \n", "mean 0.012043 28.980735 22.745283 61.185453 36.840793 \n", "std 0.006049 4.308677 4.360248 16.942304 16.875312 \n", "min 0.000527 11.800000 8.100000 4.000000 3.100000 \n", "25% 0.008142 26.800000 20.100000 50.700000 24.650000 \n", "50% 0.011547 29.400000 22.900000 63.100000 35.000000 \n", "75% 0.014857 32.100000 25.700000 73.400000 46.950000 \n", "max 0.081481 39.700000 37.800000 96.200000 94.000000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'income',\n", " cols=['income','# total covid cases', '# covid cases per 100k', '# covid deaths',\n", " 'population', '# covid deaths per 100k', 'case_fatality_rate',\n", " 'obesity', 'inactivity', 'trump', 'biden'])" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "wU_sFjs5KA0D", "outputId": "93f56cf7-4681-4961-b454-4b7733b15626" }, "outputs": [ { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trumpincome# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivity
count995.000000995.0000009.950000e+02995.000000995.0000009.950000e+02995.000000995.000000995.000000995.000000
mean47.24170950299.4261315.845852e+0423489.186345646.1949752.454354e+05305.2341710.01340129.72532723.284322
std11.34009415572.2231281.435519e+055632.1029501677.2453945.579268e+05157.4854990.0076725.6601195.411057
min4.00000021658.0000001.840000e+027433.7390512.0000007.680000e+0211.0000000.00105711.8000008.100000
25%40.25000039014.5000005.238500e+0320108.63855869.5000002.243150e+04190.5000000.00849426.20000019.700000
50%50.10000048413.0000001.591700e+0423626.943005169.0000006.945000e+04281.0000000.01228429.70000023.000000
75%56.70000057390.0000005.389150e+0426781.198391581.5000002.343480e+05389.0000000.01608733.20000026.500000
max61.000000125635.0000002.751220e+0660447.41908931736.0000001.003911e+071064.0000000.06660347.60000040.800000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 995.000000 995.000000 9.950000e+02 995.000000 \n", "mean 47.241709 50299.426131 5.845852e+04 23489.186345 \n", "std 11.340094 15572.223128 1.435519e+05 5632.102950 \n", "min 4.000000 21658.000000 1.840000e+02 7433.739051 \n", "25% 40.250000 39014.500000 5.238500e+03 20108.638558 \n", "50% 50.100000 48413.000000 1.591700e+04 23626.943005 \n", "75% 56.700000 57390.000000 5.389150e+04 26781.198391 \n", "max 61.000000 125635.000000 2.751220e+06 60447.419089 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 995.000000 9.950000e+02 995.000000 \n", "mean 646.194975 2.454354e+05 305.234171 \n", "std 1677.245394 5.579268e+05 157.485499 \n", "min 2.000000 7.680000e+02 11.000000 \n", "25% 69.500000 2.243150e+04 190.500000 \n", "50% 169.000000 6.945000e+04 281.000000 \n", "75% 581.500000 2.343480e+05 389.000000 \n", "max 31736.000000 1.003911e+07 1064.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 995.000000 995.000000 995.000000 \n", "mean 0.013401 29.725327 23.284322 \n", "std 0.007672 5.660119 5.411057 \n", "min 0.001057 11.800000 8.100000 \n", "25% 0.008494 26.200000 19.700000 \n", "50% 0.012284 29.700000 23.000000 \n", "75% 0.016087 33.200000 26.500000 \n", "max 0.066603 47.600000 40.800000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trumpincome# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivity
count1005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.000000
mean68.31383146840.14726412720.00398024710.531247174.84378150517.506468377.1681590.01619031.63253726.274229
std3.9056559379.28094117651.7682095359.878850224.23363266151.583773144.7132710.0088673.5564024.149821
min61.00000025768.00000085.0000003274.3148191.000000403.00000014.0000000.00052719.50000014.100000
25%65.20000040422.0000003191.00000021552.77070749.00000013260.000000279.0000000.01137729.40000023.300000
50%68.60000045719.0000007100.00000025026.614998106.00000028529.000000371.0000000.01428231.70000026.000000
75%71.70000051839.00000014953.00000028180.451997216.00000060353.000000456.0000000.01860434.00000029.100000
max74.50000097936.000000202690.00000055122.9170102964.000000636234.0000001229.0000000.08574942.30000039.900000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 1005.000000 \n", "mean 68.313831 46840.147264 12720.003980 24710.531247 \n", "std 3.905655 9379.280941 17651.768209 5359.878850 \n", "min 61.000000 25768.000000 85.000000 3274.314819 \n", "25% 65.200000 40422.000000 3191.000000 21552.770707 \n", "50% 68.600000 45719.000000 7100.000000 25026.614998 \n", "75% 71.700000 51839.000000 14953.000000 28180.451997 \n", "max 74.500000 97936.000000 202690.000000 55122.917010 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 174.843781 50517.506468 377.168159 \n", "std 224.233632 66151.583773 144.713271 \n", "min 1.000000 403.000000 14.000000 \n", "25% 49.000000 13260.000000 279.000000 \n", "50% 106.000000 28529.000000 371.000000 \n", "75% 216.000000 60353.000000 456.000000 \n", "max 2964.000000 636234.000000 1229.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 0.016190 31.632537 26.274229 \n", "std 0.008867 3.556402 4.149821 \n", "min 0.000527 19.500000 14.100000 \n", "25% 0.011377 29.400000 23.300000 \n", "50% 0.014282 31.700000 26.000000 \n", "75% 0.018604 34.000000 29.100000 \n", "max 0.085749 42.300000 39.900000 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
trumpincome# total covid cases# covid cases per 100k# covid deathspopulation# covid deaths per 100kcase_fatality_rateobesityinactivity
count1005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.0000001005.000000
mean80.78696543878.3741295059.20497524627.39090485.42786120120.834826422.1512440.01791831.53711428.320597
std4.2895039102.3267016019.2019746383.615594103.65506523113.800617162.7447740.0084653.5929864.534053
min74.50000023047.00000032.0000004916.8858791.000000168.00000027.0000000.00130019.40000013.600000
25%77.50000037417.0000001364.00000020633.46759422.0000005925.000000327.0000000.01293728.90000025.000000
50%80.10000042310.0000003257.00000024538.95370754.00000013287.000000422.0000000.01654931.50000027.900000
75%83.60000049184.0000006730.00000028499.655884113.00000025924.000000525.0000000.02135233.90000031.900000
max96.20000086354.00000057505.000000113017.7514791435.000000223233.0000001212.0000000.09293743.20000041.400000
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ], "text/plain": [ " trump income # total covid cases # covid cases per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 1005.000000 \n", "mean 80.786965 43878.374129 5059.204975 24627.390904 \n", "std 4.289503 9102.326701 6019.201974 6383.615594 \n", "min 74.500000 23047.000000 32.000000 4916.885879 \n", "25% 77.500000 37417.000000 1364.000000 20633.467594 \n", "50% 80.100000 42310.000000 3257.000000 24538.953707 \n", "75% 83.600000 49184.000000 6730.000000 28499.655884 \n", "max 96.200000 86354.000000 57505.000000 113017.751479 \n", "\n", " # covid deaths population # covid deaths per 100k \\\n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 85.427861 20120.834826 422.151244 \n", "std 103.655065 23113.800617 162.744774 \n", "min 1.000000 168.000000 27.000000 \n", "25% 22.000000 5925.000000 327.000000 \n", "50% 54.000000 13287.000000 422.000000 \n", "75% 113.000000 25924.000000 525.000000 \n", "max 1435.000000 223233.000000 1212.000000 \n", "\n", " case_fatality_rate obesity inactivity \n", "count 1005.000000 1005.000000 1005.000000 \n", "mean 0.017918 31.537114 28.320597 \n", "std 0.008465 3.592986 4.534053 \n", "min 0.001300 19.400000 13.600000 \n", "25% 0.012937 28.900000 25.000000 \n", "50% 0.016549 31.500000 27.900000 \n", "75% 0.021352 33.900000 31.900000 \n", "max 0.092937 43.200000 41.400000 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "view_partitions(merged, 'trump',\n", " cols=['trump','income','# total covid cases', '# covid cases per 100k', '# covid deaths',\n", " 'population', '# covid deaths per 100k', 'case_fatality_rate',\n", " 'obesity', 'inactivity',])" ] }, { "cell_type": "markdown", "metadata": { "id": "8kcwLx7LKIhU" }, "source": [ "We split the data into equal lower, middle, and upper quantiles based on first obeity and then inactivity. We can see that the the average death rates of counties in these partitions is positivly correlated with both of these features. This was expected as preexisting health conditions (obescity) and heath risks (inactivity) increase all cause mortality but also have a strong effect on how serious a covid infection can be. Finally we see that income has an even stronger relationship with the death rate, though here the correlation is a negative one. Obesity and inactivity are both negatively correlated with income as well. The relationship between voting for Trump and income is not a string one strong, though there is a positive correlation between Trump voting and obesity, inactivity, and covid death rate. " ] }, { "cell_type": "markdown", "metadata": { "id": "onZEIEqYKVAn" }, "source": [ "# 6. Data Weakness\n", "\n", "We can tell from comparing the populations between the groups that this data is not treated granularly as would be ideal. Very small population counties that get weighted the same as very large population counties in regards to the mean. So rural areas get over represented in the averages nationwide. This also explains why the Trump vs Biden is so far skewed from the actual well known national average based on popular votes.\n", "\n", "Also, the difference between income correlates more with population density than it might with an individual socio economic status. First, a higher income might not go as far towards standard of living in the city as it does in rural areas. Second, by using the average income over the whole county, income inequality in that county is not factored in. There could be many low income individuals living with many high income individuals in the same county. " ] }, { "cell_type": "markdown", "metadata": { "id": "oWmVkVxeK3yd" }, "source": [ "# 7. Future Work\n", "\n", "We have done data gathing, parsing, and exploring data. We can continue to predict some behavior of the data (e.g. how a particular county will respond to COVID on a weekly basis).\n", "\n", "Alternatively, we could be interested in inference, whereby we are more concerned with trying to understand why and how a system behaves the way it does. We might wish to understand which factors most correlate and cause a certain event to happen. This could give us insights into where certain inequalities persist." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "project-notebook.ipynb", "provenance": [], "toc_visible": true }, "interpreter": { "hash": "6136f57e9522e1f7c9a1d00a6950a9c42424b667ef418888919cda0b3f236728" }, "kernelspec": { "display_name": "Python 3.8.8 ('base')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 0 }