<h1>Covid 19<h>

Active reporting on COVID-19 deaths will drastically improve civil and government measures. We will pull COVID-19 data and build an interactive nation wide map broken down by state and age group.

**Steps**

**1. Import Libraries**
**2. Process Data**
**3. Map Data**
**4. Feedback**

<h2> CDC Data <h>

https://data.cdc.gov/NCHS/Provisional-COVID-19-Death-Counts-by-Sex-Age-and-S/9bhg-hcku

In [1]:
import numpy as np 
import pandas as pd 
import requests 
import folium
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes
from geopy.geocoders import Nominatim 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
!pip install lxml
!pip install beautifulsoup4
!pip install geocoder

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         395 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

Pull the Data, check it out.

In [145]:
df = pd.read_csv('https://data.cdc.gov/api/views/9bhg-hcku/rows.csv?accessType=DOWNLOAD')
df.dtypes

Data as of                                   object
Start week                                   object
End Week                                     object
State                                        object
Sex                                          object
Age group                                    object
COVID-19 Deaths                             float64
Total Deaths                                float64
Pneumonia Deaths                            float64
Pneumonia and COVID-19 Deaths               float64
Influenza Deaths                            float64
Pneumonia, Influenza, or COVID-19 Deaths    float64
Footnote                                     object
dtype: object

Trim Data

In [146]:
df1 = df.drop(columns=['Data as of', 'Start week', 'End Week', 'Sex', 'Total Deaths', 'Pneumonia Deaths', 
      'Pneumonia and COVID-19 Deaths', 'Influenza Deaths', 'Pneumonia, Influenza, or COVID-19 Deaths', 'Footnote'])
df1 = df1.rename(columns={"COVID-19 Deaths":"Deaths", "Age group":"Age"})
df1["Age"].replace({"Under 1 year": "0-1 years"}, inplace=True)
df1["State"].replace({"New York City" : "New York"}, inplace=True)
df1 = df1[(df1.State != 'United States') & (df1.Age != 'All ages') & (df1.Age != 'All Ages')]
df1 = df1.groupby(['State'], as_index = False).sum()
df1['State'] = df1['State'].replace({
    'Alabama': 'AL',
    'Alaska': 'AK',
    'American Samoa': 'AS',
    'Arizona': 'AZ',
    'Arkansas': 'AR',
    'California': 'CA',
    'Colorado': 'CO',
    'Connecticut': 'CT',
    'Delaware': 'DE',
    'District of Columbia': 'DC',
    'Florida': 'FL',
    'Georgia': 'GA',
    'Guam': 'GU',
    'Hawaii': 'HI',
    'Idaho': 'ID',
    'Illinois': 'IL',
    'Indiana': 'IN',
    'Iowa': 'IA',
    'Kansas': 'KS',
    'Kentucky': 'KY',
    'Louisiana': 'LA',
    'Maine': 'ME',
    'Maryland': 'MD',
    'Massachusetts': 'MA',
    'Michigan': 'MI',
    'Minnesota': 'MN',
    'Mississippi': 'MS',
    'Missouri': 'MO',
    'Montana': 'MT',
    'Nebraska': 'NE',
    'Nevada': 'NV',
    'New Hampshire': 'NH',
    'New Jersey': 'NJ',
    'New Mexico': 'NM',
    'New York': 'NY',
    'North Carolina': 'NC',
    'North Dakota': 'ND',
    'Northern Mariana Islands':'MP',
    'Ohio': 'OH',
    'Oklahoma': 'OK',
    'Oregon': 'OR',
    'Pennsylvania': 'PA',
    'Puerto Rico': 'PR',
    'Rhode Island': 'RI',
    'South Carolina': 'SC',
    'South Dakota': 'SD',
    'Tennessee': 'TN',
    'Texas': 'TX',
    'Utah': 'UT',
    'Vermont': 'VT',
    'Virgin Islands': 'VI',
    'Virginia': 'VA',
    'Washington': 'WA',
    'West Virginia': 'WV',
    'Wisconsin': 'WI',
    'Wyoming': 'WY'
})
df1

Unnamed: 0,State,Deaths
0,AL,638.0
1,AK,0.0
2,AZ,799.0
3,AR,82.0
4,CA,3793.0
5,CO,1310.0
6,CT,2155.0
7,DE,346.0
8,DC,373.0
9,FL,2297.0


In [147]:
df2 = df.drop(columns=['Data as of', 'Start week', 'End Week', 'Sex', 'Total Deaths', 'Pneumonia Deaths', 
      'Pneumonia and COVID-19 Deaths', 'Influenza Deaths', 'Pneumonia, Influenza, or COVID-19 Deaths', 'Footnote'])
df2 = df2.rename(columns={"COVID-19 Deaths":"Deaths", "Age group":"Age"})
df2["Age"].replace({"Under 1 year": "0-1 years"}, inplace=True)
df2["State"].replace({"New York City" : "New York"}, inplace=True)
df2 = df2[(df2.State != 'United States') & (df2.Age != 'All ages') & (df2.Age != 'All Ages')]
df2 = df2.groupby(['State'], as_index = False).sum()
df3 = pd.read_html('https://developers.google.com/public-data/docs/canonical/states_csv')[0]
df3 = df3.rename(columns={"latitude": "Latitude", "longitude": "Longitude", "name":"State" })
df3 = df3.drop(columns='state')
df3 = df3[['State', 'Latitude', 'Longitude']]
df5 = pd.merge(df2, df3, on=('State'))
df5

Unnamed: 0,State,Deaths,Latitude,Longitude
0,Alabama,638.0,32.318231,-86.902298
1,Alaska,0.0,63.588753,-154.493062
2,Arizona,799.0,34.048928,-111.093731
3,Arkansas,82.0,35.20105,-91.831833
4,California,3793.0,36.778261,-119.417932
5,Colorado,1310.0,39.550051,-105.782067
6,Connecticut,2155.0,41.603221,-73.087749
7,Delaware,346.0,38.910832,-75.52767
8,District of Columbia,373.0,38.905985,-77.033418
9,Florida,2297.0,27.664827,-81.515754


In [148]:
url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
state_geo = f'{url}/us-states.json'

In [149]:
Covid_19 = folium.Map(location=[37.0902, -95.7129], tiles='Stamen Toner', zoom_start=3,)
Covid_19.choropleth(geo_data=state_geo,
    data=df1,
    columns=['State', 'Deaths'],
    key_on='feature.id',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='CDC COVID 19 Deaths'
)

folium.LayerControl().add_to(Covid_19)

for lat, lng, State, Deaths in zip(df5['Latitude'], df5['Longitude'], df5['State'], df5['Deaths']):
    label = 'State: {} Deaths: {}'.format(State,Deaths)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(Covid_19) 

Covid_19

**Feedback:CDC data is limited and suppressed, if you check the data itself "..suppressed in accordance with NCHS confidentiality standards." there is a clear indication that the data itself is very limited. The data was published May 7 as well during this notebook publishing.**