{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Final assignment\n", "\n", "- The final assignment is a problem set in which I try to expose you to some issues you might experience in the real world. The goal is to compile data and understand any relationship between population, population density and the spread of COVID-19." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load in basic packages and import the covid data.\n", "- Import the `NumPy` and `pandas` packages.\n", "- Import the COVID-19 data from the following link: \n", " - Import the COVID-19 data from the following link: https://raw.githubusercontent.com/nytimes/covid-19-data/master/live/us-counties.csv\n", " - This data was compiled by the [New York Times](https://github.com/nytimes).\n", "- Import this file as a variable named `covid`" ] }, { "cell_type": "code", "execution_count": 93, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Explore the data.\n", "- Display the columns in the dataframe as a list.\n", "- Make a variable `tableCases` in which you use the `pandas` attribute `pivot_table` to display the total number of cases in each state using the additional argument `aggfunc = np.sum`.\n", "- Make a variable `tableDeaths` in which you use the `pandas` attribute `pivot_table` to display the average number of deaths by state using the additional argument `aggfunc = np.sum`.\n", "- use `print()` to display both of these tables.\n" ] }, { "cell_type": "code", "execution_count": 94, "metadata": {}, "outputs": [], "source": [ "# Display the columns below this line:" ] }, { "cell_type": "code", "execution_count": 95, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Understanding `pivot_table` more.\n", "- Run the function `.columns` on your two tables.\n", " - Notice how 'cases' and 'deaths' are the only columns displayed. This is because 'state' is used as index.\n", "- Run the function `.reset_index()` on both `tableCases` and `tableDeaths` and update these variables.\n", "- Now look at your tables again." ] }, { "cell_type": "code", "execution_count": 96, "metadata": {}, "outputs": [], "source": [ "# Review the columns below this line:\n" ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Merge your two dataframes.\n", "- Create a new variable `casesDeaths` which merges the two tables you have created.\n", "\n", "- Display this new dataframe." ] }, { "cell_type": "code", "execution_count": 98, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 5. Plot some data.\n", "\n", "- Load in the `matplotlib.pyplot as plt`.\n", "- Conduct the following steps in order:\n", " - Using the `casesDeaths` variable run `plot.bar(x= ,y=)`. \n", " - Put `state` in `x` and `['cases','deaths']` in `y`.\n", " - Now from `plt` change the title of the table to 'COVID-19 Deaths and Cases by State' and the y-axis title to 'Total' using the attributes `.title` and `.ylabel`." ] }, { "cell_type": "code", "execution_count": 99, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 6. Plot a subset.\n", "- Now subset the `casesDeaths` variable to a new variable `myFive` and select five states you are interested in and plot those as you plotted `casesDeaths` above.\n", " - You can use brackets to subset standard and within the brackets use the `.isin()` function of the `.state` column." ] }, { "cell_type": "code", "execution_count": 100, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 7. Load the `mo_dict` data below.\n", "- Run the code below to create the `mo_dict` variable." ] }, { "cell_type": "code", "execution_count": 101, "metadata": {}, "outputs": [], "source": [ "mo_dict = {'001': {'county': 'Adair County', 'population': 25607, 'area': 568.0}, '003': {'county': 'Andrew County', 'population': 17291, 'area': 435.0}, '005': {'county': 'Atchison County', 'population': 5685, 'area': 545.0},\n", " '007': {'county': 'Audrain County', 'population': 25529, 'area': 693.0}, '009': {'county': 'Barry County', 'population': 35597, 'area': 779.0}, '011': {'county': 'Barton County', 'population': 12402, 'area': 594.0},\n", " '013': {'county': 'Bates County', 'population': 17049, 'area': 848.0}, '015': {'county': 'Benton County', 'population': 19056, 'area': 706.0}, '017': {'county': 'Bollinger County', 'population': 12363, 'area': 621.0},\n", " '019': {'county': 'Boone County', 'population': 162642, 'area': 685.0}, '021': {'county': 'Buchanan County', 'population': 89201, 'area': 410.0}, '023': {'county': 'Butler County', 'population': 42794, 'area': 698.0},\n", " '025': {'county': 'Caldwell County', 'population': 9424, 'area': 429.0}, '027': {'county': 'Callaway County', 'population': 44332, 'area': 839.0}, '029': {'county': 'Camden County', 'population': 44002, 'area': 655.0},\n", " '031': {'county': 'Cape Girardeau County', 'population': 75674, 'area': 579.0}, '033': {'county': 'Carroll County', 'population': 9295, 'area': 695.0}, '035': {'county': 'Carter County', 'population': 6265, 'area': 508.0},\n", " '037': {'county': 'Cass County', 'population': 99478, 'area': 699.0}, '039': {'county': 'Cedar County', 'population': 13982, 'area': 476.0}, '041': {'county': 'Chariton County', 'population': 7831, 'area': 756.0},\n", " '043': {'county': 'Christian County', 'population': 77422, 'area': 563.0}, '045': {'county': 'Clark County', 'population': 7139, 'area': 507.0}, '047': {'county': 'Clay County', 'population': 221939, 'area': 396.0},\n", " '049': {'county': 'Clinton County', 'population': 20743, 'area': 419.0}, '051': {'county': 'Cole County', 'population': 75990, 'area': 392.0}, '053': {'county': 'Cooper County', 'population': 17601, 'area': 565.0},\n", " '055': {'county': 'Crawford County', 'population': 24696, 'area': 743.0}, '057': {'county': 'Dade County', 'population': 7883, 'area': 490.0}, '059': {'county': 'Dallas County', 'population': 16777, 'area': 542.0},\n", " '061': {'county': 'Daviess County', 'population': 8433, 'area': 567.0}, '063': {'county': 'DeKalb County', 'population': 12892, 'area': 424.0}, '065': {'county': 'Dent County', 'population': 15657, 'area': 754.0},\n", " '067': {'county': 'Douglas County', 'population': 13684, 'area': 815.0}, '069': {'county': 'Dunklin County', 'population': 31953, 'area': 546.0}, '071': {'county': 'Franklin County', 'population': 101492, 'area': 922.0},\n", " '073': {'county': 'Gasconade County', 'population': 15222, 'area': 520.0}, '075': {'county': 'Gentry County', 'population': 6738, 'area': 492.0}, '077': {'county': 'Greene County', 'population': 275174, 'area': 675.0},\n", " '079': {'county': 'Grundy County', 'population': 10261, 'area': 436.0}, '081': {'county': 'Harrison County', 'population': 8957, 'area': 725.0}, '083': {'county': 'Henry County', 'population': 22272, 'area': 702.0},\n", " '085': {'county': 'Hickory County', 'population': 9627, 'area': 399.0}, '087': {'county': 'Holt County', 'population': 4912, 'area': 462.0}, '089': {'county': 'Howard County', 'population': 10144, 'area': 466.0},\n", " '091': {'county': 'Howell County', 'population': 40400, 'area': 928.0}, '093': {'county': 'Iron County', 'population': 10630, 'area': 551.0}, '095': {'county': 'Jackson County', 'population': 674158, 'area': 605.0},\n", " '097': {'county': 'Jasper County', 'population': 117404, 'area': 640.0}, '099': {'county': 'Jefferson County', 'population': 218733, 'area': 657.0}, '101': {'county': 'Johnson County', 'population': 52595, 'area': 831.0},\n", " '103': {'county': 'Knox County', 'population': 4131, 'area': 506.0}, '105': {'county': 'Laclede County', 'population': 35571, 'area': 766.0}, '107': {'county': 'Lafayette County', 'population': 33381, 'area': 629.0},\n", " '109': {'county': 'Lawrence County', 'population': 38634, 'area': 613.0}, '111': {'county': 'Lewis County', 'population': 10211, 'area': 505.0}, '113': {'county': 'Lincoln County', 'population': 52566, 'area': 630.0},\n", " '115': {'county': 'Linn County', 'population': 12761, 'area': 620.0}, '117': {'county': 'Livingston County', 'population': 15195, 'area': 535.0}, '121': {'county': 'Macon County', 'population': 15566, 'area': 804.0},\n", " '123': {'county': 'Madison County', 'population': 12226, 'area': 497.0}, '125': {'county': 'Maries County', 'population': 9176, 'area': 528.0}, '127': {'county': 'Marion County', 'population': 28781, 'area': 438.0},\n", " '119': {'county': 'McDonald County', 'population': 23083, 'area': 540.0}, '129': {'county': 'Mercer County', 'population': 3785, 'area': 454.0}, '131': {'county': 'Miller County', 'population': 24748, 'area': 592.0},\n", " '133': {'county': 'Mississippi County', 'population': 14358, 'area': 413.0}, '135': {'county': 'Moniteau County', 'population': 15607, 'area': 417.0}, '137': {'county': 'Monroe County', 'population': 8840, 'area': 646.0},\n", " '139': {'county': 'Montgomery County', 'population': 12236, 'area': 539.0}, '141': {'county': 'Morgan County', 'population': 20565, 'area': 598.0}, '143': {'county': 'New Madrid County', 'population': 18956, 'area': 678.0},\n", " '145': {'county': 'Newton County', 'population': 58114, 'area': 626.0}, '147': {'county': 'Nodaway County', 'population': 23370, 'area': 877.0}, '149': {'county': 'Oregon County', 'population': 10881, 'area': 792.0},\n", " '151': {'county': 'Osage County', 'population': 13878, 'area': 606.0}, '153': {'county': 'Ozark County', 'population': 9723, 'area': 747.0}, '155': {'county': 'Pemiscot County', 'population': 18296, 'area': 493.0},\n", " '157': {'county': 'Perry County', 'population': 18971, 'area': 475.0}, '159': {'county': 'Pettis County', 'population': 42201, 'area': 685.0}, '161': {'county': 'Phelps County', 'population': 45156, 'area': 673.0},\n", " '163': {'county': 'Pike County', 'population': 18516, 'area': 673.0}, '165': {'county': 'Platte County', 'population': 89322, 'area': 420.0}, '167': {'county': 'Polk County', 'population': 31137, 'area': 637.0},\n", " '169': {'county': 'Pulaski County', 'population': 52274, 'area': 547.0}, '171': {'county': 'Putnam County', 'population': 4979, 'area': 518.0}, '173': {'county': 'Ralls County', 'population': 10167, 'area': 471.0},\n", " '175': {'county': 'Randolph County', 'population': 25414, 'area': 482.0}, '177': {'county': 'Ray County', 'population': 23494, 'area': 570.0}, '179': {'county': 'Reynolds County', 'population': 6696, 'area': 811.0},\n", " '181': {'county': 'Ripley County', 'population': 14100, 'area': 630.0}, '183': {'county': 'St. Charles County', 'population': 360485, 'area': 561.0}, '185': {'county': 'St. Clair County', 'population': 9805, 'area': 677.0},\n", " '187': {'county': 'St. Francois County', 'population': 65359, 'area': 450.0}, '189': {'county': 'St. Louis County', 'population': 1000438, 'area': 508.0}, '510': {'county': 'St. Louis city County', 'population': 319294, 'area': 61.9},\n", " '186': {'county': 'St. Genevieve County', 'population': 18145, 'area': 502.0}, '195': {'county': 'Saline County', 'population': 23370, 'area': 756.0}, '197': {'county': 'Schuyler County', 'population': 4431, 'area': 308.0},\n", " '199': {'county': 'Scotland County', 'population': 4843, 'area': 438.0}, '201': {'county': 'Scott County', 'population': 39191, 'area': 421.0}, '203': {'county': 'Shannon County', 'population': 8441, 'area': 1004.0},\n", " '205': {'county': 'Shelby County', 'population': 6373, 'area': 501.0}, '207': {'county': 'Stoddard County', 'population': 29968, 'area': 827.0}, '209': {'county': 'Stone County', 'population': 32202, 'area': 463.0},\n", " '211': {'county': 'Sullivan County', 'population': 6714, 'area': 651.0}, '213': {'county': 'Taney County', 'population': 51675, 'area': 632.0}, '215': {'county': 'Texas County', 'population': 26008, 'area': 1179.0},\n", " '217': {'county': 'Vernon County', 'population': 21159, 'area': 834.0}, '219': {'county': 'Warren County', 'population': 32513, 'area': 432.0}, '221': {'county': 'Washington County', 'population': 25195, 'area': 760.0},\n", " '223': {'county': 'Wayne County', 'population': 13521, 'area': 761.0}, '225': {'county': 'Webster County', 'population': 36202, 'area': 593.0}, '227': {'county': 'Worth County', 'population': 2171, 'area': 266.0},\n", " '229': {'county': 'Wright County', 'population': 18815, 'area': 682.0}}\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 8. Change dictionary to panda\n", "- Take the `mo_dict` file and change it to a panda object named `mo_counties`\n", "- Transpose the data so that the `county population` and `area` are the columns and each county is a row.\n", "- Be sure to overwrite your panda file and display your updated (transposed) dataframe." ] }, { "cell_type": "code", "execution_count": 102, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 9. Look at the unique counties.\n", "\n", "- Look at the counties in your `mo_counties` data.\n", "- Display the number of unique counties in Missouri." ] }, { "cell_type": "code", "execution_count": 103, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 10. Edit data for merge.\n", "\n", "- Notice that the `covid` data does not include the word 'county' after the county name.\n", "- Edit the data such that the word ' County' is added to each county in the `covid` data.\n", "- Display your updated data frame.\n" ] }, { "cell_type": "code", "execution_count": 104, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 11. Merge dataframes.\n", "- Merge the `covid` dataframe and the `mo_counties` dataframe.\n", "- Be sure to subset the `covid` dataframe to only counties in Missouri because multiple states have the same county names.\n", " - You can do this without making a new variable by performing the subset in the merge call." ] }, { "cell_type": "code", "execution_count": 105, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 12. Make a new variable.\n", "\n", "- Add a new column to your `covidMO` dataframe called `density`.\n", "- This variable is calculated by dividing the `population` variable by the `area` variable\n", " - *You can do division element wise and just divide `.population` by `.area`.*" ] }, { "cell_type": "code", "execution_count": 106, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 13. Create a scatter plot.\n", "- Make a scatter plot using `panda` attributes where you observe the relationship between `density` (x) and `cases` (y).\n", "- Add a title to the plot and include useful axis names.\n", "- Make the clustering of the points on the plot more visible by changing their transparency." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 14. Look at a subset of the data.\n", "- Load in the `seaborn` packages `as sns`.\n", "- Use the `.scatterplot` attribute of seaborn to make a density/cases plot similar to the one above.\n", "- Subset down to the following counties: 'St. Charles County' , 'St. Louis County'. 'St. Louis city County, 'Webster County', 'Bollinger County','Cole County','Madison County'.\n", " - *Make sure to specify the data you are using in your call (you can subset the data here).*\n", "- Add the argument `hue = 'county'` to your `scatterplot` call.\n", "- Make sure the legend is outside of the plot.\n", " - This plot is larger than our previous exposure to the `.legend` attribute. You can change the value to (1.5, 1) instead of (1,1).\n", "- **Give the plot and axes useful names**." ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 15. Run the same plot with deaths instead of cases.\n", "- Change the axis and title names to reflect these changes." ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 16. Change the type of a column.\n", "- Change the `density` column to interger values (currently float) using the `.astype` attribute of `pandas`." ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 17. Run a regression.\n", "- Using the `statsmodels.api as smf` package run and display the regression 'cases ~ density' on the `covidMO` dataset." ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 18. Run the same regression with deaths as the outcome variable." ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 19. Save the `covidMO` dataset.\n", "- Save the `covidMO` dataset to a csv file named `covidMO.csv`." ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 20. Save your regression tables.\n", "- Import `from stargazer.stargazer import Stargazer` to use the `stargazer` package to export your table.\n", "- Print out the HTML code using the `.render_html()` attribute of `Stargazer`.\n", " - *Make sure to put brackets around your lm object for stargazer to process correctly*.\n", "- Put this code into 2 variables 'regresion1' for the 'cases\\~density' table and 'regression2' for the 'deaths\\~density' tables.\n", "- Save these two objects of HTML string code into 2 text files with the same name as the variables." ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [], "source": [ "# Type your solution below this line:\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Thank you for taking my course!\n", "- Push this problem set to your 'Final Assignment' GitHub repository.\n", "- If you have the time please provide an evaluation for this course." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.2" } }, "nbformat": 4, "nbformat_minor": 4 }