Due before class on April 9th.
Now that you’ve demonstrated your software is setup, the goal of this assignment is to practice transforming and exploring data.
hw02
repositoryGo here to fork the repo for homework 02.
FiveThirtyEight, a data journalism site devoted to politics, sports, science, economics, and culture, recently published a series of articles on gun deaths in America. Gun violence in the United States is a significant political issue, and while reducing gun deaths is a noble goal, we must first understand the causes and patterns in gun violence in order to craft appropriate policies. As part of the project, FiveThirtyEight collected data from the Centers for Disease Control and Prevention, as well as other governmental agencies and non-profits, on all gun deaths in the United States from 2012-2014.
I have included this dataset in the rcfss
library on GitHub. To install the package, use the command devtools::install_github("uc-cfss/rcfss")
in R. If you don’t already have the devtools
library installed, you will get an error. Go back and install this first using install.packages()
, then install rcfss
. The gun deaths dataset can be loaded using data("gun_deaths")
. Use the help function in R (?gun_deaths
) to get detailed information on the variables and coding information.
Using your knowledge of dplyr
and ggplot2
, use summary statistics and graphs to answer the following questions:
While you are practicing exploratory data analysis, your final graphs should be appropriate for sharing with outsiders. That means your graphs should have:
?labs
for details)This is just a starting point. Consider adopting your own color scales, taking control of your legends (if any), playing around with themes, etc.
When presenting tabular data (aka dplyr::summarize()
), make sure you format it correctly. Use the kable()
function from the knitr
package to format the table for the final document. For instance, this is a poorly presented table summarizing where gun deaths occurred:
library(tidyverse)
library(knitr)
library(rcfss)
# calculate total gun deaths by location
count(gun_deaths, place)
## # A tibble: 11 x 2
## place n
## <chr> <int>
## 1 Farm 470
## 2 Home 60486
## 3 Industrial/construction 248
## 4 Other specified 13751
## 5 Other unspecified 8867
## 6 Residential institution 203
## 7 School/instiution 671
## 8 Sports 128
## 9 Street 11151
## 10 Trade/service area 3439
## 11 <NA> 1384
Instead, use kable()
to format the table, add a caption, and label the columns:
count(gun_deaths, place) %>%
kable(caption = "Gun deaths in the United States (2012-2014), by location",
col.names = c("Location", "Number of deaths"))
Location | Number of deaths |
---|---|
Farm | 470 |
Home | 60486 |
Industrial/construction | 248 |
Other specified | 13751 |
Other unspecified | 8867 |
Residential institution | 203 |
School/instiution | 671 |
Sports | 128 |
Street | 11151 |
Trade/service area | 3439 |
NA | 1384 |
Run ?kable
in the console to see how additional options.
Note that when viewed on GitHub, table captions will not show up. Just a (missing) feature of Markdown on GitHub 😢
Your assignment should be submitted as an R Markdown document. Don’t know what an R Markdown document is? Read this! Or this! I have included starter files for you to modify to complete the assignment, so you are not beginning completely from scratch.
Follow instructions on homework workflow. As part of the pull request, you’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
Check minus: Displays minimal effort. Doesn’t complete all components. Code is poorly written and not documented. Uses the same type of plot for each graph, or doesn’t use plots appropriate for the variables being analyzed. No record of commits other than the final push to GitHub.
Check: Solid effort. Hits all the elements. No clear mistakes. Easy to follow (both the code and the output). Nothing spectacular, either bad or good.
Check plus: Finished all components of the assignment correctly. Code is well-documented (both self-documented and with additional comments as necessary). Graphs and tables are properly labeled. Uses multiple commits to back up and show a progression in the work. Analysis is clear and easy to follow, either because graphs are labeled clearly or you’ve written additional text to describe how you interpret the output.
This work is licensed under the CC BY-NC 4.0 Creative Commons License.