--- title: "Homework 1: Example key" author: "Rebecca Ferrell" date: "April 6, 2016" output: html_document: theme: paper --- ```{r load_libraries, echo=FALSE, warning=FALSE, message=FALSE} library(pander) ``` ## Data description The `swiss` data contain several variables pertaining to socioeconomic information of `r nrow(swiss)` French-speaking provinces of Switzerland in 1888. These include: * **Fertility**: "a common standardized fertility measure" * **Agriculture**: percent of males involved in agriculture as occupation * **Examination** : percent of draftees receiving highest mark on army examination * **Education**: percent of draftees having education beyond primary school * **Catholic**: percent of the population that is Catholic (as opposed to Protestant) * **Infant Mortality**: percentage of live births who live less than 1 year. ```{r, echo=FALSE} pander(head(swiss), split.tables = 120, caption = "Example observations within the Swiss data") ``` We can observe joint relationships between these variables in the figure below: ```{r make_pairs_plot, echo=FALSE, fig.width=8, fig.height=8} pairs(swiss, main = "Variables in swiss data", cex = 0.5, pch = 16, col = "dodgerblue1") ``` ## Examination Among other things, we observe that the `Examination` variable --- the percent of draftees receiving the highest mark on their army exams --- has strong negative associations with fertility (Pearson correlation = `r round(cor(swiss$Examination, swiss$Fertility), 2)`), agriculture (correlation = `r round(cor(swiss$Examination, swiss$Agriculture), 2)`), and percent Catholic (correlation = `r round(cor(swiss$Examination, swiss$Catholic), 2)`), and a strong positive association with post-primary draftee education rates (correlation = `r round(cor(swiss$Examination, swiss$Education), 2)`). It is weakly associated with infant mortality rates (correlation = `r round(cor(swiss$Examination, swiss$Infant.Mortality), 2)`). Looking more closely into this variable, we see that the mean percentage of draftees achieving the top mark is `r round(mean(swiss$Examination), 0)`% averaging over provinces, and the standard deviation is `r round(sd(swiss$Examination), 1)`%, indicating a fair amount of dispersion in this percentage across provinces. The values range from `r round(min(swiss$Examination), 0)`% to `r round(max(swiss$Examination), 0)`%. We can observe this spread in the figure below, which also illustrates the unimodality of the distribution. ```{r plot_exam, echo=FALSE, fig.width=6, fig.height=4} hist(swiss$Examination, xlab = "% of draftees getting highest exam score", main = "Distribution of army exam high-scoring rates", col = "dodgerblue1") ```