--- title: "Live coding 5" author: "Andrew Irwin" date: "2021-02-04 8:35" output: html_document: toc: true toc_float: true --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(kableExtra) ``` # Agenda * Questions arising from lessons, tasks * Examples from lessons 12, 13, 14: reshaping data, displaying tables, and getting help * Live coding examples of pivoting tools (`pivot_longer`, `pivot_wider`, `unite` and `separate`), formatting tables (`kable`, `kableExtra`) -- material for tasks 7 and 8 * Strategies for getting help and learning independently (for Assignment 2) The plan for the live coding is for us to work together to solve problems using tools from this week's lessons. As with last week, we'll start with some repeated examples from lessons to remind ourselves of the basics, then try to use the tools in new ways, and finally switch perspectives to try to solve a real problem with each tool. ## Reshaping data Let's use the `mpg` data from the tidyverse. It's always good to have a variety of data to work with to test your understanding of new functions, so I've collected the datasets we've been using (and a few more) in their own chapter called [Data sources](#data-sources). This brief way of summarizing the data is great for seeing as much of the data as possible in a small space. ```{r} glimpse(mpg) ``` The lower-density table version is probably a better way to look a table if we are going to think about reshaping from long to wide format. ```{r} mpg %>% head() ``` Let's make a table that lists manufacturer, model, and then in two columns the average city fuel consumption for each year in the data. First we compute average fuel consumption for each combination of manufacturer, model, and year. Then pivot the data to make a wider table (it will be 38 rows by 4 columns when we are done). Store the result in your environment by giving the new table a name, so that we can use formatting tools in the next section to display it nicely. ```{r} table1 <- mpg %>% group_by(model, manufacturer, year) %>% summarise(avg_mpg_city = mean(cty)) %>% pivot_wider(names_from = year, values_from = avg_mpg_city) table1 ``` Clean up the "trans"mission column to put the parenthetical codes (av, l5, m6, etc) in their own column using separate. ```{r} mpg %>% separate(trans, into = c("trans_type", "trans_code"), sep ="\\(", remove = FALSE) %>% mutate(trans_code = str_remove(trans_code, "\\)")) ``` To practice make wide data longer, we should start with a different dataset. The `co2` data has monthly observations of atmospheric CO2 from 1959 to 1997. ```{r} co2 ``` This is a time series object, which is displayed as a rectangle, but does not have variable names and doesn't work like a data frame (or tibble). Suppose instead, someone gave you this data formatted as a data frame with each month in its own column: ```{r} co2M <- matrix(co2, 39, 12, byrow = TRUE) colnames(co2M) <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec") co2DF <- as_tibble(co2M) %>% mutate(Year = 1959:1997) %>% select(Year, everything()) co2DF ``` Now, use `pivot_longer` to reshape this into a "tidy" data frame with a column for year, a column for month, and a column for CO2 concentration. ```{r} co2DF %>% select(Jun:Aug, Oct:Dec, everything()) co2Y <- co2DF %>% # select(Year, Jun:Aug, Oct:Dec) %>% pivot_longer(cols = Jan:Dec, names_to= "month", values_to = "pCO2") co2Y ``` ## Formatting tables Compute the average annual pCO2 for each year, then show this as a table. Add a caption. ```{r} co2Y %>% group_by(Year) %>% summarise(mean_pCO2 = mean(pCO2)) %>% kable(digits = 1, caption = "Annual average pCO2.") %>% kable_styling(bootstrap = "striped", full_width = FALSE) ``` Compute the average by decade, discarding partial decades (1950s and 1990s). ```{r} co2Y %>% filter(Year >= 1960, Year < 1990) %>% mutate(decade = floor(Year/10)*10) %>% group_by(decade) %>% summarize(mean_pco2 = mean(pCO2)) %>% kable(digits = 2, caption = "Decadal average pCO2 at Mauna Loa") %>% kable_styling(full_width = FALSE, bootstrap_options = "striped") ``` Find the lowest and highest pCO2 for each year and show that in a table. ```{r} co2Y %>% group_by(Year) %>% summarize(minimum = min(pCO2), maximum = max(pCO2)) %>% kable() %>% kable_styling(bootstrap_options = "basic", full_width = FALSE) ``` Return to the `mpg` dataset and make some nice looking tables. ```{r} ``` Look at the examples [here](https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html) and experiment with some of the formatting options. Use `mpg` instead of `mtcars`. ```{r} ``` ## Getting help Pick a task related to something you know how to do and try to learn how to do this. Use the following methods: ### Read a help page in Rstudio A good idea here is to think of a function you know a bit about and read its help page. ```{r} round(0.5) signif(pi, 4) ``` ### Read the documentation for ggplot at the tidyverse Try geom_quantile - from the cheatsheet. ```{r} mpg %>% ggplot(aes(cty, hwy)) + geom_point() + geom_quantile() ``` ### Look at a figure in the R Graph Gallery And try to reconstruct it. ```{r} data=head(mtcars, 20) ggplot(data, aes(x=wt, y=mpg)) + geom_point() + # Show dots geom_text( label=rownames(data), nudge_x = 0.25, nudge_y = 0.25, check_overlap = FALSE ) ``` ```{r} library(gapminder) gapminder %>% filter(country %in% c("Canada", "Brazil", "China")) gapminder %>% filter(country == "Canada" | country == "Brazil" | country == "China") # same result ```