--- title: "Solution _Using the tidyverse_" author: "Maximilian H.K. Hesselbarth" date: 2022/10/24 editor_options: chunk_output_type: console --- ```{r, setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE ) library(downloadthis) ``` ```{r tidyfigure, echo = FALSE, fig.align = "center", out.width = '65%'} knitr::include_graphics("img/tidy_tools.png", auto_pdf = FALSE) ``` ```{r download, echo = FALSE} download_link(link = "https://raw.githubusercontent.com/mhesselbarth/advanced-r-workshop/main/solution-tidyverse.Rmd", button_label = "Download .Rmd file", button_type = "danger") ```

First, make sure you downloaded the `tidyverse` and are able to load it. Furthermore, you need to install the `palmerpenguins` package to access the data sets. Load both packages and check if you have any `NAMESPACE` conflicts. ```{r load_libs} library(tidyverse) library(palmerpenguins) ``` My conflicts are `stats::filter()` and `stats::lag()`, which should not be an issue. Have a look at the class and structure of the `penguins_raw` data set. Also, make yourself familiar with the columns. Since the data set is part of a package, you can also use the corresponding help page (`?` or F1). ```{r class} class(penguins_raw) str(penguins_raw) names(penguins_raw) ?penguins_raw ``` Next, for easier data handling, clean the column names by removing all special characters (e.g., brackets, units, ...) and replacing all white spaces with an underscore. Last, makle sure all column names are either all lower case or all upper case ```{r names} col_names <- names(penguins_raw) |> stringr::str_remove_all(pattern = " \$[^()]+\$") |> stringr::str_replace_all(pattern = " ", replacement = "_") |> stringr::str_to_lower() names(penguins_raw) <- col_names ``` Now, remove all rows that don't have a measure for stable isotopes (both Delta 15 N or Delta 13 C). Save this into a new `tibble`. ```{r dropna} penguins_cln <- tidyr::drop_na(penguins_raw, delta_15_n, delta_13_c) ``` Filter the data set to include only the 50% smallest individuals in terms of body mass. Select the individual id, species, the culmen dimensions, and the sex columns. Save this into a new `tibble` called `penguins_small` (or something similar). ```{r filter} penguins_small <- dplyr::filter(penguins_cln, body_mass <= quantile(body_mass, 0.5)) |> dplyr::select(individual_id, species, tidyselect::starts_with("culmen"), sex) ``` Create a new column (`culmen_class`) in which each male individual with a culmen length larger than 50 mm is identified by `1`, each female individual with a culmen length larger than 45 mm is identified by `2`, and all other individuals are identified by `0`. ```{r casewhen} penguins_small <- dplyr::mutate(penguins_small, culmen_class = dplyr::case_when(culmen_length > 50 & sex == "MALE" ~ 1, culmen_length > 45 & sex == "FEMALE" ~ 2, TRUE ~ 0)) ``` Calculate the relative number (%) of individuals within each group and the ratio between the minimum culmen length and depth as well as between the maximum culmen length and depth. Add a `sex_new` column again (`culmen_class 1 = "male"`, `culmen_class 2 = "female"`, `culmen_class 0 = "mixed`). Save the result as `penguings_sum`. ```{r groupby} penguings_sum <- dplyr::group_by(penguins_small, culmen_class) |> dplyr::summarise(n_rel = dplyr::n() / nrow(penguins_small) * 100, ratio_min = min(culmen_length) / min(culmen_depth), ratio_max = max(culmen_length) / max(culmen_depth)) |> dplyr::mutate(sex_new = dplyr::case_when(culmen_class == 0 ~ "mixed", culmen_class == 1 ~ "male", culmen_class == 2 ~ "female")) ``` Now, combine `penguins` and `penguins_sum` to one `tibble` using `sex` and `sex_new` as ID columns. ```{r join} dplyr::left_join(penguins, penguings_sum, by = c("sex" = "sex_new")) dplyr::right_join(penguins, penguings_sum, by = c("sex" = "sex_new")) ``` Reshape the `penguins_small` `tibble` from wide to long in a way that the culmen length and depth columns are tidy. The name of the new column specifying the information should be `fun`, the new column containing the values should be `measurements`. Save the results as `penguins_small_long` `tibble`. ```{r pivot} penguins_small_long <- tidyr::pivot_longer(penguins_small, c(culmen_length, culmen_depth), values_to = "measurement", names_to = "fun") ``` Use the `map` function to fit a linear model (`flipper_length_mm ~ body_mass_g`) to the penguins data set, but seperated by species. Extract the R squared and p value and save the results in a `data.frame` that additionally includes the species. (Tip: Have a look at `broom::glance`, however, there are many ways to achieve this). ```{r map} library(broom) dplyr::group_by(penguins, species) |> dplyr::group_split() |> purrr::map_dfr(function(i) { lm_model <- lm(data = i, flipper_length_mm ~ body_mass_g) cbind(species = unique(i$species), broom::glance(lm_model)) }) ``` ```{r end_meme, echo = FALSE, fig.align = "center", out.width = '50%'} knitr::include_graphics("img/meme_google.png", auto_pdf = FALSE) ```