--- title: "A penguin story" date: "2024-07-08" format: html: toc: true theme: litera code-annotations: hover --- ## Data For this analysis, we'll use the penguin dataset in the R package `palmerpenguins` (). ```{r} #| label: load-packages #| message: false library(tidyverse) library(palmerpenguins) library(gt) ``` The `penguins` dataset contains the following elements: ```{r} glimpse(penguins) ``` ## Species We have `r length(unique(penguins$species))` species in this dataset: `r knitr::combine_words(unique(penguins$species))` that are not evenly distributed. ```{r} #| column: margin penguins |> count(species) %>% knitr::kable() ``` ```{r} #| label: fig-culmen #| echo: false #| fig-align: center #| fig-alt: Illustration of penguin culmen length and depth #| fig-cap: Illustration of penguin culmen length and depth #| column: page knitr::include_graphics("https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png") ``` The @fig-species shows the differences in culmen by species. ```{r} #| label: fig-species #| warning: false #| fig-with: 5.0 #| fig-cap: Categorizing species by culmen size #| fig-cap-location: margin ggplot( penguins, aes( x = bill_length_mm, y = bill_depth_mm, color = species, shape = species ) ) + geom_point() + labs(x = "Culmen length (mm)", y = "Culmen depth (mm)") ``` ## Manchots The table below (numbered [-@tbl-top10]) below shows the top 10 penguins in the dataset. ```{r} #| label: tbl-top10 #| tbl-cap: The first 10 elements of the dataset #| tbl-cap-location: bottom penguins |> slice_head(n = 10) |> select(species, island, bill_length_mm, bill_depth_mm) |> gt() # <1> ``` 1. We use **gt** here, but any table package can be used, for example **flextable** or **tinytable**. Let's get a summary of all our data set in a fancy table[^fancy_table] ```{r} #| echo: false spanners_and_header <- function(gt_tbl) { gt_tbl |> tab_spanner( label = md("**Adelie**"), columns = 3:4 ) |> tab_spanner( label = md("**Chinstrap**"), columns = c("Chinstrap_female", "Chinstrap_male") ) |> tab_spanner( label = md("**Gentoo**"), columns = contains("Gentoo") ) |> tab_header( title = "Penguins in the Palmer Archipelago", subtitle = "Data is courtesy of the {palmerpenguins} R package" ) } penguin_counts <- penguins |> filter(!is.na(sex)) |> mutate(year = as.character(year)) |> group_by(species, island, sex, year) |> summarise(n = n(), .groups = "drop") penguin_counts_wider <- penguin_counts |> pivot_wider( names_from = c(species, sex), values_from = n ) |> # Make missing numbers (NAs) into zero mutate(across(.cols = -(1:2), .fns = ~ replace_na(., replace = 0))) |> arrange(island, year) desired_colnames <- colnames(penguin_counts_wider) |> str_remove('(Adelie|Gentoo|Chinstrap)_') |> str_to_title() |> set_names(nm = colnames(penguin_counts_wider)) penguin_counts_wider |> mutate(across(.cols = -(1:2), ~ if_else(. == 0, NA_integer_, .))) |> mutate( island = as.character(island), year = as.numeric(year), island = paste0("Island: ", island) ) |> gt(groupname_col = "island", rowname_col = "year") |> cols_label(.list = desired_colnames) |> spanners_and_header() |> sub_missing(missing_text = "-") |> summary_rows( groups = TRUE, fns = list( "Maximum" = ~ max(.), "Total" = ~ sum(.) ), formatter = fmt_number, decimals = 0, missing_text = "-" ) |> tab_options( data_row.padding = px(2), summary_row.padding = px(3), # A bit more padding for summaries row_group.padding = px(4) # And even more for our groups ) |> opt_stylize(style = 6, color = "gray") ``` [^fancy_table]: Thanks to [\@AlbertRapp](https://github.com/AlbertRapp/gt_book/commits?author=AlbertRapp) for its great example of ["creating beautiful tables in R with {gt}"](https://gt.albert-rapp.de/getting_started) ::: {.callout-note collapse=true} ## To find out more about this dataset See the R **palmerpenguins** package vignette available at :::