--- title: "Simpson's Paradox in Palmer Penguins" format: html: toc: true execute: echo: false --- ```{r} #| label: load #| message: false targets::tar_load(model_predictions) targets::tar_load(model_summaries) library(tidyverse) ``` The goal of this analysis is to determine how bill length and depth are related in three species of penguins from Antarctica. ## Methods We analyzed the relationship between bill length and depth using linear models. One model did not take into account species (single slope and intercept; "combined model"), one model added a parameter for species (same slope, different intercepts; "species model"), and another model added both a parameter for species and the interaction between species and bill length (different slopes, different intercepts; "interaction model"). ## Results ```{r} #| label: results-stats mod_stats <- model_summaries %>% mutate(across(c(r.squared, adj.r.squared), ~round(.x, 3))) %>% split(.$model_name) ``` The model that did not account for species ("combined model") shows a negative relationship between bill length and depth overall (@fig-combined-plot). The models that did account for species show the opposite result: a positive relationship between bill length and depth for each species (@fig-species-plot; @fig-interaction-plot). Furthermore, the combined model explained less of the variation in the data (adjusted *R* squared `r mod_stats$combined_model$r.squared`) than models accounting for species (adjusted *R* squared `r mod_stats$species_model$r.squared` and `r mod_stats$interaction_model$r.squared`, species model and interaction model, respectively ). In the models accounting for species, the relationship between bill depth and length (the slope) appears very similar for each species (@fig-species-plot; @fig-interaction-plot). ## Conclusion The penguins data is an example of "Simpson's Paradox," whereby aggregation of data can hide trends that are happening at more nested levels. We need to be careful of this when analyzing data. ## Figures ```{r} #| label: fig-combined-plot #| fig-cap: "Combined model" ggplot( filter(model_predictions, model_name == "combined_model"), aes(x = bill_length_mm) ) + geom_point(aes(y = bill_depth_mm)) + geom_line(aes(y = .fitted)) + labs( y = "Bill length (mm)", x = "Bill depth (mm)" ) ``` ```{r} #| label: fig-species-plot #| fig-cap: "Species model" ggplot( filter(model_predictions, model_name == "species_model"), aes(x = bill_length_mm, color = species) ) + geom_point(aes(y = bill_depth_mm)) + geom_line(aes(y = .fitted)) + labs( y = "Bill length (mm)", x = "Bill depth (mm)" ) + theme(legend.position = "bottom") ``` ```{r} #| label: fig-interaction-plot #| fig-cap: "Interaction model" ggplot( filter(model_predictions, model_name == "interaction_model"), aes(x = bill_length_mm, color = species) ) + geom_point(aes(y = bill_depth_mm)) + geom_line(aes(y = .fitted)) + labs( y = "Bill length (mm)", x = "Bill depth (mm)" ) + theme(legend.position = "bottom") ```