--- title: "Palmer Penguins — canonical pipeline (three models, three metrics)" author: "Aparna Pandey and Stephan Peischl" format: html: toc: true code-tools: true engine: knitr --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE) suppressPackageStartupMessages({ library(tidymodels) library(palmerpenguins) library(dplyr) library(tidyr) library(ggplot2) library(purrr) }) ``` # Overview Minimal **canonical `tidymodels` pipeline** on Palmer Penguins (**Adelie vs Gentoo**): 1. **Recipe** → preprocess inside resampling 2. **Three model specs** → same folds, fair comparison 3. **`fit_resamples()`** → **accuracy**, **kappa**, **ROC AUC** 4. **One summary figure** (mean ± SE across folds) Companion: [Module 04 — canonical pipeline](../modules/module-04-pipeline.qmd#canonical-pipeline-tuesday), [Module 07](../modules/module-07-penguins-choose-metrics.qmd), [Module 08](../modules/module-08-penguins-compare-models.qmd). ## Data, recipe, and folds ```{r} peng <- penguins |> filter(species %in% c("Adelie", "Gentoo")) |> mutate( y = factor(species, levels = c("Adelie", "Gentoo")), year = as.numeric(year) ) |> select(-species, -flipper_length_mm, -body_mass_g) |> drop_na() rec <- recipe(y ~ ., data = peng) |> step_zv(all_predictors()) |> step_dummy(all_nominal_predictors()) |> step_normalize(all_numeric_predictors()) folds <- vfold_cv(peng, v = 5, strata = y) metrics <- metric_set(accuracy, kap, roc_auc) ``` ## Three models (fixed settings) ```{r} tree_spec <- decision_tree(tree_depth = 4, min_n = 10) |> set_engine("rpart") |> set_mode("classification") glm_spec <- logistic_reg() |> set_engine("glm") |> set_mode("classification") rf_spec <- rand_forest(mtry = 4, trees = 300, min_n = 2) |> set_engine("ranger") |> set_mode("classification") workflows <- list( decision_tree = workflow() |> add_recipe(rec) |> add_model(tree_spec), logistic_glm = workflow() |> add_recipe(rec) |> add_model(glm_spec), random_forest = workflow() |> add_recipe(rec) |> add_model(rf_spec) ) ``` ## Cross-validation (`fit_resamples`) ```{r} set.seed(7) cv_results <- workflows |> imap(\(wf, name) fit_resamples(wf, folds, metrics = metrics)) |> set_names(names(workflows)) ``` ## Metrics table ```{r} cmp <- imap_dfr( cv_results, \(rs, name) collect_metrics(rs) |> mutate(model = name) ) cmp |> select(model, .metric, mean, std_err) |> mutate( mean = round(mean, 3), std_err = round(std_err, 3) ) |> arrange(.metric, desc(mean)) |> knitr::kable(col.names = c("Model", "Metric", "Mean", "Std err")) ``` ## Performance summary (final figure) Same recipe and folds for every model — differences reflect **model family** only. ```{r fig.width=9, fig.height=4.5} plot_df <- cmp |> mutate( model = recode( model, decision_tree = "Decision tree", logistic_glm = "Logistic regression", random_forest = "Random forest" ), metric = recode( .metric, accuracy = "Accuracy", kap = "Kappa", roc_auc = "ROC AUC" ) ) ggplot(plot_df, aes(model, mean, fill = model)) + geom_col(show.legend = FALSE, width = 0.72) + geom_errorbar( aes(ymin = mean - std_err, ymax = mean + std_err), width = 0.18, linewidth = 0.5 ) + facet_wrap(~metric, scales = "free_y", ncol = 3) + coord_flip() + scale_fill_brewer(palette = "Set2") + labs( title = "Model comparison (5-fold CV, same recipe)", subtitle = "Bars = mean across folds; error bars = standard error", x = NULL, y = "Score (higher is better)" ) + theme_minimal(base_size = 13) + theme(strip.text = element_text(face = "bold")) ``` **Takeaway:** pick the **metric** that matches your goal (e.g. ROC AUC for ranking, kappa when classes are imbalanced), then compare models on that score — small gaps within the error bars are often noise on this \(n\).