--- name: r-code trigger: writing R functions / API design / error handling description: Guide for writing R code. Use when writing new functions, designing APIs, or reviewing/modifying existing R code. --- # R code This skill covers how to design and write R functions — including naming conventions, signatures, API conventions, input validation, error handling, and common pitfalls. For documenting functions, use the `document` skill. For tests, use the `tdd-workflow` skill. ## Naming conventions ### Functions Functions use `snake_case` and should be **verbs or verb phrases** that describe what the function does: ```r fetch_records() build_summary() validate_input() ``` A function name should be descriptive enough to make its purpose clear without a comment. Prefer clarity over brevity — don't abbreviate unless there is a widely understood convention (e.g. `df` for data frame, `dir` for directory). Internal helpers use a dot prefix: ```r .parse_response() .validate_columns() ``` ### Parameters Parameters use `snake_case` and should generally be **nouns**, occasionally adjectives. The same rule applies: clarity over brevity. ```r # Good fetch_records(file_path, page_size, overwrite) # Bad — unclear abbreviations fetch_records(fp, ps, ow) ``` ## File organization One exported function per file: `R/{function_name}.R` (e.g. `fetch_records()` → `R/fetch_records.R`). Internal helpers used exclusively by that function live in the same file. Shared helpers go in `R/utils.R` or `R/utils-{topic}.R` (e.g. `R/utils-parsing.R`). ## Coding style - Always run `air format .` after generating code. - Use the base pipe operator (`|>`) not the magrittr pipe (`%>%`). - Use `\() ...` for single-line anonymous functions. For all other cases, use `function() {...}`. ## Function design **Functional core, imperative shell** — pure, testable functions that accept data and return data form the core. The imperative shell orchestrates program flow, manages state, and calls the functional core. Functions should be **small and single-purpose**. Each function should operate at a **single level of abstraction**: it either orchestrates calls to other functions, or performs a direct operation on data, but does not mix the two. ```r # Orchestrator — delegates to focused helpers build_report <- function(data, output_path) { data <- .clean_data(data) summary <- .compute_summary(data) .write_report(summary, output_path) } # Worker — performs one direct operation .clean_data <- function(data) { data |> dplyr::filter(!is.na(value)) |> dplyr::mutate(value = round(value, 2)) } ``` Name functions well enough that their purpose is obvious from the call site. When reading the orchestrator above, each step is self-documenting — no comments needed. **Simplify control flow** — prefer guard clauses and returning early over complex if/else structures. **Pure conditionals** — the expression inside a conditional check should not cause side effects. Extract the pure check from the impure action into separate functions if needed. ## General API design patterns **Enum-like arguments** — declare choices as the default vector; resolve with `rlang::arg_match()` at the top of the function: ```r summarize_data <- function(x, method = c("mean", "median")) { method <- rlang::arg_match(method) # method is now guaranteed to be "mean" or "median" } ``` **`NULL` as "not provided"** — use `NULL` as the default for optional arguments where there is no sensible scalar fallback; check with `is.null()`: ```r fetch_records <- function(x, output_column = NULL) { if (!is.null(output_column)) { ... } } ``` **S3 object construction** — build as a named list, set class explicitly: ```r .new_summary <- function(values, method) { out <- list(values = values, method = method) class(out) <- c(paste0("summary_", method), "data_summary") out } ``` **`call` propagation in internal validators** — helpers that validate arguments and may throw errors should accept and forward `call`: ```r .check_non_empty <- function(x, call = rlang::caller_env()) { if (length(x) == 0L) { .pkg_abort("Input {.arg x} cannot be empty.", "empty_input", call = call) } } process_data <- function(x, call = rlang::caller_env()) { .check_non_empty(x, call = call) ... } ``` **Return tibbles, not data frames:** ```r summarize_data <- function(x) { result |> tibble::as_tibble() } ``` ## Input validation Use `stbl::to_*()` and `stbl::stabilize_*()` to validate parameters. These functions coerce when safe and fail with clear error messages when not. - **`to_*()`** — simple type coercion. Use when you need to ensure a parameter is the right type but don't need additional constraints. - **`stabilize_*()`** — coercion plus content validation (regex, ranges, etc.). Use when simple type coercion isn't enough. **Validate in the function that uses the parameter**, not in a caller that passes it through. This preserves R's lazy evaluation — if a parameter is never used on a code path, it is never evaluated or validated. ```r # Good — validation happens where the parameter is used build_report <- function(data, title, page_size) { data <- .clean_data(data) summary <- .compute_summary(data, page_size) .write_report(summary, title) } .compute_summary <- function(data, page_size, call = rlang::caller_env()) { page_size <- stbl::to_int_scalar(page_size, call = call) ... } .write_report <- function(summary, title, call = rlang::caller_env()) { title <- stbl::to_chr_scalar(title, call = call) ... } ``` ```r # Bad — validates everything eagerly, breaking lazy evaluation build_report <- function(data, title, page_size) { title <- stbl::to_chr_scalar(title) page_size <- stbl::to_int_scalar(page_size) ... } ``` When `call` is available (because the function accepts it), always pass it to `stbl` calls so error messages point to the user's call frame. ## Internal vs. exported functions Export a function when: - Users will call it directly - Other packages may want to extend it - It is a stable, intentional part of the API Keep a function internal when: - It is an implementation detail that may change - It is only used within the package - Exporting it would clutter the user-facing API Internal helpers use a dot prefix (e.g. `.parse_response()`). ## Error handling Use `.pkg_abort()` (defined in `R/aaa-conditions.R`) rather than calling `cli::cli_abort()` directly. This wraps `stbl::pkg_abort()` and ensures consistent error class formatting: ```r .pkg_abort( "Column {.field {name}} not found in {.arg data}.", "column_not_found", call = call ) ``` Always pass `call = call` (or `call = rlang::caller_env()`) so errors point to the user's call frame, not an internal helper. ## Common package mistakes ```r # Never use library() inside package code library(dplyr) # Wrong dplyr::filter(...) # Right # or `@importFrom dplyr filter` if used extensively # Never modify global state without restoring it options(my_option = TRUE) # Wrong withr::local_options(list(my_option = TRUE)) # Right # Use system.file() for package data, not hardcoded paths read.csv("/home/user/data.csv") # Wrong system.file("extdata", "data.csv", package = "mypkg") # Right ``` ## Dependencies ### Use existing imports first Packages already in `Imports` in `DESCRIPTION` should be preferred over base R equivalents: `purrr::map()` over `lapply()`, `rlang::is_*()` predicates over `is.*()`, and `withr::local_*()` over manual `on.exit()` state management. ### When to add a new dependency Add a dependency when it provides significant functionality that would be complex or brittle to reimplement — date parsing, web requests, complex string manipulation. Stick with base R or existing imports when the solution is straightforward. **Adding a new dependency requires explicit discussion with the developer.**