--- name: stan-development description: Expert guidance for Stan probabilistic programming language development, including modern syntax, cmdstanr/cmdstanpy integration, and testing patterns --- # Stan Development Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy). ## Modern Stan Syntax ### Array Declarations (IMPORTANT) **Use new array syntax introduced in Stan 2.26+:** ```stan // Good - Modern syntax (Stan 2.26+) array[10] int x; // Array of 10 integers array[N] real y; // Array of N reals array[N, M] real z; // 2D array array[N] vector[K] v; // Array of vectors // Avoid - Old syntax (deprecated) int x[10]; // Old style real y[N]; // Old style real z[N, M]; // Old style vector[K] v[N]; // Old style ``` **Why the change?** - More consistent and readable - Clearer distinction between arrays and matrix/vector types - Aligns with modern Stan style ### Other Key Syntax **Data types:** ```stan // Scalar types int n; real x; // Vector/Matrix types vector[N] v; // Column vector row_vector[N] rv; // Row vector matrix[N, M] A; // Matrix // Arrays of vectors/matrices array[K] vector[N] vectors; array[K] matrix[N, M] matrices; // Constrained types real sigma; // Lower bound real theta; // Bounded simplex[K] pi; // Simplex constraint ordered[N] sorted_values; // Ordered constraint ``` ## Stan File Organization ### Standard Structure ``` project/ ├── inst/stan/ # Stan models (R packages) │ ├── model.stan # Main model file │ ├── functions/ # Reusable Stan functions │ │ ├── likelihood.stan │ │ ├── priors.stan │ │ └── utils.stan │ └── chunks/ # Reusable code chunks (optional) └── stan/ # Alternative location (Python projects) ``` ### Modular Stan Code **Main model includes functions:** ```stan functions { #include functions/likelihood.stan #include functions/priors.stan #include functions/utils.stan } data { int N; array[N] real y; } parameters { real mu; real sigma; } model { // Use functions from included files y ~ custom_likelihood(mu, sigma); mu ~ custom_prior(); } ``` **Benefits of modular approach:** - Reusable functions across models - Easier testing of individual components - Better organization for complex models - Cleaner main model files ## Stan Integration with R (cmdstanr) ### Basic Workflow ```r library(cmdstanr) # Compile Stan model model <- cmdstan_model("path/to/model.stan") # Prepare data (list with names matching Stan data block) stan_data <- list( N = 100, y = rnorm(100) ) # Fit model fit <- model$sample( data = stan_data, chains = 4, parallel_chains = 4, iter_warmup = 1000, iter_sampling = 1000 ) # Extract samples draws <- fit$draws() summary <- fit$summary() ``` ### Model Compilation and Caching ```r # Get model path from package model_path <- system.file("stan", "model.stan", package = "mypackage") # Compile with cmdstanr model <- cmdstan_model(model_path) # Package-specific model functions # Many packages provide wrapper functions: my_package_model <- get_model() # Returns compiled model ``` ### Inference Algorithms ```r # NUTS sampling (default, most robust) fit <- model$sample(data = stan_data) # Variational inference (faster, approximate) fit <- model$variational(data = stan_data) # Pathfinder (fast, approximate) fit <- model$pathfinder(data = stan_data) # Laplace approximation (very fast, approximate) fit <- model$laplace(data = stan_data) # Optimization (MAP estimate) fit <- model$optimize(data = stan_data) ``` ## Testing Stan Functions ### Exposing Stan Functions to R **In R packages with cmdstanr:** ```r # Expose Stan functions for testing # Typically in a package function expose_stan_functions <- function() { model_path <- system.file("stan", "functions.stan", package = "mypackage") # Note: This typically works on Linux only cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE) } # Then in tests test_that("Stan function works correctly", { # Exposed functions are now available in R result <- stan_function_name(args) expect_equal(result, expected) }) ``` **Common pattern in test setup:** ```r # tests/testthat/setup.R if (on_linux()) { expose_stan_functions() } # tests/testthat/test-stan-functions.R test_that("likelihood function works", { skip_if_not(on_linux()) # Stan function exposure often Linux-only result <- stan_likelihood(data, params) expect_gt(result, -Inf) }) ``` ## Stan-R Interface Patterns ### Data Preparation ```r # Convert R data structures to Stan format stan_data_list <- list( N = nrow(data), K = ncol(X), y = data$outcome, X = as.matrix(X), # Arrays use R vectors/matrices directly group = as.integer(data$group) ) # For complex models, packages often provide conversion functions stan_data <- prepare_stan_data(preprocessed_data, model_spec) ``` ### Prior Specification as Data **Empirical Bayes approach - priors as data:** ```stan data { // Data int N; array[N] real y; // Priors as data (empirical Bayes) real prior_mu_mean; real prior_mu_sd; real prior_sigma_alpha; real prior_sigma_beta; } parameters { real mu; real sigma; } model { // Use data-specified priors mu ~ normal(prior_mu_mean, prior_mu_sd); sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta); y ~ normal(mu, sigma); } ``` **Benefits:** - Priors can be updated without recompiling - Easy to specify different priors for different models - Enables automated prior selection ### Distribution Lookup Systems For packages with multiple distribution options: ```r # R side - get distribution ID dist_id <- get_distribution_id("lognormal") # Stan side - use distribution stan_data <- list( distribution = dist_id, # Integer ID # ... other data ) ``` ```stan // Stan functions for distribution dispatch real compute_lpdf(real y, int distribution, vector params) { if (distribution == 1) { // Normal return normal_lpdf(y | params[1], params[2]); } else if (distribution == 2) { // Lognormal return lognormal_lpdf(y | params[1], params[2]); } // ... other distributions } ``` ## Common Stan Patterns ### Vectorization ```stan // Good - vectorized (much faster) y ~ normal(mu, sigma); // Avoid - loop (slower) for (n in 1:N) { y[n] ~ normal(mu, sigma); } ``` ### Efficient Matrix Operations ```stan // Use built-in matrix operations vector[N] mu = X * beta; // Matrix-vector multiplication // Avoid loops when vectorization possible ``` ### Handling Missing Data ```stan data { int N; int N_obs; array[N_obs] int obs_idx; array[N_obs] real y_obs; } parameters { array[N] real y_latent; real mu; real sigma; } model { // Likelihood for observed data y_obs ~ normal(mu, sigma); // Prior/model for all data y_latent ~ normal(mu, sigma); // Constrain observed values y_latent[obs_idx] ~ normal(y_obs, 0.001); // Small noise } ``` ## Debugging Stan Models ### Common Issues **Divergences:** ```r # Increase adapt_delta fit <- model$sample( data = stan_data, adapt_delta = 0.95 # Default 0.8 ) ``` **Slow mixing:** ```r # Use non-centered parameterization # Reparameterize hierarchical models ``` **Model diagnostics:** ```r # Check convergence fit$diagnostic_summary() # Check R-hat, ESS fit$summary(c("Rhat", "ess_bulk", "ess_tail")) # Pairs plot for problematic parameters bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma")) ``` ## When to Use This Skill Activate this skill when: - Writing Stan models - Integrating Stan with R packages (cmdstanr) - Testing Stan functions - Debugging Stan models - Working with Bayesian hierarchical models - Implementing custom likelihoods or priors in Stan This skill provides Stan-specific development patterns. Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.