---
name: stan-development
description: Expert guidance for Stan probabilistic programming language development, including modern syntax, cmdstanr/cmdstanpy integration, and testing patterns
---

# Stan Development

Use this skill when working with Stan models for Bayesian inference and probabilistic programming, particularly when integrating with R (cmdstanr) or Python (cmdstanpy).

## Modern Stan Syntax

### Array Declarations (IMPORTANT)

**Use new array syntax introduced in Stan 2.26+:**

```stan
// Good - Modern syntax (Stan 2.26+)
array[10] int x;              // Array of 10 integers
array[N] real y;              // Array of N reals
array[N, M] real z;           // 2D array
array[N] vector[K] v;         // Array of vectors

// Avoid - Old syntax (deprecated)
int x[10];                    // Old style
real y[N];                    // Old style
real z[N, M];                 // Old style
vector[K] v[N];               // Old style
```

**Why the change?**
- More consistent and readable
- Clearer distinction between arrays and matrix/vector types
- Aligns with modern Stan style

### Other Key Syntax

**Data types:**
```stan
// Scalar types
int n;
real x;

// Vector/Matrix types
vector[N] v;           // Column vector
row_vector[N] rv;      // Row vector
matrix[N, M] A;        // Matrix

// Arrays of vectors/matrices
array[K] vector[N] vectors;
array[K] matrix[N, M] matrices;

// Constrained types
real<lower=0> sigma;                    // Lower bound
real<lower=0, upper=1> theta;          // Bounded
simplex[K] pi;                         // Simplex constraint
ordered[N] sorted_values;              // Ordered constraint
```

## Stan File Organization

### Standard Structure

```
project/
├── inst/stan/                 # Stan models (R packages)
│   ├── model.stan            # Main model file
│   ├── functions/            # Reusable Stan functions
│   │   ├── likelihood.stan
│   │   ├── priors.stan
│   │   └── utils.stan
│   └── chunks/               # Reusable code chunks (optional)
└── stan/                      # Alternative location (Python projects)
```

### Modular Stan Code

**Main model includes functions:**

```stan
functions {
  #include functions/likelihood.stan
  #include functions/priors.stan
  #include functions/utils.stan
}

data {
  int<lower=0> N;
  array[N] real y;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // Use functions from included files
  y ~ custom_likelihood(mu, sigma);
  mu ~ custom_prior();
}
```

**Benefits of modular approach:**
- Reusable functions across models
- Easier testing of individual components
- Better organization for complex models
- Cleaner main model files

## Stan Integration with R (cmdstanr)

### Basic Workflow

```r
library(cmdstanr)

# Compile Stan model
model <- cmdstan_model("path/to/model.stan")

# Prepare data (list with names matching Stan data block)
stan_data <- list(
  N = 100,
  y = rnorm(100)
)

# Fit model
fit <- model$sample(
  data = stan_data,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,
  iter_sampling = 1000
)

# Extract samples
draws <- fit$draws()
summary <- fit$summary()
```

### Model Compilation and Caching

```r
# Get model path from package
model_path <- system.file("stan", "model.stan", package = "mypackage")

# Compile with cmdstanr
model <- cmdstan_model(model_path)

# Package-specific model functions
# Many packages provide wrapper functions:
my_package_model <- get_model()  # Returns compiled model
```

### Inference Algorithms

```r
# NUTS sampling (default, most robust)
fit <- model$sample(data = stan_data)

# Variational inference (faster, approximate)
fit <- model$variational(data = stan_data)

# Pathfinder (fast, approximate)
fit <- model$pathfinder(data = stan_data)

# Laplace approximation (very fast, approximate)
fit <- model$laplace(data = stan_data)

# Optimization (MAP estimate)
fit <- model$optimize(data = stan_data)
```

## Testing Stan Functions

### Exposing Stan Functions to R

**In R packages with cmdstanr:**

```r
# Expose Stan functions for testing
# Typically in a package function
expose_stan_functions <- function() {
  model_path <- system.file("stan", "functions.stan", package = "mypackage")
  # Note: This typically works on Linux only
  cmdstanr::cmdstan_model(model_path, compile_standalone = TRUE)
}

# Then in tests
test_that("Stan function works correctly", {
  # Exposed functions are now available in R
  result <- stan_function_name(args)
  expect_equal(result, expected)
})
```

**Common pattern in test setup:**

```r
# tests/testthat/setup.R
if (on_linux()) {
  expose_stan_functions()
}

# tests/testthat/test-stan-functions.R
test_that("likelihood function works", {
  skip_if_not(on_linux())  # Stan function exposure often Linux-only

  result <- stan_likelihood(data, params)
  expect_gt(result, -Inf)
})
```

## Stan-R Interface Patterns

### Data Preparation

```r
# Convert R data structures to Stan format
stan_data_list <- list(
  N = nrow(data),
  K = ncol(X),
  y = data$outcome,
  X = as.matrix(X),
  # Arrays use R vectors/matrices directly
  group = as.integer(data$group)
)

# For complex models, packages often provide conversion functions
stan_data <- prepare_stan_data(preprocessed_data, model_spec)
```

### Prior Specification as Data

**Empirical Bayes approach - priors as data:**

```stan
data {
  // Data
  int<lower=0> N;
  array[N] real y;

  // Priors as data (empirical Bayes)
  real prior_mu_mean;
  real<lower=0> prior_mu_sd;
  real<lower=0> prior_sigma_alpha;
  real<lower=0> prior_sigma_beta;
}

parameters {
  real mu;
  real<lower=0> sigma;
}

model {
  // Use data-specified priors
  mu ~ normal(prior_mu_mean, prior_mu_sd);
  sigma ~ gamma(prior_sigma_alpha, prior_sigma_beta);
  y ~ normal(mu, sigma);
}
```

**Benefits:**
- Priors can be updated without recompiling
- Easy to specify different priors for different models
- Enables automated prior selection

### Distribution Lookup Systems

For packages with multiple distribution options:

```r
# R side - get distribution ID
dist_id <- get_distribution_id("lognormal")

# Stan side - use distribution
stan_data <- list(
  distribution = dist_id,  # Integer ID
  # ... other data
)
```

```stan
// Stan functions for distribution dispatch
real compute_lpdf(real y, int distribution, vector params) {
  if (distribution == 1) {  // Normal
    return normal_lpdf(y | params[1], params[2]);
  } else if (distribution == 2) {  // Lognormal
    return lognormal_lpdf(y | params[1], params[2]);
  }
  // ... other distributions
}
```

## Common Stan Patterns

### Vectorization

```stan
// Good - vectorized (much faster)
y ~ normal(mu, sigma);

// Avoid - loop (slower)
for (n in 1:N) {
  y[n] ~ normal(mu, sigma);
}
```

### Efficient Matrix Operations

```stan
// Use built-in matrix operations
vector[N] mu = X * beta;  // Matrix-vector multiplication

// Avoid loops when vectorization possible
```

### Handling Missing Data

```stan
data {
  int<lower=0> N;
  int<lower=0> N_obs;
  array[N_obs] int<lower=1, upper=N> obs_idx;
  array[N_obs] real y_obs;
}

parameters {
  array[N] real y_latent;
  real mu;
  real<lower=0> sigma;
}

model {
  // Likelihood for observed data
  y_obs ~ normal(mu, sigma);

  // Prior/model for all data
  y_latent ~ normal(mu, sigma);

  // Constrain observed values
  y_latent[obs_idx] ~ normal(y_obs, 0.001);  // Small noise
}
```

## Debugging Stan Models

### Common Issues

**Divergences:**
```r
# Increase adapt_delta
fit <- model$sample(
  data = stan_data,
  adapt_delta = 0.95  # Default 0.8
)
```

**Slow mixing:**
```r
# Use non-centered parameterization
# Reparameterize hierarchical models
```

**Model diagnostics:**
```r
# Check convergence
fit$diagnostic_summary()

# Check R-hat, ESS
fit$summary(c("Rhat", "ess_bulk", "ess_tail"))

# Pairs plot for problematic parameters
bayesplot::mcmc_pairs(fit$draws(), pars = c("mu", "sigma"))
```

## When to Use This Skill

Activate this skill when:
- Writing Stan models
- Integrating Stan with R packages (cmdstanr)
- Testing Stan functions
- Debugging Stan models
- Working with Bayesian hierarchical models
- Implementing custom likelihoods or priors in Stan

This skill provides Stan-specific development patterns.
Project-specific model architecture and domain knowledge should remain in project CLAUDE.md files.