---
name: adaptive-rejection-sampler
description: Guidance for implementing adaptive rejection sampling (ARS) algorithms for generating random samples from log-concave probability distributions. This skill should be used when tasks involve implementing ARS, rejection sampling, or Monte Carlo methods that require sampling from custom probability distributions, particularly in R or other statistical computing languages.
---

# Adaptive Rejection Sampler

## Overview

This skill provides guidance for implementing adaptive rejection sampling (ARS) as described by Gilks et al. (1992). ARS is a method for generating random samples from log-concave probability distributions by constructing piecewise linear upper and lower bounds (envelopes) that adapt as samples are drawn.

## Algorithm Foundation

Before writing any code, document the mathematical foundations:

### Core Components to Understand

1. **Log-concavity requirement**: The target density f(x) must be log-concave, meaning log(f(x)) is concave
2. **Upper hull construction**: Piecewise linear envelope above log(f(x)) using tangent lines at abscissae
3. **Lower hull (squeeze function)**: Piecewise linear envelope below log(f(x)) connecting abscissae
4. **Rejection sampling**: Sample from exponential of upper hull, accept based on squeeze/density comparison
5. **Adaptive updating**: Add rejected points as new abscissae to tighten the envelope

### Algorithm Pseudocode

Document these steps before implementing:

```
1. Initialize with k ≥ 2 abscissae where log-density is finite
2. Construct upper hull from tangent lines at abscissae
3. Construct lower hull connecting abscissae points
4. Loop:
   a. Sample x* from exponential of upper hull
   b. Sample u ~ Uniform(0,1)
   c. If u ≤ exp(lower_hull(x*) - upper_hull(x*)):
      Accept x* (squeeze test)
   d. Else if u ≤ exp(log_f(x*) - upper_hull(x*)):
      Accept x* (rejection test)
      Add x* to abscissae
   e. Else:
      Reject x*
      Add x* to abscissae
5. Return accepted samples
```

## Implementation Approach

### Step 1: Verify File Writes

After every file write operation, read back the file to confirm content is complete. Truncated writes are a common failure mode that can leave syntax errors or incomplete functions.

```r
# After writing, always verify:
source("ars.R")  # Will error on syntax issues
# Or read and check line count matches expected
```

### Step 2: Implement Log-Concavity Check

The log-concavity check is critical and error-prone:

```r
# Common mistake: Using strict inequality
# WRONG: if (diff2 < 0) { not_concave }
# CORRECT: if (diff2 <= tol) { is_concave }  # with small tolerance

# For numerical stability, use tolerance:
tol <- 1e-8  # Document why this tolerance was chosen
is_log_concave <- all(second_differences <= tol)
```

**Rationale for tolerance**: Floating-point arithmetic can produce small positive values (e.g., 1e-15) for theoretically zero second differences. A tolerance of 1e-8 accommodates numerical noise while still detecting true non-concavity.

### Step 3: Handle Domain Boundaries

Implement explicit handling for:

- **Unbounded domains** (-∞, +∞): Initialize with points spanning the mode
- **Lower-bounded domains** [a, +∞): Include point near lower bound
- **Upper-bounded domains** (-∞, b]: Include point near upper bound
- **Bounded domains** [a, b]: Include points near both bounds

```r
# Example initialization strategy for different domains:
if (is.infinite(lower) && is.infinite(upper)) {
  # Unbounded: span around mode
  x_init <- c(mode - 2*sd, mode, mode + 2*sd)
} else if (!is.infinite(lower) && is.infinite(upper)) {
  # Lower-bounded: start from boundary
  x_init <- c(lower + 0.01, lower + 1, lower + 5)
}
```

### Step 4: Initialization Point Selection

Poor initialization is a major source of failures:

1. **Find the mode first**: Use optimization to locate the density mode
2. **Ensure points span the mode**: Initial abscissae should be on both sides of the mode
3. **Verify log-density is finite**: All initial points must have finite log(f(x))
4. **Use domain-appropriate spacing**: Wider for heavy-tailed distributions

```r
# Verify initialization before sampling
for (x in x_init) {
  log_val <- log_f(x)
  if (!is.finite(log_val)) {
    stop(paste("Initial point", x, "has non-finite log-density"))
  }
}
```

### Step 5: Numerical Stability

Guard against common numerical issues:

```r
# Prevent underflow in exponentials
log_envelope <- pmin(log_envelope, 700)  # exp(700) ≈ 1e304

# Prevent division by zero in hull construction
if (abs(x[i+1] - x[i]) < 1e-10) {
  # Handle degenerate case
}

# Use log-sum-exp trick for probability calculations
log_p <- log_envelope - log_sum_exp(log_envelope)
```

### Step 6: Convergence Safeguards

Implement maximum iteration limits and diagnostics:

```r
max_iter <- n_samples * 100  # Reasonable limit
iter <- 0
while (length(samples) < n_samples && iter < max_iter) {
  iter <- iter + 1
  # ... sampling logic
}
if (iter >= max_iter) {
  warning("Maximum iterations reached; returning partial sample")
}
```

## Verification Strategies

### Statistical Testing

Use proper statistical tests, not just moment comparisons:

```r
# Kolmogorov-Smirnov test (preferred)
ks_test <- ks.test(samples, reference_cdf)
if (ks_test$p.value < 0.01) {
  warning("KS test suggests samples don't match target distribution")
}

# Chi-square goodness of fit for discrete comparisons
# Quantile-quantile plots for visual verification
```

### Moment-Based Checks (with proper tolerances)

If using moment comparisons, justify tolerances:

```r
# Tolerance should scale with sample size
# Standard error of mean ≈ sd / sqrt(n)
n <- length(samples)
se_mean <- sd(samples) / sqrt(n)
tolerance_mean <- 4 * se_mean  # ~99.99% CI

if (abs(mean(samples) - expected_mean) > tolerance_mean) {
  warning("Sample mean outside expected range")
}
```

### Test Cases to Include

1. **Standard normal**: Mean=0, SD=1 (baseline)
2. **Shifted normal**: Mean≠0 to test mode-finding
3. **Exponential**: Tests handling of semi-bounded domain
4. **Truncated distributions**: Tests boundary handling
5. **Non-log-concave (should reject)**: Mixture distributions to verify detection

### Modular Testing

Test components in isolation before integration:

```r
# Test hull construction separately
test_upper_hull <- function() {
  # Verify hull is above log-density at test points
}

# Test sampling from envelope separately
test_envelope_sampling <- function() {
  # Verify samples follow envelope distribution
}
```

## Common Pitfalls

### 1. Incomplete File Writes
Always verify file content after writing. Truncated writes produce syntax errors that may not surface until runtime.

### 2. Strict vs. Non-Strict Inequality in Log-Concavity
Using `<` instead of `<=` (with tolerance) causes rejection of valid distributions like the exponential where second derivatives are exactly zero.

### 3. Initialization Outside Mode Region
When initial points don't span the mode, the envelope may be poor and sampling becomes extremely inefficient or biased.

### 4. Missing Boundary Handling
Failing to handle bounded domains causes samples to violate support constraints.

### 5. Insufficient Error Messages
Generic errors make debugging difficult. Include context in all error messages:

```r
# BAD
stop("Invalid input")

# GOOD
stop(paste("Log-density is -Inf at x =", x,
           "; ensure initial points are in distribution support"))
```

### 6. No Convergence Limits
Infinite loops can occur with poorly-initialized samplers. Always include maximum iteration counts.

### 7. Masking Root Causes
When tests fail, understand why before fixing. Expanding initialization points may work but could mask deeper algorithmic issues.

## Performance Considerations

- Envelope updates have O(k) complexity where k is number of abscissae
- Consider limiting maximum number of abscissae to prevent O(n²) behavior
- Pre-allocate output vectors for large sample sizes
- Profile hot paths if performance is critical

## References

When implementing, verify against the original paper:
- Gilks, W.R. and Wild, P. (1992). "Adaptive Rejection Sampling for Gibbs Sampling." Applied Statistics, 41(2), 337-348.