--- name: adaptive-rejection-sampler description: Guidance for implementing adaptive rejection sampling (ARS) algorithms for generating random samples from log-concave probability distributions. This skill should be used when tasks involve implementing ARS, rejection sampling, or Monte Carlo methods that require sampling from custom probability distributions, particularly in R or other statistical computing languages. --- # Adaptive Rejection Sampler ## Overview This skill provides guidance for implementing adaptive rejection sampling (ARS) as described by Gilks et al. (1992). ARS is a method for generating random samples from log-concave probability distributions by constructing piecewise linear upper and lower bounds (envelopes) that adapt as samples are drawn. ## Algorithm Foundation Before writing any code, document the mathematical foundations: ### Core Components to Understand 1. **Log-concavity requirement**: The target density f(x) must be log-concave, meaning log(f(x)) is concave 2. **Upper hull construction**: Piecewise linear envelope above log(f(x)) using tangent lines at abscissae 3. **Lower hull (squeeze function)**: Piecewise linear envelope below log(f(x)) connecting abscissae 4. **Rejection sampling**: Sample from exponential of upper hull, accept based on squeeze/density comparison 5. **Adaptive updating**: Add rejected points as new abscissae to tighten the envelope ### Algorithm Pseudocode Document these steps before implementing: ``` 1. Initialize with k ≥ 2 abscissae where log-density is finite 2. Construct upper hull from tangent lines at abscissae 3. Construct lower hull connecting abscissae points 4. Loop: a. Sample x* from exponential of upper hull b. Sample u ~ Uniform(0,1) c. If u ≤ exp(lower_hull(x*) - upper_hull(x*)): Accept x* (squeeze test) d. Else if u ≤ exp(log_f(x*) - upper_hull(x*)): Accept x* (rejection test) Add x* to abscissae e. Else: Reject x* Add x* to abscissae 5. Return accepted samples ``` ## Implementation Approach ### Step 1: Verify File Writes After every file write operation, read back the file to confirm content is complete. Truncated writes are a common failure mode that can leave syntax errors or incomplete functions. ```r # After writing, always verify: source("ars.R") # Will error on syntax issues # Or read and check line count matches expected ``` ### Step 2: Implement Log-Concavity Check The log-concavity check is critical and error-prone: ```r # Common mistake: Using strict inequality # WRONG: if (diff2 < 0) { not_concave } # CORRECT: if (diff2 <= tol) { is_concave } # with small tolerance # For numerical stability, use tolerance: tol <- 1e-8 # Document why this tolerance was chosen is_log_concave <- all(second_differences <= tol) ``` **Rationale for tolerance**: Floating-point arithmetic can produce small positive values (e.g., 1e-15) for theoretically zero second differences. A tolerance of 1e-8 accommodates numerical noise while still detecting true non-concavity. ### Step 3: Handle Domain Boundaries Implement explicit handling for: - **Unbounded domains** (-∞, +∞): Initialize with points spanning the mode - **Lower-bounded domains** [a, +∞): Include point near lower bound - **Upper-bounded domains** (-∞, b]: Include point near upper bound - **Bounded domains** [a, b]: Include points near both bounds ```r # Example initialization strategy for different domains: if (is.infinite(lower) && is.infinite(upper)) { # Unbounded: span around mode x_init <- c(mode - 2*sd, mode, mode + 2*sd) } else if (!is.infinite(lower) && is.infinite(upper)) { # Lower-bounded: start from boundary x_init <- c(lower + 0.01, lower + 1, lower + 5) } ``` ### Step 4: Initialization Point Selection Poor initialization is a major source of failures: 1. **Find the mode first**: Use optimization to locate the density mode 2. **Ensure points span the mode**: Initial abscissae should be on both sides of the mode 3. **Verify log-density is finite**: All initial points must have finite log(f(x)) 4. **Use domain-appropriate spacing**: Wider for heavy-tailed distributions ```r # Verify initialization before sampling for (x in x_init) { log_val <- log_f(x) if (!is.finite(log_val)) { stop(paste("Initial point", x, "has non-finite log-density")) } } ``` ### Step 5: Numerical Stability Guard against common numerical issues: ```r # Prevent underflow in exponentials log_envelope <- pmin(log_envelope, 700) # exp(700) ≈ 1e304 # Prevent division by zero in hull construction if (abs(x[i+1] - x[i]) < 1e-10) { # Handle degenerate case } # Use log-sum-exp trick for probability calculations log_p <- log_envelope - log_sum_exp(log_envelope) ``` ### Step 6: Convergence Safeguards Implement maximum iteration limits and diagnostics: ```r max_iter <- n_samples * 100 # Reasonable limit iter <- 0 while (length(samples) < n_samples && iter < max_iter) { iter <- iter + 1 # ... sampling logic } if (iter >= max_iter) { warning("Maximum iterations reached; returning partial sample") } ``` ## Verification Strategies ### Statistical Testing Use proper statistical tests, not just moment comparisons: ```r # Kolmogorov-Smirnov test (preferred) ks_test <- ks.test(samples, reference_cdf) if (ks_test$p.value < 0.01) { warning("KS test suggests samples don't match target distribution") } # Chi-square goodness of fit for discrete comparisons # Quantile-quantile plots for visual verification ``` ### Moment-Based Checks (with proper tolerances) If using moment comparisons, justify tolerances: ```r # Tolerance should scale with sample size # Standard error of mean ≈ sd / sqrt(n) n <- length(samples) se_mean <- sd(samples) / sqrt(n) tolerance_mean <- 4 * se_mean # ~99.99% CI if (abs(mean(samples) - expected_mean) > tolerance_mean) { warning("Sample mean outside expected range") } ``` ### Test Cases to Include 1. **Standard normal**: Mean=0, SD=1 (baseline) 2. **Shifted normal**: Mean≠0 to test mode-finding 3. **Exponential**: Tests handling of semi-bounded domain 4. **Truncated distributions**: Tests boundary handling 5. **Non-log-concave (should reject)**: Mixture distributions to verify detection ### Modular Testing Test components in isolation before integration: ```r # Test hull construction separately test_upper_hull <- function() { # Verify hull is above log-density at test points } # Test sampling from envelope separately test_envelope_sampling <- function() { # Verify samples follow envelope distribution } ``` ## Common Pitfalls ### 1. Incomplete File Writes Always verify file content after writing. Truncated writes produce syntax errors that may not surface until runtime. ### 2. Strict vs. Non-Strict Inequality in Log-Concavity Using `<` instead of `<=` (with tolerance) causes rejection of valid distributions like the exponential where second derivatives are exactly zero. ### 3. Initialization Outside Mode Region When initial points don't span the mode, the envelope may be poor and sampling becomes extremely inefficient or biased. ### 4. Missing Boundary Handling Failing to handle bounded domains causes samples to violate support constraints. ### 5. Insufficient Error Messages Generic errors make debugging difficult. Include context in all error messages: ```r # BAD stop("Invalid input") # GOOD stop(paste("Log-density is -Inf at x =", x, "; ensure initial points are in distribution support")) ``` ### 6. No Convergence Limits Infinite loops can occur with poorly-initialized samplers. Always include maximum iteration counts. ### 7. Masking Root Causes When tests fail, understand why before fixing. Expanding initialization points may work but could mask deeper algorithmic issues. ## Performance Considerations - Envelope updates have O(k) complexity where k is number of abscissae - Consider limiting maximum number of abscissae to prevent O(n²) behavior - Pre-allocate output vectors for large sample sizes - Profile hot paths if performance is critical ## References When implementing, verify against the original paper: - Gilks, W.R. and Wild, P. (1992). "Adaptive Rejection Sampling for Gibbs Sampling." Applied Statistics, 41(2), 337-348.