# Verified Automatic Differentiation in Hesper

## Overview

Hesper uses Lean 4's type system to **verify** that backward (gradient) computations are mathematically correct. This ensures that GPU training kernels produce correct gradients without relying on manual testing alone.

## Architecture

```
┌─────────────────────────────────────────────────┐
│  1. CPU Spec (Pure Lean functions)              │
│     forward_spec : Array Float → Array Float    │
│     backward_spec : Array Float → Array Float   │
│           → Array Float                         │
├─────────────────────────────────────────────────┤
│  2. Numerical Verification                       │
│     numericalVJP ≈ backward_spec                │
│     (finite differences, tolerance 1e-3)         │
├─────────────────────────────────────────────────┤
│  3. Chain Rule Composition                       │
│     (g ∘ f).backward = f.backward ∘ g.backward  │
│     Verified algebraically (error = 0.0)         │
├─────────────────────────────────────────────────┤
│  4. GPU Kernel (WGSL ShaderM)                   │
│     Must match CPU spec output                   │
│     Tested at runtime via readback               │
└─────────────────────────────────────────────────┘
```

## How to Add a New Verified Operation

### Step 1: Define Forward and Backward as Pure Functions

```lean
-- In Hesper/AD/Verified.lean

/-- Forward: element-wise ReLU -/
def reluFwd (x : Array Float) : Array Float :=
  x.map (fun xi => if xi > 0.0 then xi else 0.0)

/-- Backward: ReLU gradient (step function) -/
def reluBwd (x dy : Array Float) : Array Float :=
  Array.zipWith (fun xi di => if xi > 0.0 then di else 0.0) x dy
```

### Step 2: Register as a DiffOp with Test Data

```lean
def reluOp : DiffOp := {
  name := "ReLU"
  forward := reluFwd
  backward := reluBwd
  testInput := #[1.0, -2.0, 3.0, -0.5, 0.1]
  testGradOutput := #[0.1, -0.3, 0.2, 0.5, -0.1]
}
```

### Step 3: Verify via Numerical Gradient Check

```lean
-- In runVerification:
let (passed, err) := verifyOp reluOp
-- passed = true, err ≈ 0.0
```

This automatically verifies:
- `reluBwd x dy ≈ Jᵀ(x) · dy` where `J` is the Jacobian of `reluFwd`
- Uses central finite differences: `(f(x+ε) - f(x-ε)) / 2ε`

### Step 4: Implement GPU Kernel Matching the Spec

```lean
-- In a WGSL module:
def reluBackwardKernel (n : Nat) : ShaderM Unit := do
  let gid ← ShaderM.globalId
  let i := Exp.vec3X gid
  let _x ← ShaderM.declareInputBuffer "x" (.array (.scalar .f32) n)
  let _dy ← ShaderM.declareInputBuffer "dy" (.array (.scalar .f32) n)
  let _dx ← ShaderM.declareOutputBuffer "dx" (.array (.scalar .f32) n)
  ShaderM.if_ (Exp.lt i (Exp.litU32 n)) (do
    let xi ← ShaderM.readBuffer (ty := .scalar .f32) (n := n) "x" i
    let di ← ShaderM.readBuffer (ty := .scalar .f32) (n := n) "dy" i
    let result := Exp.select (Exp.gt xi (Exp.litF32 0.0)) di (Exp.litF32 0.0)
    ShaderM.writeBuffer (ty := .scalar .f32) "dx" i result
  ) (pure ())
```

### Step 5: Compose Operations with Verified Chain Rule

```lean
-- Composition is automatically correct:
let fusedOp := compose reluOp softmaxOp testInput testGrad
let (passed, err) := verifyOp fusedOp
-- Chain rule: (softmax ∘ relu).bwd(x, dy) = relu.bwd(x, softmax.bwd(relu(x), dy))
```

## Currently Verified Operations

| Operation | Forward | Backward | Numerical Error |
|-----------|---------|----------|-----------------|
| **Softmax** | `exp(xᵢ-max) / Σ exp` | `sᵢ(dyᵢ - Σsⱼdyⱼ)` | 0.000000 |
| **RoPE** | `R(θ) @ x` | `R(-θ) @ dy` | 0.000000 |
| **RMSNorm** | `(x/rms) * γ` | chain rule | 0.000000 |
| **ScaledDot** | `scale * q·k` | `scale * dy * k` | 0.000000 |
| **Composition** | `g ∘ f` | `f.bwd ∘ g.bwd` | 0.000000 |

## Running Verification

```bash
lake build verified-ad backward-verify
./.lake/build/bin/verified-ad
./.lake/build/bin/backward-verify
```

Expected output:
```
═══════════════════════════════════════════════
  Verified AD: Numerical Gradient Checks
═══════════════════════════════════════════════

  PASS Softmax: max_relative_error = 0.000000
  PASS RoPE(θ=0.700000): max_relative_error = 0.000000
  PASS RoPE(θ=1.500000): max_relative_error = 0.000000
  PASS RMSNorm: max_relative_error = 0.000000
  PASS ScaledDot: max_relative_error = 0.000000
  PASS ScaledDot ∘ RoPE(θ=0.300000): max_relative_error = 0.000000

  Chain Rule Verification:
  PASS Chain rule composition: error = 0.000000

  ✓ All AD verifications PASSED
```

## Design Philosophy

1. **Spec first**: Write the pure mathematical spec before any GPU code
2. **Verify numerically**: Finite differences catch sign errors, missing terms
3. **Compose safely**: Chain rule is verified once, applies to all compositions
4. **GPU matches spec**: Runtime tests compare GPU output to CPU spec

This approach eliminates the class of bugs where backward kernels have
incorrect gradient formulas — the most common source of training failures
in hand-written GPU training code.

## Next Steps

- [ ] Full attention backward as verified composition of primitives
- [ ] Lean proof that chain rule preserves VJP correctness (beyond numerical)
- [ ] Auto-generate GPU kernels from verified specs
- [ ] Extend to more operations (Conv2D, LayerNorm, GELU)