# Chapter 03 — Automatic Differentiation & Verified Ops Hesper ships a reverse-mode autodiff layer that's *verified*: every forward primitive comes with a backward primitive, and the equivalence between fused and unfused kernels is checked at the type level. You get gradients without writing them by hand. For the full design notes see [`docs/VERIFIED_AD.md`](../../VERIFIED_AD.md). This chapter is the hands-on tour. ## The `Differentiable` typeclass ```lean import Hesper.Core.Differentiable open Hesper.Core -- The shape of the interface — every differentiable op has a forward -- function and a vector-Jacobian-product backward function. #check @Differentiable.forward -- Differentiable.forward : ∀ {Op I O} [self : Differentiable Op I O], Op → I → O #check @Differentiable.backward -- Differentiable.backward : ∀ {Op I O} [self : Differentiable Op I O], Op → I → O → I ``` The three type parameters are: - `Op` — a marker / configuration type for the operation (`AddOp`, `MatMulOp`, `ReLUOp`, …). - `I` — the input shape: `Float` for scalars, `TensorData` for tensors, product types for multi-input ops. - `O` — the output shape. `backward op x dy` is the vector-Jacobian product: given a cotangent `dy` at the output, produce a cotangent `dx` at the input. ## A trivial scalar instance ```lean -- A marker for "multiply by a constant scalar": structure ScaleByTwo where deriving Repr instance : Differentiable ScaleByTwo Float Float where forward _ x := 2.0 * x backward _ _ dy := 2.0 * dy -- d/dx (2x) · dy = 2·dy ``` ```lean def s : ScaleByTwo := {} #eval (Differentiable.forward s (3.0 : Float) : Float) -- 6.0 #eval (Differentiable.backward s (3.0 : Float) (1.0 : Float)) -- 2.0 ``` That's the whole interface. Composing op instances (Chain rule, etc.) lives in `Hesper.AD.Chain`; ready-made instances for the tensor primitives live in `Hesper.AD.BackwardOps`. ## Where the verification lives Two kinds of proofs travel with the AD code: 1. **Per-primitive correctness.** `Hesper/AD/Verified.lean` carries the theorem that, for each shipped op, `backward` is the transpose of the linearization of `forward`. 2. **Fusion equivalence.** When a kernel-fusion pass merges two ops into one shader, it discharges a proof that the fused kernel computes the same forward and the same backward as the unfused chain (see `Hesper/Tensor/VerifiedOpFusion.lean`). If you add a new primitive, the elaborator forces you to supply the proof — there's no "trust me" escape hatch. ## Running the AD demos ```bash lake exe ad-demo # walk through a 4-layer MLP forward + backward lake exe verified-op-demo # show a fused op matching its unfused spec lake exe unified-ad-demo # one big example using all features ``` `Examples/MachineLearning/` contains a small Adam optimizer driver and a LoRA-style fine-tuning loop that all build on this layer. ## What you can rely on - Gradients of any registered primitive are correct by construction. - Composite functions inherit correctness from their primitives. - Fusing two primitives doesn't change the gradient — Lean checks. - New primitives can't be merged into the AD layer without a proof. ## What this *doesn't* give you - It doesn't prove numerical stability — `1/x` near zero still explodes. Use safe variants where it matters. - It doesn't replace tests — proofs are about mathematics, not hardware. Bit-parity tests against a CPU reference still live in `Tests/`. ## What's next - [Chapter 04 — High-Level API & Tensors](Ch04_HighLevelApi.md): build models out of differentiable layers. - [Chapter 06 — Proofs](Ch06_Proofs.md): how the equivalence and invariant proofs in this chapter are actually written.