# How to Read the Proofs This guide is for ML theory people who know the theorems but not Lean 4. ## Lean 4 in 60 seconds Lean 4 is a dependently typed programming language and interactive theorem prover. A "proof" in Lean is a program whose type is a proposition. If the program compiles, the proposition is true (relative to the axioms). ```lean -- This declares a theorem named `my_theorem`. -- The type signature IS the statement. -- Everything after `:= by` is the proof. theorem my_theorem (n : ℕ) (h : 0 < n) : n ≠ 0 := by omega ``` ## Reading a theorem statement Look at the type signature. Everything before the colon (`:`) is a hypothesis; everything after is the conclusion. ```lean theorem genGap_highProb_rademacher {ι : Type*} [Fintype ι] [Nonempty ι] -- finite nonempty hypothesis class [MeasurableSpace Z] {μ : Measure Z} -- probability space [IsProbabilityMeasure μ] {ℓ : ι → Z → ℝ} -- loss function {B : ℝ} (hB : 0 < B) -- bound B > 0 (hℓ_bdd : ∀ i z, |ℓ i z| ≤ B) -- loss bounded by B {n : ℕ} (hn : 0 < n) -- sample size n > 0 {ε : ℝ} (hε : 0 ≤ ε) : -- tolerance ε ≥ 0 -- CONCLUSION: (piMeasure μ n).real {S | genGap μ ℓ S ≥ 2 * E[Rad] + ε} ≤ Real.exp (-(ε ^ 2 * n) / (8 * B ^ 2)) := by ... ``` Read this as: "For any finite nonempty hypothesis class, probability measure, B-bounded loss, sample size n > 0, and ε ≥ 0, the probability that the generalization gap exceeds 2·E[Rad] + ε is at most exp(-ε²n/(8B²))." ## Typeclass brackets | Syntax | Meaning | |--------|---------| | `(n : ℕ)` | Explicit argument | | `{n : ℕ}` | Implicit argument (Lean infers it) | | `[Fintype ι]` | Typeclass instance (ι has finitely many elements) | | `[Nonempty ι]` | Typeclass instance (ι has at least one element) | | `[IsProbabilityMeasure μ]` | μ is a probability measure | | `[MeasurableSpace Z]` | Z has a sigma-algebra | ## Common proof tactics | Tactic | What it does | |--------|-------------| | `exact h` | Close the goal using hypothesis `h` | | `apply f` | Reduce the goal by applying `f` | | `rw [lemma]` | Rewrite using `lemma` | | `simp` | Simplify using built-in rules | | `linarith` | Solve linear arithmetic | | `ring` | Solve ring equalities | | `omega` | Solve natural number arithmetic | | `intro x` | Introduce a universally quantified variable | | `cases h` | Case split on hypothesis `h` | | `calc` | Chain of equalities/inequalities | | `have h : P := ...` | Prove an intermediate fact | | `ext k` | Prove function equality pointwise | | `funext` | Function extensionality | ## Where to look in this repo **Start with definitions.** Read `FormalSLT/Risk.lean` and `FormalSLT/GhostSample.lean` first. These define `populationRisk`, `empiricalRisk`, and `genGap`. The theorem statements in later files reference these. **Then read the main results.** Each module's `/-! ... -/` docstring explains the theorem, the proof strategy, and the scope. The docstring is the best entry point for each file. **Skip the Azuma internals.** The `FormalSLT/Azuma/` directory has 9 files building up the exposure martingale and bounded-differences machinery. This is infrastructure. Read `GenGapTail.lean` for the final result and skip the intermediate files unless you want the measure-theory details. ## Mathlib This repo depends on [Mathlib](https://github.com/leanprover-community/mathlib4), the standard mathematics library for Lean 4. Mathlib provides: - `MeasureTheory.Measure`, `MeasureTheory.integral` — measure spaces and integration - `Finset.sup'`, `Finset.sum` — finite suprema and sums - `Finset.vcDim`, `Finset.card_shatterer_le_sum_vcDim` — VC dimension and Sauer-Shelah - `Real.exp`, `Real.log`, `Real.sqrt` — standard functions When you see a lemma name like `MeasureTheory.integral_mono` or `Finset.card_image_of_injOn`, it comes from Mathlib. You can search for it at [leanprover-community.github.io/mathlib4_docs](https://leanprover-community.github.io/mathlib4_docs/).