# JZ — design & architecture The decided architecture and design rationale behind JZ: representation, allocator, type inference, the native pipeline, and the principles that fix them. Status markers in headings: **[x]** = decided / reflected in the code, **[ ]** = designed or deferred. Audience & persona material lives in [`.work/marketing.md`](../.work/marketing.md) (canonical) and the expansion map in [`.work/ecosystem.md`](../.work/ecosystem.md); this document is the technical record. --- ## [x] Vision > **JZ = JS as it should have been → WASM** > Crockford's Good Parts realized. Explicit > implicit. Functional > OOP. Compile-time > runtime. Native speed. Compact, clean, tight functional JS subset, compiling to minimal, meaningful, optimal WASM. Design incorporates clever, elegant and innovative, but reliable solutions. It fits in a browser and enables highly performant live compilation. It blocks garbage and bs from JS - historical artifacts and regrets, bad practices. It enforces good JS style by its own design, so that linters are not needed. Any JZ code is automatically good JS code. It encourages best practices by design. Error messages are very user friendly and guiding, failing at proper times. The internal implementation is clever, clean, elegant, innovative. The language brings feeling of performance in timeless manner. It enables easy gateway from JS to low-level world, not simply isolate WASM. Anyone who uses JZ gets access to world of low-level machinery (gateway through C or WASI I suppose?) **What would be paradigm shift that would unlock a new value?** Functional JS subset → minimal WASM. Fits in a browser, compiles in real-time. Excludes JS misfeatures (coercions, hoisting, `this`, classes). Valid JZ = valid JS. No linter needed — bad patterns don't parse. Errors fail early with actionable messages. Gateway from JS to low-level: WASM, WASI, native via wasm2c. ## [x] Uniqueness > type inference from plain JS (no annotations — AS can't say this), auto-SIMD, single-digit-kB output, a pure synchronous compiler that runs in the browser in milliseconds (Porffor fundamentally can't compile-on-the-fly like this), self-hosting, and the native pipeline that beats V8. Nobody else has that combination. ## [x] Mission **Purpose**: Give JS developers direct access to native-speed computation without leaving their language. **Activity**: Compile a functional JS subset to minimal WASM — statically, in real-time, with zero runtime. **Values**: Correctness by design, transparency of execution, zero overhead. > JS developer writes functions → gets native-speed WASM. No new language, no toolchain, no runtime. ## [x] Principles (basis of reasoning) 1. **Compile-time over runtime** — resolve everything statically. No runtime dispatch, no type checks, no GC. What can be known at compile time must be. 2. **Explicit over implicit** — no coercions, no hoisting, no magic. Code means what it says. 3. **Functional over OOP** — functions are the unit of composition. No classes, no `this`, no inheritance. Data is plain, behavior is functions. 4. **Constraint enables performance** — every limitation unlocks a zero-cost guarantee. Document the tradeoff. 5. **Uniform representation** — one convention (f64 everywhere, NaN-boxing) beats type-specific optimizations. Simplicity at boundary > micro-optimization inside. 6. **Minimal core, extensible surface** — core compiles pure compute. Everything else (arrays, strings, objects) is a module. Capabilities grow without core growth. 7. **Host resolves, compiler transforms** — no I/O in compilation. Resolution is the host's job. Compilation is a pure function. 8. JS compat - reduced boundary friction. ## [x] Values (what matters most) 1. **Predictable proven performance** — native speed from plain JS knowledge. No annotations, no toolchains. Any case is guaranteed worst case be faster than V8/JSC. *(Split this: "predictable" is provable structurally; "faster than V8/JSC on every case" is evidenced, not provable — the provable surrogate is "no waste". See §"Proof architecture".)* 2. **Correctness by design** — bad patterns don't compile. The language is the linter. 3. **Transparency** — no hidden allocations, no implicit copies. What you write is what runs. 4. **Immediacy** — compilation is interactive, not a build step. 5. **Tiny footprint** — kilobytes, not megabytes. No runtime, no wrappers. 6. **Elegance** — the compiler stays clean and layered: a pure core that reads pre-computed facts, with type/stdlib breadth pushed to swappable modules. Dependency-light (two first-party libs: subscript, watr). ## [x] Goals / claims * Produced WASM is faster conceptually and practically on all possible bench classes than the other WASM targets: if other compilers can optimize random source that way, it means they have a class of optimization, not specific case; * Produced WASM is faster than V8, JSC on all bench cases: if JIT understands source well to be able to reduce it to efficient bytecode, so should JZ * Produced WASM size is smaller than AssemblyScript * Compiled wasm2c or w2c2 is faster than any of the alternatives > **Status — the enforced bar, ratcheted, not yet universally met.** CI-gates the claim in > [`test/bench.js`](../test/bench.js): `WASM_TODO` is the explicit shrinking list of cases where a > wasm rival still leads (a non-listed regression fails the gate; closing a case deletes its entry). > Current gaps against the live corpus: 3/31 cases trail V8 (mandelbrot, wav, jessie); 17/31 are > larger than AssemblyScript (lz, qoi worst). Treat these as targets to ratchet, not facts to assert > in public copy — the README already hedges to "on par with AssemblyScript". ## [x] Paradigm shift -> WASM as live medium, not build artifact Current WASM workflow: write Rust/C → compile offline → load binary → deploy. JZ workflow: write JS → compile in browser → instant native code. * WASM as interaction medium, not deployment format * Live-coding native audio/visuals in JS * User-generated native compute (sandboxed) * Hot-swappable compute kernels (no reload) * WASM as REPL target * Scripting = compiling (same act) ## [x] Anti-goals (what JZ refuses to be) * Not a general-purpose language — no DOM, no async, no event loop * Not a JS runtime — no eval, no dynamic import, no reflection * Not aiming for 100% JS compat — subset by design, divergences documented * Not a build tool — no bundling, no tree-shaking, no source maps * Not a speculative optimizer — sound static lowering only (narrowing, SIMD, CSE, escape analysis, const-fold); no profile-guided or runtime reopt, the engine handles the rest * Not a type system — types inferred from usage, never annotated ## [x] Success criteria (how we know it works) * Compilation < 1ms in browser for typical module * Output smaller than equivalent C via emscripten * Compiler stays small and dependency-light (two first-party deps: subscript, watr) * Any JZ program runs identically as JS (within documented divergences) * Audio worklet: zero GC pauses, stable real-time output * Cold start: parse + compile + instantiate < 5ms ## [x] Positioning (why JZ, not alternatives) Others compile JS (or JS-like) to WASM. JZ is different in kind, not degree: * **vs porffor/jawsm**: they target full JS semantics → runtime overhead, GC. JZ targets a subset → zero runtime. * **vs assemblyscript**: separate language with JS-like syntax. JZ code IS valid JS. * **vs javy**: embeds QuickJS interpreter in WASM. Interpreter overhead. JZ compiles to native WASM ops. * **vs emscripten**: C/C++ toolchain. Different language, massive output. JZ is JS-native. * The argument: JZ trades JS completeness for something no alternative offers — zero-overhead WASM from JS syntax, compilable in the browser, in real-time. ## [x] Name -> JZ * JZ + java zcript + js zero + jazz ## [x] Qualities * _Lightweight_ – embed anywhere, from websites to microcontrollers. * _Fast_ – compiles to WASM faster than `eval` parses. * _Tiny output_ – no runtime, no heap, no wrappers. * JZ output ≤ hand-written. Pure scalar = identical. Loops = ≤5% overhead. * _Zero overhead_ – no runtime type checks, monomorphized per call-site. * _JS interop_ – export/import, preserve func signatures at WASM boundary. * _JS compat_ – any JZ is valid js with limitations. * Simple, but extensible (like subscript) * Lightweight, but versatile * Transparent, but clever * Uncompromised performance. ## [x] Applications -> Audio/DSP, real-time compute * Digital filter DSP (array processing, in-place mutation) * Web-audio-api worklets (latency-critical, no GC pauses) * Floatbeats/bytebeat generators * Color-space conversions (scalar math + tuples) * Game physics/math kernels * Embedded scripting (IoT, microcontrollers) * Plugin systems (safe sandboxed compute) ## [x] Alternatives | Project | Approach | Interop | |---------|----------|---------| | porffor | AOT JS→WASM | Custom, also has C target | | jawsm | JS→WASM GC | WASIp2, requires Node v23+ | | assemblyscript | TS-like→WASM | wasm-bindgen style | | javy | QuickJS embedded | WASI fd_read/write | | emscripten | C/C++→WASM | JS glue | | grain/kotlin/moonbit | Lang→WASM GC | Native GC interop | JZ differentiator: minimal core (<2K lines), zero runtime, pure functional subset, module-extensible. ## [x] Closures -> Capture by value + explicit env param * Capture by value: zero runtime cost for immutable captures * Mutable captures **supported** — the captured cell lives in the closure env (verified: an in-body `() => n = n + 1` mutating a captured `n` compiles and runs). (Was once disallowed; no longer.) * Implementation: funcIdx + env pointer (call_indirect with env as first param) * Boundary caveat: a closure *returned to JS* comes back as a boxed value (callable via the interop layer), not as a bare JS function * Sufficient for functional patterns (currying, callbacks) ## [x] Floating point precision -> Compile-time rational simplification * Zero runtime cost * Exact arithmetic for constant expressions (`1/3 * 3 = 1`, `1/10 + 2/10 = 0.3`) * Falls back to f64 for dynamic values * Overflow falls back to f64 ## [x] Data representation -> NaN-boxed f64 everywhere ### Decision: NaN-boxing for all pointers, internal and external Everything is f64. Scalars are regular f64/i32. Pointers are NaN-encoded f64. No wrapping layers, no export adapters, no mixed signatures. Simplest design. | Data | Representation | |------|---------------| | Scalars | f64 or i32 (type-coerced by operator) | | Pointers (arrays, objects, strings) | NaN-boxed f64 (type+aux+offset in quiet NaN) | | Tuple returns | Multi-value `(result f64 f64 f64)` | **Cost**: extracting i32 offset from NaN = 3 register ops (~1 cycle), once per function entry. Cached in i32 local — loop body is pure i32 arithmetic. Negligible. **Benefit**: uniform f64 signatures everywhere. No wrapper generation. No param type analysis. JS passes/receives plain numbers. Polymorphism for free (param can be number or pointer). Both sides of the boundary (JS and WASM) follow the same convention: read/write memory at the offset encoded in the NaN payload. JS uses typed array views on exported memory. **Boundary wire type — i64, not f64.** Internally a box is an f64 NaN. *Crossing* JS↔wasm it rides as an **i64** (a `BigInt` on the JS side): JSC/Safari canonicalizes f64 NaN payloads at the boundary, erasing the box. Plain numbers stay f64 (free); only boxed values pay the i64 carrier. A per-export `jz:i64exp` custom section records which params/results ride i64. See [`layout.js`](../layout.js) and [`interop.js`](../interop.js). So "f64 everywhere" holds for *internal* representation; the *boundary* is the one mixed signature. ### WASM GC: not viable for JS boundary Tested: GC structs and arrays are **opaque from JS** — no field access, no indexing. `p[0]` → undefined. Only accessor functions work. The `gc-js-customization` proposal exists but no engine implements it. GC types only useful for WASM↔WASM. ### Return convention: multi-value vs pointer **Array literal return** → multi-value (tuple). Compile-time known length. ```js return [a, b, c] // → (result f64 f64 f64), JS gets real Array ``` **Variable/dynamic array return** → NaN-boxed pointer to memory. ```js return arr // → (result f64), NaN-boxed pointer ``` Heuristic: `return [expr, expr, ...]` with literal brackets = multi-value. Everything else = single f64 return (scalar or pointer). ### NaN-boxing pointer layout Quiet NaN format: `0x7FF8_xxxx_xxxx_xxxx` — 51-bit payload. Layout: `[type:4][aux:15][offset:32]`. 16 types, each with ONE layout (no flags). Type dispatch handles everything — no extra branches, no conditional interpretation. Principle: aux holds IMMUTABLE metadata only. Mutable state (length, size) in memory. Aliases see mutations. C-style: header + data contiguous. | Type | Name | aux (15 bits) | offset → | Memory layout | |------|------|---------------|----------|---------------| | 0 | ATOM | kind | id | none | | 1 | ARRAY | 0 | data start | `[-8:len(i32)][-4:cap(i32)][elem0:f64, ...]` | | 2 | BUFFER | 0 | bytes start | `[-8:byteLen(i32)][-4:byteCap(i32)][bytes...]` (ArrayBuffer / DataView passthrough) | | 3 | TYPED | elemType:3 ∣ view:1 | data start / descriptor | **Owned** (`aux & 8 == 0`): `[-8:byteLen(i32)][-4:byteCap(i32)][bytes...]` — shares BUFFER header; `__len = byteLen >> log2(stride)`. Reinterpret `new T(buf)` is a zero-copy view (same offset, shared header). **Subview** (`aux & 8 == 8`, i.e. `new T(buf, off, len)`): offset points to 16-byte descriptor `[0:byteLen(i32)][4:dataOff(i32)][8:parentOff(i32)][12:pad]`. Reads/writes alias the parent; `.buffer = BUFFER@parentOff`, `.byteOffset = dataOff - parentOff`. | | 4 | STRING | 0 | data start | `[-4:len(i32)][chars:u8...]` | | 5 | STRING_SSO | len | packed chars | none (≤4 ASCII inline) | | 6 | OBJECT | schemaId | data start | `[prop0:f64, prop1:f64, ...]` | | 7 | HASH | 0 | table start | `[-8:size(i32)][-4:cap(i32)][entries...]` (string-keyed, FNV-1a) | | 8 | SET | 0 | table start | `[-8:size(i32)][-4:cap(i32)][entries...]` | | 9 | MAP | 0 | table start | `[-8:size(i32)][-4:cap(i32)][entries...]` | | 10 | CLOSURE | funcIdx | env start | `[env0:f64, env1:f64, ...]` | | 11 | EXTERNAL | 0 | extMap idx | none (host JS ref table) | | 12-15 | (free) | | | | Key properties: - 4GB addressable (32-bit offset), type extractable with 3 bit ops - **One layout per type** — no flags, no subtypes. "Parse, don't validate" for pointers. - **Heap length** — mutable len/cap in memory header. Aliases see mutations. C-style. - ATOM/STRING_SSO need zero memory allocation - 4 free slots remaining for future (Promise, Iterator, BigInt, etc) **vs Go/Rust**: Go/Rust are statically typed — no runtime type bits needed. JZ needs them because a single f64 param could be number/array/string/object (JS polymorphism). NaN-boxing is the cheapest way to pay it. ## [x] Allocator -> for linear memory, pluggable | Strategy | Alloc | Free | Best for | |----------|-------|------|----------| | **Bump** (default — **the only one shipped**) | Increment pointer | `_clear()` resets the arena | DSP, batch processing | | **Free list** *(designed, deferred)* | malloc | free(ptr) | Mixed lifetimes | | **Refcount** *(designed, deferred)* | alloc | auto on rc=0 | Shared structures | | **External** *(designed, deferred)* | Host provides | Host frees | Embedded, plugins | Contract: `_alloc(bytes) → i32` and `_clear()` (arena reset); a `_free(ptr)` hook is reserved for the deferred free-list/refcount strategies. Implementation swappable — today only bump ships. ## [x] Imports -> Pre-bundled source + primitives-only linking ### Resolution - **Resolution** = host responsibility (JS/Node/WASI) - **Compilation** = JZ responsibility (pure transform, no I/O) - CLI: fs + importmap.json - API: `modules` option (pre-resolved sources) - WASM API: pre-bundled source format (single string with `//!jz:module` markers) ### Multi-module - Primary: bundle into single WASM (shared memory, full types) - Optional: primitives-only linking (for numeric leaf modules like DSP kernels) - Circular imports: prohibited (Jessie-style) - Exports: named + re-export; `export default …` also compiles (yields a `default` export), though named exports are the convention - Bare specifiers: importmap (CLI), relative paths required in source ## [x] Types -> i32/f64 by operator, monomorphic * `1` → i32, `1.0` → f64. Operators preserve i32 when both operands i32. * `/`, `**` always f64. Bitwise always i32. Comparisons always i32. * Variables typed by pre-analysis: if any assignment is f64, local is f64. * All types resolved at compile-time. No runtime dispatch. ## [x] Pointers -> i32 internal, boundary wraps (see Data representation above) ## [x] Host APIs -> WASI + shim | JS API | WASI Function | |--------|---------------| | console.log | fd_write(1, ...) | | Date.now() | clock_time_get(realtime) | | performance.now() | clock_time_get(monotonic) | ## [x] Native binary -> WASM is the IR ``` JS → jz → .wasm → wasm2c/w2c2 → .c → gcc/clang → native ``` No custom C backend needed. WASM IS the portable IR. Our i32/f64 type system directly improves native perf (wasm2c translates instruction-by-instruction). | Tool | Pipeline | Notes | |------|----------|-------| | **w2c2** | WASM → C89 | Smallest (150KB), C89 compat | | **wasm2c** (WABT) | WASM → C99 | Official, well-tested | | **wasmer create-exe** | WASM → native | One command, cross-compile | ## [x] Metacircular (jz.wasm) -> WASI Shipped: JZ compiles its own entire source to `dist/jz.wasm` — the whole pipeline runs inside wasm. * `dist/jz.wasm` *is* JZ compiled by JZ (built via [`scripts/selfhost-build.mjs`](../scripts/selfhost-build.mjs)) * Uses WASI for I/O (fd_read/write for source, fd_write for output) * CI-gated: `npm run test:self` round-trips real programs through the in-wasm compiler and runs the output ## [x] Pluggable architecture -> Modules extending ctx.emit Modules register on ctx: `ctx.core.emit[name]` (emitters), `ctx.core.stdlib[name]` (WAT), `ctx.core.includes` (lazy inclusion). Core stays minimal, capabilities grow through modules. ## [x] Representation -> per-site, inferred (not a user ABI knob) The compiler picks the carrier for each value the way a human reader infers type from JS: from name, operators, member access, `typeof`, assignments, JSDoc, optional declarations. Default cast is nanbox; analysis specializes to flat/i32/SSO/externref/packed/etc. per site. No `opts.abi`. Only the **boundary protocol** (how exports cross JS↔wasm) is a user concern, and that's `opts.host` (`'js'` | `'wasi'` | `'gc'`, autodetect). | | Pro | Con | |---|---|---| | Per-site inference | Hot path goes fast without ceremony; same binary mixes flat + nanbox where each fits; user writes plain JS | Wins only as strong as the analysis; bail-to-nanbox sites silently lose perf | | Nanbox as default cast | Polymorphism free, JS numbers passthrough, uniform slot, simplest codegen | 3–5 instr overhead on pointer ops, 8B per slot regardless of value | | Vs shipping `flat` preset | No transient API; analysis grows under one binary | Tempting shortcut — papers over weak analysis with user opt-in that can't be removed later | ### Type evidence (in increasing strength) - **Name**: `i`, `n`, `len`, `count`, `idx` → integer; `s`, `str`, `name` → string; `is*`, `has*` → bool. Lowest-confidence; suggestive only. - **Literals**: `[1,2,3]` → int array; `"abc"` → SSO string. - **Operators**: `x | 0`, `x >>> 0` → i32; `+x` → f64; `"" + x` → string; `x & mask` → i32. - **Member access**: `.length` → string/array/typed; `.charCodeAt` → string; `.byteLength` → buffer; `.then` → promise. - **`typeof` / `instanceof` guards**: narrow refinement in then-branch. - **Assignment flow**: if every reaching def is i32, local is i32. - **JSDoc** when present (`@type {number}`). - **Optional declarations** (future): explicit annotations sharpen inference but are never required. Anything a human reading the code would conclude, the compiler should conclude. Anything ambiguous falls back to nanbox. ### Already adaptive Typed-element arrays, `intCertain`/`intConst`/`intRange`, object schemas, val-type propagation, jzify `typeof`-narrowing, SSO at literal time. ### Next wins (same direction, deeper) SSO flow through concat results when inputs prove short. Schema field packing (`{x:i32,y:i32}` → 8B not 16B). Closure capture narrowing. Cross-call propagation through monomorphic-ish sites. Specialization on observed arg types when the export's callers are in-module. ### Implications Per-type rep modules survive as **internal** codegen modules: one file per type under `src/abi/` (`number.js`, `string.js`, `array.js`, `object.js`), each holding every carrier the narrower may pick for that type, dispatched per call site via analysis (`ctx.abi..ops.`). `opts.abi` drops; `opts.host` (`js` / `wasi` / `gc`) takes its place. jsstring becomes per-site externref specialization for `host: 'js'`, not a preset. The earlier `src/abi//.js` split would have been preset-thinking smuggled in by file layout — separate files imply separate testing units imply user-pickable presets, the surface being removed. The `jz:abi` custom section, if kept, is a feature-detection version stamp (e.g. "ref-types required"), never a preset name. ### Open policy questions (deferred until first non-default rep emits at scale) 1. **JSDoc strength.** `@type` as a hint (overridable by stronger evidence) or a contract (refuse to widen)? Hint matches implicit-inference; contract gives a cross-module escape hatch. Default: hint. 2. **Null/undefined under flat slots.** A flat `i32`/`f64` can't carry them; the narrower must prove non-null at the binding or widen back to tagged. 3. **Compound lifetime.** `__alloc` for a flat string passed to a host call — when freed? Today's `_clear`-reset arena suits short-lived; long-running needs a hook. Defer until a real long-running program forces it. 4. **Cross-module ABI freezing.** Exported flat-slot signatures are public contract even though `opts.host` is what users picked. Export signatures derive from proven types of exports' params/returns; write exports so their types are obvious (or annotate). No promise of stable *internal* rep across versions for the same source. ## [x] Inference -> collect before compile All shape/flow facts are produced by **analysis passes that run before emit**, never by ad-hoc inference inside the emit path. The emit phase **reads** facts (`repOf`, `lookupValType`, `lookupNotString`, `paramReps[k]`, `f.valResult`, etc.) but never derives them. Why this matters: - **Proofs.** Every dispatch-elision (`__length` → `__len`, `__to_num` → `asF64`, `__ptr_type` → branch fold, polymorphic `[]` → direct typed read) has a *named upstream fact* and a *named upstream pass*. When a regression appears, the chain is: `wat shows __length` ← `notString missing on param k` ← `notStringEvidence didn't fire` ← `body shape didn't match`. Every link is a deterministic AST walk over fully-prepared source. - **Editor-hint consumability.** The same facts are emitted as custom WASM sections (`runtime.rests`, `runtime.i64`, etc.) for the JS boundary wrapper to read. Nothing prevents an editor host from consuming `ctx.func.repByLocal`, `ctx.func.paramReps`, and the per- function `f.valResult` / `f.arrayElemValType` to: - render param shapes as inlay hints (`(x: STRING, k: i32)`), - flag "this branch is suboptimal" when a call site forces a `paramReps[k]` field to sticky-null (the lattice already records *which* site disagreed — see narrow.js' D-phase), - surface `notString` / `intConst` / `arrayElemSchema` as optimization badges next to the source location. - **Auditability.** "Why did this function specialize?" is answered by one record (`f.sig`, `f.valResult`, the rep map). "Why did this branch *not* specialize?" is answered by the absence of a fact at a known location, never by tracing the emit walk. The phase chronology in `src/infer.js` (above the paramReps lattice primitives, ~line 84) is the canonical reference for *what's valid when*. Read it before adding a new consumer. ## [x] Stdlib sources * [Metallic](https://github.com/jdh8/metallic), [Piezo](https://github.com/dy/piezo/blob/main/src/stdlib.js), [AS musl](https://github.com/AssemblyScript/musl/tree/master) ## Backlog (old arch, archived) * Boxed primitives (Object.assign pattern) * TypedArray pointer-embedded metadata * Ring arrays (auto-promote on shift/unshift) * NaN-boxing pointer kinds (7 types) * Compile-time rational simplification ## [x] Optimization principle — minimal theoretical WASM, or no value > *"Nothing takes place in the universe in which some rule of maximum or minimum does not appear."* — Euler > *"Premature optimization is the root of all evil."* — Knuth **The bet.** JZ's whole value is the *guarantee* that, for every JS syntax construct / pattern / design case, it emits the **simplest, most minimal theoretical WASM** for it — wasm a careful hand-writer would produce, or better (branch hints, SIMD, whole-program devirt). If JZ only *ties* V8, there is no reason to choose it: a tie is a loss. So the metric is concrete and binary: **jessie.wasm (parser only) must beat jessie.js under warmed-up V8.** **Per-construct, not per-program.** The unit of optimization is the *construct* (a loop, an indexed read, a tail call, an allocation), never a benchmark. Fixing the minimal lowering of a construct helps every program. Tuning one benchmark's shape helps nothing. Never overburden emitted wasm out of complacency — every extra instruction on a hot path must justify itself. **Method discipline (this is where we kept failing).** Do NOT grind by statically reading the WAT and fixing what is *easy to see*. That is premature optimization — it aims at the visible, not the hot. **Profile first** (`node --prof` attributes ticks to wasm functions by name; map via `.work/funcmap.json`), find the hot 20%, then audit *those* functions' WAT against "minimal theoretical wasm" with numbers. A fix that can't be tied to a profiled cost is speculation. **Allowed levers** (all must stay sound — guarantees, never speculation; the developer adjusts nothing): - safe jzify transforms that simplify code shape before lowering - WASM/watr branch hints, native tail calls, SIMD where provable - whole-program devirtualization, SRoA, const-fold, i32/f64 narrowing - README *performance hints* — documentation only, advisory, never required ## [x] Proof architecture — what's provable vs what's evidenced "Predictable proven performance" (Values §1) fuses two claims that need different machinery. Conflating them is the standing risk — and the reason a 39-case bench can never close the goal. **Predictable is provable, structurally, today.** AOT + no deopt + no tiering + no GC ⇒ bounded variance: no warmup cliff, no deopt cliff, no GC pause. This follows from the architecture, not a benchmark, and it is the guarantee V8/JSC *cannot* make. Lead with variance, not mean. **Fast — "faster than V8/JSC on every case" — is NOT provable**, only *evidenced*: induction over a sample, against a moving, adaptive, machine-dependent JIT. Asserting it as a guarantee is the overclaim (and already locally false: size-vs-AS in aggregate; crc32/mandelbrot/AoS floors). The provable surrogate is **no waste** — the output contains none of the overhead the optimizer claims to remove. A negative, statically checkable property of the WAT, independent of any competitor or stopwatch. You cannot prove "fast"; you can prove "no waste", and that is stronger than any benchmark win because it does not erode when V8 ships a new tier next quarter. **The guarantee ladder** (increasing strength, decreasing reach): 0. **Predictable** — structural, proven by construction. 1. **Waste-free** — a machine-checked verifier asserts each absence-of-overhead invariant over emitted WAT, swept over the fuzzer's generated *sublanguage* (a proof over the language, not a sample). This is the engineering target. 2. **Locally optimal** — output is a fixpoint of the peephole+watr rewrite system (no missed local rewrite) = "minimal theoretical WASM per construct", checkably. Audited by `scripts/audit-fixpoint.mjs` (`npm run audit:fixpoint`): re-run jz's OWN optimizer (watr) on the finished output; a drop = a rewrite the pipeline left on the table. Today 7/10 corpus kernels are fixpoints; the non-convergent cluster is reductions / conditional maps (dot, sum, clamp-map: a 2nd watr pass removes 15–18 ops). NB: a FOREIGN optimizer (wasm-opt -O3) is the wrong oracle — it unrolls/re-vectorizes, RAISING static op count on biquad/matmul/crc, so its delta conflates "jz left slack" with "different size↔speed trade". watr (jz's own system) is unconfounded. 3. **Globally optimal** — undecidable in general; reachable per-construct only via a bounded superoptimizer, offline, as a one-shot proof artifact. Never a live gate. **The speed claim becomes a partition, not a universal:** { cases the verifier proves minimal } ∪ { cases at a documented floor }. The documented-floor half is already a STRUCTURED, machine-gated registry — `WASM_TODO` in `test/bench.js`, behind the `fastest-wasm` gate (jz leads every wasm rival on every case UNLESS listed, and each entry carries a MEASURED root cause; closing one = deleting it). Two reference classes, kept distinct: - **floor vs wasm rivals** (`WASM_TODO`: fft FMA-parity, raytrace/mat4/nqueens/qoi/dict scalar-codegen-quality) — jz trails another wasm producer for an LLVM-grade scalar-isel/regalloc reason V8's wasm tier caps anyway, or an FMA parity class that would break valid-jz=valid-JS. No jz pass closes these; each was verified against the WAT and the disproven hypotheses recorded so they aren't re-chased. - **floor vs native** (crc32 `clmul`/SSE4.2, AoS AVX2-4-lane) — jz LEADS the wasm rivals but trails NATIVE because the hardware instruction is unreachable in wasm-v1. This bounds every wasm producer, not just jz. A documented floor is a *stronger* statement than a benchmark win: "no wasm producer beats this on this target" outlives the next JIT; "we won by 1.05× today" does not. **The infra, mapped to the ladder** (the OUTPUT-side twin of `src/infer.js`'s named-fact/named-pass discipline — every opt now also has a named output invariant and a check with teeth): - **absence-of-overhead invariants** — `test/wat-invariants.js`: per-pass *ablation* (overhead present with the pass off, absent with it on — proves the check sees the pattern AND that the named pass removes it) + per-sublanguage *population sweep* over the fuzzer generators (hard-zero where clean; ratcheted where a gap is documented). The structural twin of the value-correctness fuzzer. - **machine-independent regression gate** — `test/perf-ratchet.js`: loop-body op count over the shared corpus (`scripts/perf-corpus.mjs`), now including adversarial whitelist-defeating shapes (`cond` nested-conditional, `buf` param-array, `nest` nested-loop). The portable "did codegen get wasteful" signal; no stopwatch. - **selection-bias audit** — `scripts/audit-assemblyscript.mjs` (`npm run audit:as`): the AssemblyScript canon, annotation-free, checked for the two *sound* waste signals (counter-compared-in-f64, per-iteration pointer decode). Third-party shapes the in-house corpus never tuned for. (It deliberately does NOT flag body f64 from bare `*`/`+`/`%` — that is the integer-overflow contract, not a deopt.) **Surfaced by building this** (now ratcheted, can't worsen, shrink when fixed; see `.work/todo.md`): nested-conditional int → f64 round-trip under the vectorizer; param typed-array base re-decoded per iteration (the flagship DSP shape); narrowLoopBound handling only `i < n` (not `<=`/`>`/`>=` or non-const counters). None were visible to the net-output bench — exactly why the proof lives at the construct level, not the program level. **Restated headline (honest, and the real moat):** JZ emits provably waste-free, predictable WASM from plain JS — no warmup, no deopt, no GC — and on every case is either proven minimal or at a floor no wasm producer can beat. Benchmarks are *evidence* the contract correlates with wall-clock wins; the verifier is the *proof* the contract holds; the perf-fuzzer and bias audit find the invariants not yet formalized. "We win benchmarks" is induction against a moving target; "we emit waste-free locally-optimal code with bounded variance" is deduction.