# Acceptance-Gate Provenance

## Core Principle

**An autonomous loop's STOP/ACCEPT gate determines whether the loop is
same-family-safe. The thing being judged at that gate — not the loop's
subject matter, not how many agents ran — decides whether Claude may
judge it.**

ARIS has loops that keep working until a condition is met: `/auto-review-loop`,
`/dse-loop`, the `/experiment-bridge` auto-debug cycle, the
`/auto-paper-improvement-loop`, and any future "keep going until X"
skill. Every such loop terminates on a gate it evaluates each iteration:
"are we done yet?" That gate is where same-family self-acquittal sneaks
in. The loop body can be all Claude; the **gate** is what this contract
governs.

This is `reviewer-independence.md` and `experiment-integrity.md` applied
to the temporal/iterative case: those two cover single-shot review and
single-shot experiment judging; this one covers the *recurring verdict*
a loop makes on itself, round after round, with no human in between.

One-liner, and the whole doc in seven words:

> **A goal/loop can DRIVE; it cannot ACQUIT.**

The loop may freely *drive* itself toward a target — schedule the next
config, recompile, re-run the failed job, spawn ten search branches.
What it may not do is *acquit* its own work — declare the paper good,
the proof valid, the claim supported, the idea novel, the review
satisfied. Acquittal is a cross-model act.

## The two gate types

Classify **every** stop/accept gate of a loop as exactly one of these.
There is no third bucket; if a gate seems to be both, it is two gates
and you split it (see "Compound gates" below).

### Type-A — EXECUTION / OBJECTIVE gate

A machine-checkable or externally-observable signal of *what happened*,
with no judgment of *merit*. Claude **MAY** self-judge Type-A gates —
it is execution bookkeeping, not a verdict.

A gate is Type-A iff a non-LLM process (a shell exit code, a stat on the
filesystem, a counter, a parser reading a benchmark's own output) could
in principle answer it with the same answer Claude gives.

- ✅ exit code == 0
- ✅ `figures/result.png` exists / `paper/main.pdf` compiled (LaTeX returned 0)
- ✅ N/N jobs finished (queue drained)
- ✅ test suite passed (pytest exit 0)
- ✅ the reviewer **was invoked** (a `codex` thread returned, a JSON verdict file exists)
- ✅ all checklist items were **attempted** (each row touched)
- ✅ no `NaN` in the loss log / training reached `max_steps`
- ✅ the benchmark harness emitted a number and it parsed
- ✅ PATIENCE/TIMEOUT/MAX_ROUNDS budget exhausted (a counter hit its bound)

Type-A gates are *coverage and completion* facts. Claude self-judging
"did the audit run?" is fine; Claude self-judging "did the audit pass?"
is not (that's Type-B).

### Type-B — QUALITY / CORRECTNESS / ACCEPTANCE gate

A judgment of *merit, correctness, or sufficiency*. Claude must
**NEVER** self-judge a Type-B gate — it requires a **different model
family** (per `reviewer-routing.md`: `codex` default, `oracle-pro` on
request, or `manual` **only when** the human routes the prompt to a
genuinely non-Claude model and records which one). This is the
cross-model invariant, applied to the loop's terminating verdict.

- ❌ "the paper is good" / "submission-ready"
- ❌ "the proof is valid" / "the gap is closed"
- ❌ "the claim is supported by the results"
- ❌ "the idea is novel"
- ❌ "the review is satisfied" / "the weaknesses are addressed"
- ❌ "score >= 6" — when *Claude* assigned the score
- ❌ "this config is good enough to publish" / "the result is strong"
- ❌ "the rebuttal answers the reviewer"
- ❌ "the fix is correct" (as opposed to "the fix made the test pass" — that's Type-A)

A Type-B gate, left to the executor, is the loop quietly grading its own
homework every round and stopping the moment it likes the grade. The
fact that it ran a hundred iterations does not launder the verdict: a
hundred rounds of Claude-judging-Claude is still one model family.

### The dividing question

> *Could a dumb script with no taste answer this gate?*
>
> **Yes → Type-A** (Claude may self-judge — it's bookkeeping).
> **No, it needs taste / correctness / domain judgment → Type-B** (route to a different model family).

"The PDF compiled" needs no taste — Type-A. "The PDF is a good paper"
is nothing *but* taste — Type-B. "The job exited 0" — Type-A. "The job's
output is the right answer" — Type-B.

## Compound gates: split, don't average

Many natural-language stop conditions secretly bundle an A-part and a
B-part. `/auto-review-loop`'s real condition is *"score >= 6 AND verdict
contains 'ready'"* evaluated each round — but the **score and the
verdict both come from the cross-model reviewer**, so the A-part Claude
owns is only "did round N's reviewer return?" and "is round < MAX_ROUNDS?".

When you meet a compound gate, decompose it:

```
STOP when "the paper is submission-ready"
  ├─ A: all 3 audits were invoked and emitted JSON   → Claude self-checks
  ├─ A: verify_paper_audits.sh exit code == 0         → external process, Claude reads it
  └─ B: "the paper is actually good enough to submit" → cross-model verdict
```

Never collapse a compound gate to its A-part and call the loop safe. The
B-part doesn't disappear because it's inconvenient; it gets *routed*.

## Decision procedure (for any new autonomous loop)

When you author or review a "keep working until X" skill:

1. **Enumerate every stop/accept gate.** Not just the headline one —
   the early-exit on convergence, the PATIENCE bail-out, the
   per-iteration "is this round done?" check, the final "are we
   finished?" check. Write them down.

2. **Classify each gate A or B** using the dividing question. If it's
   compound, split it (above) and classify the parts.

3. **For every Type-A gate:** Claude may self-judge. Prefer an
   *external* check where one exists (read an exit code, stat a file,
   read a counter) over an LLM "I believe it finished" — Type-A is
   exactly the place where a cheap deterministic check beats a vibe.

4. **For every Type-B gate:** route it to a cross-model verdict per
   `reviewer-routing.md` (default `mcp__codex__codex` at
   `reasoning_effort: xhigh`; `oracle-pro` on request; `manual` only if
   the routed model is verifiably non-Claude and recorded — otherwise it
   is same-family self-acquittal in disguise). Pass file paths, not
   summaries (`reviewer-independence.md`).
   The loop **continues or stops on the reviewer's verdict**, not on
   Claude's reading of it. Save the verdict as an artifact
   (`integration-contract.md` §3) so a third party can confirm the
   acquittal was external.

5. **State the provenance in the SKILL.** One line: "STOP gate = Type-B,
   routed to codex." A reviewer of the SKILL should be able to find,
   for each terminating condition, which model family signs off.

6. **Refuse the anti-pattern:** a loop whose continue/stop decision reads
   an LLM-produced quality verdict that the **same** model family
   (Claude) produced. That is self-acquittal regardless of how the
   prompt is phrased.

Rule of thumb: **if removing the cross-model reviewer would still let
the loop decide to stop, the loop is self-acquitting.** A safe Type-B
loop is *designed* (by this contract) so that removing the external
family's verdict leaves it unable to terminate-accept — a design rule
the skill author enforces, not an automatic structural property.

## ARIS loops mapped to the taxonomy

The codebase **already** follows this rule. This section makes the
implicit pattern explicit and operational for the next loop someone
writes.

| Loop | Headline stop gate | Type | Who acquits | Status |
|---|---|---|---|---|
| `/dse-loop` | objective metric converged / TIMEOUT / PATIENCE | A | benchmark harness emits the number; Claude reads & compares to budget | ✅ safe same-model |
| `/experiment-bridge` auto-debug | "did it run / did it converge" (exit 0, no NaN, training started) | A | exit codes, log parse | ✅ safe same-model |
| `/run-experiment`, `/experiment-queue` retry | job finished / OOM-retry exhausted / N jobs done | A | scheduler + exit codes | ✅ safe same-model |
| `/auto-review-loop` | score >= 6 AND verdict "ready", per round | B | **codex** assigns score & verdict | ✅ already cross-model |
| `/auto-paper-improvement-loop` | "review satisfied" (2 rounds) | B | **codex (GPT xhigh)** review | ✅ already cross-model |
| `/result-to-claim` | `claim_supported ∈ {yes,partial,no}` + `integrity_status` | B | **codex** judges results vs claims | ✅ cross-model |
| `/kill-argument` | rejection memo → defense, residual issues | B | two fresh **codex** threads | ✅ cross-model |
| `/proof-checker` | each gap closed, per round | B | **codex** re-reviews each round | ✅ cross-model |
| `/experiment-audit` | integrity verdict (fake GT, normalization fraud) | B | **codex** audits the eval code | ✅ cross-model |
| `/paper-claim-audit` | every number matches result files | B | fresh zero-context **cross-model** reviewer | ✅ cross-model |
| `/citation-audit` | every entry real & in-context | B | fresh **cross-model** reviewer | ✅ cross-model |
| `/paper-writing` Phase 6 (submission) | `verify_paper_audits.sh` exit 0 | A (gate) **wrapping** B (the audits) | external verifier reads cross-model JSON | ✅ A-gate over B-verdicts |

> 📌 The `/auto-review-loop` row reflects the skill's stop logic: `score >= 6`
> AND verdict contains "ready"/"almost", evaluated each round. (Its `Constants`
> block previously stated this with `OR` and a stale verdict vocabulary — an
> internal inconsistency now reconciled to the `AND` form the Phase-E stop
> check actually uses, in `auto-review-loop` and its `-llm`/`-minimax`
> siblings.) The acquittal is **codex's** score+verdict, so the Type-B
> classification is unchanged.

Two patterns to notice:

- **The execution loops (dse, auto-debug, queue) are Type-A all the way
  down** — "did it run / did it converge" is a fact a harness reports.
  They are *correctly* allowed to self-acquit, because there is nothing
  of merit being judged: a converged number from a real simulator is an
  observation, not an opinion. (The moment someone adds *"...and the
  result is good enough to claim"* to a dse stop condition, that clause
  is Type-B and must route out — see the dse caveat below.)

- **Every quality/correctness loop already routes its acquittal to
  codex.** Nothing here is new behavior; the doc names the rule the
  codebase converged on so the next author doesn't have to rediscover it
  by getting reviewed.

### The dse-loop caveat (objective ≠ acceptance)

`/dse-loop` optimizes a metric the benchmark *itself* produces (cycles,
area, coverage). "Config B beats config A on the harness's own number"
is Type-A — a parser, not Claude, owns it. But two adjacent judgments are
Type-B and must NOT be folded into the loop's self-acquittal:

- "this config is **good enough to ship/publish**" — sufficiency verdict.
- "the benchmark/metric **is the right thing to optimize** / the result
  **generalizes**" — correctness-of-framing verdict.

So dse may self-terminate on *"best config found within budget"* (A), but
the claim *"and this is a publishable result"* leaves the loop and goes
through `/result-to-claim` (B). Driving the search is in-family; acquitting
the science is not.

## Tie to fan-out: breadth is same-family; the jury is not

`fan-out-pattern.md` describes skill-layer fan-out — spawning multiple
agents for breadth (parallel search branches, per-section drafting,
per-entry citation checks). Fan-out interacts with this contract in
exactly one dangerous way:

**Same-family breadth is fine for Type-A coverage. It is NEVER a Type-B
jury.**

- ✅ Ten Claude branches each *attempting* a different search query, then
  unioning hits — Type-A coverage (did we look broadly?). Self-judged
  fine.
- ✅ N Claudes each drafting a section, a Type-A "all sections drafted"
  completion check.
- ❌ N Claude reviewers each scoring the paper, then taking the
  **majority/average as the accept verdict.** This *feels* like a jury
  — independent voters! — but it is correlated same-family blindness
  wearing a jury costume. N agreeing Claudes share the same training
  priors and the same blind spots; their agreement is evidence of
  shared bias, not of correctness. A Type-B verdict needs a **different
  family**, not a *bigger N of the same one*.

> **Known failure mode:** "We ran the review 5× and all 5 said accept,
> so it's robust." Five draws from one distribution is one opinion with
> error bars, not five opinions. The cross-model invariant is about
> *family diversity*, not *sample count*. Fan-out scales breadth and
> Type-A coverage; it can never substitute for the one cross-family
> acquittal a Type-B gate requires.

Fan-out and this contract compose cleanly: fan-out (same family) does
the broad *driving*; the loop always funnels into the identical
cross-model *acquittal* at the Type-B gate. Breadth degrades gracefully
across runtimes (fewer parallel agents = slower, not unsafe); the
acquittal does not degrade — it is always the cross-family verdict, or
the loop is unsafe.

## Required components (for a loop to claim same-family-safe)

A loop is same-family-safe iff **all** hold:

1. **Every stop/accept gate is classified** A or B in the SKILL (compound
   gates split).
2. **Every Type-B gate routes to a cross-model verdict** per
   `reviewer-routing.md`; the loop's continue/stop reads *that* verdict,
   not a Claude re-judgment of it.
3. **The cross-model verdict is an artifact** (`integration-contract.md`
   §3) — a JSON/file a third party can inspect to confirm the acquittal
   was external.
4. **No same-family majority is treated as a Type-B jury** — fan-out
   breadth never substitutes for cross-family acquittal.
5. **Type-A self-judgment prefers an external check** (exit code, stat,
   counter) over an LLM "I think it's done" wherever one exists.

If any fails, the loop can self-acquit and is **not** same-family-safe —
regardless of how many rounds it runs or how confident it sounds.

## Anti-patterns to refuse in review

- **"The loop decides when it's good enough."** Good-enough is Type-B;
  the loop may decide when it's *done running*, not when it's *good*.
- **"We re-review until it passes."** Fine — but *who* says it passed? If
  the answer is Claude, the loop is self-acquitting.
- **"N agreeing agents = consensus."** Same-family agreement is correlated
  blindness, not a jury (see fan-out section).
- **"It converged, so it's correct."** Convergence is Type-A
  (it stopped moving); correctness/sufficiency is Type-B.
- **"Score >= 6, so stop."** Only safe if a *different family* assigned
  the score. Claude scoring Claude and stopping at 6 is self-acquittal.
- **`/loop` wrapping an internal semantic loop.** External cadence
  (`/loop`) is additive only for external-world waits (GPU done?
  overnight heartbeat?). Wrapping ARIS's internal semantic loops with a
  timer breaks `threadId` continuity and re-runs Type-B verdicts on a
  clock instead of on the reviewer's turn — noise at best, a corrupted
  acquittal at worst. Keep external cadence outside the acceptance gate.

## See Also

- `reviewer-independence.md` — the single-shot form: executor never
  filters the reviewer's inputs. Type-B gates inherit this in full.
- `experiment-integrity.md` — the experiment form: the model that writes
  experiment code must not judge its integrity. `/experiment-audit`'s
  Type-B verdict is the loop instance of this rule.
- `reviewer-routing.md` — where Type-B gates send their verdict (codex
  default, oracle-pro on request, manual only with a verified non-Claude
  target).
- `fan-out-pattern.md` — breadth via same-family spawn; this doc's
  fan-out section is the guardrail that keeps breadth out of the jury box.
- `integration-contract.md` §3 — the cross-model verdict must leave an
  inspectable artifact.