# Assurance Contract

ARIS audits emit machine-readable verdicts. The `assurance` axis decides whether
those verdicts are advisory (draft mode) or load-bearing gates (submission mode).
This contract is referenced by `paper-writing`, `paper-claim-audit`, `citation-audit`,
`proof-checker`, and the external verifier `tools/verify_paper_audits.sh`.

## Why a separate axis from `effort`

Historically `effort` (lite/balanced/max/beast) was conflated with audit strictness.
The result: `effort: beast` did not guarantee mandatory audits ran — phases were
gated by content detectors (e.g. `if \begin{theorem} exists`) and could silently
skip. A user reported `effort: beast` produced a "draft-quality" paper with all
three submission-gate audits skipped.

The fix is to split the concerns:

| Axis | Controls | Default |
|------|----------|---------|
| `effort` | depth/cost (papers, rounds, ideation) | `balanced` |
| `assurance` | audit strictness — silent-skip-allowed vs verdict-required | derived from `effort` (see mapping) |

Override either independently: `— effort: balanced, assurance: submission` is
legal and means "normal depth, but every audit must emit a verdict before
finalization."

## Assurance Levels

### `draft` — current behavior, no breakage
- Audits run only if their content detector matches.
- Silent skip allowed.
- `paper-writing` Phase 6 produces a final report regardless.
- For: rapid iteration, exploratory drafts, early-stage research.

### `submission` — load-bearing audits
- All mandatory audits **must** emit a verdict (one of the 6 below).
- Silent skip is **forbidden**.
- `paper-writing` Phase 6 invokes `tools/verify_paper_audits.sh`; non-zero exit
  blocks Final Report.
- The Final Report tags itself `submission-ready: yes/no` based on verifier output.
- For: conference / journal submission, anything you'd put your name on.

## Default Mapping (derived if `assurance` not given)

| `effort` | implied `assurance` |
|----------|---------------------|
| `lite` | `draft` |
| `balanced` | `draft` |
| `max` | `submission` |
| `beast` | `submission` |

This means a user passing only `— effort: beast` automatically gets full audit
enforcement — matching their intent ("turn everything up"). Users wanting
strict audits at lower depth pass `— assurance: submission` explicitly.

## Verdict State Machine

Every mandatory audit must emit exactly one of these — never silent skip:

| Verdict | Meaning | Audit ran? | Submission-blocking? |
|---------|---------|-----------|----------------------|
| `PASS` | All checks passed | Yes | No |
| `WARN` | Issues found, none disqualifying | Yes | No |
| `FAIL` | Disqualifying issues found | Yes | **Yes** |
| `NOT_APPLICABLE` | Detector negative; nothing to audit (e.g., no theorems in paper, no `\cite`s, no numeric claims) | Audit phase ran, child audit invocation may have been skipped | No |
| `BLOCKED` | Audit should apply but prerequisites are missing or unsupported (e.g., paper has numeric claims but no `results/` directory; paper cites references but `.bib` missing) | Could not complete | **Yes** |
| `ERROR` | Audit invocation failed (network, timeout, malformed reviewer output) | Attempted but errored | **Yes** at submission |

### Why `NOT_APPLICABLE` is not the same as `SKIP`

`NOT_APPLICABLE` means **the audit phase ran**, the detector returned negative,
and a verdict artifact was written documenting "we checked, there's nothing to
verify." This is verifiable from outside the LLM — the artifact file exists.

A silent skip leaves no record. There's no way to distinguish "we checked and
there was nothing" from "we forgot." This contract makes that distinction
mandatory.

### Why `BLOCKED` is more dangerous than `NOT_APPLICABLE`

`BLOCKED` means the audit *should* have run but cannot. Example: a paper claims
`accuracy = 89.2%` but has no `results/` directory to verify against. That's not
"nothing to audit" — that's "we cannot verify a load-bearing claim." Treating
this as `SKIP` masks the danger; `BLOCKED` surfaces it and blocks submission.

## Required Audit Artifact Schema

Every mandatory audit must write a JSON artifact (and may also write a
human-readable Markdown sibling). The JSON must contain at minimum:

```json
{
  "audit_skill": "paper-claim-audit",       // citation-audit, proof-checker, etc.
  "verdict": "PASS",                         // one of the 6 above
  "reason_code": "all_numbers_match",        // skill-specific short string
  "summary": "Verified 23 numeric claims against 4 result files; no mismatches.",
  "audited_input_hashes": {
    "main.tex":                          "sha256:a3f8...",
    "sections/5.evidence.tex":           "sha256:b2d1...",
    "/Users/me/project/results/run_2026_04_19.json": "sha256:c9e4..."
  },
  "trace_path": ".aris/traces/paper-claim-audit/2026-04-21_run01/",
  "thread_id":  "019dae73-fc12-4ab8-...",
  "reviewer_model": "gpt-5.4",
  "reviewer_reasoning": "xhigh",
  "generated_at": "2026-04-21T14:23:01Z",
  "details": {
    // skill-specific structured data
  }
}
```

Field semantics:

- **`audited_input_hashes`** — SHA256 of every file the audit consumed.
  - Keys are **paths relative to the paper directory** (the argument
    passed to `verify_paper_audits.sh`) for files inside it, or
    **absolute paths** for files outside it (e.g. `../results/run.json`
    is legal but `/Users/me/project/results/run.json` is more portable).
    Do NOT prefix in-paper files with `paper/` — the verifier already
    resolves relative to the paper dir and `paper/paper/main.tex` will
    false-fail. The verifier rehashes the current files and flags `STALE`
    if any hash changed since the audit ran. (User edited `main.tex`
    after running `paper-claim-audit`? The next verifier run will catch it.)
- **`trace_path`** — directory containing the full reviewer prompt + response
  pair, per `review-tracing.md`. Required for mandatory audits — not optional.
- **`thread_id`** — Codex MCP thread ID, for forensic traceability.
- **`reviewer_model`** + **`reviewer_reasoning`** — proves cross-family review
  invariant was honored.
- **`generated_at`** — UTC ISO-8601 timestamp.

## Verifier Contract

`tools/verify_paper_audits.sh <paper-dir>` is the single source of truth for
"are mandatory audits complete and current?" It must:

1. Locate the paper-writing manifest (which mandatory audits applied this run).
2. For each, check artifact JSON exists at expected path.
3. Validate artifact JSON against required-fields schema (above).
4. Verify `verdict` is one of the 6 allowed values.
5. Recompute SHA256 of every file in `audited_input_hashes`; flag `STALE` if any
   mismatches.
6. Verify `trace_path` exists and is non-empty.
7. Output a structured JSON report and exit 0 (all green) or 1 (any FAIL /
   BLOCKED / ERROR / STALE / missing artifact).

Phase 6 of `paper-writing` invokes the verifier; at `assurance: submission`,
non-zero exit blocks Final Report generation.

## Subskill Contract: "Always Emit, Never Block"

Child audit skills (`paper-claim-audit`, `citation-audit`, `proof-checker`)
follow this contract:

- **Always emit a verdict artifact**, even on detector-negative or error paths.
- **Never block** the parent's flow themselves — they only emit verdicts.
- **The parent skill** (`paper-writing` Phase 6 + verifier) decides whether a
  given verdict blocks finalization. This decision lives in *one* place
  (`assurance` axis + verifier), not duplicated across child skills.

Earlier wording in `paper-claim-audit` and `citation-audit` (e.g., "audit is
advisory, never blocking") referred to this division of labor — but conflicted
with `paper-writing`'s declaration that they were "mandatory submission gates."
This contract resolves the conflict: child = always emit; parent = decides
blocking based on assurance level.

## Examples

### Theory paper, beast effort
```
— effort: beast    (implies assurance: submission)
```
- `proof-checker` runs, audits theorems → `PASS` or `WARN` or `FAIL`
- `paper-claim-audit` runs, finds numbers → `PASS`
- `citation-audit` runs, audits refs → `PASS`
- Verifier: all green
- Final Report: `submission-ready: yes`

### Position paper (no theorems, no numbers, no experiments), beast effort
```
— effort: beast    (implies assurance: submission)
```
- `proof-checker` invoked → no theorems found → emits `NOT_APPLICABLE`
- `paper-claim-audit` invoked → no numeric claims → emits `NOT_APPLICABLE`
- `citation-audit` invoked → audits refs → `PASS`
- Verifier: all green (NOT_APPLICABLE is not blocking)
- Final Report: `submission-ready: yes` with note "no theorems / no numeric claims to audit"

### Empirical paper missing raw results, beast effort
```
— effort: beast
```
- `proof-checker` → `NOT_APPLICABLE`
- `paper-claim-audit` invoked → finds claims like `accuracy = 89.2%` but
  `results/` is empty → emits `BLOCKED` with reason_code `no_raw_evidence`
- `citation-audit` → `PASS`
- Verifier: exit 1 (BLOCKED is submission-blocking)
- Final Report: **refuses to finalize**; surfaces "Mandatory audit BLOCKED:
  paper-claim-audit cannot verify numeric claims — no raw result files found.
  Add results/ or downgrade to `— assurance: draft`."

### Stale audit (user edited paper after running audits)
- User runs `/paper-writing` at beast → all audits PASS, files written
- User edits `sec/5.evidence.tex` to change a number
- User reruns the verifier (or re-finalizes)
- Verifier rehashes → `audited_input_hashes` mismatch → `STALE` flag → exit 1
- Final Report: refuses; instructs user to rerun `paper-claim-audit` and
  `citation-audit` before re-finalizing.

## Backward Compatibility

- Users on `effort: balanced` (default) get `assurance: draft` — **identical
  current behavior, no breakage**.
- Users explicitly using `effort: max` or `effort: beast` automatically get
  `assurance: submission` — matching their intent.
- Users wanting the old "beast = depth only, no audit enforcement" can pass
  `— effort: beast, assurance: draft` (explicit override). This combination is
  legal but discouraged for actual submissions.

## See Also

- `effort-contract.md` — depth/cost axis (separate concern)
- `review-tracing.md` — trace artifact protocol (referenced by `trace_path`)
- `reviewer-independence.md` — cross-model review invariant
- `tools/verify_paper_audits.sh` — external verifier implementation