# ADR-058: Runner Pool Segmentation and Verification Capacity Governance

| Field | Value |
|-------|-------|
| **Decision ID** | ADR-058 |
| **Initiative** | unsorry CI scalability and fork-safe verification |
| **Proposed By** | unsorry maintainers |
| **Date** | 2026-06-16 |
| **Status** | Proposed |

## Context

ADR-049 makes unsorry safe for untrusted contributors by keeping the
load-bearing soundness verdict on a project-controlled central re-check. Issue
#1206 then shows the next constraint: non-contributor fork support and larger
agent fleets can make verification capacity the bottleneck even when soundness
is already safe.

The current Gate A workflow already has the useful split:

- cheap detection and aggregation jobs run on GitHub-hosted `ubuntu-latest`,
- trusted Lean verification jobs run on namespace.so profiles,
- prepare/build work routes to `namespace-profile-unsorry-prepare`,
- axiom audit work routes to `namespace-profile-unsorry-audit`,
- kernel replay work routes to `namespace-profile-unsorry-replay`.

That split should become an explicit policy. GitHub-hosted runners are useful
as cheap elastic intake capacity, but they give less control and operational
visibility than the namespace lane. Namespace runners provide the better
surface for trusted Lean verification because the project controls the profile,
cache volume, sizing, and logs around the verifier boundary.

As of this decision, the operating model is:

- `namespace-profile-unsorry-prepare`: small prepare/build profile for cache warming, goal builds,
  statement bindings, and archive package validation.
- `namespace-profile-unsorry-audit`: axiom-audit profile sized for one resident mathlib image.
- `namespace-profile-unsorry-replay`: replay profile sized for leanchecker and full-replay
  headroom.
- GitHub-hosted runners: cheap actions, protocol checks, docs checks, PR
  intake, labels, generated-board refreshes, and final aggregation.

## WH(Y) Decision Statement

**In the context of** unsorry scaling from trusted same-repo agents toward fork
contributors, volunteer agents, and more parallel PRs,

**facing** the fact that a single PR can consume several minutes of runner time,
and that letting fork, agent, generated-artifact, and trusted verifier jobs
compete in one capacity pool would create queue starvation and CI denial-of-
service pressure,

**we decided for** explicit **runner pool segmentation and verification
capacity governance**: GitHub-hosted runners are the cheap/intake lane for
low-trust, low-cost, and non-Lean work; namespace.so runners are the trusted
verification lane for Gate A and any future central re-check that can admit
content to the verified library; `namespace-profile-unsorry-prepare`, `namespace-profile-unsorry-audit`, and
`namespace-profile-unsorry-replay` split the trusted verifier work by job role so the operator can
scale prepare/build, axiom audit, and kernel replay capacity independently,

**and neglected** a single shared runner pool (rejected because noisy agent and
fork work can starve merge-blocking verification), GitHub-hosted runners as the
only verifier surface or a direct replacement for the role-specific Namespace
verifier lanes (rejected because
the trusted Lean lane needs stronger profile control, cache-volume control, and
visibility; GitHub-hosted concurrency is useful enough to pilot, but not enough
to make the protected verifier lane opaque by default), namespace runners for every
cheap job (rejected because that wastes paid verifier capacity), contributor
self-hosted runners as a merge-blocking verifier (rejected by ADR-049), and
scaling first by making all runners larger (rejected because capacity isolation
and queue policy are the first-order problem),

**to achieve** a CI architecture where cheap checks stay cheap, trusted
verification stays protected, fork support can be opened without granting
unbounded access to paid verifier minutes, and operator-visible namespace
capacity is reserved for the work that actually carries soundness,

**accepting that** GitHub-hosted runner concurrency and queue visibility are
plan-dependent, that namespace capacity costs more per trusted minute, that
some comments/specs must be kept in sync with operator-side profile sizing, and
that future fork automation must add identity/quota controls before it can
freely spend namespace verifier capacity.

## Runner Classes

| Class | Runner surface | Trust / cost role | Examples |
|-------|----------------|-------------------|----------|
| Cheap intake | GitHub-hosted `ubuntu-latest` | Low-cost checks before verifier spend | path filters, PR labels, ADR/spec lint, protocol checks |
| Required aggregator | GitHub-hosted `ubuntu-latest` | Stable required context wrapper | final `gate-a`, `gate-b` aggregation |
| Prepare verifier | `namespace-profile-unsorry-prepare` | Trusted build/cache preparation | goal builds, statement bindings, archive package validation |
| Audit verifier | `namespace-profile-unsorry-audit` | Trusted axiom-audit verification | serial `axiom_audit` over changed scope |
| Replay verifier | `namespace-profile-unsorry-replay` | Trusted kernel replay | incremental replay and forced full replay |
| Scheduled backstop | namespace profile selected by verifier policy | Defense-in-depth verification | daily full replay with small replay chunk |
| Generated artifacts | GitHub-hosted unless verifier evidence is required | Interruptible maintenance | leaderboard, targets board, visualization refresh |
| Agent exploration | contributor/local or separate agent pool | Noisy advisory work | local proving, retries, candidate generation |

## Capacity Rules

- Cheap GitHub-hosted checks should run before namespace verifier jobs wherever
  possible.
- Fork PRs and unknown agents should not directly spend namespace verifier
  minutes without intake checks, maintainer approval, or future ADR-054 quota
  policy.
- Required merge-blocking verifier jobs must have their own namespace lane and
  must not be starved by generated artifacts or agent exploration.
- A workflow that can admit content to `UnsorryLibrary` must use the trusted
  verifier lane defined by ADR-049.
- Gate A jobs should target their role-specific profiles: prepare/archive on
  `namespace-profile-unsorry-prepare`, audit on `namespace-profile-unsorry-audit`, and replay on `namespace-profile-unsorry-replay`.
- Superseded runs should be cancelled by concurrency groups so stale commits do
  not occupy trusted verifier capacity.
- Runner sizing is an operator-controlled capacity property; the repository
  records the current intended size and routing contract, but correctness must
  not depend on a hidden profile size.
- Switching routine Gate A from the Namespace verifier lanes to GitHub-hosted
  runners requires a shadow benchmark or pilot PR first. The pilot must compare wall time, cache
  restore behavior, failure modes, queue wait, and verifier log quality against
  the namespace lane before any required check is moved.

## Live Operations Transition

This decision is not allowed to assume an empty queue. The repository already
has open proof PRs, queued GitHub-hosted checks, queued namespace Gate A jobs,
and in-progress verifier runs. Runner segmentation must therefore be adopted as
a live operations change.

The transition rules are:

- Do not rename required contexts. `gate-a` and `gate-b` remain the branch
  protection contexts during the transition.
- Do not rename namespace profiles during the transition. `namespace-profile-unsorry-prepare`,
  `namespace-profile-unsorry-audit`, and `namespace-profile-unsorry-replay` remain stable routing labels.
- Do not cancel all existing PR runs as part of the cutover. Superseded runs may
  be cancelled by existing concurrency groups, but active PRs should either
  finish on their current workflow revision or be explicitly rebased/rerun.
- Existing green PRs may merge under the previous workflow revision if they
  already passed the protected checks. The verifier contract did not weaken.
- Existing queued or failed PRs may be re-run under their current revision, or
  rebased onto the new routing docs/policy when a maintainer wants the new
  metadata and comments to apply.
- Keep both namespace profiles available until the open PR queue has drained
  through at least one normal Gate A cycle and one scheduled backstop has
  completed.
- If queue depth is already high, pause new autonomous proof PR creation before
  changing runner capacity.
- Record the cutover as an operator event: timestamp, open PR count, queued
  Gate A count, namespace profile sizes, and any manual cancellations.

The intended migration is additive: the PR documents and clarifies the routing
contract without changing required-check names or the soundness boundary. Future
workflow rewrites that materially change routing must carry their own live
operations plan.

## Consequences

- **Positive.** GitHub-hosted runners absorb cheap PR and docs traffic without
  spending namespace verifier minutes.
- **Positive.** Namespace runners remain reserved for trusted verifier work,
  where cache volumes, profile sizing, and log visibility matter.
- **Positive.** Fork support has a clear governor: fork work can be cheap until
  it earns or receives access to the trusted verifier lane.
- **Positive.** The existing Gate A split becomes an explicit platform contract
  instead of an implicit workflow detail.
- **Negative.** Operator profile changes must be reflected in docs/specs or the
  repository will mislead maintainers.
- **Negative.** GitHub-hosted runner concurrency is useful but plan-dependent,
  so queue guarantees cannot rely on an undocumented fixed number.
- **Negative.** More lanes mean more policy surface: labels, path filters,
  concurrency groups, and quotas must remain understandable.

## References

| Reference ID | Title | Type | Location |
|--------------|-------|------|----------|
| REF-1 | Runner pool segmentation spec | Specification | specs/SPEC-058-A-Runner-Pool-Segmentation-And-Verification-Capacity.md |
| REF-2 | Decentralised CI Runner Architecture | Decision | ADR-049-Decentralised-CI-Runner-Architecture.md |
| REF-3 | Verify-on-Ingest | Decision | ADR-048-Verify-On-Ingest.md |
| REF-4 | Gate A Workflow | Specification | specs/SPEC-006-B-Gate-A-Workflow.md |
| REF-5 | Volunteer-Scale Claim Substrate | Decision | ADR-053-Volunteer-Scale-Claim-Substrate.md |
| REF-6 | Agent Identity, Quotas, and Reputation | Decision | ADR-054-Agent-Identity-Quotas-And-Reputation.md |
| REF-7 | Non-contributor proof submission via forks | Issue | GitHub issue #1206 |

## Status History

| Status | Approver | Date |
|--------|----------|------|
| Proposed | unsorry maintainers | 2026-06-16 |