# Fan-Out Pattern When a skill needs **breadth** — many candidate ideas, many sources, many attack angles, many proof obligations, many draft sections — it may fan the generation step out across same-family subagents. This document is the canonical convention for doing that **without** weakening the cross-model jury that the entire ARIS design rests on. Rule of thumb: **Fan-out is 火力 (firepower); the jury is 裁判席 (the bench). Subagents GENERATE candidates; they NEVER score them.** Fan-out multiplies how much breadth you can cover per unit time. It does not, and must not, change *who renders the verdict*. The verdict stays a single, heterogeneous, cross-model step — identical whether you fanned out across 8 parallel workers or ran one shard at a time on a slow night. ## Core principle: decouple FAN-OUT from JURY These are two different operations and they are governed by two different rules: | | FAN-OUT (breadth) | JURY (verdict) | |---|---|---| | What it does | Generates N candidate items | Renders the STOP/ACCEPT decision | | Who runs it | Same-family subagents (Claude clones, or codex shards) | A **different** model family (`reviewer-routing.md`) | | Allowed to judge quality? | **No.** Generate only. | **Yes.** That is its only job. | | Failure if violated | None (it's just more candidates) | Invariant breach: model judges its own family's output | | Analogy | 火力 — fire more shots | 裁判席 — the bench that rules | The decoupling is the whole point. A subagent that both generates a candidate *and* decides whether it is good has collapsed the two operations and re-introduced exactly the correlated blind spot that heterogeneous review exists to remove. A Claude subagent generating an idea, then a Claude orchestrator declaring that idea "novel" or "publishable," is a Claude judging Claude — the invariant is dead, no matter how many subagents were involved. So the contract on every shard is narrow and absolute: - ✅ A shard MAY: enumerate, draft, propose, retrieve, hypothesize, decompose, attack — i.e., emit candidate items. - ❌ A shard MUST NOT: rank candidates against each other, declare one "best," assert novelty/soundness/publishability, decide the loop is done, or otherwise render the acceptance verdict. Mechanical operations on the merged candidate set (deduplication, clustering, schema validation, sorting by a declared field) are **not** judgment and are explicitly allowed on the executor — see § Structured-output contract. ## The 3-tier degradation ladder Fan-out is a **skill-prompt pattern, not a harness capability.** ARIS already fans out today on runtimes that have no parallel-orchestration primitive at all (`/kill-argument` runs two sequential fresh codex threads with **no Agent tool**; `/citation-audit` verifies per-entry; `/proof-checker` re-derives per-round). A richer runtime (ultracode / Workflow true parallelism) merely *accelerates* the same pattern. Therefore fan-out must degrade gracefully across runtimes. The three tiers below differ **only** in how the candidate-generation step is dispatched. They terminate in the **identical** cross-model jury step. | Tier | Dispatch mechanism | When available | |---|---|---| | **Tier 1** | ultracode / Workflow true parallel — N shards run concurrently with dynamic orchestration | Runtime exposes a parallel-spawn primitive | | **Tier 2** | Plain `Agent`-tool spawn — N subagents launched, no dynamic orchestration (static fan, collect, merge) | Host has the `Agent` tool but no Workflow engine | | **Tier 3** | Sequential fallback — the same N shards run one-by-one, each in a **fresh context** (context reset between shards) | Any runtime, including codex CLI / bare Claude Code with no Agent tool | ``` ┌─────────────────────────────────────────┐ Tier 1 ──┐ │ │ Tier 2 ──┼──► │ merged union → mechanical dedup (SAFE) │ ──► CROSS-MODEL JURY Tier 3 ──┘ │ (executor-side, NOT judgment) │ (identical step) └─────────────────────────────────────────┘ (dispatch differs) (same) (same — invariant) ``` **The jury invariant is strictly orthogonal to whether subagents exist.** Tier 3 with zero subagents (one fresh-context pass per shard, in series) must produce a verdict from the *same* cross-model jury as Tier 1 with eight parallel workers. If a skill cannot run Tier 1, it drops to Tier 2; if it cannot run Tier 2, it drops to Tier 3. It never drops the jury. Degrading the dispatch is free; degrading the verdict is a breach. Known failure mode: a skill author "optimizes" Tier 3 by letting the single sequential pass *also* pick the winner, because there is no orchestrator to do it. That is self-acquittal smuggled in through the fallback path. Tier 3 still ends at the cross-model jury; the sequential pass only generates. ## Structured-output contract for shards Every shard returns a **structured result set**, not prose, so the merge + dedup + jury steps can operate mechanically. There are two envelope shapes, chosen by what the shard does — but they share one invariant: `shard_id` + a keyed list + a `dedup_key` per item. **Generation fan-out** — the shard *produces* new candidates (idea lenses, attack axes, draft variants). Returns `candidates[]`: ```json { "shard_id": "lens:scaling-regime", "candidates": [ { "kind": "idea | attack | draft_section", "payload": "", "provenance": "", "dedup_key": "" } ] } ``` **Extraction fan-out** — the shard *reads* a fixed input set and reports the units it finds (papers in a verified set, obligations in a proof). Returns `entries[]` with the same per-item keys, except `dedup_key` is the unit's **pre-existing canonical id** (assigned upstream), not a freshly normalized string: ```json { "shard_id": "section:4.2", "entries": [ { "kind": "source | proof_obligation", "payload": "", "dedup_key": "" } ] } ``` The `dedup_key` is what makes mechanical clustering possible without judgment: for generation, normalize titles / claim-stems / obligation-statements to a canonical string and cluster on string match / near-match; for extraction, the canonical id already identifies the unit. No model decides "are these the same?" by *taste* — the key decides by *normalization rule*. Domain-specific fields (an idea's hypothesis, a paper's method) may be inlined alongside these keys rather than buried in an opaque `payload`. ### Dedup discipline Deduplication runs on the merged union, **on the executor (Claude), BEFORE the jury**, and is **SAFE** because it is mechanical, not judgment: - ✅ Cluster candidates by `dedup_key` (exact + near-match on a declared metric). - ✅ Drop exact duplicates; collapse near-duplicates into one representative + a count. - ✅ Sort/limit by a *declared field* (e.g. keep top-K by retrieval score the source already returned). - ❌ Drop a candidate because the executor *thinks* it's weak — that is quality judgment and belongs to the jury. - ❌ Re-rank candidates by the executor's own quality opinion before the jury sees them — that pre-filters the jury's input with same-family judgment. Required ordering: **dedup BEFORE jury, on the merged union.** This is not just hygiene — it is a cost-control invariant. The jury backend (codex GPT-5.5 / Gemini / oracle-pro) is the rate-limited, token-expensive resource. Sending it 40 candidates of which 25 are near-duplicate is a waste of the scarce cross-model budget and invites rate-limit failure mid-verdict. Mechanical dedup on the cheap same-family side, first, keeps the expensive heterogeneous step lean. ``` fan-out (N shards) → merge union → mechanical dedup (Claude, SAFE) → CROSS-MODEL JURY └ cheap, judgment-free, └ expensive, rate-limited, shrinks the jury's input set sees a deduped set only ``` ## When to fan out — and when NOT to Fan out when the task is **breadth-bound**: its quality scales with how much of the candidate space you cover, and coverage is the bottleneck. | Fan out (breadth-bound) | Do NOT fan out (value IS the single jury) | |---|---| | Idea generation across lenses | `/novelty-check` — the verdict IS the product | | Literature retrieval across sources | `/research-review` — single heterogeneous critique | | Attack-angle enumeration | `/experiment-audit` — one cross-model integrity ruling | | Proof-obligation extraction | `/peer-review` meta-review — one external verdict | | Draft-section first passes | Any skill whose output *is* the acceptance decision | Known failure mode (the one to refuse in review): fanning out a **judgment** skill across Claude clones. `/novelty-check`, `/research-review`, `/experiment-audit`, and the `/peer-review` meta-review do not have a breadth bottleneck — their entire value is the *single heterogeneous jury verdict*. Spawning eight Claude subagents to each "assess novelty" and then aggregating their opinions does not give you eight independent reviews; it gives you eight **correlated** Claude opinions (same family, same blind spots) dressed up as a panel. Worse, it dilutes the invariant: the aggregate now *looks* like a review but was never adjudicated by a different model family. If a skill's deliverable is a verdict, you may fan out the *evidence-gathering* that feeds the verdict, but the verdict itself stays a single cross-model call. One-liner to apply at review time: **fan out the search for candidates; never fan out the bench.** ## Worked examples (real ARIS skills) ### `/kill-argument` — Tier 3 sequential fan-out, NO Agent tool `/kill-argument` is the canonical proof that fan-out is a prompt pattern, not a harness feature. It runs **two** fresh `mcp__codex__codex` threads in series — Thread 1 writes the strongest 200-word rejection memo; Thread 2 (independent, no `codex-reply`) decomposes that memo into 3-7 atomic rejection points and adjudicates each. There is **no `Agent` tool** in its `allowed-tools`; the "fan" is the decomposition into per-point obligations, run sequentially with context reset between threads. The jury here is cross-model by construction — both threads are GPT-5.5 adjudicating a Claude-executor's paper, and **the skill code computes the final verdict from per-point counts; the codex thread is forbidden from emitting the top-level verdict** (`Verdict is computed by the skill, not by the adjudicator`). Generation (the attack, the per-point classification) fans out; the ACCEPT/FAIL mapping is mechanical and lives in the skill, not the model. ### `/idea-creator` — Tier-1 parallel lens fan-out → dedup → existing cross-model jury `/idea-creator` fans out idea generation across analytic *lenses* (structural gaps: method-in-A-not-B, contradictory findings, untested assumptions, unexplored scaling regimes — Phase 1). On a Tier-1 runtime these lenses run as parallel shards; on Tier 3 they are enumerated in one pass. After fan-out the merged set should be **mechanically deduped only** (cluster near-identical ideas; never drop one for being "weak"). The **jury** is the already-existing Phase-4 cross-model devil's-advocate pass: GPT-5.5 via Codex MCP surfaces the strongest reviewer objection per idea and ranks for a top venue. `/idea-creator` does **not** currently declare the `Agent` tool — it was stripped in the WB2 least-privilege sweep because its body does not yet fan out. The WB3 fan-out refactor re-grants `Agent` in the *same* change that wires these lens shards and fixes the Phase-3 gap below (per the re-grant rule in **Allowed-tools hygiene**). On a Tier-1 runtime the lenses then run as Workflow shards; on Tier 3 they fall back to sequential enumeration with no grant needed. > ⚠️ **Known gap — idea-creator is an *aspirational* example here, not yet a clean one.** > Today `/idea-creator` Phase 3 (`skills/idea-creator/SKILL.md:159,175`) > does same-family *quick novelty check + feasibility gating* and > **eliminates ideas** before the Phase-4 cross-model jury ever sees them. > That is exactly the ❌ "executor pre-filters the jury's input with > same-family quality judgment" this doc forbids above — a Type-B > novelty/quality verdict made same-family (see > [`acceptance-gate.md`](acceptance-gate.md)). The fan-out refactor must > push all novelty/quality elimination INTO (or after) the Phase-4 > cross-model jury; Phase 3 keeps only mechanical dedup + *objective* > feasibility (compute/time budget), and every non-duplicate idea reaches > the jury. Fixing this is part of fanning the skill out, not a separate > chore. ### `/research-lit` — per-source fan-out, deterministic gate as "jury" `/research-lit` fans out retrieval across sources (arXiv, Semantic Scholar, OpenAlex, Exa, DeepXiv, Zotero, web) under integration-contract **Policy D2** (multi-source aggregate: invoke every resolved source, warn-and-continue on per-source failure, proceed if ≥1 contributed). Here the "jury" is **not** an LLM at all — it is the **deterministic** `verify_papers.py` gate (Policy D1: 3-layer arXiv / CrossRef / S2 cross-check), which decides KEEP / `[UNVERIFIED]` by mechanical cross-reference, not by taste. This is the **near-zero-risk** corner of the design space: the candidate generators are same-family (or just API fetchers), but the acceptance gate is a deterministic external verifier, so there is no same-family-self-judgment risk to begin with. When the "jury" is a deterministic check rather than a model verdict, the cross-model-family rule is automatically satisfied (a process is not a model family). Fan out freely. ## Cross-references - **`reviewer-routing.md`** — jury backend selection. The cross-model jury step routes through Codex MCP (`gpt-5.5`, `xhigh`) by default, or Oracle MCP (`gpt-5.5-pro`) under `— reviewer: oracle-pro`. Fan-out tier never changes the jury backend. - **`reviewer-independence.md`** — the jury call receives **file paths only**, in a **fresh thread**, with no executor summary/interpretation. This applies to the post-fan-out jury exactly as to any other review: the deduped candidate set is handed over as artifacts the reviewer reads itself, not as the executor's pre-digested ranking. - **`acceptance-gate.md`** — when self-judgment is allowed. Self-judging EXECUTION-completeness (exit code, files exist, N shards returned, PDF compiled) is SAFE same-model; self-judging QUALITY/CORRECTNESS (idea novel, proof valid, claim supported, review satisfied) MUST be cross-model. A fan-out loop may self-verify *that all N shards ran*; it may not self-verify *that the candidates are good*. The loop can DRIVE; it cannot ACQUIT. - **`integration-contract.md`** — fan-out across sources/helpers uses the §2 resolver chain and the Policy D1/D2 failure policies; the jury step, when load-bearing, needs an artifact + verdict schema like any audit. ## Required components for a fan-out skill A SKILL that fans out must specify all of: 1. **Tier-portable dispatch.** State the Tier-1 parallel form AND the Tier-3 sequential fallback. Never assume `Agent` or Workflow exists. 2. **Per-shard structured output.** Each shard returns a structured object keyed by `shard_id`, never prose. A *generation* fan-out (e.g. idea-creator's lenses) returns `candidates[]`, each item carrying a `dedup_key`. An *extraction* fan-out over a fixed input set (e.g. research-lit per-paper, proof-checker per-section) returns `entries[]`, each item carrying its canonical id as the `dedup_key`. Either shape: `shard_id` + a keyed list + a dedup/identity key per item. 3. **Mechanical dedup before the jury.** On the merged union, on the executor, judgment-free, declared metric — to control jury cost and rate-limit exposure. 4. **A single cross-model jury step** (per `reviewer-routing.md` + `reviewer-independence.md`) — OR a deterministic verifier gate — that is **identical** across all three tiers. 5. **A breadth-bound justification.** State why this task benefits from breadth. If the deliverable IS a verdict, do not fan out the verdict; fan out only the evidence that feeds it. ## Allowed-tools hygiene — the `Agent` grant policy `Agent` in a skill's `allowed-tools` frontmatter is the capability gate for **Tier-2** dispatch (spawning Claude subagents via the Agent tool). It is **granted only to skills whose body actually fans out** — i.e. whose prose instructs the model to spawn parallel Claude subagents. It is **not** boilerplate to be copied across skills. This matters because the other two tiers need no per-skill grant: - **Tier-1** (ultracode / Workflow) is a *harness* capability, not a tool a skill lists. A skill cannot "grant itself" Workflow; the runtime provides it. So fanning out at Tier-1 requires no `Agent` in `allowed-tools`. - **Tier-3** (sequential fallback) spawns nothing — e.g. `/kill-argument` runs its two passes as fresh `mcp__codex__codex` threads, no Agent tool. Correctly, `kill-argument` does **not** grant `Agent`. So `Agent` is needed *only* for the Tier-2 form, *only* in skills that genuinely fan out. As of the WB2 least-privilege sweep, **no mainline skill spawns Claude subagents in its body**, so no mainline skill grants `Agent`. (48 vestigial grants — pure copied boilerplate, never invoked — were removed.) Note that "reviewer **sub-agent**" in several skills refers to the cross-model *codex/GPT reviewer*, not the Agent tool, and never implied a real grant need. **Re-granting rule.** A skill that adds genuine fan-out re-introduces `Agent` to its `allowed-tools` **in the same change that adds the fan-out prose**, and that prose must cite this document (`fan-out-pattern.md`) so the grant is self-justifying. Grant tracks usage; never the reverse. **Enforcement.** `tools/check_skills_inventory.py` fails the drift check if any mainline skill grants `Agent` without citing `fan-out-pattern.md` in its body. This keeps vestigial grants from creeping back and guarantees every real grant is traceable to the convention it follows.