# External Cadence

External schedulers — `/loop`, `/schedule`, `CronCreate`, and any
wall-clock "wake me every N minutes" mechanism — decide **WHEN** an
agent wakes up. They do not, and must not, decide **WHO** judges the
work or **WHETHER** a result is accepted.

## Core Principle

**External cadence is pure fire-control. It is never a jury.**

A scheduler picks the firing moment. It points the agent at a task at a
chosen time. It has no opinion on correctness, quality, novelty, or
publishability, and it must never silently re-spawn an agent or drop a
verdict step in order to stay cheap or finish faster.

Rule of thumb: **cadence can DRIVE; it cannot ACQUIT.** This is the
fire-control corollary of the acceptance-gate rule
(`acceptance-gate.md`): a goal/loop may keep an agent going, but the
STOP/ACCEPT decision still belongs to whoever the acceptance gate
assigns it to — for quality/correctness verdicts, that is always a
different model family (`reviewer-independence.md`).

## Known failure mode (why this doc exists)

External cadence is genuinely useful for one shape of work — waiting on
the external world — and genuinely harmful for another — wrapping
ARIS's own internal semantic loops. The two look superficially similar
("run this skill again later"), so people reach for `/loop` on both. The
harmful case has a specific pathology:

- **Verdict re-run on a wall-clock timer.** Wrapping
  `/auto-review-loop` in `/loop 30m` does not produce 30-minutes-better
  review. It re-runs a verdict-bearing skill on a clock that has nothing
  to do with whether the artifact changed. Zero new signal, full token
  cost.
- **Thread discontinuity.** ARIS's multi-round review skills carry state
  across rounds in the reviewer's own thread: `codex-reply` reuses the
  round-1 `threadId` and the accumulated `REVIEWER_MEMORY` so the
  reviewer can check resolution against its *own* prior critique
  (`reviewer-independence.md`, Exception). An external `/loop` re-enters
  the skill from the top each tick, starting a *fresh* `threadId`. The
  reviewer loses its memory of what it already flagged; "did you fix
  round 1's gap?" becomes unanswerable.
- **Duplicated scheduling.** `/experiment-queue` already runs a
  detached server-side scheduler that polls job status every 60s and
  enforces `depends_on`. Wrapping the queue skill in an external poll
  loop duplicates that scheduler on a second, uncoordinated clock and
  invites wave-transition races the queue was built to prevent.

The fix is a clean split: external cadence for **external-world-wait**,
never for **internal semantic loops**.

## The distinction

| | External-world-wait (ADDITIVE) | Internal semantic loop (HARMFUL to wrap) |
|---|---|---|
| What it waits on | A fact in the outside world: job done, metric logged, file landed | A judgment the agent itself produces |
| What advances it | Reality changing (GPU frees, epoch logs, PDF compiles) | A model emitting a verdict |
| Owns its own loop? | No — without cadence a Claude session blocks on `sleep` | Yes — the skill already iterates internally, carrying its own round-to-round state (a reviewer thread, or fed-forward summaries) |
| Cadence replaces | A blocking session burning context on a wait | Nothing — it only re-spawns and re-judges |
| Acceptance gate | Machine-checkable existence/completion (safe same-model) | Quality/correctness (must be cross-model) |

One-liner: **schedule the wait, never the verdict.**

## ADDITIVE cases (external-world-wait shape)

These replace a Claude session that would otherwise sit `sleep`-ing on
an external event. The cadence is the *only* thing the agent is waiting
for; no semantic judgment is being re-run. ARIS already validated this
pattern in production.

- **GPU / experiment job completion polling.**
  `/monitor-experiment` + `/check-gpu` on a cadence: "is the job done?
  are the GPUs still busy?" The agent wakes, reads status, and either
  reports done or sleeps again. The thing it waits on (job exit, GPU
  free) is external and machine-checkable.
- **WandB anomaly checks.** `/training-check` is *already* cron-wired:
  its SKILL.md sets itself up via `CronCreate` ("do not ask the user
  whether to set it up — just set it") to read WandB metrics every N
  minutes and catch NaN / divergence / idle GPUs early. The cadence
  exists so the agent does not have to hold a session open for the whole
  training run.
- **Experiment-queue progression visibility.** Periodically surfacing
  *where the queue is* (N done / N running / N pending) so a human can
  watch overnight progress. Read-only visibility — see the fence below
  on not re-polling the queue's own scheduler.
- **Overnight `research-pipeline` heartbeat.** A non-judgmental wake
  that checks whether the current phase is still advancing and, if a
  phase has stalled, nudges it forward. Heartbeat only — see the
  overnight-pipeline rule below.
- **Daily literature watch.** A once-a-day `/research-lit` or
  `/deepxiv` sweep for new arXiv papers in a tracked direction. The
  external fact is "the world published something new today"; the
  cadence just sets the polling rhythm.

ARIS's own `tools/watchdog.py` makes the additive shape explicit: it
aggregates per-task status into a `summary.txt` whose header documents
it as a "one-line-per-task summary for CronCreate polling." The
artifact is built *so that* an external low-frequency poller can read
completion state cheaply, without holding a session open.

### Why these are safe same-model

In every additive case the acceptance gate is **execution-completeness**
— exit code, file exists, N jobs ran, metric logged, PDF compiled. Those
are machine-checkable, so the polling agent may judge them itself
(`acceptance-gate.md`: "self-judging EXECUTION-completeness is safe
same-model"). The cadence never touches a quality/correctness verdict.

## NOISE / HARMFUL cases (wrapping internal semantic loops)

- **`/loop` around `/auto-review-loop`.** The auto-review loop *is*
  already a loop: review → implement fix → re-review, with the reviewer
  holding round-to-round memory in one `threadId`. Wrapping it in an
  external timer breaks that continuity (a fresh `threadId` per tick,
  `REVIEWER_MEMORY` reset) and fires a verdict on wall-clock time
  instead of on artifact change. Pure noise.
- **Polling `/experiment-queue` on a timer.** Duplicates the queue's
  own 60s server-side scheduler on a second clock, racing its
  wave-transition logic. Use the queue's status output for visibility;
  do not run a competing poll loop.
- **Re-asking an agent to "improve the paper" on a timer.** Quality
  does not improve on a schedule. A timed "improve again" with no new
  review signal is token burn — and if the loop also *accepts* its own
  output to decide whether to stop, it has crossed from fire-control
  into self-acquittal, which the acceptance gate forbids.

## The fence: do NOT wrap these in external cadence

Any **verdict-bearing** skill — one whose output is a judgment of
quality, correctness, support, novelty, or satisfaction — must run on
its own internal cadence with its own round-to-round state (a persistent
reviewer thread, or prior-round summaries fed forward — whichever the skill
uses), and must terminate in the cross-model jury. Never put one inside
`/loop`, `/schedule`, or `CronCreate`:

- `/auto-review-loop` — already loops internally; reviewer carries
  round-to-round memory in one `threadId` (`codex-reply`)
- `/auto-review-loop-llm`, `/auto-review-loop-minimax` — same loop, alternate
  reviewer backend; same internal round cadence (each round's prior-round
  summary is fed into the next prompt — a stateless per-round API call, not a
  shared thread, but still verdict-bearing and self-iterating)
- `/auto-paper-improvement-loop` — review → fix → recompile loop with its own
  round structure and a fresh-reviewer bias guard each round (no `codex-reply`)
- `/research-review` — produces a cross-model review verdict
- `/result-to-claim` — judges whether results support a claim
- `/experiment-audit` — judges experiment integrity
- `/paper-claim-audit` — judges paper-to-evidence fidelity
- `/citation-audit` — judges bibliographic correctness
- `/proof-checker` — judges proof validity across rounds
- `/kill-argument` — adversarial accept/reject verdict

If you find yourself wanting to schedule one of these, the thing you
actually want to schedule is the *external wait that precedes it* (job
done → then audit once), not the verdict itself.

> **Adjacent but distinct — `/dse-loop`.** It also loops internally, so do
> not wrap it in external cadence either, but for a *different* reason: its
> stop gate is an **objective machine-checkable metric** ("objective met or
> timeout"), which is Type-A, not a quality verdict — so it is not a
> self-acquittal hazard. The reason not to wrap it is **scheduler
> duplication** (component #4 below), the same reason as `/experiment-queue`,
> not the verdict fence. Its own objective gate is a safe same-model
> self-termination (`acceptance-gate.md`).

## The affordance: natural external-wait surfaces

These are the surfaces external cadence is *for*. They wait on the
outside world and self-judge only machine-checkable completion:

- `/monitor-experiment` — poll for job completion / progress
- `/check-gpu` — poll for GPU availability and running processes
- `/experiment-queue` — **visibility only** (report position); never a
  re-poll that competes with its own scheduler
- overnight `/research-pipeline` — a **non-judgmental heartbeat + nudge**
  (see next), never a quality gate

## The overnight-pipeline rule

An overnight `research-pipeline` heartbeat may wake on a cadence,
detect that a phase has **stalled** (no progress since last tick,
process died, waiting on a freed resource), and **nudge** it forward —
unblock a stuck step, restart a dropped job, prod a phase to continue
("搞快点"). That is fire-control: it changes *when/whether work
resumes*, not *whether work is good*.

The heartbeat must **NEVER** become a quality gate. It may not decide
that a paper is good enough, that a proof holds, that a claim is
supported, or that a review is satisfied. Every such verdict stays on
its skill's own internal cadence and terminates in the cross-model jury
(`acceptance-gate.md`). The nudge keeps the pipeline moving; it does not
acquit the work the pipeline produces.

One-liner: **a heartbeat may say "keep going," never "good enough."**

## Required components (when you add external cadence to a skill)

1. **Waits on an external fact, not a self-verdict.** State the fact in
   one observable line: "job exit code present," "epoch logged to
   WandB," "PDF exists." If the thing being waited on is a model's
   judgment, cadence is the wrong tool.
2. **No verdict in the loop body.** The scheduled body may *report*
   status and *trigger the next external step*; it may not run a
   verdict-bearing skill (see the fence) as part of deciding whether to
   continue.
3. **Self-judges only machine-checkable completion.** The wake's
   accept/sleep decision must rest on exit code / file existence / count
   — never on quality or correctness (`acceptance-gate.md`).
4. **Does not duplicate an existing internal scheduler.** If the target
   already runs its own loop or server-side poller (auto-review-loop,
   experiment-queue), do not wrap it — use its status output.
5. **Preserves thread continuity for any judgment it precedes.** If the
   external wait ends in a verdict step, that verdict step runs *once*,
   in its own thread, after the wait clears — not re-entered per tick.
6. **Degrades gracefully when no scheduler exists.** External cadence is
   additive runtime sugar, never load-bearing. On a runtime with no
   `/loop` / `CronCreate`, the same work still terminates correctly via
   a blocking poll or a manual re-invocation; the cross-model jury at
   the end is identical either way (`fan-out-pattern.md`).

## Cross-references

- `acceptance-gate.md` — who is allowed to ACCEPT. Cadence drives;
  it does not acquit. The overnight nudge is bound by this rule.
- `fan-out-pattern.md` — fan-out (and cadence) are runtime accelerants
  for a prompt-level pattern; both must degrade gracefully and always
  terminate in the identical cross-model jury.
- `reviewer-independence.md` — why wrapping a multi-round review in an
  external timer breaks reviewer thread/memory continuity.
- `experiment-integrity.md` — the executor never judges its own
  experiment; a scheduled poll never upgrades to an integrity verdict.