# ADR-075: Harness Self-Evolution (Darwin) and Agent-Config Provenance (Witness Manifest) **Status**: Accepted — implemented in 2.1.0. Track A (harness MCP tools: `harness_repair` / `harness_manifest` / `harness_verify`) and Track B (Ed25519 provenance) shipped; the `metaharness` scaffolder/host-adapters remain out of scope as stated below. **Date**: 2026-06-23 **Decision Makers**: RUV, Claude Flow Team **Related**: ADR-073 (Cost-Optimal Router), ADR-074 (Darwin TDR), ADR-076 (Meta-Harness Repositioning) **Affected packages**: `agentic-flow` (`src/harness/`, `src/mcp/tools/`) **Implementation**: `src/harness/provenance.ts`, `src/mcp/tools/harness-tools.ts` (registered in `src/mcp/standalone-stdio.ts`), `tests/harness/provenance.test.ts` ## Context Two lower-frequency but strategically useful capabilities from the `metaharness` ecosystem are not yet present in `agentic-flow`: 1. **Harness self-evolution** — `@metaharness/darwin`'s core loop ("freeze the model, evolve the harness") mutates one of **seven policy surfaces** (`planner`, `contextBuilder`, `reviewer`, `retryPolicy`, `toolPolicy`, `memoryPolicy`, `scorePolicy`), tests each in a sandbox, and keeps only what _measurably_ improves — building an archive (a tree, not a single best branch). Reported lifts on a frozen model (e.g. `finalScore 0.765 → 0.985`, ADR-103) come from evolving policy, not swapping models. 2. **Provenance / integrity** — the `metaharness` scaffolder ships `harness sign / verify / doctor`: an Ed25519 **witness manifest** over a harness's agent/skill/command files. ### Prior art on the orchestration side `claude-flow` **already exposes `metaharness_*` MCP tools** (`metaharness_evolve`, `_bench`, `_score`, `_genome`, `_threat_model`, `_security_bench`, `_drift_from_history`, `_similarity`, `_mcp_scan`, `_oia_audit`, `_audit_list`, `_audit_trend`). The wiring pattern for surfacing Darwin through MCP is therefore proven; `agentic-flow` can mirror it. `agentic-flow`'s seven Darwin surfaces map conceptually onto its existing agent prompt/policy configuration (planner, context selection, reviewer, retry/tool/memory policy), and it already has a benchmark suite (`bench/`, `benchmarks/`, `benchmark-results/`) to evolve against. ## Decision Adopt both, in two tracks: ### Track A — Harness self-evolution as MCP tools (mirror claude-flow) Expose Darwin's `evolve()` and scoring through `agentic-flow`'s MCP server in `src/mcp/`, mirroring claude-flow's `metaharness_*` naming. Evolve agent harness policy against the existing benchmark suite, persisting the archive under `.metaharness/` per Darwin's convention. All evolutionary mechanisms stay **opt-in and additive** (Darwin's default-path runs are byte-reproducible). ### Track B — Agent-config provenance Adopt `harness sign / verify` to produce a signed witness manifest over agent/skill/command configs. Run `verify` in CI and optionally as a pre-publish gate, complementing the security work in PR #170 (CWE-78). `harness doctor` becomes a smoke-check for generated/edited harness configs. ## Consequences **Positive** - Self-improving harness: measurable gains without changing the model or paying for a bigger one. - Provenance/integrity for agent configs — tamper-evidence and supply-chain assurance. - Reuses a proven MCP integration pattern (claude-flow) and Darwin's reproducible, sandboxed core. **Negative / risks** - Evolution requires a trustworthy benchmark to score against; a weak benchmark evolves toward the wrong objective. Gate promotions on the frozen kernel scorer + safety clauses Darwin already enforces. - MCP surface area grows; keep tools opt-in and documented. - Signing introduces key management (Ed25519) — must define where keys live and CI verification policy. **Neutral** - Both tracks are opt-in; no change to default agent behavior. ## Scope note The full `metaharness` **scaffolder + host adapters** (claude-code / codex / hermes / rvm harness emission) is **deliberately out of scope** here — it overlaps `agentic-flow`'s own `init`/agent system and is a generator, not a runtime library. Revisit only if emitting portable, host-targeted harnesses becomes a product goal. This ADR adopts only the runtime-relevant pieces: `evolve` (Track A) and `sign/verify` (Track B).