---
status: active
purpose: Four-layer harness model for harness-monitor, keeping run-centric domain records while collapsing operator concepts into a clearer architecture and package map.
---
# Harness Monitor Four-Layer Model
## Review Judgment
The current semantics are directionally correct, but the concept count is too high when presented as a flat list of peer planes.
The most readable model is a four-layer loop:
```mermaid
flowchart LR
C[Context
Rules and context
AGENTS.md / architecture / task]
R[Run
Execution and constraints
Task / Run / Workspace / Policy]
O[Observe
Observation and attribution
hooks / process / git / attribution]
G[Govern
Evaluation and delivery
Entrix / gate / evidence / operate]
C --> R
R --> O
O --> G
G --> C
```
In plain words:
- `Context`: decides what the agent should know
- `Run`: decides what the agent may do
- `Observe`: records what the agent actually did
- `Govern`: decides whether the result may move forward
## Stable Domain Records
This simplification does not change the stable first-class objects. Those remain:
- `Task`
- `Run`
- `Workspace`
- `EvalSnapshot`
- `PolicyDecision`
- `Evidence`
- related domain events
The loop is about how to explain behavior around those records, not about replacing them with more runtime entities.
## 3+1 Overview
For slides and overview pages, the most compact version is the implementation-biased `3+1` loop:
```mermaid
flowchart LR
O[Observe
hook events / process scan / dirty git / session matching]
A[Attribute
file ownership / session-agent link / exact-inferred-unknown]
E[Evaluate
Entrix fast-full / hard gate / coverage / gate visibility]
X[Expand
Task-Run-Workspace / policy / evidence / operate / reflect]
O --> A --> E --> X
X --> O
```
This is the shortest accurate story for the current codebase: `Observe -> Attribute -> Evaluate`, then expand that loop back into the full harness surface.
## Package Structure
The current package structure already fits the four-layer model without forcing a crate split per concept.
```text
Context
AGENTS.md
docs/ARCHITECTURE.md
docs/fitness/README.md
crates/harness-monitor/templates/
crates/harness-monitor/scripts/
Run
crates/harness-monitor/src/domain/
crates/harness-monitor/src/application/run_assessment.rs
crates/harness-monitor/src/operator_guardrails.rs
crates/harness-monitor/src/repo.rs
Observe
crates/harness-monitor/src/observe.rs
crates/harness-monitor/src/detect.rs
crates/harness-monitor/src/hooks.rs
crates/harness-monitor/src/ipc.rs
crates/harness-monitor/src/state_events.rs
Govern
crates/harness-monitor/src/domain/evaluator.rs
crates/harness-monitor/src/state_fitness.rs
crates/harness-monitor/src/tui_fitness.rs
entrix-driven gate, evidence, and readiness status consumed through run assessment
Surfaces
crates/harness-monitor/src/main.rs
crates/harness-monitor/src/cli_operator.rs
crates/harness-monitor/src/state*.rs
crates/harness-monitor/src/tui*.rs
packages/harness-monitor/bin/harness-monitor.js
```
`Surfaces` are not a fifth semantic layer. They are entrypoints and renderers over the same four-layer loop.
## Plane Mapping
The older plane vocabulary is still useful, but it now maps into the four-layer model instead of competing with it:
- `Context` owns `Contextualize`
- `Run` owns `Orchestrate` and `Constrain`
- `Observe` owns `Observe` and `Attribute`
- `Govern` owns `Evaluate`, `Validate`, `Evidence`, and `Operate`
- `Reflect` is the feedback edge from `Govern` back into `Context`
## Code Boundary
The current shared semantic path remains:
- `RunAssessmentInput` collects raw run, workspace, and evaluation facts
- `assess_run(...)` derives operator meaning, policy and evidence state, next action, and summarized plane status
- CLI and TUI render from that shared assessment instead of reconstructing semantics independently
In concrete code:
- `crates/harness-monitor/src/application/run_assessment.rs` is the semantic aggregation layer
- `crates/harness-monitor/src/operator_guardrails.rs` remains the lower-level run/govern constraint engine
- `crates/harness-monitor/src/cli_operator.rs` and the TUI modules stay as surfaces over the same assessment path
## Scope Boundaries
This model deliberately does not claim that all four layers are equally mature.
Today the strongest implemented loop is:
- observe signals
- attribute ownership
- evaluate readiness
The remaining context, operate, and reflection capabilities should grow from that loop instead of becoming separate top-level architectures too early.
## Verification Criteria
This model is successful when:
- public docs explain `harness-monitor` with one four-layer story
- the slide-friendly `3+1` view stays consistent with the implementation
- stable domain records remain `Task / Run / Workspace / EvalSnapshot / PolicyDecision / Evidence`
- CLI and TUI still share one run assessment path
- attribution ambiguity and gate blocking remain explicit rather than hidden