---
name: evaluator
description: Independent assessment without debate context — adversarial loop prevents gaming
allowed-tools:
  - read_file
  - write_file
tier: 1
protocol: INDEPENDENT-EVALUATOR
tags: [moollm, evaluation, adversarial, scoring, review]
credits: "Mike Gallaher — independent evaluator pattern"
related: [adversarial-committee, rubric, roberts-rules, room]
---

# Evaluator

> *"Fresh eyes, no bias, just the rubric."*

Committee output goes to a separate model instance with NO debate context.

## The Separation

```yaml
evaluation:
  principle: "Evaluator has NO access to:"
    - debate_transcript
    - speaker_identities
    - amendment_history
    - voting_patterns
    - minority_dissents
    
  evaluator_sees_only:
    - final_output
    - rubric_criteria
    - subject_matter_context
```

## Room Architecture

```yaml
# committee-room/
#   ROOM.yml
#   debate.yml
#   output.yml
#   outbox/
#     evaluation-request-001.yml

# evaluation-room/
#   ROOM.yml
#   rubric.yml
#   inbox/
#     evaluation-request-001.yml  # Landed here
#   evaluations/
#     eval-001.yml
```

## Evaluation Request

```yaml
# Thrown from committee to evaluator
evaluation_request:
  id: eval-req-001
  from: committee-room
  timestamp: "2026-01-05T15:00:00Z"
  
  subject: "Client X Engagement Decision"
  
  output_only: |
    Recommendation: Accept Client X with:
    - Explicit scope boundaries
    - Milestone-based billing
    - Quarterly scope review
    
    Confidence: 0.65
    
    Key considerations:
    - Revenue opportunity aligns with growth goals
    - Risk mitigated by contractual protections
    - Capacity impact manageable
    
  rubric: client-evaluation-v1
  
  # Note: NO debate context included
```

## Evaluation Process

```yaml
evaluation:
  request: eval-req-001
  evaluator: "fresh model instance"
  context_loaded: false  # Critical!
  
  steps:
    1. load_rubric: client-evaluation-v1
    2. read_output: evaluation_request.output_only
    3. score_each_criterion: independently
    4. calculate_weighted_total: true
    5. generate_critique: if score < threshold
```

## Evaluation Output

```yaml
# evaluation-room/evaluations/eval-001.yml
evaluation:
  id: eval-001
  request: eval-req-001
  timestamp: "2026-01-05T15:05:00Z"
  
  rubric: client-evaluation-v1
  
  scores:
    resource_efficiency:
      score: 4
      rationale: "Output indicates capacity is manageable"
      
    risk_level:
      score: 3
      rationale: "Mitigations proposed but not detailed"
      confidence: "Would score higher with specific terms"
      
    strategic_alignment:
      score: 4
      rationale: "Growth goals mentioned, seems aligned"
      
    stakeholder_impact:
      score: 3
      rationale: "Not explicitly addressed in output"
      flag: "Committee should consider stakeholder effects"
      
  weighted_total: 3.45
  threshold: 3.5
  
  result: REVIEW  # Just below accept
  
  critique:
    summary: "Close to acceptance threshold"
    
    gaps:
      - "Risk mitigation lacks specifics"
      - "Stakeholder impact not addressed"
      - "Confidence of 0.65 seems low for recommendation"
      
    suggestions:
      - "Detail the milestone structure"
      - "Explain how scope boundaries will be enforced"
      - "Address impact on existing clients and team"
      
    if_addressed: "Score could reach 3.7+ (accept)"
```

## Revision Loop

```yaml
revision_loop:
  max_iterations: 3
  
  flow:
    1. committee_outputs: recommendation
    2. evaluator_scores: against rubric
    3. if score >= threshold: ACCEPT
    4. if score < threshold:
         - evaluator_generates: critique
         - critique_thrown_to: committee inbox
         - committee_revises: based on critique
         - goto: step 1
    5. if max_iterations reached: ESCALATE to human
```

## Adversarial Properties

```yaml
adversarial_separation:
  why: "Prevents committee gaming metrics"
  
  committee_cannot:
    - see_evaluator_reasoning
    - predict_exact_scores
    - optimize_for_rubric_loopholes
    
  evaluator_cannot:
    - be_influenced_by_debate_dynamics
    - favor_particular_speakers
    - weight_majority_over_minority
    
  result: "Genuine quality signal"
```

## Commands

| Command | Action |
|---------|--------|
| `EVALUATE [output]` | Send to independent evaluator |
| `APPLY RUBRIC [name]` | Score against criteria |
| `CRITIQUE` | Generate improvement suggestions |
| `REVISE` | Committee addresses critique |
| `ESCALATE` | Send to human decision maker |

## Integration

```mermaid
graph TD
    C[Committee] -->|output| T[THROW to outbox]
    T -->|lands in| I[Evaluator inbox]
    I --> E[Evaluate]
    R[RUBRIC.yml] --> E
    E --> S{Score}
    S -->|≥ threshold| A[✅ Accept]
    S -->|< threshold| CR[Generate Critique]
    CR -->|THROW back| CI[Committee inbox]
    CI --> REV[Revise]
    REV --> C
    
    subgraph "No Context Crossing"
    I
    E
    R
    end
```

## Model Instance Separation

For true independence:

```yaml
implementation:
  option_1:
    name: "Fresh conversation"
    method: "New chat with no history"
    
  option_2:
    name: "Separate model"
    method: "Different API call, different instance"
    
  option_3:
    name: "System prompt separation"
    method: "Explicit instruction: 'You have no prior context'"
    
  key_principle: |
    The evaluator must NOT have access to:
    - How the committee reached the conclusion
    - Who said what
    - What alternatives were considered
    - Why certain risks were dismissed
    
    Only the final output matters.
```