---
name: justify
description: 'Post-implementation audit against minimal-intervention standard. Detects additive bias and test tampering. Use after work, before commit or PR.'
version: 1.9.3
alwaysApply: false
category: workflow-methodology
tags:
- justification
- proof-of-work
- anti-additive-bias
- code-review
- iron-law
dependencies:
- imbue:proof-of-work
- leyline:additive-bias-defense
tools: []
usage_patterns:
- post-implementation-review
- pre-commit-audit
- change-justification
complexity: intermediate
model_hint: standard
estimated_tokens: 2800
role: entrypoint
---

> The simplest change that fixes the problem is the
> safest change to merge.
> Adding code is easy. Removing the need for code is
> engineering.

# Justify

## The Additive Bias Problem

AI models are trained to be helpful, which creates
a systematic bias toward *adding* code rather than
*fixing* root causes:

| AI Default Behavior | Correct Behavior |
|---------------------|------------------|
| Add a workaround | Fix the root cause |
| Modify test expectations | Fix the implementation |
| Create a new helper | Use an existing one |
| Add error handling | Prevent the error |
| Add a compatibility shim | Remove the old code |
| Wrap in try/catch | Fix the exception source |

This skill audits changes for these patterns and
requires explicit justification for each.

## When to Use

- After completing implementation work
- Before committing or creating PRs
- When reviewing your own changes for quality
- When scope-guard flags RED/YELLOW zone

## Audit Protocol

### Step 1: Gather the Delta

```bash
# Determine base branch
base=$(git merge-base master HEAD 2>/dev/null \
  || git merge-base main HEAD 2>/dev/null)

# Get change statistics
git diff "$base" --stat
git diff "$base" --shortstat
git diff "$base" --diff-filter=A --name-only  # new files
git diff "$base" --diff-filter=M --name-only  # modified files
git diff "$base" --diff-filter=D --name-only  # deleted files
```

### Step 2: Compute Additive Bias Score

Score each dimension 0-3 (0 = clean, 3 = high bias):

| Signal | Weight | How to Measure |
|--------|--------|----------------|
| Line ratio | 2x | `additions / max(deletions, 1)` |
| New files | 2x | Count of `--diff-filter=A` |
| Test logic changes | 3x | Test assertion/expectation diffs |
| New abstractions | 1x | New classes, functions, modules |
| Workaround patterns | 2x | Try/catch, if/else guards added |

**Line Ratio Scoring:**

| Ratio | Score | Interpretation |
|-------|-------|----------------|
| < 2:1 | 0 | Balanced change |
| 2:1 to 5:1 | 1 | Mildly additive |
| 5:1 to 10:1 | 2 | Additive bias likely |
| > 10:1 | 3 | Strong additive bias |

**Aggregate Score:**

```
bias_score = sum(signal_score * weight) / sum(weights)
```

| Aggregate | Zone | Action |
|-----------|------|--------|
| 0.0 - 0.5 | GREEN | Proceed |
| 0.5 - 1.5 | YELLOW | Justify each signal |
| 1.5 - 2.5 | RED | Rethink approach |
| 2.5+ | STOP | Likely wrong approach |

### Step 3: Iron Law Compliance Check

The Iron Law states: tests drive implementation, not
the other way around. Check for violations:

```bash
# Find test files that were modified
git diff "$base" --name-only | rg "test_|_test\.|spec\." \
  || git diff "$base" --name-only | grep -E "test_|_test\.|spec\."

# For each modified test file, check what changed
git diff "$base" -- <test_file> | rg "^[-+].*assert|^[-+].*expect|^[-+].*should"
```

**Violation patterns (test logic was tampered):**

- Assertion values changed (expected output modified)
- Test cases removed or commented out
- `@skip` or `@pytest.mark.skip` added
- Error expectations weakened (broad exception types)
- Mock return values changed to match new behavior
- Test renamed to no longer describe original behavior

**Each violation requires explicit justification:**

> "I changed this test assertion because the
> *requirement* changed, not because my implementation
> couldn't meet the original requirement."

If the requirement didn't change, the test should not
change. Fix the implementation instead.

### Step 4: Minimal Intervention Analysis

For each changed file, answer:

1. **Was this change necessary?**
   Could the goal be achieved without touching this file?

2. **Was this the minimal change?**
   Could fewer lines achieve the same result?

3. **Did this change add or remove complexity?**
   New functions, classes, or control flow = added
   complexity that needs justification.

4. **Is there a subtraction-first alternative?**
   Could removing code fix the problem instead of
   adding code?

### Step 4.5: Invariant Impact Analysis

Changes can be minimal and still catastrophically wrong
if they silently revise a load-bearing design decision.
For each changed file, check whether it touches a design
invariant:

**What counts as an invariant:**

- Architectural patterns (module boundaries, layer
  separation, data flow direction)
- Data structure choices (why a map vs list, why
  normalized vs denormalized)
- API contracts (public interfaces, protocol formats)
- Error handling strategies (fail-fast vs recovery)
- Concurrency models (single-threaded assumption,
  actor model, shared-nothing)

**Detection heuristic:**

```bash
# Check for structural changes (new modules, moved
# boundaries, changed interfaces)
git diff "$base" --name-only | rg "(interface|abstract|base|core|types|schema|model)" \
  || git diff "$base" --name-only | grep -E "(interface|abstract|base|core|types|schema|model)"

# Check for pattern-breaking changes
git diff "$base" -U5 | rg "(TODO.*refactor|HACK|WORKAROUND|XXX)" \
  || git diff "$base" -U5 | grep -E "(TODO.*refactor|HACK|WORKAROUND|XXX)"
```

**When an invariant conflict is detected:**

Do NOT silently pick a resolution. Present the three
options to the human:

| Option | Description | When Right |
|--------|-------------|------------|
| **Preserve** | Don't add the feature; the invariant pays dividends | Invariant simplifies many things; feature is marginal |
| **Layer** | Add feature inelegantly on top | Feature is needed; invariant is still valuable; imperfection is acceptable |
| **Revise** | Change the invariant itself | Genuine new learning invalidates the original decision |

**Add to Justification Report:**

```markdown
### Invariant Impact: NONE / DETECTED

[If DETECTED:]
- **Invariant**: [name the design decision]
- **Conflict**: [what change clashes with it]
- **Option chosen**: Preserve / Layer / Revise
- **Justification**: [why this option, not the others]
- **Human reviewed**: YES / NO — if NO, flag as
  requiring review before merge
```

**Compounding risk warning:** Bad invariant decisions
accumulate. If this branch has multiple invariant
revisions, flag the entire branch for architectural
review. Each silent invariant change multiplies the
probability of an unsalvageable codebase.

### Step 5: Generate Justification Report

Output a structured report:

```markdown
## Justification Report

**Branch**: feature/xyz
**Base**: master
**Delta**: +N/-M lines, X files changed

### Additive Bias Score: X.X (ZONE)

| Signal | Score | Detail |
|--------|-------|--------|
| Line ratio | N | +A/-D = R:1 |
| New files | N | [list] |
| Test changes | N | [list] |
| New abstractions | N | [list] |
| Workarounds | N | [list] |

### Iron Law Compliance: PASS/FAIL

[List any test logic modifications with justification]

### Change-by-Change Justification

#### file.py (+N/-M)
- **What**: [description]
- **Why**: [root cause this addresses]
- **Alternatives considered**: [what else could work]
- **Why this is minimal**: [why fewer changes won't work]

#### test_file.py (+N/-M)
- **What**: [description]
- **Justification**: [why test logic changed, if it did]
- **Iron Law status**: PASS/VIOLATION

### Risk Assessment

| Factor | Rating |
|--------|--------|
| Lines changed | LOW/MED/HIGH |
| Files touched | LOW/MED/HIGH |
| Test modifications | NONE/JUSTIFIED/VIOLATION |
| New abstractions | NONE/JUSTIFIED/UNNECESSARY |
| Overall merge risk | LOW/MED/HIGH |

### Recommendations

[List any changes that should be reconsidered,
simpler alternatives, or unnecessary additions]
```

## Decision Weights

When evaluating competing approaches, weight
these factors:

| Factor | Weight | Rationale |
|--------|--------|-----------|
| Fewer lines changed | HIGH | Less risk, easier review |
| No new files | HIGH | No new maintenance burden |
| No test logic changes | HIGH | Iron Law compliance |
| Root cause fix | HIGH | Prevents recurrence |
| Removes code | BONUS | Reduces maintenance surface |
| Adds abstraction | PENALTY | Only justified at 3rd use |
| Adds error handling | NEUTRAL | Only at system boundaries |

**The Subtraction Test**: Before accepting any change,
ask: "Could I achieve this by *removing* code instead
of *adding* it?" If yes, prefer the subtractive approach.

## Integration with Proof of Work

Justify extends proof-of-work with change-level
accountability:

- **proof-of-work**: "Did it work?" (evidence)
- **justify**: "Was this the right way?" (reasoning)

Both are required before claiming work is complete.
Run proof-of-work first, then justify.

## Anti-Patterns to Flag

### 1. Test Mutation
Changing test expectations to match broken code.
**Fix**: Revert the test change, fix the implementation.

### 2. Shotgun Addition
Adding code in many files for a single-concern fix.
**Fix**: Find the single point of change.

### 3. Defensive Overengineering
Adding try/catch, null checks, or validation for
scenarios that can't happen in practice.
**Fix**: Trust internal code. Only validate at boundaries.

### 4. Premature Abstraction
Creating a helper/utility/base class for one use case.
**Fix**: Inline the code. Abstract at the 3rd use.

### 5. Compatibility Shim
Adding backward-compatibility code instead of
updating callers.
**Fix**: Update callers directly. Delete dead paths.

### 6. Silent Invariant Revision
Changing an architectural pattern, data structure
choice, or API contract without acknowledging
that a design invariant is being revised.
**Fix**: Name the invariant. Present the 3 options
(preserve, layer, revise) to a human. Do not make
the judgment call yourself — models default to the
"average" of training data, and wrong invariant
decisions compound into unsalvageable codebases.

## Scrutiny Questions (from leyline:additive-bias-defense)

Before justifying any change, apply these questions.
If the answer to questions 4 and 5 is not concrete
evidence, the change is unjustified.

1. **Priority alignment**: Is this a deviation from the
   current priority?
2. **Criticality**: Is it critical to implement at this
   juncture?
3. **Simplicity**: Does a simpler or more elegant
   solution exist?
4. **Evidence**: What evidence proves this is needed
   (not assumed)?
5. **Consequence**: What breaks if we do not add this?

## Burden of Proof Inversion

The default stance is: **this addition should not
exist.** The change must prove its necessity, not the
reviewer must prove it unnecessary.

When generating the Justification Report (Step 5), add
a `Burden of Proof` section:

| Change | Scrutiny Q4 Answer | Scrutiny Q5 Answer | Verdict |
|--------|--------------------|--------------------|---------|
| file.py | [evidence] | [consequence] | justified/needs_evidence/unjustified |

Changes with `unjustified` verdict MUST be removed or
reworked before the report passes.

## The Wise Counsel

> Is what you are doing a deviation of your priority?
> Is it critical to implement at this juncture?
> Rely less on AI and initial lines of thinking.
> Challenge yourself to be better, to think of a more
> elegant implementation or a simpler solution.

## Related Skills

- `imbue:karpathy-principles` - "Surgical Changes" and
  "Goal-Driven Execution" principles invoke this audit
  from a higher-level synthesis
- `leyline:additive-bias-defense` - the contract this
  audit enforces in detail
- `imbue:proof-of-work` - the validation layer this
  audit complements (proof-of-work asks "did it work?",
  justify asks "did it need to exist?")
- See `docs/quality-gates.md#skill-level-quality-gate-composition`
  for the full gate-skill federation graph