---
name: git-workflow
description: Use when running claudikins-kernel:execute, decomposing plans into tasks, setting up two-stage review, deciding batch sizes, or handling stuck agents — enforces isolation, verification, and human checkpoints; prevents runaway parallelization and context death
allowed-tools:
  - Read
  - Grep
  - Glob
  - Bash
  - Edit
  - Write
  - TodoWrite
  - Skill
  - mcp__plugin_claudikins-tool-executor_tool-executor__search_tools
  - mcp__plugin_claudikins-tool-executor_tool-executor__get_tool_schema
  - mcp__plugin_claudikins-tool-executor_tool-executor__execute_code
---

# Git Workflow Methodology

## When to use this skill

Use this skill when you need to:

- Run the `claudikins-kernel:execute` command
- Decompose plans into executable tasks
- Set up two-stage code review
- Decide batch sizes and checkpoints
- Handle stuck agents or failed tasks

## Core Philosophy

> "I'd use 5-7 agents per SESSION, not 30 per batch." - Boris

Execution is about isolation, verification, and human checkpoints. Not speed.

### The Six Principles

1. **One task = one branch** - Isolation prevents pollution
2. **Fresh context per task** - `context: fork` gives a clean slate
3. **Two-stage review** - Spec compliance first, then code quality
4. **Human checkpoints between batches** - Not between individual tasks
5. **Commands own git** - Agents never checkout/merge/push
6. **Features are the unit** - Batch at feature level, not task level

### Batch Size Guidance (GOSPEL)

**Wrong:** 30 agents for 10 tasks (3 per task micro-management)
**Right:** 5-7 agents total (feature-level batches)

| Scenario             | Wrong                      | Right              |
| -------------------- | -------------------------- | ------------------ |
| 10 tasks, 5 features | 30 micro-task agents       | 5-7 feature agents |
| Simple refactor      | 10 agents for tiny changes | 1-2 feature agents |

Default `--batch 1` is correct. Features are the unit of work.

## Task Decomposition

From a plan, extract tasks that are:

| Quality         | Definition                                     | Example                                       |
| --------------- | ---------------------------------------------- | --------------------------------------------- |
| **Atomic**      | Completable in one agent session               | "Add auth middleware" not "Build auth system" |
| **Testable**    | Has measurable acceptance criteria             | "Returns 401 for invalid token"               |
| **Independent** | Minimal dependencies on other tasks            | Can be reviewed in isolation                  |
| **Right-sized** | Not too small (noise) or large (context death) | 50-200 lines of changes                       |

See [task-decomposition.md](references/task-decomposition.md) for patterns.

## Review Stages

Two reviewers with different jobs. Never skip either.

### Stage 1: Spec Compliance (spec-reviewer, opus)

**Question:** "Did it do what was asked?"

Checks:

- All acceptance criteria addressed?
- Any scope creep (features not in spec)?
- Any missing requirements?

Output: `PASS` or `FAIL` with line references.

### Stage 2: Code Quality (code-reviewer, opus)

**Question:** "Is it well-written?"

Checks:

- Consistency with codebase style
- Error handling
- Edge cases
- Naming clarity
- Unnecessary complexity

Output: `PASS` or `CONCERNS` with confidence scores.

See [review-criteria.md](references/review-criteria.md) for detailed checklists.

## Review Enforcement (MANDATORY)

**This is non-negotiable. Violations here break the entire workflow.**

### The Iron Rule

After EVERY task completes, you MUST spawn BOTH reviewer agents:

1. **spec-reviewer** - Spawned via `Task(spec-reviewer, {...})`
2. **code-reviewer** - Spawned via `Task(code-reviewer, {...})` (if spec passes)

### What "MUST spawn" Means

| Allowed | NOT Allowed |
|---------|-------------|
| `Task(spec-reviewer, { prompt: "...", context: "fork" })` | Inline spec check by orchestrator |
| `Task(code-reviewer, { prompt: "...", context: "fork" })` | "I'll just verify the code looks good" |
| Waiting for agent output JSON | Making your own compliance table |
| Reading from `.claude/reviews/spec/` | Skipping because "it's a simple task" |

### Inline Reviews Are VIOLATIONS

If you find yourself doing ANY of these, you are VIOLATING the methodology:

- Creating a "Spec Compliance Check" table yourself
- Writing "Verdict: PASS" without spawning an agent
- Saying "Let me verify the implementation meets criteria"
- Checking acceptance criteria in a loop instead of delegating

**The orchestrator does NOT review. The orchestrator SPAWNS reviewers.**

### Pre-Merge Checklist (HARD GATE)

Before ANY merge decision can be offered to the user:

```
□ .claude/reviews/spec/{task_id}.json EXISTS for each task
□ .claude/reviews/code/{task_id}.json EXISTS for each task
□ Both files contain valid JSON with "verdict" field
□ spec-reviewer verdict is PASS (or user override documented)
```

If ANY file is missing: **DO NOT proceed to merge. You skipped the review.**

### Why This Matters

1. **Consistency** - Every task gets the same rigor, not "looks simple, I'll check it"
2. **Auditability** - Review outputs are artifacts, not orchestrator judgments
3. **Separation of concerns** - Orchestrator orchestrates, reviewers review
4. **No rationalization** - You can't convince yourself your inline check is "good enough"

## Verdict Matrix

What happens when reviewers return their verdicts:

| Spec Result | Code Result | Action                                           |
| ----------- | ----------- | ------------------------------------------------ |
| PASS        | PASS        | Offer [Accept] [Revise anyway]                   |
| PASS        | CONCERNS    | Offer [Accept with caveats] [Fix] [Klaus review] |
| FAIL        | \*          | Always [Revise] or [Retry]                       |

See [review-conflict-matrix.md](references/review-conflict-matrix.md) for edge cases.

## Batch Checkpoint Flow

```
All tasks in batch complete?
├── No → Wait for remaining
└── Yes →
    All reviews pass?
    ├── No →
    │   Retry count < 3?
    │   ├── Yes → Retry failed tasks
    │   └── No → Escalate to Klaus or human
    └── Yes →
        Present results to human
        └── Human decides: [Accept] [Revise] [Retry]
```

See [batch-patterns.md](references/batch-patterns.md) for decision trees.

## Rationalizations to Resist

Agents under pressure find excuses. These are all violations:

| Excuse                                     | Reality                                                             |
| ------------------------------------------ | ------------------------------------------------------------------- |
| "30 agents is fine, tasks are independent" | More agents = more chaos. 5-7 per session, features as units.       |
| "I'll just checkout main to compare"       | Agents don't own git. Use `git show main:file` instead.             |
| "Skip spec review, code looks correct"     | Spec review catches scope creep. Never skip.                        |
| "I'll do the review myself, it's simple"   | Spawn the reviewer agents. Inline reviews are VIOLATIONS.           |
| "Both passed, auto-merge is safe"          | Human checkpoint required. Always.                                  |
| "Context is fine, I'll continue"           | ACM at 60% = checkpoint offer. 75% = mandatory stop.                |
| "This tiny task doesn't need a branch"     | One task = one branch. No exceptions. Isolation prevents pollution. |
| "Retry limit is just a guideline"          | 2 retries then escalate. Infinite retry = infinite waste.           |
| "I'll merge my changes when done"          | Commands own merge. You own implementation. Stay in your lane.      |

**All of these mean: Follow the methodology. Speed is not the goal.**

## Red Flags — STOP and Reassess

If you're thinking any of these, you're about to violate the methodology:

- "Let me just run git checkout..."
- "30 tasks, 30 agents, maximum parallelism"
- "Review passed, no need for human checkpoint"
- "Context is getting tight but I can finish"
- "This is simple, don't need isolation"
- "I'll merge it myself"
- "Retry limit doesn't apply here"
- "Spec review is redundant if code review passes"
- "Let me verify the implementation meets criteria" (SPAWN THE AGENT)
- "I'll create a quick compliance table" (SPAWN THE AGENT)

**All of these mean: STOP. Commands own git. Humans own checkpoints. Reviewers own reviews. You own orchestration.**

## Robustness Patterns

Things go wrong. Here's how to handle them.

### SubagentStop Hook Failure (A-6)

If the capture hook fails, agent output is lost.

**Pattern:** Write to backup location first, then move to primary.

```bash
# Always backup first
echo "$OUTPUT" > "$BACKUP_DIR/agent-$(date +%s).json"
# Then move to primary
mv "$BACKUP_DIR/..." "$PRIMARY"
```

### Malformed JSON Output (A-7)

Agents sometimes produce invalid JSON.

**Pattern:** Validate required fields before accepting.

```bash
REQUIRED='["task_id", "status"]'
jq -e "all($REQUIRED[]; has(.))" "$OUTPUT" || exit 2
```

### Task Branch Directory Export (A-8)

Agents need to know where to work.

**Pattern:** Export directory as environment variable in SubagentStart hook.

```bash
export TASK_BRANCH_DIR="$PROJECT_DIR"
export TASK_BRANCH_NAME="execute/task-${TASK_ID}-${SLUG}"
```

### Model Rate Limiting (A-10)

Opus gets rate limited more than Sonnet.

**Pattern:** Offer fallback options to human.

1. Notify: "Opus rate limited. Options:"
2. Offer: [Wait 60s] [Use Sonnet fallback] [Abort]
3. If fallback, add caveat to review output

### Context Exhaustion Mid-Task (A-11)

Agent runs out of context before finishing.

**Pattern:** Output partial state and mark as resumable.

```json
{
  "status": "partial",
  "files_changed": ["completed work..."],
  "next_steps": ["what remains..."],
  "checkpoint_hash": "sha256:..."
}
```

### Dependency Failure Chains (S-7)

Task Y depends on Task X. Task X fails. What happens to Y?

See [dependency-failure-chains.md](references/dependency-failure-chains.md).

### Branch Collision (S-8)

Two tasks accidentally get the same branch name.

See [branch-collision-detection.md](references/branch-collision-detection.md).

### Branch Guard Recovery (S-9)

The git-branch-guard hook blocks something it shouldn't.

See [branch-guard-recovery.md](references/branch-guard-recovery.md).

### Batch Size Verification (S-10)

Validating batch sizes before execution starts.

See [batch-size-verification.md](references/batch-size-verification.md).

### Task Branch Recovery (S-12)

Recovering orphaned branches from crashed sessions.

See [task-branch-recovery.md](references/task-branch-recovery.md).

### Circuit Breakers (S-13)

Preventing cascading failures when operations fail repeatedly.

**Pattern:** Track failure rate. If threshold exceeded, "open" the circuit - fail fast without attempting.

```
Circuit: agent_spawn
State: OPEN (3 failures in 60s)
Reset in: 30 seconds

[Wait for reset] [Force close] [Skip operation]
```

See [circuit-breakers.md](references/circuit-breakers.md).

### Execution Tracing (S-14)

Debugging execution graphs and understanding what happened.

**Pattern:** Record spans for each operation. Visualise as waterfall or dependency graph.

```
Trace: exec-session-xyz
├── batch_1 (45s)
│   ├── task-1 (20s) ✓
│   └── task-2 (25s) ✓
└── batch_2 (60s)
    └── task-3 (60s) ✓

Critical path: batch_1 → batch_2
```

See [execution-tracing.md](references/execution-tracing.md).

## Stuck Detection

| Signal                | Threshold                     | Response            |
| --------------------- | ----------------------------- | ------------------- |
| Tool call flooding    | 20 calls without file changes | Warning, then Klaus |
| Time without progress | 10 minutes                    | Warning, then Klaus |
| Repeated failures     | Same error 3x                 | Pause, offer Klaus  |
| Context burn rate     | ACM at 60%                    | Checkpoint offer    |
| Review timeout        | 5 minutes per reviewer        | Offer [Wait] [Skip] |

## Anti-Patterns

**Don't do these:**

- Running git checkout/merge/push from agents
- Batching 30+ tasks in parallel
- Skipping spec review because "code looks fine"
- Auto-merging without human checkpoint
- Ignoring stuck signals
- Continuing after context warnings

## References

Full documentation in this skill's references/ folder:

- [task-decomposition.md](references/task-decomposition.md) - How to break down plans
- [review-criteria.md](references/review-criteria.md) - What reviewers check (400 LOC threshold, attack surface tracing)
- [batch-patterns.md](references/batch-patterns.md) - Checkpoint decision patterns (coordinated checkpoints, load shedding, deadline propagation)
- [dependency-failure-chains.md](references/dependency-failure-chains.md) - When dependent tasks fail
- [branch-collision-detection.md](references/branch-collision-detection.md) - Preventing duplicate branches
- [branch-guard-recovery.md](references/branch-guard-recovery.md) - Recovering from guard failures
- [batch-size-verification.md](references/batch-size-verification.md) - Validating batch sizes
- [review-conflict-matrix.md](references/review-conflict-matrix.md) - Handling reviewer disagreements (RESOLVE framework)
- [task-branch-recovery.md](references/task-branch-recovery.md) - Recovering orphaned branches
- [circuit-breakers.md](references/circuit-breakers.md) - Preventing cascading failures (circuit breaker pattern, timeout strategies)
- [execution-tracing.md](references/execution-tracing.md) - Debugging execution graphs (spans, traces, critical path analysis)