git merge --no-ff feat/pln_xxxx -m "merge: " ``` **Prevention**: every dispatch brief targeting agents prone to this pattern (notably codex) should include explicit commit instructions at the end, e.g. *"When done editing, stage your changes and create a commit with a clear message referencing the plan id (e.g. `feat(scope): summary (pln#XXX)`). Do not stop until the commit exists."* --- ## MCP runtime corrupted (mcp-worker.js missing) **Symptom**: `MCP error -32603: Cannot find module 'mcp-worker.js'` or the server logs `MCP runtime corrupted (mcp-worker.js missing)` on startup. **Why**: `dist/` was wiped or partially deleted. Common causes: a `git merge` that triggered worktree cleanup before pln#477 landed, an `npm run clean:dist` followed by an interrupted build, or filesystem-level corruption. **Fix**: ```bash brainclaw doctor --repair ``` This rebuilds `dist/` from `src/` (TypeScript compile + copy default profiles) and validates by running `node dist/cli.js --version`. The repair also writes `dist/.brainclaw-build.json` so subsequent runs can do a stale-check (compare `src_hash` vs `dist_hash`). **If `--repair` fails**: it usually means `node_modules` is also damaged. Run a clean `npm install` first, then re-run `brainclaw doctor --repair`. **Note**: read-only MCP handlers stay available in-process even when the worker is missing (since pln#478) — so basic `bclaw_context` and `bclaw_find` calls still respond, but anything requiring the worker (most write operations) returns `runtime_corrupted` with a repair pointer. --- ## Octopus merge fails on parallel lanes **Symptom**: after a sequenced parallel dispatch finishes, you run `git merge --no-ff lane1 lane2 lane3 -m "merge: …"` and git refuses with conflict markers. **Why**: octopus merges only succeed when the lanes touch disjoint files. If two lanes wrote to the same file, octopus aborts and you must merge them sequentially. **Fix**: ```bash # Cancel the failed octopus git merge --abort # Merge lanes one at a time, resolving conflicts as needed git merge --no-ff lane1 # (resolve any conflicts, commit) git merge --no-ff lane2 # (resolve any conflicts, commit) git merge --no-ff lane3 ``` **Prevention**: when defining a sequence, choose lane scopes that minimize file overlap. Use `hard_after` dependencies for lanes that genuinely need to land in order. The dispatcher does not itself enforce disjoint scopes — that's the caller's responsibility when designing the sequence. --- ## `.brainclaw/` looks corrupted (schema drift, malformed JSON) **Symptom**: `bclaw_doctor` reports `state is invalid: ` or files in `.brainclaw/memory/` fail to parse. **Why**: usually a half-written file from an interrupted write (process killed mid-write), a migration that didn't complete, or a manual edit that introduced syntax errors. `brainclaw upgrade --rollback` exists precisely for this case. **Fix**: ```bash # 1. Inspect what's wrong brainclaw doctor --after-migration # 2. If the most recent migration is the cause, roll back brainclaw upgrade --rollback # This restores the last backup at .bak-/ and parks the # current corrupted store at .rollback-/ for inspection. # 3. If a single file is corrupted (and rollback is too aggressive), # inspect the parked rollback dir and copy individual files back manually. ``` **Prevention**: brainclaw takes a backup before every `upgrade` run (see `docs/concepts/upgrade-cli.md`). For non-upgrade scenarios, rely on git: `.brainclaw/` is git-versioned by default, so `git log` and `git checkout ` recover any committed state. --- ## Plan stuck `in_progress` **Symptom**: a plan has been marked `in_progress` for days with no commits or claim activity. **Why**: the agent that started it crashed, was rerouted, or simply forgot to transition to `done` / `blocked` / `dropped`. **Fix**: ```bash # Survey brainclaw stale list # plan_in_progress flagged after 7 days by default # Decide based on context brainclaw stale resolve # → dropped (default for stale) # or, via canonical grammar, transition to a different terminal state: # bclaw_transition(entity="plan", id="", to="done") # bclaw_transition(entity="plan", id="", to="blocked") ``` **Threshold tuning**: defaults live in `src/core/staleness.ts`. A config-driven override is on the roadmap (open follow-up); for now you adjust the source file if 7 days is too aggressive for your project. --- ## Inbox messages stuck / brief-ack never arrived **Symptom**: a dispatched assignment shows `running` indefinitely, and `bclaw_assignment_events` shows `run_running` but no further progress. **Why**: the spawned worker process either (a) crashed before reading its inbox, (b) read the inbox but couldn't acknowledge (e.g., MCP unavailable inside the spawned sandbox — common with codex `--sandbox workspace-write`), or (c) is genuinely still working but slow. **Diagnostic order**: ```bash # 1. Is the worker process still alive? ps -ef | grep # codex, claude, copilot, … # Windows: Get-Process -Id # or `tasklist /FI "PID eq "` # 2. Did the brief-ack file land? ls .brainclaw/coordination/runtime/ack/.ack # If yes → spawn started, worker is somewhere in its loop # If no → spawn never started or died before the wrap shell ran touch # 3. (pln#504) What did the worker actually say? stdout/stderr capture # Spawned workers now route their streams to per-assignment log files. If the # worker died silently, the error usually shows up here. cat .brainclaw/coordination/runtime/log/.stdout.log cat .brainclaw/coordination/runtime/log/.stderr.log # 4. Inspect the worktree for activity git -C log --oneline -5 git -C status # 5. Check the run log brainclaw inbox list --agent # or via MCP: bclaw_assignment_events(assignmentId="") ``` **Fix paths**: - Worker dead, no ack → reroute via `bclaw_coordinate(intent="reroute", …)` to another agent - Worker dead, ack present, work uncommitted → manual harvest (see "Dispatched worker finished without committing" above) - Worker still alive but slow → wait, or `kill` and reroute **Brief-ack TTL** is configurable via `BRAINCLAW_HANDSHAKE_TIMEOUT_MS` (default 30s since pln#475+#476). Past that, the dispatcher times the spawn out and surfaces the failure in the assignment events log. --- ## See also - [`docs/concepts/dispatch-lifecycle.md`](dispatch-lifecycle.md) — the entity model + FSMs + observability decision tree underlying every diagnostic step on this page - [`docs/concepts/memory-staleness.md`](memory-staleness.md) — staleness signals and resolve flow in depth - [`docs/concepts/loop-engine.md`](loop-engine.md) — multi-turn loops (review-fix), recovery semantics for in-flight loops - [`docs/concepts/upgrade-cli.md`](upgrade-cli.md) — `brainclaw upgrade` design + rollback path - [`docs/cli.md`](../cli.md) — full command reference for `doctor`, `stale`, `claim`, `upgrade`, `inbox`, `worktree` - [`docs/concepts/multi-agent-workflows.md`](multi-agent-workflows.md) — happy-path coordination patterns (the inverse of this page)