# MISHKAN — Token Optimisation & Context Management How the harness manages context and token usage **on top of Claude** — by leveraging the platform's native primitives, not by replacing them. MISHKAN never intercepts or rewrites a Claude request. It is a discipline of **input composition**: arrange what enters each call so the Claude model's own cost and context behaviour falls in the engineer's favour. This is the operational detail behind §11 of `MISHKAN_harness_design.md`. --- ## 1. The cost model being optimised Every agent run is one Claude API request. The model bills input tokens at two rates and output at one: ``` cost ≈ (uncached_input × 1.0) + (cached_input × ~0.1) + (output × 1.0) ``` - **Cached input** — a prefix byte-identical to a recent call is read from cache at roughly a tenth of the price (ephemeral cache, a few minutes' TTL). - **Uncached input** — everything new or changed, at full price. - **Output** — always full price. Three levers follow directly, and every MISHKAN mechanism pulls one or more: **(a) grow cached_input · (b) shrink uncached_input · (c) keep the whole window small** so nothing resident wasn't needed. --- ## 2. Anatomy of one agent call Claude Code assembles each request from fixed parts, in order. MISHKAN's arrangement splits them into a stable, cacheable prefix and a variable suffix: ``` ┌───────────────────────────────────────────────┐ ┐ │ system prompt (Claude Code core) │ │ │ agent definition (.md role block + tools) │ │ STABLE PREFIX │ ~/.claude/CLAUDE.md (engineer identity) │ │ byte-identical │ y4nn-standards + engineer-standards │ │ call-to-call │ matched path-scoped rules │ │ → CACHEABLE (~0.1x) ├───────────────────────────────────────────────┤ ┘ │ project ./CLAUDE.md (sprint state) │ ┐ │ task / conversation so far │ │ VARIABLE SUFFIX │ tool results · Cognee summaries │ │ → full price └───────────────────────────────────────────────┘ ┘ ``` Every agent file enforces this with a trailing `## Dynamic Context Injection Point` marker: everything above it is stable and caches; the changing sprint state and task sit below. --- ## 3. The five mechanisms — native primitive × MISHKAN formulation | # | Claude primitive (native) | MISHKAN formulation (input-shaping) | Lever | |---|---|---|---| | 1 | **Prompt caching** of a stable prefix | "static first, dynamic last" ordering in every agent; standards/rules above the injection point stay byte-stable | grow cached | | 2 | **Subagent isolation** — each Task subagent has its own context window | the 45-agent org; heavy work runs in disposable child windows, only summaries return to the parent | shrink window | | 3 | **Per-agent tool grants** + deferred tool schemas | tight `tools:` per agent (QA has no Write, reporters no Bash, Jakin only Read) | shrink uncached | | 4 | **Rule loading by glob match** | path-scoped team rules (33–39 globs each); only 3 `common/` rules are always-on | shrink uncached | | 5 | **MCP external store** | Cognee offloading — full artifacts to the graph, summary + node-id in context (`context-compress` skill) | shrink window | ### 3.1 Prompt caching — formulated by ordering Caching fires only on a byte-identical prefix. Putting the role, standards, and matched rules first makes them a stable prefix that caches; the variable task sits last. Two calls to the same agent in a session pay full price only for what differs: ``` call #1 → Hizkiah: [role+standards+rules][task A] prefix WRITE (~1.25x once) call #2 → Hizkiah: [role+standards+rules][task B] prefix HIT (~0.1x); only task B full price ``` ### 3.2 Subagent isolation — formulated by decomposition (the biggest lever) When an orchestrator routes via the Task tool, the specialist runs in its **own** window. Its file reads, research, and scans never enter the parent's context; only the final message returns. The main conversation stays lean regardless of how much work happens beneath it. ``` MAIN THREAD (lean) SUBAGENT WINDOWS (isolated, discarded) Nehemiah ─Task─► Hizkiah ───────► [reads/edits ≈30k tokens of work] returns ◄──────── "done: 3 files, contract held" (~200 tok) ─Task─► Caleb ───────► [web research ≈25k tokens] returns ◄──────── "PKCE S256; sources […]" (~300 tok) ``` This is **why disabling auto-compaction is low-risk in MISHKAN**: the token-heavy history is never on the main thread to compact — it was spent and dropped in child windows. (See §5.) ### 3.3 Tool grants — formulated by frontmatter Each tool's JSON schema costs input tokens. Tight per-agent `tools:` lists mean an agent never carries schemas for tools it cannot use. ### 3.4 Rule scoping — formulated by globs A rule's body is tokens. `backend/yasad.md` contributes **zero** while editing a `.css`; it loads only when a matching path is touched. The rule budget tracks the file in hand. ### 3.5 Cognee offloading — formulated by externalising state Context is not history. Instead of carrying a 3,000-token result forward, write it to the graph and keep a ~150-token summary + node id; query for detail on demand. This is the **deliberate, into-a-store** counterpart to letting the model auto-summarise the conversation. --- ## 4. What "on top of Claude" means ``` Claude model (caching · context window · tool-use) ← unchanged ▲ │ MISHKAN shapes INPUTS only: ┌───────────────────┴─────────────────────────────────────────┐ │ file ORDER → what caches (static-first) │ │ agent BOUNDARIES → what's resident (subagent fan-out) │ │ tool GRANTS → schema weight (tight tools:) │ │ glob SCOPES → rule weight (path-matched) │ │ store CHOICE → context vs graph (Cognee offload) │ └───────────────────────────────────────────────────────────────┘ ``` No request interception, no rewriting. The platform's native behaviour does the work; MISHKAN composes the inputs so it works in the engineer's favour. --- ## 5. Interaction with auto-compaction Auto-compaction (Claude Code native) summarises older context when the window fills, so long sessions don't hit a wall. The engineer runs with `autoCompactEnabled: false` — preferring exact, un-rewritten context (consistent with verify-before-fix and no-fabrication). MISHKAN makes that safe in the common path because: - **Subagent isolation** keeps the heavy history off the main thread. - **Cognee offloading** is the manual, reviewed substitute for auto-summarisation. - **CLAUDE.md re-injection** reloads sprint state from file, not from a summary. **Where compaction-off still bites:** a long exploration chat with Nehemiah/Bezalel that never spawns a subagent accumulates on the main thread with no auto-rescue. Mitigation is manual: `/context-compress`, or capture intent into `/mishkan-init` and start a fresh session. --- ## 6. Model tiering (cost, complementary to tokens) Tokens and model choice are separate cost axes. `config/model-routing.yaml` sets a Claude tier per agent role: **Opus** for orchestration/leads, **Sonnet** for any agent that writes code/config, **Haiku** for evaluate/collect/advise roles. Tiering cuts cost without touching the token mechanics above. --- ## 7. Targets and honest gaps - **Budget target:** under ~10 active MCPs and ~80 loaded tools at any time. Currently discipline + per-agent `tools:` lists — **not an enforced gate**. - **Cache-hit measurement** is aspirational: the PostToolUse hook captures tool/outcome but not token/cache fields (the hook payload doesn't expose them), so cache-hit-rate must be derived Cognee-side once enriched. - **Auto-compaction is off** by setting; §5 covers the implications. --- *Operational detail for `MISHKAN_harness_design.md` §11. Mechanisms are native to the Claude model and Claude Code; MISHKAN's contribution is the input composition that makes them pay off.*