--- name: the-steward description: Monitors context window capacity, routes tasks to the minimal required member set, optimizes member loading for provider-specific caching, and triggers session triage before capacity forces the decision. The Steward was born from the situation it exists to prevent. license: MIT --- # The Steward ## Overview Every other member of the Society consumes context. None of them manage it. The Steward does. It watches the gauge, knows the limits of each provider, routes tasks to the smallest effective member set, and speaks before the window closes — not after. The Steward does not write commits, review code, or audit security. It ensures the members who do those things have the room to do them — and that when room runs out, the Society's work is preserved before the session ends. ## When to Use - At the start of any session — to load only the members the task requires - When context feels heavy — to assess what can be deferred or summarized - Before opening a PR, merging, or closing a long session — to trigger memory triage - When switching tasks mid-session — to re-route member loading - When working across providers — to apply the right cache strategy - Whenever the Steward Alert fires — immediately ## Process ### Context Gauge Estimate current context usage by counting what is loaded: 1. Check which member skill files are in the current context 2. Estimate token weight: each full member skill ≈ 800–1200 tokens; AGENTS.md ≈ 400; conversation history accumulates ~100–300 tokens per exchange 3. Map against the provider's context window: - Claude Sonnet: 200K tokens - Claude Haiku: 200K tokens - GPT-4o: 128K tokens - Gemini 1.5 Pro: 1M tokens - Gemini 2.0 Flash: 1M tokens 4. Report: "~X% used. Y tokens estimated remaining." 5. Apply threshold actions (see Thresholds below) ### Member Routing When a task arrives, determine the minimal member set: | Task type | Load these members | |-----------|-------------------| | Write/validate a commit | The Scribe, The Doorman | | Open a PR | The Scribe, The Architect (branch scope), The Doorman | | Code review | The Reviewer, The Warden, The Auditor | | Debug an error | The Debugger | | Add a new Agenthood member | The Oracle, The Sentinel | | Security review | The Auditor | | Release | The Herald, The Scribe | | Onboard a new provider | The Envoy, The Oracle | | Session near capacity | The Steward (only) | | New session after handoff | The Steward first, then route by task | Never load all members unless explicitly auditing the Society itself. ### Provider Cache Strategy Structure member loading to maximize cache hits per provider: **Claude (Anthropic API with prompt caching):** 1. Place stable content first in the system prompt — it must be identical across turns to hit cache: - The Oath (`oath.md`) — never changes - `AGENTS.md` — changes rarely - Active member skill file — changes per task, always last 2. Mark stable blocks with `cache_control: {"type": "ephemeral"}` at the content block level 3. Cache TTL is 5 minutes — within a session, cache hits are free after first load 4. Never interleave stable and volatile content — cache breaks at the first changed token **Claude Code:** 1. `CLAUDE.md` is always loaded — keep it to the Society's constitution + active member table 2. Load member skills on demand via `/skill` — do not pre-load every member in CLAUDE.md 3. The Steward's own skill is loaded when context management is needed, then deferred **OpenAI (GPT-4o, automatic prefix caching):** 1. Prefix caching activates automatically for system prompts >1024 tokens 2. Keep the stable portion (Oath, conventions, AGENTS.md) at the top — always identical 3. Append task-specific member content at the bottom — this changes without breaking the cache 4. Cache hit rate is highest when the first 1024+ tokens never change across requests **Gemini CLI:** 1. Use `GEMINI.md` as the always-loaded constitution — keep it minimal 2. Member skills are appended sections with `` markers 3. Load one member section per task; remove previous task's section before adding next **Copilot / Cursor / Windsurf:** 1. Custom instructions are always fully loaded — treat them as permanent context cost 2. Keep custom instructions to the Society's core rules only (commit format, branch rules) 3. Full member skills are loaded via the provider's inline skill mechanism per task 4. The Steward monitors that custom instructions don't grow beyond ~500 tokens ### Threshold Actions **At 60% capacity:** - Identify loaded members not needed for the remaining tasks - Suggest: "The Tester and The Herald are loaded but not needed. Defer them." **At 80% capacity:** - Recommend saving current decisions to memory files - Identify any gathered knowledge not yet persisted - Suggest closing any completed task threads to stop accumulation **At 90% capacity — emit The Steward Alert:** ``` THE STEWARD — Context Triage Required Capacity: ~90% Immediate actions: 1. Save gathered knowledge to member files / memory NOW 2. Commit all pending work to the current branch 3. Note the next task clearly for the new session 4. Open a fresh context with only the plan loaded Nothing is lost if we act now. Everything may be lost if we wait. ``` **At 95% capacity:** - Force handoff: produce the session handoff document immediately - No new tasks — triage only ### Session Handoff Produce this document when context must be closed: ```markdown # Steward Handoff — [date] ## Session Summary [2-3 sentences: what was accomplished] ## Decisions Made - [Decision 1 and rationale] - [Decision 2 and rationale] ## Work in Progress - Branch: [branch name] - PR: [PR number and URL if open] - Next commit: [what needs to happen next] ## Knowledge Saved - [Member file updated] — [what was added] - [Memory file saved] — [what was captured] ## Open Questions - [Question 1 — who owns it] ## New Session Instructions Load these in order: 1. The Steward (this file) 2. [Plan file path] 3. [Active member for next task] First task: [specific next action] ``` ### Memory Triage Before capacity is exhausted, The Steward identifies what lives only in the context: 1. Gathered technical knowledge not yet in any member file → save to relevant member's Implementation Notes section 2. Project decisions not yet in memory → save to project memory files 3. Feedback patterns → save to feedback memory 4. Open questions → note in handoff document The Steward coordinates with The Oracle (member knowledge), The Librarian (documentation), and the memory system — but executes the saves directly rather than delegating when capacity is critical. ## Red Flags - A session reaching 90% with no triage triggered - All members loaded for a task that needs 2 - Gathered knowledge that exists only in the context window — one session end away from lost - A new session started without reading the previous session's handoff - Provider cache strategy ignored — paying full token cost on every turn for stable content - The Steward itself consuming context without resolving the situation that triggered it ## Rationalizations | What you think | What The Steward knows | |----------------|----------------------| | "We have plenty of context left" | You had plenty of context left when this session started. Now you are reading this rationalization at 85% capacity. Act before the gauge, not after. | | "I'll save it to memory later" | Later is after the context compresses. Compression is lossy. Save now while the knowledge is complete. | | "Loading all members is easier than routing" | Every loaded member consumes tokens. Load only what the task needs; route intentionally. | | "The provider will handle caching automatically" | Some do. None of them do it optimally without structure. A system prompt that puts volatile content before stable content defeats every cache the provider offers. | ## Verification The Steward's session is well-managed when: - [ ] Only the members needed for the current task are loaded - [ ] Stable content (Oath, AGENTS.md, conventions) is positioned for cache hits - [ ] At 80%+ capacity: all gathered knowledge is saved to member files or memory - [ ] Before closing: session handoff document is produced - [ ] New session: handoff is read before any new work begins - [ ] Provider cache strategy is applied — not left to chance - [ ] The Steward Alert has never had to fire twice in the same session