--- name: context-engineering description: Manage AI agent context effectively — what to include, what to exclude, compression strategies, and context hierarchy for optimal performance. version: "1.0.0" last-updated: "2026-04-17" model_tested: "claude-sonnet-4-6" category: meta platforms: [claude-code, codex, gemini-cli, cursor, copilot, windsurf, cline] language: en geo_relevance: [global] priority: high dependencies: mcp: [] skills: [] apis: [] data: [] update_sources: - url: "https://www.augmentcode.com/guides/how-to-build-agents-md" check_frequency: "quarterly" last_checked: "2026-04-17" license: MIT --- # Context Engineering Based on ETH Zurich research: overly detailed instructions reduce task success by 3%, increase token cost by 20%, and add 2-4 reasoning steps. ## When to Use - Writing SKILL.md, AGENTS.md, or system prompts - Debugging poor agent performance - Optimizing token costs - Designing multi-agent workflows - Reducing context window pressure ## Context Hierarchy (5 Levels) Most persistent → most transient: | Level | Content | Persistence | Example | |-------|---------|-------------|---------| | 1. Rules | Project-wide standards | Always loaded | CLAUDE.md, AGENTS.md | | 2. Spec | Feature/session scope | Per feature | PRD, architecture docs | | 3. Source | Per task | Per task | Relevant source files | | 4. Errors | Per iteration | Per attempt | Test failures, stack traces | | 5. History | Accumulates | Session | Conversation history | **Principle**: Levels 1-2 are curated (high leverage). Levels 3-5 are per-call (keep minimal). ## What to Include Include ONLY what the agent cannot discover independently: - Non-obvious conventions ("we use snake_case for DB columns") - Project-specific constraints ("never modify the auth module") - Architectural decisions not in code ("we chose Drizzle over Prisma because...") - External dependencies not discoverable ("deploy via internal CI, not GitHub Actions") ## What NOT to Include The agent can discover these itself — including them wastes tokens: - Tech stack (visible in package.json / requirements.txt) - File structure (visible via ls / find) - Key files (visible via search) - Build commands (visible in scripts / Makefile) - Standard patterns (the model already knows React, Express, etc.) ## Sizing Guidelines | Context Type | Max Size | Rationale | |-------------|----------|-----------| | AGENTS.md | 500-1000 tokens | ETH Zurich: more = worse | | SKILL.md (core) | 1000-2500 tokens | Balance detail vs overhead | | references/ per skill | 500-1000 tokens | Support data, not duplicate | | System prompt total | < 5K tokens | Beyond this: diminishing returns | ## Compression Strategies 1. **Remove examples when the pattern is clear** — one example > three redundant ones 2. **Use tables over prose** — 50% fewer tokens for structured info 3. **Remove "obvious" instructions** — "write clean code" is noise 4. **Use references for static data** — move schemas/checklists to files 5. **Lazy-load context** — only load what's needed for current task ## Anti-Patterns | Anti-Pattern | Problem | Fix | |-------------|---------|-----| | "Always be thorough" | Forces effort=high, +35% tokens | Remove — model handles this | | "Think step by step" | Redundant with adaptive thinking | Remove on modern models | | Repeating the same rule 3x | Token waste, no benefit | State once, clearly | | Including full API docs | Context overflow | Link to docs, summarize key parts | | "You are a helpful assistant" | Generic, no value | Use specific task context | ## What This Skill Does NOT Do - Does not manage conversation memory (different problem) - Does not optimize the model itself (skill ≠ fine-tuning) - Does not handle multi-agent coordination (orchestration concern)