--- name: context-engineering description: Optimize Claude Code context usage through monitoring, reduction strategies, progressive disclosure, planning/execution separation, and file-based optimization. Task-based operations for context window management, token efficiency, and maintaining conversation quality. Use when managing token costs, optimizing context usage, preventing context overflow, or improving multi-turn conversation quality. allowed-tools: Read, Write, Edit, Glob, Grep, Bash, WebSearch, WebFetch --- # Context Engineering ## Overview context-engineering provides systematic strategies for optimizing Claude Code context window usage. It helps you monitor token consumption, reduce context load, design context-efficient skills, and apply proven optimization patterns. **Purpose**: Maximize Claude Code effectiveness while managing token costs and maintaining conversation quality **The 5 Context Optimization Operations**: 1. **Monitor Context Usage** - Track token consumption, identify heavy consumers 2. **Reduce Context Load** - Remove stale content, minimize loaded files 3. **Optimize Skill Design** - Progressive disclosure, efficient reference loading 4. **Separate Planning/Execution** - Keep execution context clean 5. **File-Based Strategies** - Externalize large data (95% token savings) **Key Benefits**: - **29-39% performance improvement** with context editing strategies - **95% token savings** using file-based approaches for large data - **Sustained quality** in multi-turn conversations - **Cost reduction** through efficient token usage - **Prevention** of context overflow and conversation drift **Context Window Sizes** (2025): - Sonnet 4/4.5: 200k tokens (standard), 500k-1M (beta for Tier 4) - Auto-compaction: Triggers around 80% usage (~160k for 200k window) ## When to Use Use context-engineering when: 1. **Approaching Context Limits** - Context usage >60-70%, need to optimize before hitting limits 2. **Building Large Skills** - Creating skills with extensive documentation, need efficient loading strategies 3. **Token Cost Management** - Reducing API costs through optimization 4. **Multi-Turn Conversations** - Maintaining coherence across extended sessions 5. **Skill Design Phase** - Planning context-efficient architecture from start 6. **Performance Optimization** - Improving response quality and latency 7. **Conversation Quality** - Preventing drift and maintaining focus 8. **MCP Integration** - Managing context from Model Context Protocol servers 9. **Large Data Handling** - Working with extensive datasets or outputs ## Prerequisites - Understanding of Claude context windows - Access to context monitoring (Claude Code `/context` command) - Familiarity with progressive disclosure pattern - Skills under development or optimization ## Operations ### Operation 1: Monitor Context Usage **Purpose**: Track token consumption, identify context-heavy elements, and detect optimization opportunities **When to Use This Operation**: - Beginning of optimization effort (baseline measurement) - During development (continuous monitoring) - When approaching context limits (>60-70% usage) - Investigating performance issues **Process**: 1. **Check Current Context Usage** ``` Use /context command in Claude Code to view: - Total tokens used - Percentage of context window - Files loaded - Recent tool calls ``` 2. **Identify Heavy Consumers** - Large files loaded (>5,000 tokens each) - Extensive conversation history - Many tool call results - Large CLAUDE.md files 3. **Analyze Usage Patterns** - Which files are loaded but rarely referenced? - Are all tool results still relevant? - Is conversation history necessary? - Are there duplicate or redundant contexts? 4. **Document Baseline** - Record current token usage - Note context-heavy elements - Identify optimization targets 5. **Set Optimization Goals** - Target token reduction (e.g., reduce by 20%) - Performance improvement targets - Quality maintenance requirements **Validation Checklist**: - [ ] Current context usage measured (tokens and percentage) - [ ] Context-heavy elements identified (files, history, tool results) - [ ] Baseline documented for comparison - [ ] Optimization targets set - [ ] High-impact optimization opportunities noted **Outputs**: - Current context usage metrics - List of context-heavy elements - Baseline measurement - Optimization targets - Priority optimization opportunities **Time Estimate**: 10-15 minutes **Example**: ``` Context Usage Analysis ====================== Current Usage: 145,000 tokens (72% of 200k window) Heavy Consumers: 1. CLAUDE.md: 25,000 tokens (17%) 2. Large skill files: 40,000 tokens (28%) - planning-architect/SKILL.md: 15,000 tokens - development-workflow/common-patterns.md: 12,000 tokens - review-multi/scoring-rubric.md: 8,000 tokens 3. Conversation history: 30,000 tokens (21%) 4. Tool call results: 20,000 tokens (14%) Optimization Opportunities: - Split large CLAUDE.md (25k → 10k target) - Use references/ loading instead of full files (40k → 15k) - Clear old tool results (20k → 5k) Target: Reduce to ~100k tokens (50% of window, 31% reduction) ``` --- ### Operation 2: Reduce Context Load **Purpose**: Remove stale content, minimize loaded files, and reduce token consumption **When to Use This Operation**: - Context usage >70% (approaching limits) - Performance degradation noticed - Before major operations (clear space) - Periodic maintenance (every few hours) **Process**: 1. **Remove Stale Tool Results** - Identify old tool call results no longer needed - Tool results from exploratory work - Superseded information 2. **Minimize File Loading** - Only load files actively needed - Use Grep instead of Read for searching - Load specific sections, not entire files - Unload files when done with them 3. **Optimize Conversation History** - Context editing auto-clears stale content - Summarize long conversations if needed - Start fresh session for new major tasks 4. **Reduce CLAUDE.md Size** - Keep only essential long-term instructions - Move project-specific details to separate files - Use CLAUDE.local.md for temporary preferences - Target: <5,000 tokens for CLAUDE.md 5. **Apply Progressive Loading** - Load overview/index files first - Load detailed references only when needed - Use skill references/ on-demand loading **Validation Checklist**: - [ ] Stale tool results cleared - [ ] Only necessary files loaded - [ ] CLAUDE.md size optimized (<5,000 tokens if possible) - [ ] Progressive loading applied where relevant - [ ] Context usage reduced measurably **Outputs**: - Reduced token count - Cleaner context window - List of removed/optimized elements - New context usage measurement **Time Estimate**: 15-30 minutes **Example Reduction**: ``` Before Optimization: 145,000 tokens (72%) Actions Taken: 1. Cleared 50 old tool results: -15,000 tokens 2. Unloaded 3 large files no longer needed: -18,000 tokens 3. Optimized CLAUDE.md (split to CLAUDE.local.md): -12,000 tokens 4. Used references/ loading instead of full files: -25,000 tokens After Optimization: 75,000 tokens (37%) Reduction: 70,000 tokens (48% reduction) Quality Impact: None - relevant context maintained ``` --- ### Operation 3: Optimize Skill Design **Purpose**: Design context-efficient skills using progressive disclosure, lazy loading, and token-aware architecture **When to Use This Operation**: - Planning new skills (design for efficiency from start) - Refactoring existing skills (improve context efficiency) - Building large/complex skills (manage context proactively) - Creating skill ecosystems (coordinate context usage) **Process**: 1. **Apply Progressive Disclosure** - **SKILL.md**: Overview + essentials only (<1,200 lines, ~3,000-5,000 tokens) - **references/**: Detailed guides loaded on-demand (300-600 lines each, ~1,000-2,000 tokens) - **scripts/**: Automation loaded when needed **Token Impact**: 70-80% reduction vs monolithic (5k vs 20k+ tokens) 2. **Design for Lazy Loading** - Separate content into focused reference files - Each reference file covers one topic - Load specific reference, not entire skill - Example: Load `references/structure-review-guide.md` not entire review-multi 3. **Optimize File Sizes** - SKILL.md target: 800-1,200 lines (2,500-4,000 tokens) - Reference files: 300-600 lines (1,000-2,000 tokens each) - Keep files focused and concise 4. **Use Token-Efficient Formats** - Tables instead of prose (higher information density) - Lists instead of paragraphs (more scannable) - Code blocks for examples (clear and concise) - Quick Reference sections (high-density lookup) 5. **Consider Context Budget** - Estimate token usage for skill - Simple skill: 3,000-8,000 tokens total - Medium skill: 10,000-20,000 tokens total - Complex skill: 25,000-40,000 tokens total - Design within budget **Validation Checklist**: - [ ] Progressive disclosure applied (SKILL.md + references/) - [ ] SKILL.md <1,200 lines (~4,000 tokens or less) - [ ] Reference files 300-600 lines each - [ ] Files focused on single topics (lazy loadable) - [ ] Token-efficient formats used (tables, lists, code blocks) - [ ] Estimated total tokens within budget - [ ] Skill can be partially loaded (references/ on-demand) **Outputs**: - Context-efficient skill design - Token budget estimate - Progressive disclosure plan - Reference file organization **Time Estimate**: 20-40 minutes (during planning phase) **Example**: ``` Skill Design: api-integration Token Budget Analysis: - SKILL.md: 900 lines → ~3,000 tokens - references/ (3 files): - api-guide.md: 400 lines → ~1,300 tokens - auth-patterns.md: 350 lines → ~1,200 tokens - examples.md: 300 lines → ~1,000 tokens - README.md: 300 lines → ~1,000 tokens Total if all loaded: ~7,500 tokens Typical usage: SKILL.md only → 3,000 tokens (60% savings) With 1 reference: 3,000 + 1,200 → 4,200 tokens (44% savings) Progressive Disclosure Impact: - Monolithic (all in SKILL.md): ~7,500 tokens always loaded - Progressive (SKILL.md + on-demand refs): 3,000-4,500 tokens typical - Token Savings: 40-60% depending on usage Design: ✅ Context-efficient with progressive disclosure ``` --- ### Operation 4: Separate Planning from Execution **Purpose**: Keep execution context clean by separating exploratory planning from focused implementation **When to Use This Operation**: - Starting complex development work - Context getting cluttered with exploration - Need clean context for implementation - Before critical/focused work **Process**: 1. **Planning Phase** (Separate Session) - Broad codebase exploration - Research and pattern discovery - Architecture decisions - Task breakdown - Output: Plan documents, task lists **Characteristics**: High context usage, exploratory, broad 2. **Execution Phase** (Fresh Session) - Load plan documents (not exploration history) - Focused implementation - Specific file operations - Minimal context bloat **Characteristics**: Clean context, focused, efficient 3. **Session Transition** - End planning session when plan complete - Save planning artifacts (plans, task lists, decisions) - Start new session for execution - Load only: plan docs, CLAUDE.md, immediate dependencies 4. **Maintain Clean Execution Context** - Don't re-explore during execution - Follow plan, don't re-research - Load files as needed, unload when done - Keep focus on implementation **Validation Checklist**: - [ ] Planning and execution in separate sessions (when appropriate) - [ ] Planning artifacts saved and documented - [ ] Execution session starts with clean context - [ ] Only plan docs and essentials loaded for execution - [ ] No re-exploration during execution - [ ] Context stays focused on current task **Outputs**: - Clean execution context - Focused implementation - Reduced context bloat - Better performance and quality **Time Estimate**: Planning decision (0-5 min), session management (as needed) **Example**: ``` Planning Session (Context: 150k tokens, 75% usage): - Explored 20 files for research - Analyzed patterns across codebase - Made architecture decisions - Created detailed plan - Output: skill-plan.md (comprehensive) [End session, save plan] Execution Session (Context: 30k tokens, 15% usage): - Load: CLAUDE.md + skill-plan.md + development-workflow - Context: Clean, focused, 30k tokens - Implementation: Follow plan, build skill - Load references as needed (not all at once) Result: 80% context reduction (150k → 30k) Quality: Higher (clean context, focused work) ``` **Research Finding**: *"Separating planning from execution keeps implementation context clean"* - confirmed by 2025 best practices --- ### Operation 5: File-Based Optimization **Purpose**: Externalize large data to temporary files for on-demand analysis, achieving 95% token savings **When to Use This Operation**: - Handling large data sets (>5,000 tokens) - Processing extensive outputs (logs, reports) - Working with large MCP responses - Managing generated content **Process**: 1. **Identify Large Data** - Data >5,000 tokens (typically >10,000 characters) - Repeated reference to same large content - Extensive generated outputs - Large MCP server responses 2. **Externalize to Files** ```bash # Save large data to temp file echo "large data here" > /tmp/large-data.txt ``` Instead of keeping in conversation context 3. **Reference File Instead of Content** ```markdown Large dataset saved to: /tmp/analysis-data.json (50,000 tokens) To analyze: Read /tmp/analysis-data.json when needed ``` **Token Impact**: 50,000 tokens → ~500 tokens (95% reduction) 4. **Load On-Demand** - Read file only when specific analysis needed - Process in chunks if necessary - Don't keep full content in context 5. **Clean Up Temp Files** - Remove temp files when no longer needed - Don't accumulate unused files **Validation Checklist**: - [ ] Large data identified (>5,000 tokens) - [ ] Data externalized to files - [ ] File paths documented (reference in conversation) - [ ] On-demand loading used (not full content in context) - [ ] Token savings measured (before/after) - [ ] Quality maintained (can access data when needed) - [ ] Temp files cleaned up when done **Outputs**: - Externalized data files - File path references - Significant token savings (often 90-95%) - Maintained data accessibility **Time Estimate**: 10-20 minutes (setup and management) **Example**: ``` Scenario: Analyzing large log file (100,000 tokens) Before Optimization: - Full log in conversation context: 100,000 tokens - Context usage: 50% just for log data After Optimization: 1. Save log to /tmp/app-log.txt 2. Reference in context: "Log saved to /tmp/app-log.txt (100k tokens)" 3. Read specific sections when needed: - Read first 50 lines for overview - Grep for errors - Read relevant sections on-demand Token Usage: - Before: 100,000 tokens in context - After: ~500 tokens (file reference) + ~2,000 tokens (specific reads) - Savings: 97,500 tokens (97.5% reduction) Quality: Maintained - can still analyze log on-demand Access: Full log available when needed ``` **Research Finding**: *"File-based approach achieves 95% token savings"* - proven in 2025 optimization studies --- ## Best Practices ### 1. Quality Over Quantity **Practice**: Focus on relevant, high-quality context rather than loading everything **Rationale**: Every piece should be current, accurate, and directly relevant to task **Application**: Before loading file, ask: "Do I need this right now for current task?" ### 2. Progressive Disclosure Always **Practice**: Design all skills with SKILL.md + references/ pattern **Rationale**: 70-80% token reduction vs monolithic design **Application**: SKILL.md <1,200 lines, details in references/ loaded on-demand ### 3. Monitor Regularly **Practice**: Check context usage periodically, especially in long sessions **Rationale**: Auto-compaction triggers at ~80%, but proactive monitoring prevents drift **Application**: Check `/context` every 30-60 minutes in active development ### 4. File-Based for Large Data **Practice**: Externalize data >5,000 tokens to files **Rationale**: 95% token savings while maintaining accessibility **Application**: Save to /tmp/, reference file path, load on-demand ### 5. Separate Planning from Execution **Practice**: Plan in one session, execute in clean session with plan artifacts **Rationale**: Keeps execution context focused, prevents exploratory noise **Application**: When planning >1 hour, start fresh session for implementation ### 6. Clean Context Before Critical Work **Practice**: Reduce context load before important/complex operations **Rationale**: Clean context improves quality and performance **Application**: Before complex implementation, clear unnecessary context ### 7. Use Appropriate Tools **Practice**: Choose tools that minimize context usage **Rationale**: Some tools add more context than others **Application**: - Grep instead of Read for searching (doesn't load full file) - Glob for finding files (doesn't load content) - Task agents for exploration (separate context) ### 8. Optimize CLAUDE.md **Practice**: Keep CLAUDE.md concise (<5,000 tokens), split if needed **Rationale**: CLAUDE.md loaded every session, large files waste context **Application**: - Essential standards in CLAUDE.md (project level) - Temporary preferences in CLAUDE.local.md - Specific domain knowledge in separate files loaded as needed --- ## Context Budgets for Skills ### Simple Skills (Total: 5,000-10,000 tokens) **Structure**: - SKILL.md only or SKILL.md + 1-2 small references - No scripts or minimal automation - ~400-800 lines total **Token Breakdown**: - SKILL.md: 3,000-4,000 tokens - References (if any): 1,000-2,000 tokens each - README: 1,000 tokens **Example**: format-validator, simple helpers --- ### Medium Skills (Total: 15,000-30,000 tokens) **Structure**: - SKILL.md + 3-5 references + scripts - ~1,500-3,000 lines total **Token Breakdown**: - SKILL.md: 3,000-5,000 tokens - References: 4-6 files × 1,500 tokens = 6,000-9,000 tokens - Scripts: 2-3 files × 1,000 tokens = 2,000-3,000 tokens - README: 1,000-1,500 tokens **Progressive Loading**: Load SKILL.md (3-5k) + specific reference when needed (+1.5k) = 4.5-6.5k typical **Example**: prompt-builder, skill-researcher --- ### Complex Skills (Total: 40,000-60,000 tokens) **Structure**: - SKILL.md + 7-10 references + 4+ scripts - ~4,000-7,000 lines total **Token Breakdown**: - SKILL.md: 4,000-6,000 tokens - References: 7-10 files × 2,000 tokens = 14,000-20,000 tokens - Scripts: 4-6 files × 1,500 tokens = 6,000-9,000 tokens - README: 1,500-2,000 tokens **Progressive Loading**: Load SKILL.md (4-6k) + 1-2 references as needed (+2-4k) = 6-10k typical **Example**: review-multi, testing-validator **Key**: Even complex skills only load 6-10k tokens typically (not full 40-60k) --- ## Common Mistakes ### Mistake 1: Loading Everything Upfront **Symptom**: Context quickly fills with all skill files **Cause**: Reading all files instead of progressive loading **Fix**: Load SKILL.md first, load references/ only when needed for specific operations **Prevention**: Follow progressive disclosure pattern ### Mistake 2: Not Monitoring Context **Symptom**: Unexpected context overflow, performance degradation **Cause**: No visibility into token usage **Fix**: Check `/context` regularly, monitor usage patterns **Prevention**: Check context every 30-60 min in active sessions ### Mistake 3: Large Monolithic CLAUDE.md **Symptom**: Every session starts with 20k-50k tokens used **Cause**: Putting everything in CLAUDE.md **Fix**: Split CLAUDE.md - essentials only, separate files for detailed knowledge **Prevention**: Keep CLAUDE.md <5,000 tokens, use multiple files ### Mistake 4: Keeping Stale Tool Results **Symptom**: Context bloated with old exploratory results **Cause**: Not clearing old tool calls **Fix**: Context editing auto-clears, but can manually manage by starting fresh sessions **Prevention**: Fresh session for major transitions (planning → execution) ### Mistake 5: Not Using File-Based Strategies **Symptom**: Large data sets consuming 30-50% of context **Cause**: Keeping large outputs in conversation **Fix**: Save to /tmp/ files, reference file path, load on-demand **Prevention**: Any data >5,000 tokens → externalize to file ### Mistake 6: Ignoring Progressive Disclosure **Symptom**: Skills with 2,000+ line SKILL.md files **Cause**: Not using references/ for detailed content **Fix**: Extract detailed content to references/, keep SKILL.md as overview **Prevention**: Design with progressive disclosure from start (use planning-architect) --- ## Quick Reference ### Context Window Limits (2025) | Model | Standard | Beta (Tier 4) | Auto-Compact | |-------|----------|---------------|--------------| | Sonnet 4/4.5 | 200k tokens | 500k-1M tokens | ~80% (~160k for 200k) | ### Token Savings Strategies | Strategy | Token Savings | Application | |----------|---------------|-------------| | Progressive Disclosure | 70-80% | SKILL.md + references/ vs monolithic | | File-Based Externalization | 95% | Large data >5k tokens to /tmp/ files | | Context Editing | 29-39% | Auto-clears stale content | | Lazy Loading | 60-70% | Load references on-demand vs all upfront | | Optimized CLAUDE.md | Variable | Keep <5k tokens vs 20-50k bloat | ### Skill Token Budgets | Skill Complexity | Total Tokens | Typical Load | Progressive Load | |------------------|--------------|--------------|------------------| | Simple | 5k-10k | 3k-4k | SKILL.md only | | Medium | 15k-30k | 4k-6k | SKILL.md + 1 reference | | Complex | 40k-60k | 6k-10k | SKILL.md + 2 references | ### Context Usage Guidelines | Usage % | Status | Action | |---------|--------|--------| | <50% | ✅ Healthy | Normal operation | | 50-70% | ⚠️ Monitor | Check periodically, plan optimization | | 70-80% | ⚠️ Optimize | Reduce context load soon | | >80% | ❌ Critical | Immediate optimization needed (auto-compact triggers) | ### Optimization Decision Tree ``` Is context >70%? ├─ Yes → Reduce immediately (Operation 2) │ ├─ Clear stale tool results │ ├─ Unload unnecessary files │ └─ Start fresh session if needed │ └─ No → Preventive optimization ├─ Is data >5k tokens? → File-based (Operation 5) ├─ Building skills? → Progressive disclosure (Operation 3) └─ Long session? → Consider planning/execution split (Operation 4) ``` ### Quick Optimization Actions **Immediate** (When context >80%): ```bash 1. Check usage: /context command 2. Clear old tool results (context editing helps) 3. Start fresh session with essentials only 4. Load plan docs, not exploration history ``` **Preventive** (During development): ``` 1. Design skills with progressive disclosure 2. Use file-based for large data (>5k tokens) 3. Monitor context every 30-60 min 4. Separate planning from execution (for complex work) ``` ### Common Commands ```bash # Monitor context usage /context # Read specific lines (not full file) Read file_path --limit 50 # Search without loading (uses Grep, more efficient) Grep "pattern" path/ # Find files without loading content Glob "*.py" path/ # Externalize large data Bash: command > /tmp/output.txt # Then reference: "See /tmp/output.txt for results" ``` ### Token Estimation **Quick Estimation**: - 1 line of code/text ≈ 3-4 tokens (average) - 1,000 lines ≈ 3,000-4,000 tokens - Dense prose: 3-3.5 tokens/line - Code with comments: 2.5-3 tokens/line - Tables/lists: 2-2.5 tokens/line (more efficient) ### For More Information - **Context monitoring**: references/context-monitoring-guide.md - **Reduction strategies**: references/reduction-strategies.md - **Optimization patterns**: references/optimization-patterns.md - **Analysis script**: scripts/analyze-context-usage.py --- **context-engineering** helps you maximize Claude Code effectiveness through strategic context management, ensuring optimal performance, quality, and cost-efficiency throughout development.