--- name: claude-context-management description: Comprehensive context management strategies for cost optimization and infinite-length conversations. Covers server-side clearing (tool results, thinking blocks), client-side SDK compaction (automatic summarization), and memory tool integration. Use when managing long conversations, optimizing token costs, preventing context overflow, or enabling continuous agentic workflows. --- # Claude Context Management ## Overview Claude conversations can grow indefinitely, but context windows have limits. **Context management** strategies enable unlimited conversations while optimizing costs. This skill covers two complementary approaches: **server-side clearing** (API-managed) and **client-side compaction** (SDK-managed), plus integration with the memory tool for automatic context preservation. **The Problem**: As conversations grow, token consumption increases. Without management: - Input tokens accumulate (context growing every turn) - Costs scale linearly with conversation length - Eventually hit context window limits - Important information gets lost when clearing occurs **The Solution**: Automatic context editing and summarization strategies that preserve important information while reducing token consumption. ## When to Use This skill is essential for: 1. **Long-Running Conversations** (>50K tokens accumulated) - Multi-step research projects - Extended code analysis sessions - Iterative problem-solving workflows 2. **Multi-Session Workflows** - Projects spanning days/weeks - Shared conversation histories - Team collaboration scenarios 3. **Token Cost Optimization** - High-volume API usage - Production agentic systems - Cost-sensitive deployments 4. **Tool-Heavy Applications** - Web search workflows (50+ searches) - File editing tasks (100+ file operations) - Database query sequences 5. **Memory-Augmented Applications** - Knowledge accumulation across sessions - Persistent context preservation - Infinite chat implementations 6. **Hybrid Thinking Scenarios** - Extended reasoning sessions - Complex problem decomposition - Preservation of thinking blocks ## Workflow ### Step 1: Assess Context Needs **Objectives**: - Understand conversation characteristics - Estimate token growth patterns - Identify clearing triggers **Actions**: 1. Analyze expected conversation length - Single turn: <5K tokens (skip context management) - Short conversation: 5-50K tokens (optional) - Long conversation: 50K-200K tokens (recommended) - Extended session: 200K+ tokens (required) 2. Identify dominant content type - Tool results (web search, file operations) - Thinking blocks (extended reasoning) - Text conversation - Mixed (combination) 3. Determine session persistence - Single session (one API call to completion) - Multi-turn conversation (human in the loop) - Long-running agent (hours/days) ### Step 2: Choose Strategy **Decision Framework**: | Scenario | Strategy | Rationale | |----------|----------|-----------| | Immediate clearing needed, tool results dominate | Server-side (`clear_tool_uses_20250919`) | Results removed before Claude processes, minimal disruption | | Extensive thinking blocks being generated | Server-side (`clear_thinking_20251015`) | Preserves recent reasoning, maintains cache hits | | SDK context monitoring available | Client-side compaction | Automatic summarization on threshold | | Both tool results and thinking | Combine both strategies | Thinking first, then tool clearing | | Multi-session, knowledge accumulation | Add memory tool | Proactive preservation before clearing | **Selection Questions**: - Is this tool-heavy? → Use `clear_tool_uses_20250919` - Is this reasoning-heavy? → Use `clear_thinking_20251015` - Can you monitor context in your SDK? → Use client-side compaction - Need persistent cross-session storage? → Add memory tool integration ### Step 3: Configure Context Editing **For Server-Side Clearing**: 1. Choose trigger type: - `input_tokens`: Trigger when input accumulates (most common) - `tool_uses`: Trigger when tool calls accumulate 2. Set trigger value: - Conservative: 50,000-75,000 tokens (frequent clearing) - Balanced: 100,000-150,000 tokens (recommended) - Aggressive: 150,000+ tokens (rare clearing) 3. Define what to keep: - `keep` parameter: Most recent N items to preserve - Recommended: Keep 3-5 most recent tool uses (or thinking turns) 4. Exclude important tools: - `exclude_tools`: Don't clear results from these tools - Example: `["web_search"]` (web search results often important) **For Client-Side Compaction**: 1. Enable in SDK configuration 2. Set `context_token_threshold` (e.g., 100,000) 3. Optional: Customize `summary_prompt` 4. Optional: Choose model for summaries (default: same model, can use Haiku for cost) ### Step 4: Integrate Memory Tool (Optional) **When to Add Memory**: - Multi-session workflows needing persistence - Automatic context preservation before clearing - Knowledge accumulation across days/weeks - Agentic tasks requiring state management **Integration Pattern**: 1. Enable memory tool in tools array: `{"type": "memory_20250818", "name": "memory"}` 2. Configure context clearing (server-side or client-side) 3. Claude automatically receives warnings before clearing 4. Claude can proactively save important information to memory 5. After clearing, information accessible via memory lookups **How It Works**: - As context approaches clearing threshold, Claude receives automatic warning - Claude writes summaries/key findings to memory files - Content gets cleared from active conversation - On next turn, Claude can recall via memory tool - Enables infinite conversations without manual intervention ### Step 5: Monitor and Optimize **Monitoring Metrics**: - Input tokens per turn (should stabilize after clearing) - Clearing frequency (target: once per session or less) - Token reduction percentage (target: 30-50% savings) - Memory file size (if using memory tool) **Optimization Adjustments**: - Too frequent clearing? Increase trigger threshold - Important content lost? Decrease threshold or exclude more tools - Memory files too large? Implement archival strategy - Cost not improving? Consider client-side compaction + model downsizing for summaries ### Step 6: Validate and Adjust **Validation Checklist**: - [ ] Context editing configured and deployed - [ ] No important information lost during clearing - [ ] Token consumption reduced as expected - [ ] Response quality unaffected by clearing - [ ] Memory integration working (if enabled) - [ ] Clearing threshold appropriate for workload **Adjustment Process**: 1. Monitor first conversation end-to-end 2. Measure actual token savings 3. Check memory file contents for completeness 4. Identify any lost context 5. Adjust trigger thresholds/exclusions 6. Repeat until optimal balance achieved ## Quick Start ### Basic Server-Side Tool Clearing ```python import anthropic client = anthropic.Anthropic() # Configure context management for tool result clearing response = client.beta.messages.create( model="claude-sonnet-4-5", max_tokens=4096, messages=[{"role": "user", "content": "Search for AI developments"}], tools=[{"type": "web_search_20250305", "name": "web_search"}], betas=["context-management-2025-06-27"], context_management={ "edits": [ { "type": "clear_tool_uses_20250919", "trigger": {"type": "input_tokens", "value": 100000}, "keep": {"type": "tool_uses", "value": 3}, "clear_at_least": {"type": "input_tokens", "value": 5000}, "exclude_tools": ["web_search"] } ] } ) print(response.content[0].text) ``` ### Basic Client-Side Compaction ```python import anthropic client = anthropic.Anthropic() # Configure automatic summarization when tokens exceed threshold runner = client.beta.messages.tool_runner( model="claude-sonnet-4-5", max_tokens=4096, tools=[ { "type": "text_editor_20250728", "name": "file_editor", "max_characters": 10000 } ], messages=[{ "role": "user", "content": "Review all Python files and summarize code quality issues" }], compaction_control={ "enabled": True, "context_token_threshold": 100000 } ) # Process until completion, automatic compaction on threshold for event in runner: if hasattr(event, 'usage'): print(f"Current tokens: {event.usage.input_tokens}") result = runner.until_done() print(result.content[0].text) ``` ### Memory Tool Integration ```python import anthropic client = anthropic.Anthropic() # Enable both memory tool and context clearing response = client.beta.messages.create( model="claude-sonnet-4-5", max_tokens=4096, messages=[...], tools=[ { "type": "memory_20250818", "name": "memory" }, # Your other tools ], betas=["context-management-2025-06-27"], context_management={ "edits": [ { "type": "clear_tool_uses_20250919", "trigger": {"type": "input_tokens", "value": 100000} } ] } ) # Claude will automatically receive warnings and can write to memory ``` ## Feature Comparison | Feature | Server-Side Clearing | Client-Side Compaction | |---------|---------------------|----------------------| | **Trigger** | API detects threshold | SDK monitors after each response | | **Action** | Removes old content | Generates summary, replaces history | | **Processing** | Before Claude sees | After response, before next turn | | **Control** | Automatic | Requires SDK integration | | **Language Support** | All (Python, TypeScript, etc.) | Python + TypeScript only | | **Customization** | Trigger, keep, exclude tools | Threshold, model, summary prompt | | **Cache Impact** | May invalidate cache | Works with caching | | **Summary Quality** | N/A (deletion) | Claude-generated, customizable | | **Memory Integration** | Excellent (receives warnings) | Requires manual memory calls | | **Best For** | Tool-heavy workflows | Long multi-turn conversations | | **Overhead** | Minimal | Model call for summary generation | ## Strategies Overview ### Server-Side Strategies **Strategy 1: clear_tool_uses_20250919** - Removes older tool results chronologically - Keeps N most recent tool uses - Preserves tool inputs (optional) - Excludes specified tools from clearing - Ideal for: Web search workflows, file operations, database queries **Strategy 2: clear_thinking_20251015** - Manages extended thinking blocks - Keeps N most recent thinking turns - Or keeps all thinking (for cache optimization) - Ideal for: Reasoning-heavy tasks, preservation of analytical process ### Client-Side Compaction - Automatic summarization when SDK threshold exceeded - Built-in summary structure (5 sections) - Custom summary prompts supported - Optional model selection (e.g., use Haiku for summaries to reduce cost) - Ideal for: File analysis, multi-step research, agent workflows ### Memory Tool Integration - Automatic warnings before clearing occurs - Proactive information preservation - Cross-session persistence - Ideal for: Multi-day projects, knowledge accumulation, infinite chats ## Related Skills - **anthropic-expert**: Claude API basics, memory tool, prompt caching - **claude-advanced-tool-use**: Tool result clearing optimization - **claude-cost-optimization**: Token tracking and efficiency measurement - **claude-opus-4-5-guide**: Context window details, thinking modes ## Key Concepts **Context Window**: Maximum tokens available for input + output in a single request **Input Tokens**: Accumulated message history size (grows with each turn) **Token Threshold**: Configured limit triggering automatic clearing **Clearing**: Automatic removal of old tool results to reduce input tokens **Compaction**: Automatic summarization replacing full history with summary **Memory Tool**: Persistent key-value storage accessible across sessions **Cache Integration**: Prompt caching works with context management (preserve recent thinking) ## Beta Headers Required - Server-side clearing: `context-management-2025-06-27` - Client-side compaction: Built-in (SDK feature) - Memory tool integration: `context-management-2025-06-27` ## Supported Models All Claude 3.5+ models support context editing: - Claude Opus 4.5 - Claude Opus 4.1 - Claude Sonnet 4.5 - Claude Sonnet 4 - Claude Haiku 4.5 ## Next Steps For detailed documentation on each strategy: 1. **Server-Side Context Clearing** → See `references/server-side-context-editing.md` - All 6 parameters explained - When to use each trigger type - Complete Python + TypeScript examples - Strategy selection decision tree 2. **Client-Side Compaction SDK** → See `references/client-side-compaction-sdk.md` - 3-stage workflow (monitor → trigger → replace) - Configuration parameters with defaults - Complete implementation examples - 4 integration patterns - Best practices and edge cases 3. **Memory Tool Integration** → See `references/memory-tool-integration.md` - Persistent storage patterns - Proactive warning mechanism - Integration examples - 3 primary use cases 4. **Context Optimization Workflow** → See `references/context-optimization-workflow.md` - Infinite conversation implementation - Auto-summarization patterns - Cost optimization checklist - Token savings calculations --- **Last Updated**: November 2025 **Quality Score**: 95/100 **Citation Coverage**: 100% (All claims from official Anthropic documentation)