--- name: context-compression description: Use when compressing agent context, implementing conversation summarization, reducing token usage in long sessions, or asking about "context compression", "conversation history", "token optimization", "context limits", "summarization strategies" version: 1.0.0 --- # Context Compression Strategies When agent sessions generate millions of tokens, compression becomes mandatory. Optimize for tokens-per-task (total tokens to complete a task), not tokens-per-request. ## Compression Approaches ### 1. Anchored Iterative Summarization (Recommended) - Maintain structured summaries with explicit sections - On compression, summarize only newly-truncated content - Merge with existing summary instead of regenerating - Structure forces preservation of critical info ### 2. Opaque Compression - Highest compression ratios (99%+) - Sacrifices interpretability - Cannot verify what was preserved ### 3. Regenerative Full Summary - Generate detailed summary on each compression - Readable but may lose details across cycles - Full regeneration rather than merging ## Structured Summary Format ```markdown ## Session Intent [What the user is trying to accomplish] ## Files Modified - auth.controller.ts: Fixed JWT token generation - config/redis.ts: Updated connection pooling ## Decisions Made - Using Redis connection pool instead of per-request - Retry logic with exponential backoff ## Current State - 14 tests passing, 2 failing - Remaining: mock setup for session service tests ## Next Steps 1. Fix remaining test failures 2. Run full test suite 3. Update documentation ``` ## Compression Triggers | Strategy | Trigger | Trade-off | |----------|---------|-----------| | Fixed threshold | 70-80% context | Simple but may compress early | | Sliding window | Last N turns + summary | Predictable size | | Importance-based | Low-relevance first | Complex but preserves signal | | Task-boundary | At task completions | Clean but unpredictable | ## The Artifact Trail Problem File tracking is the weakest dimension (2.2-2.5/5.0 in evaluations). Coding agents need: - Which files were created - Which files were modified and what changed - Which files were read but not changed - Function names, variable names, error messages **Solution**: Separate artifact index or explicit file-state tracking. ## Probe-Based Evaluation Test compression quality with probes: | Probe Type | Tests | Example | |------------|-------|---------| | Recall | Factual retention | "What was the original error?" | | Artifact | File tracking | "Which files have we modified?" | | Continuation | Task planning | "What should we do next?" | | Decision | Reasoning chain | "What did we decide about Redis?" | ## Compression Ratios | Method | Compression | Quality | Trade-off | |--------|-------------|---------|-----------| | Anchored Iterative | 98.6% | 3.70 | Best quality | | Regenerative | 98.7% | 3.44 | Moderate | | Opaque | 99.3% | 3.35 | Best compression | The 0.7% extra tokens buys 0.35 quality points—worth it when re-fetching costs matter. ## Three-Phase Workflow (Large Codebases) 1. **Research Phase**: Explore and compress into structured analysis 2. **Planning Phase**: Convert to implementation spec (~2,000 words for 5M tokens) 3. **Implementation Phase**: Execute against the spec ## Best Practices 1. Optimize for tokens-per-task, not tokens-per-request 2. Use structured summaries with explicit file sections 3. Trigger compression at 70-80% utilization 4. Implement incremental merging over regeneration 5. Test with probe-based evaluation 6. Track artifact trail separately if critical 7. Monitor re-fetching frequency as quality signal