--- name: context-engineering-advisor description: Diagnose context stuffing vs. context engineering. Use when an AI workflow feels bloated, brittle, or hard to steer reliably. intent: >- Guide product managers through diagnosing whether they're doing **context stuffing** (jamming volume without intent) or **context engineering** (shaping structure for attention). Use this to identify context boundaries, fix "Context Hoarding Disorder," and implement tactical practices like bounded domains, episodic retrieval, and the Research→Plan→Reset→Implement cycle. type: interactive theme: ai-agents best_for: - "Diagnosing context stuffing vs. context engineering in your AI workflows" - "Building better memory and retrieval architecture for AI agents" - "Improving AI output quality through structured context design" scenarios: - "My AI outputs are mediocre even though I'm giving it lots of information — diagnose what's wrong" - "I want to architect context properly for a multi-step AI workflow in my product team" estimated_time: "15-20 min" --- ## Purpose Guide product managers through diagnosing whether they're doing **context stuffing** (jamming volume without intent) or **context engineering** (shaping structure for attention). Use this to identify context boundaries, fix "Context Hoarding Disorder," and implement tactical practices like bounded domains, episodic retrieval, and the Research→Plan→Reset→Implement cycle. **Key Distinction:** Context stuffing assumes volume = quality ("paste the entire PRD"). Context engineering treats AI attention as a scarce resource and allocates it deliberately. This is not about prompt writing—it's about **designing the information architecture** that grounds AI in reality without overwhelming it with noise. ## Key Concepts ### The Paradigm Shift: Parametric → Contextual Intelligence **The Fundamental Problem:** - LLMs have **parametric knowledge** (encoded during training) = static, outdated, non-attributable - When asked about proprietary data, real-time info, or user preferences → forced to hallucinate or admit ignorance - **Context engineering** bridges the gap between static training and dynamic reality **PM's Role Shift:** From feature builder → **architect of informational ecosystems** that ground AI in reality --- ### Context Stuffing vs. Context Engineering | Dimension | Context Stuffing | Context Engineering | |-----------|------------------|---------------------| | **Mindset** | Volume = quality | Structure = quality | | **Approach** | "Add everything just in case" | "What decision am I making?" | | **Persistence** | Persist all context | Retrieve with intent | | **Agent Chains** | Share everything between agents | Bounded context per agent | | **Failure Response** | Retry until it works | Fix the structure | | **Economic Model** | Context as storage | Context as attention (scarce resource) | **Critical Metaphor:** Context stuffing is like bringing your entire file cabinet to a meeting. Context engineering is bringing only the 3 documents relevant to today's decision. --- ### The Anti-Pattern: Context Stuffing **Five Markers of Context Stuffing:** 1. **Reflexively expanding context windows** — "Just add more tokens!" 2. **Persisting everything "just in case"** — No clear retention criteria 3. **Chaining agents without boundaries** — Agent A passes everything to Agent B to Agent C 4. **Adding evaluations to mask inconsistency** — "We'll just retry until it's right" 5. **Normalized retries** — "It works if you run it 3 times" becomes acceptable **Why It Fails:** - **Reasoning Noise:** Thousands of irrelevant files compete for attention, degrading multi-hop logic - **Context Rot:** Dead ends, past errors, irrelevant data accumulate → goal drift - **Lost in the Middle:** Models prioritize beginning (primacy) and end (recency), ignore middle - **Economic Waste:** Every query becomes expensive without accuracy gains - **Quantitative Degradation:** Accuracy drops below 20% when context exceeds ~32k tokens **The Hidden Costs:** - Escalating token consumption - Diluted attention across irrelevant material - Reduced output confidence - Cascading retries that waste time and money --- ### Real Context Engineering: Core Principles **Five Foundational Principles:** 1. **Context without shape becomes noise** 2. **Structure > Volume** 3. **Retrieve with intent, not completeness** 4. **Small working contexts** (like short-term memory) 5. **Context Compaction:** Maximize density of relevant information per token **Quantitative Framework:** ``` Efficiency = (Accuracy × Coherence) / (Tokens × Latency) ``` **Key Finding:** Using RAG with 25% of available tokens preserves 95% accuracy while significantly reducing latency and cost. --- ### The 5 Diagnostic Questions (Detect Context Hoarding Disorder) Ask these to identify context stuffing: 1. **What specific decision does this support?** — If you can't answer, you don't need it 2. **Can retrieval replace persistence?** — Just-in-time beats always-available 3. **Who owns the context boundary?** — If no one, it'll grow forever 4. **What fails if we exclude this?** — If nothing breaks, delete it 5. **Are we fixing structure or avoiding it?** — Stuffing context often masks bad information architecture --- ### Memory Architecture: Two-Layer System **Short-Term (Conversational) Memory:** - Immediate interaction history for follow-up questions - Challenge: Space management → older parts summarized or truncated - Lifespan: Single session **Long-Term (Persistent) Memory:** - User preferences, key facts across sessions → deep personalization - Implemented via vector database (semantic retrieval) - Two types: - **Declarative Memory:** Facts ("I'm vegan") - **Procedural Memory:** Behavioral patterns ("I debug by checking logs first") - Lifespan: Persistent across sessions **LLM-Powered ETL:** Models generate their own memories by identifying signals, consolidating with existing data, updating database automatically. --- ### The Research → Plan → Reset → Implement Cycle **The Context Rot Solution:** 1. **Research:** Agent gathers data → large, chaotic context window (noise + dead ends) 2. **Plan:** Agent synthesizes into high-density SPEC.md or PLAN.md (Source of Truth) 3. **Reset:** **Clear entire context window** (prevents context rot) 4. **Implement:** Fresh session using **only** the high-density plan as context **Why This Works:** Context rot is eliminated; agent starts clean with compressed, high-signal context. --- ### Anti-Patterns (What This Is NOT) - **Not about choosing AI tools** — Claude vs. ChatGPT doesn't matter; architecture matters - **Not about writing better prompts** — This is systems design, not copywriting - **Not about adding more tokens** — "Infinite context" narratives are marketing, not engineering reality - **Not about replacing human judgment** — Context engineering amplifies judgment, doesn't eliminate it --- ### When to Use This Skill ✅ **Use this when:** - You're pasting entire PRDs/codebases into AI and getting vague responses - AI outputs are inconsistent ("works sometimes, not others") - You're burning tokens without seeing accuracy improvements - You suspect you're "context stuffing" but don't know how to fix it - You need to design context architecture for an AI product feature ❌ **Don't use this when:** - You're just getting started with AI (start with basic prompts first) - You're looking for tool recommendations (this is about architecture, not tooling) - Your AI usage is working well (if it ain't broke, don't fix it) --- ### Facilitation Source of Truth Use [`workshop-facilitation`](../workshop-facilitation/SKILL.md) as the default interaction protocol for this skill. It defines: - session heads-up + entry mode (Guided, Context dump, Best guess) - one-question turns with plain-language prompts - progress labels (for example, Context Qx/8 and Scoring Qx/5) - interruption handling and pause/resume behavior - numbered recommendations at decision points - quick-select numbered response options for regular questions (include `Other (specify)` when useful) This file defines the domain-specific assessment content. If there is a conflict, follow this file's domain logic. ## Application This interactive skill uses **adaptive questioning** to diagnose context stuffing, identify boundaries, and provide tactical implementation guidance. --- ### Step 0: Gather Context **Agent asks:** Before we diagnose your context practices, let's gather information: **Current AI Usage:** - What AI tools/systems do you use? (ChatGPT, Claude, custom agents, etc.) - What PM tasks do you use AI for? (PRD writing, user research synthesis, discovery, etc.) - How do you provide context? (paste docs, reference files, use projects/memory) **Symptoms:** - Are AI outputs inconsistent? (works sometimes, not others) - Are you retrying prompts multiple times to get good results? - Are responses vague or hedged despite providing "all the context"? - Are token costs escalating without accuracy improvements? **System Architecture (if applicable):** - Do you have custom AI agents or workflows? - How is context shared between agents? - Do you use RAG, vector databases, or memory systems? **You can describe briefly or paste examples.** --- ### Step 1: Diagnose Context Stuffing Symptoms **Agent asks:** Let's assess whether you're experiencing **context stuffing**. Which of these symptoms do you recognize? **Select all that apply:** 1. **"I paste entire documents into AI"** — Full PRDs, complete user interview transcripts, entire codebases 2. **"AI gives vague, hedged responses despite having 'all the context'"** — Responses like "it depends," "consider these options," non-committal 3. **"I have to retry prompts 3+ times to get usable output"** — Inconsistency is normalized 4. **"Token costs are escalating but accuracy isn't improving"** — Spending more, getting same or worse results 5. **"I keep adding more context hoping it'll help"** — Reflexive expansion without strategy 6. **"My agents pass everything to each other"** — Agent A → Agent B → Agent C with full context chain 7. **"I don't have clear criteria for what to include/exclude"** — No context boundary definitions 8. **"None of these—my AI usage is working well"** — Skip to advanced optimization **User response:** [Select symptoms] **Agent analyzes:** Based on your selections: - **0-1 symptoms:** Healthy context practices; proceed to optimization - **2-3 symptoms:** Early context stuffing; address before it scales - **4+ symptoms:** Active Context Hoarding Disorder; immediate intervention needed **Agent proceeds to diagnostic questions.** --- ### Step 2: Diagnostic Question 1 — What Specific Decision Does This Support? **Agent asks:** Let's start with the most critical question: **For each piece of context you're including, what specific decision does it support?** **Example Context Analysis:** Imagine you're asking AI to "help with discovery planning." You're providing: - Entire PRD (20 pages) - 50 user interview transcripts (full) - Competitive analysis doc (15 pages) - Team meeting notes from last 3 months **Question:** What decision are you making right now? **Offer 3 scenarios:** 1. **"I'm deciding which user segment to interview first"** - **Context needed:** User segments from PRD (2 paragraphs), prior interview themes (1 page synthesis), not full transcripts - **Context NOT needed:** Meeting notes, full competitive analysis, full PRD 2. **"I'm deciding which discovery questions to ask in interviews"** - **Context needed:** Research objectives (from PRD), past interview insights (synthesis), Jobs-to-be-Done framework - **Context NOT needed:** Full competitive analysis, full meeting notes 3. **"I'm not sure what decision I'm making—I just want AI to 'understand my product'"** - **Problem:** No specific decision = context stuffing trap - **Fix:** Define the decision first, then select context **Agent recommends:** **Best Practice:** Before adding context, complete this sentence: > "I need this context because I'm deciding [specific decision], and without [specific information], I can't make that decision." If you can't complete that sentence, you don't need the context. **User response:** [Describe their decision + context] **Agent validates:** Does the context directly support the stated decision? If not, recommend trimming. --- ### Step 3: Diagnostic Question 2 — Can Retrieval Replace Persistence? **Agent asks:** **Second question: Is this information you always need, or something you can retrieve just-in-time?** **The Distinction:** **Always-Needed (Persist):** - Core product constraints (technical, regulatory, strategic) - User preferences that apply to every interaction - Critical definitions (operational glossary) - Non-negotiable rules **Episodic (Retrieve on-demand):** - Project-specific details (this epic, this sprint) - Historical data (past PRDs, old interview transcripts) - Contextual facts (competitive analysis, market research) - Temporary decisions **Key Insight:** Just-in-time retrieval beats always-available. Don't persist what you can retrieve. **Offer 3 options:** 1. **"Most of my context is always-needed (core constraints, user prefs)"** - **Assessment:** Good instinct; verify with Question 4 (what fails if excluded?) - **Recommendation:** Build constraints registry and operational glossary (persist these) 2. **"Most of my context is episodic (project details, historical data)"** - **Assessment:** Perfect candidate for RAG or retrieval - **Recommendation:** Implement semantic search; retrieve only relevant chunks for each query 3. **"I'm not sure which is which—I persist everything to be safe"** - **Assessment:** Classic Context Hoarding Disorder symptom - **Fix:** Apply Question 4 test to each piece of context **Agent recommends:** **Rule of Thumb:** - **Persist:** Information referenced in 80%+ of interactions - **Retrieve:** Information referenced in <20% of interactions - **Gray zone (20-80%):** Depends on retrieval latency vs. context window cost **User response:** [Categorize their context] **Agent provides:** Specific recommendations on what to persist vs. retrieve. --- ### Step 4: Diagnostic Question 3 — Who Owns the Context Boundary? **Agent asks:** **Third question: Who is responsible for defining what belongs in vs. out of your AI's context?** **The Ownership Problem:** If **no one** owns the context boundary, it will grow indefinitely. Every PM will add "just one more thing," and six months later, you're stuffing 100k tokens per query. **Offer 3 options:** 1. **"I own the boundary (solo PM or small team)"** - **Assessment:** Good—you can make fast decisions - **Recommendation:** Document your boundary criteria (use Questions 1-5 as framework) 2. **"My team shares ownership (collaborative boundary definition)"** - **Assessment:** Can work if formalized - **Recommendation:** Create a "Context Manifest" doc: what's always included, what's retrieved, what's excluded (and why) 3. **"No one owns it—it's ad-hoc / implicit"** - **Assessment:** Critical risk; boundary will expand uncontrollably - **Fix:** Assign explicit ownership; schedule quarterly context audits **Agent recommends:** **Best Practice: Create a Context Manifest** ```markdown # Context Manifest: [Product/Feature Name] ## Always Persisted (Core Context) - Product constraints (technical, regulatory) - User preferences (role, permissions, preferences) - Operational glossary (20 key terms) ## Retrieved On-Demand (Episodic Context) - Historical PRDs (retrieve via semantic search) - User interview transcripts (retrieve relevant quotes) - Competitive analysis (retrieve when explicitly needed) ## Excluded (Out of Scope) - Meeting notes older than 30 days (no longer relevant) - Full codebase (use code search instead) - Marketing materials (not decision-relevant) ## Boundary Owner: [Name] ## Last Reviewed: [Date] ## Next Review: [Date + 90 days] ``` **User response:** [Describe current ownership model] **Agent provides:** Recommendation on formalizing ownership + template for Context Manifest. --- ### Step 5: Diagnostic Question 4 — What Fails if We Exclude This? **Agent asks:** **Fourth question: For each piece of context, what specific failure mode occurs if you exclude it?** This is the **falsification test**. If you can't identify a concrete failure, you don't need the context. **Offer 3 scenarios:** 1. **"If I exclude product constraints, AI will recommend infeasible solutions"** - **Failure Mode:** Clear and concrete - **Assessment:** Valid reason to persist constraints 2. **"If I exclude historical PRDs, AI won't understand our product evolution"** - **Failure Mode:** Vague and hypothetical - **Assessment:** Historical context rarely needed for current decisions - **Fix:** Retrieve PRDs only when explicitly referencing past decisions 3. **"If I exclude this, I'm not sure anything would break—I just include it to be thorough"** - **Failure Mode:** None identified - **Assessment:** Context stuffing; delete immediately **Agent recommends:** **The Falsification Protocol:** For each context element, complete this statement: > "If I exclude [context element], then [specific failure] will occur in [specific scenario]." **Examples:** - ✅ Good: "If I exclude GDPR constraints, AI will recommend features that violate EU privacy law." - ❌ Bad: "If I exclude this PRD, AI might not fully understand the product." (Vague) **User response:** [Apply falsification test to their context] **Agent provides:** List of context elements to delete (no concrete failure identified). --- ### Step 6: Diagnostic Question 5 — Are We Fixing Structure or Avoiding It? **Agent asks:** **Fifth question: Is adding more context solving a problem, or masking a deeper structural issue?** **The Root Cause Question:** Context stuffing often hides bad information architecture. Instead of fixing messy, ambiguous documents, teams add more documents hoping AI will "figure it out." **Offer 3 options:** 1. **"I'm adding context because our docs are poorly structured/ambiguous"** - **Assessment:** You're masking a structural problem - **Fix:** Clean up the docs first (remove ambiguity, add constraints, define terms) - **Example:** Instead of pasting 5 conflicting PRDs, reconcile them into 1 Source of Truth 2. **"I'm adding context because we don't have a shared operational glossary"** - **Assessment:** You're compensating for missing foundations - **Fix:** Build the glossary (20-30 key terms); AI can reference it reliably - **Example:** Define "active user," "churn," "engagement" unambiguously 3. **"I'm adding context because our constraints aren't documented"** - **Assessment:** You're avoiding constraint engineering - **Fix:** Create constraints registry (technical, regulatory, strategic) - **Example:** Document "We won't build mobile apps" vs. explaining it in every prompt **Agent recommends:** **The Structural Health Test:** If you're adding context to compensate for: - **Ambiguous documentation** → Fix the docs, don't add more - **Undefined terms** → Build operational glossary - **Undocumented constraints** → Create constraints registry - **Conflicting information** → Reconcile into Source of Truth **User response:** [Identify structural issues] **Agent provides:** Prioritized list of structural fixes before adding more context. --- ### Step 7: Define Memory Architecture **Agent asks:** Based on your context needs, let's design a **two-layer memory architecture**. **Your Context Profile (from previous steps):** - Always-needed context: [Summary from Q2] - Episodic context: [Summary from Q2] - Boundary owner: [From Q3] - Validated essentials: [From Q4] - Structural fixes needed: [From Q5] **Recommended Architecture:** **Short-Term (Conversational) Memory:** - **What it stores:** Immediate interaction history for follow-up questions - **Lifespan:** Single session - **Management:** Summarize or truncate older parts to avoid crowding - **Your specific needs:** [Agent customizes based on user's workflow] **Long-Term (Persistent) Memory:** - **What it stores:** User preferences, core constraints, operational glossary - **Lifespan:** Persistent across sessions - **Implementation:** Vector database (semantic retrieval) - **Two types:** - **Declarative Memory:** Facts (e.g., "We follow HIPAA regulations") - **Procedural Memory:** Behavioral patterns (e.g., "Always validate feasibility before usability") - **Your specific needs:** [Agent customizes] **Retrieval Strategy (Episodic Context):** - **What it retrieves:** Historical PRDs, user interviews, competitive analysis - **Method:** Semantic search triggered by query intent - **Optimization:** Contextual Retrieval (Anthropic) — prepend explanatory context to each chunk before embedding - **Your specific needs:** [Agent customizes] **Agent offers:** Would you like me to: 1. **Generate a Context Architecture Blueprint** for your specific use case? 2. **Provide implementation guidance** (tools, techniques, best practices)? 3. **Design a retrieval strategy** for your episodic context? **User response:** [Selection] --- ### Step 8: Implement Research → Plan → Reset → Implement Cycle **Agent asks:** Now let's implement the **Research → Plan → Reset → Implement** cycle to prevent context rot. **The Problem:** As agents research, context windows grow chaotic—filled with dead ends, errors, and noise. This dilutes attention and causes goal drift. **The Solution:** Compress research into a high-density plan, then **clear the context window** before implementing. **The Four-Phase Cycle:** **Phase 1: Research (Chaotic Context Allowed)** - Agent gathers data from multiple sources - Context window grows large and messy (this is expected) - Dead ends, failed hypotheses, and noise accumulate - **Goal:** Comprehensive information gathering **Phase 2: Plan (Synthesis)** - Agent synthesizes research into a high-density SPEC.md or PLAN.md - This becomes the **Source of Truth** for implementation - **Key elements:** - Decision made - Evidence supporting decision - Constraints applied - Next steps (sequenced) - **Format:** Structured, concise, unambiguous **Phase 3: Reset (Clear Context Window)** - **Critical step:** Clear the entire context window - Delete all research artifacts, dead ends, errors - This prevents context rot from poisoning implementation **Phase 4: Implement (Fresh Session with Plan Only)** - Start a new session with **only the high-density plan** as context - Agent has clean, focused attention on execution - No noise from research phase **Agent offers 3 options:** 1. **"I want a template for the PLAN.md format"** - Agent provides structured template for high-density plans 2. **"I want to see an example of this cycle in action"** - Agent walks through concrete PM use case (e.g., discovery planning) 3. **"I'm ready to implement this in my workflow"** - Agent provides step-by-step implementation guide **User response:** [Selection] **Agent provides:** Tailored guidance based on selection. --- ### Step 9: Action Plan & Next Steps **Agent synthesizes:** Based on your context engineering assessment, here's your action plan: **Immediate Fixes (This Week):** 1. [Delete context with no falsifiable failure mode from Q4] 2. [Apply Research→Plan→Reset→Implement to your next AI task] 3. [Document context boundary in Context Manifest] **Foundation Building (Next 2 Weeks):** 1. [Build constraints registry with 20+ entries] 2. [Create operational glossary with 20-30 key terms] 3. [Implement two-layer memory architecture] **Long-Term Optimization (Next Month):** 1. [Set up semantic retrieval for episodic context] 2. [Assign context boundary owner + quarterly audit schedule] 3. [Implement Contextual Retrieval (Anthropic) for RAG] **Success Metrics:** - Token usage down 50%+ (less context stuffing) - Output consistency up (less retry/regeneration) - Response quality up (sharper, less hedged answers) - Context window stable (no unbounded growth) **Agent offers:** Would you like me to: 1. **Generate specific implementation docs** (Context Manifest, PLAN.md template, etc.)? 2. **Provide advanced techniques** (Contextual Retrieval, LLM-powered ETL)? 3. **Review your current context setup** (provide feedback on specific prompts/workflows)? --- ## Examples ### Example 1: Solo PM Context Stuffing → Engineering **Context:** - Solo PM at early-stage startup - Using Claude Projects for PRD writing - Pasting entire PRDs (20 pages) + all user interviews (50 transcripts) every time - Getting vague, inconsistent responses **Assessment:** - Symptoms: Hedged responses, normalized retries (4+ symptoms) - Q1 (Decision): "I just want AI to understand my product" (no specific decision) - Q2 (Persist/Retrieve): Persisting everything (no retrieval strategy) - Q3 (Ownership): No formal owner (solo PM, ad-hoc) - Q4 (Failure): Can't identify concrete failures for most context - Q5 (Structure): Avoiding constraint documentation **Diagnosis:** Active Context Hoarding Disorder **Intervention:** 1. **Immediate:** Delete all context that fails Q4 test → keeps 20% of original 2. **Week 1:** Build constraints registry (10 technical constraints, 5 strategic) 3. **Week 2:** Create operational glossary (25 terms) 4. **Week 3:** Implement Research→Plan→Reset→Implement for next PRD **Outcome:** Token usage down 70%, output quality up significantly, responses crisp and actionable. --- ### Example 2: Growth-Stage Team with Agent Chains **Context:** - Product team with 5 PMs - Custom AI agents for discovery synthesis - Agent A (research) → Agent B (synthesis) → Agent C (recommendations) - Each agent passes full context to next → context window explodes to 100k+ tokens **Assessment:** - Symptoms: Escalating token costs, inconsistent outputs (3 symptoms) - Q1 (Decision): Each agent has clear decision, but passes unnecessary context - Q2 (Persist/Retrieve): Mixing persistent and episodic without strategy - Q3 (Ownership): No explicit owner; each PM adds context - Q4 (Failure): Agents pass "just in case" context with no falsifiable failure - Q5 (Structure): Missing Context Manifest **Diagnosis:** Agent orchestration without boundaries **Intervention:** 1. **Immediate:** Define bounded context per agent (Agent A outputs only 2-page synthesis to Agent B, not full research) 2. **Week 1:** Assign context boundary owner (Lead PM) 3. **Week 2:** Create Context Manifest (what persists, what's retrieved, what's excluded) 4. **Week 3:** Implement Research→Plan→Reset→Implement between Agent B and Agent C **Outcome:** Token usage down 60%, agent chain reliability up, costs reduced by 50%. --- ### Example 3: Enterprise with RAG but No Context Engineering **Context:** - Large enterprise with vector database RAG system - "Stuff the whole knowledge base" approach (10,000+ documents) - Retrieval returns 50+ chunks per query → floods context window - Accuracy declining as knowledge base grows **Assessment:** - Symptoms: Vague responses despite "complete knowledge," normalized retries (2 symptoms) - Q1 (Decision): Decisions clear, but retrieval has no intent (returns everything) - Q2 (Persist/Retrieve): Good instinct to retrieve, but no filtering - Q3 (Ownership): Engineering owns RAG, Product doesn't own context boundaries - Q4 (Failure): Can't identify why 50 chunks needed vs. 5 - Q5 (Structure): Knowledge base has no structure (flat documents, no metadata) **Diagnosis:** Retrieval without intent (RAG as context stuffing) **Intervention:** 1. **Immediate:** Limit retrieval to top 5 chunks per query (down from 50) 2. **Week 1:** Implement Contextual Retrieval (Anthropic) — prepend explanatory context to each chunk during indexing 3. **Week 2:** Add metadata to documents (category, recency, authority) 4. **Week 3:** Product team defines retrieval intent per query type (discovery = customer insights, feasibility = technical constraints) **Outcome:** Accuracy up 35% (from Anthropic benchmark), latency down 60%, token usage down 80%. --- ## Common Pitfalls ### 1. **"Infinite Context" Marketing vs. Engineering Reality** **Failure Mode:** Believing "1 million token context windows" means you should use all of them. **Consequence:** Reasoning Noise degrades performance; accuracy drops below 20% past ~32k tokens. **Fix:** Context windows are not free. Treat tokens as scarce; optimize for density, not volume. --- ### 2. **Retrying Instead of Restructuring** **Failure Mode:** "It works if I run it 3 times" → normalizing retries instead of fixing structure. **Consequence:** Wastes time and money; masks deeper context rot issues. **Fix:** If retries are common, your context structure is broken. Apply Q5 (fix structure, don't add volume). --- ### 3. **No Context Boundary Owner** **Failure Mode:** Ad-hoc, implicit context decisions → unbounded growth. **Consequence:** Six months later, every query stuffs 100k tokens per interaction. **Fix:** Assign explicit ownership; create Context Manifest; schedule quarterly audits. --- ### 4. **Mixing Always-Needed with Episodic** **Failure Mode:** Persisting historical data that should be retrieved on-demand. **Consequence:** Context window crowded with irrelevant information; attention diluted. **Fix:** Apply Q2 test: persist only what's needed in 80%+ of interactions; retrieve the rest. --- ### 5. **Skipping the Reset Phase** **Failure Mode:** Never clearing context window during Research→Plan→Implement cycle. **Consequence:** Context rot accumulates; goal drift; dead ends poison implementation. **Fix:** Mandatory Reset phase after Plan; start implementation with only high-density plan as context. --- ## References ### Related Skills - **[ai-shaped-readiness-advisor](../ai-shaped-readiness-advisor/SKILL.md)** (Interactive) — Context Design is Competency #1 of AI-shaped work - **[problem-statement](../problem-statement/SKILL.md)** (Component) — Evidence-based framing requires context engineering - **[epic-hypothesis](../epic-hypothesis/SKILL.md)** (Component) — Testable hypotheses depend on clear constraints (part of context) - **[pol-probe-advisor](../pol-probe-advisor/SKILL.md)** (Interactive) — Validation experiments benefit from context engineering (define what AI needs to know) ### External Frameworks - **Dean Peters** — [*Context Stuffing Is Not Context Engineering*](https://deanpeters.substack.com/p/context-stuffing-is-not-context-engineering) (Dean Peters' Substack, 2026) - **Teresa Torres** — *Continuous Discovery Habits* (Context Engineering as one of 5 new AI PM disciplines) - **Marty Cagan** — *Empowered* (Feasibility risk in AI era includes understanding "physics of AI") - **Anthropic** — [Contextual Retrieval whitepaper](https://www.anthropic.com/news/contextual-retrieval) (35% failure rate reduction) - **Google** — Context engineering whitepaper on LLM-powered memory systems ### Technical References - **RAG (Retrieval-Augmented Generation)** — Standard technique for episodic context retrieval - **Vector Databases** — Semantic search for long-term memory (Pinecone, Weaviate, Chroma) - **Contextual Retrieval (Anthropic)** — Prepend explanatory context to chunks before embedding - **LLM-as-Judge** — Automated evaluation of context quality