--- name: prompt-assemble description: "Token-safe prompt assembly with memory orchestration. Use for any agent that needs to construct LLM prompts with memory retrieval. Guarantees no API failure due to token overflow. Implements two-phase context construction, memory safety valve, and hard limits on memory injection." --- # Prompt Assemble ## Overview A standardized, token-safe prompt assembly framework that guarantees API stability. Implements **Two-Phase Context Construction** and **Memory Safety Valve** to prevent token overflow while maximizing relevant context. **Design Goals:** - ✅ Never fail due to memory-related token overflow - ✅ Memory is always discardable enhancement, never rigid dependency - ✅ Token budget decisions centralized at prompt assemble layer ## When to Use Use this skill when: 1. Building or modifying any agent that constructs prompts 2. Implementing memory retrieval systems 3. Adding new prompt-related logic to existing agents 4. Any scenario where token budget safety is required ## Core Workflow ``` User Input ↓ Need-Memory Decision ↓ Minimal Context Build ↓ Memory Retrieval (Optional) ↓ Memory Summarization ↓ Token Estimation ↓ Safety Valve Decision ↓ Final Prompt → LLM Call ``` ## Phase Details ### Phase 0: Base Configuration ```python # Model Context Windows (2026-02-04) # - MiniMax-M2.1: 204,000 tokens (default) # - Claude 3.5 Sonnet: 200,000 tokens # - GPT-4o: 128,000 tokens MAX_TOKENS = 204000 # Set to your model's context limit SAFETY_MARGIN = 0.75 * MAX_TOKENS # Conservative: 75% threshold = 153,000 tokens MEMORY_TOP_K = 3 # Max 3 memories MEMORY_SUMMARY_MAX = 3 lines # Max 3 lines per memory ``` **Design Philosophy**: - Leave 25% buffer for safety (model overhead, estimation errors, spikes) - Better to underutilize capacity than to overflow ### Phase 1: Minimal Context - System prompt - Recent N messages (N=3, trimmed) - Current user input - **No memory by default** ### Phase 2: Memory Need Decision ```python def need_memory(user_input): triggers = [ "previously", "earlier we discussed", "do you remember", "as I mentioned before", "continuing from", "before we", "last time", "previously mentioned" ] for trigger in triggers: if trigger.lower() in user_input.lower(): return True return False ``` ### Phase 3: Memory Retrieval (Optional) ```python memories = memory_search(query=user_input, top_k=MEMORY_TOP_K) for mem in memories: summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX)) ``` ### Phase 4: Token Estimation Calculate estimated tokens for base_context + summarized_memories. ### Phase 5: Safety Valve (Critical) ```python if estimated_tokens > SAFETY_MARGIN: base_context.append("[System Notice] Relevant memory skipped due to token budget.") return assemble(base_context) ``` **Hard Rules:** - ❌ Never downgrade system prompt - ❌ Never truncate user input - ❌ No "lucky splicing" - ✅ Only memory layer is expendable ### Phase 6: Final Assembly ```python final_prompt = assemble(base_context + summarized_memories) return final_prompt ``` ## Memory Data Standards ### Allowed in Long-Term Memory - ✅ User preferences / identity / long-term goals - ✅ Confirmed important conclusions - ✅ System-level settings and rules ### Forbidden in Long-Term Memory - ❌ Raw conversation logs - ❌ Reasoning traces - ❌ Temporary discussions - ❌ Information recoverable from chat history ## Quick Start Copy `scripts/prompt_assemble.py` to your agent and use: ```python from prompt_assemble import build_prompt # In your agent's prompt construction: final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn) ``` ## Resources ### scripts/ - `prompt_assemble.py` - Complete implementation with all phases (PromptAssembler class) ### references/ - `memory_standards.md` - Detailed memory content guidelines - `token_estimation.md` - Token counting strategies