---
name: prompt-assemble
description: "Token-safe prompt assembly with memory orchestration. Use for any agent that needs to construct LLM prompts with memory retrieval. Guarantees no API failure due to token overflow. Implements two-phase context construction, memory safety valve, and hard limits on memory injection."
---

# Prompt Assemble

## Overview

A standardized, token-safe prompt assembly framework that guarantees API stability. Implements **Two-Phase Context Construction** and **Memory Safety Valve** to prevent token overflow while maximizing relevant context.

**Design Goals:**
- ✅ Never fail due to memory-related token overflow
- ✅ Memory is always discardable enhancement, never rigid dependency
- ✅ Token budget decisions centralized at prompt assemble layer

## When to Use

Use this skill when:
1. Building or modifying any agent that constructs prompts
2. Implementing memory retrieval systems
3. Adding new prompt-related logic to existing agents
4. Any scenario where token budget safety is required

## Core Workflow

```
User Input
    ↓
Need-Memory Decision
    ↓
Minimal Context Build
    ↓
Memory Retrieval (Optional)
    ↓
Memory Summarization
    ↓
Token Estimation
    ↓
Safety Valve Decision
    ↓
Final Prompt → LLM Call
```

## Phase Details

### Phase 0: Base Configuration
```python
# Model Context Windows (2026-02-04)
# - MiniMax-M2.1: 204,000 tokens (default)
# - Claude 3.5 Sonnet: 200,000 tokens
# - GPT-4o: 128,000 tokens

MAX_TOKENS = 204000  # Set to your model's context limit
SAFETY_MARGIN = 0.75 * MAX_TOKENS  # Conservative: 75% threshold = 153,000 tokens
MEMORY_TOP_K = 3                     # Max 3 memories
MEMORY_SUMMARY_MAX = 3 lines        # Max 3 lines per memory
```

**Design Philosophy**:
- Leave 25% buffer for safety (model overhead, estimation errors, spikes)
- Better to underutilize capacity than to overflow

### Phase 1: Minimal Context
- System prompt
- Recent N messages (N=3, trimmed)
- Current user input
- **No memory by default**

### Phase 2: Memory Need Decision
```python
def need_memory(user_input):
    triggers = [
        "previously",
        "earlier we discussed",
        "do you remember",
        "as I mentioned before",
        "continuing from",
        "before we",
        "last time",
        "previously mentioned"
    ]
    for trigger in triggers:
        if trigger.lower() in user_input.lower():
            return True
    return False
```

### Phase 3: Memory Retrieval (Optional)
```python
memories = memory_search(query=user_input, top_k=MEMORY_TOP_K)
for mem in memories:
    summarized_memories.append(summarize(mem, max_lines=MEMORY_SUMMARY_MAX))
```

### Phase 4: Token Estimation
Calculate estimated tokens for base_context + summarized_memories.

### Phase 5: Safety Valve (Critical)
```python
if estimated_tokens > SAFETY_MARGIN:
    base_context.append("[System Notice] Relevant memory skipped due to token budget.")
    return assemble(base_context)
```

**Hard Rules:**
- ❌ Never downgrade system prompt
- ❌ Never truncate user input
- ❌ No "lucky splicing"
- ✅ Only memory layer is expendable

### Phase 6: Final Assembly
```python
final_prompt = assemble(base_context + summarized_memories)
return final_prompt
```

## Memory Data Standards

### Allowed in Long-Term Memory
- ✅ User preferences / identity / long-term goals
- ✅ Confirmed important conclusions
- ✅ System-level settings and rules

### Forbidden in Long-Term Memory
- ❌ Raw conversation logs
- ❌ Reasoning traces
- ❌ Temporary discussions
- ❌ Information recoverable from chat history

## Quick Start

Copy `scripts/prompt_assemble.py` to your agent and use:

```python
from prompt_assemble import build_prompt

# In your agent's prompt construction:
final_prompt = build_prompt(user_input, memory_search_fn, get_recent_dialog_fn)
```

## Resources

### scripts/
- `prompt_assemble.py` - Complete implementation with all phases (PromptAssembler class)

### references/
- `memory_standards.md` - Detailed memory content guidelines
- `token_estimation.md` - Token counting strategies