--- name: letta-development-guide description: Comprehensive guide for developing Letta agents, including architecture selection, memory design, model selection, and tool configuration. Use when building or troubleshooting Letta agents. license: MIT --- # Letta Development Guide Comprehensive guide for designing and building effective Letta agents with appropriate architectures, memory configurations, model selection, and tool setups. ## When to Use This Skill Use this skill when: - Starting a new Letta agent project - Choosing between agent architectures (letta_v1_agent vs memgpt_v2_agent) - Designing memory block structure and architecture - Selecting appropriate models for your use case - Planning tool configurations - Optimizing memory management and performance - Implementing shared memory between agents - Debugging memory-related issues ## Quick Start Guide ### Minimal Working Example ```python from letta_client import Letta client = Letta() agent = client.agents.create( name="my-assistant", model="openai/gpt-4o", embedding="openai/text-embedding-3-small", memory_blocks=[ {"label": "persona", "value": "You are a helpful assistant."}, {"label": "human", "value": "The user's name and preferences."}, ], ) # Send a message response = client.agents.messages.create( agent_id=agent.id, messages=[{"role": "user", "content": "Hello!"}], ) print(response.messages[-1].content) ``` ### 1. Architecture Selection **Use letta_v1_agent when:** - Building new agents (recommended default) - Need compatibility with reasoning models (GPT-4o, Claude Sonnet 4) - Want simpler system prompts and direct message generation **Use memgpt_v2_agent when:** - Maintaining legacy agents - Require specific tool patterns not yet supported in v1 For detailed comparison, see `references/architectures.md`. ### 2. Memory Architecture Design Memory is the foundation of effective agents. Letta provides three memory types: **Core Memory (in-context):** - Always accessible in agent's context window - Use for: current state, active context, frequently referenced information - Limit: Keep total core memory under 80% of context window **Archival Memory (out-of-context):** - Semantic search over vector database - Use for: historical records, large knowledge bases, past interactions - Access: Agent must explicitly call archival_memory_search - Note: NOT automatically populated from context overflow **Conversation History:** - Past messages from current conversation - Retrieved via conversation_search tool - Use for: referencing earlier discussion, tracking conversation flow See `references/memory-architecture.md` for detailed guidance. ### 3. Memory Block Design **Core principle:** One block per distinct functional unit. **Essential blocks:** - `persona`: Agent identity, behavioral guidelines, capabilities - `human`: User information, preferences, context **Add domain-specific blocks based on use case:** - Customer support: `company_policies`, `product_knowledge`, `customer` - Coding assistant: `project_context`, `coding_standards`, `current_task` - Personal assistant: `schedule`, `preferences`, `contacts` **Memory block guidelines:** - Keep blocks focused and purpose-specific - Use clear, instructional descriptions - Monitor size limits (typically 2000-5000 characters per block) - Design for append operations when sharing memory between agents See `references/memory-patterns.md` for domain examples and `references/description-patterns.md` for writing effective descriptions. ### 4. Model Selection Match model capabilities to agent requirements: **For production agents:** - GPT-4o or Claude Sonnet 4 for complex reasoning - GPT-4o-mini for cost-efficient general tasks - Claude Haiku 3.5 for fast, lightweight operations - Gemini 2.0 Flash for balanced speed/capability **Avoid for production:** - Small Ollama models (<7B parameters) - poor tool calling - Models without reliable function calling support See `references/model-recommendations.md` for detailed guidance. ### 5. Tool Configuration **Start minimal:** Attach only tools the agent will actively use. **Common starting points:** - **Memory tools** (memory_insert, memory_replace, memory_rethink): Core for most agents - **File system tools**: Auto-attached when folders are connected - **Custom tools**: For domain-specific operations (databases, APIs, etc.) **Tool Rules:** Use to enforce sequencing when needed (e.g., "always call search before answer") Consult `references/tool-patterns.md` for common configurations. ## Advanced Topics ### Memory Size Management **When approaching character limits:** 1. **Split by topic:** `customer_profile` → `customer_business`, `customer_preferences` 2. **Split by time:** `interaction_history` → `recent_interactions`, archive older to archival memory 3. **Archive historical data:** Move old information to archival memory 4. **Consolidate with memory_rethink:** Summarize and rewrite block See `references/size-management.md` for strategies. ### Concurrency Patterns When multiple agents share memory blocks or an agent processes concurrent requests: **Safest operations:** - `memory_insert`: Append-only, minimal race conditions - Database uses PostgreSQL row-level locking **Risk of race conditions:** - `memory_replace`: Target string may change before write - `memory_rethink`: Last-writer-wins, no merge **Best practices:** - Design for append operations when possible - Use memory_insert for concurrent writes - Reserve memory_rethink for single-agent exclusive access Consult `references/concurrency.md` for detailed patterns. ## Validation Checklist Before finalizing your agent design: **Architecture:** - [ ] Does the architecture match the model's capabilities? - [ ] Is the model appropriate for expected workload and latency requirements? **Memory:** - [ ] Is core memory total under 80% of context window? - [ ] Is each block focused on one functional area? - [ ] Are descriptions clear about when to read/write? - [ ] Have you planned for size growth and overflow? - [ ] If multi-agent, are concurrency patterns considered? **Tools:** - [ ] Are tools necessary and properly configured? - [ ] Are memory blocks granular enough for effective updates? ## Common Antipatterns **Too few memory blocks:** ```yaml # Bad: Everything in one block agent_memory: "Agent is helpful. User is John..." ``` Split into focused blocks instead. **Too many memory blocks:** Creating 10+ blocks when 3-4 would suffice. Start minimal, expand as needed. **Poor descriptions:** ```yaml # Bad data: "Contains data" ``` Provide actionable guidance instead. See `references/description-patterns.md`. **Ignoring size limits:** Letting blocks grow indefinitely until they hit limits. Monitor and manage proactively. ## Implementation Steps ### 1. Design Phase - Choose architecture based on requirements - Design memory block structure - Select appropriate model - Plan tool configuration ### 2. Creation Phase (SDK) **Python:** ```python from letta_client import Letta client = Letta() # Uses LETTA_API_KEY env var # Create agent with custom memory blocks agent = client.agents.create( name="my-agent", model="openai/gpt-4o", # or "anthropic/claude-sonnet-4-20250514" embedding="openai/text-embedding-3-small", memory_blocks=[ {"label": "persona", "value": "You are a helpful assistant..."}, {"label": "human", "value": "User preferences and context..."}, {"label": "project", "value": "Current project details..."}, ], description="Agent for helping with X", ) print(f"Created agent: {agent.id}") ``` **TypeScript:** ```typescript import Letta from "letta-client"; const client = new Letta(); const agent = await client.agents.create({ name: "my-agent", model: "openai/gpt-4o", embedding: "openai/text-embedding-3-small", memoryBlocks: [ { label: "persona", value: "You are a helpful assistant..." }, { label: "human", value: "User preferences and context..." }, { label: "project", value: "Current project details..." }, ], description: "Agent for helping with X", }); console.log(`Created agent: ${agent.id}`); ``` **Note:** Letta Code CLI (`letta` command) creates agents interactively. Use `letta --new-agent` to start fresh, then `/rename` and `/description` to configure. ### 3. Testing Phase - Test with representative queries - Monitor memory tool usage patterns - Verify tool calling behavior ### 4. Iteration Phase - Refine memory block structure based on actual usage - Optimize system instructions - Adjust tool configurations ## References For detailed information on specific topics, consult the reference materials: - `references/architectures.md` - Architecture comparison and selection - `references/memory-architecture.md` - Memory types and when to use them - `references/memory-patterns.md` - Domain-specific memory block examples - `references/description-patterns.md` - Writing effective block descriptions - `references/size-management.md` - Managing memory block size limits - `references/concurrency.md` - Multi-agent memory sharing patterns - `references/model-recommendations.md` - Model selection guidance - `references/tool-patterns.md` - Common tool configurations