--- name: auto-claude-optimization description: Auto-Claude performance optimization and cost management. Use when optimizing token usage, reducing API costs, improving build speed, or tuning agent performance. version: 1.0.0 auto-claude-version: 2.7.2 --- # Auto-Claude Optimization Performance tuning, cost reduction, and efficiency improvements. ## Performance Overview ### Key Metrics | Metric | Impact | Optimization | |--------|--------|--------------| | API latency | Build speed | Model selection, caching | | Token usage | Cost | Prompt efficiency, context limits | | Memory queries | Speed | Embedding model, index tuning | | Build iterations | Time | Spec quality, QA settings | ## Model Optimization ### Model Selection | Model | Speed | Cost | Quality | Use Case | |-------|-------|------|---------|----------| | claude-opus-4-5-20251101 | Slow | High | Best | Complex features | | claude-sonnet-4-5-20250929 | Fast | Medium | Good | Standard features | ```bash # Override model in .env AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 ``` ### Extended Thinking Tokens Configure thinking budget per agent: | Agent | Default | Recommended | |-------|---------|-------------| | Spec creation | 16000 | Keep default for quality | | Planning | 5000 | Reduce to 3000 for speed | | Coding | 0 | Keep disabled | | QA Review | 10000 | Reduce to 5000 for speed | ```python # In agent configuration max_thinking_tokens=5000 # or None to disable ``` ## Token Optimization ### Reduce Context Size 1. **Smaller spec files** ```bash # Keep specs concise # Bad: 5000 word spec # Good: 500 word spec with clear criteria ``` 2. **Limit codebase scanning** ```python # In context/builder.py MAX_CONTEXT_FILES = 50 # Reduce from 100 ``` 3. **Use targeted searches** ```bash # Instead of full codebase scan # Focus on relevant directories ``` ### Efficient Prompts Optimize system prompts in `apps/backend/prompts/`: ```markdown You are an expert software developer who specializes in building high-quality, production-ready applications. You have extensive experience with many programming languages and frameworks... Expert full-stack developer. Build production-quality code. Follow existing patterns. Test thoroughly. ``` ### Memory Optimization ```bash # Use efficient embedding model OPENAI_EMBEDDING_MODEL=text-embedding-3-small # Or offline with smaller model OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384 ``` ## Speed Optimization ### Parallel Execution ```bash # Enable more parallel agents (default: 4) MAX_PARALLEL_AGENTS=8 ``` ### Reduce QA Iterations ```bash # Limit QA loop iterations MAX_QA_ITERATIONS=10 # Default: 50 # Skip QA for quick iterations python run.py --spec 001 --skip-qa ``` ### Faster Spec Creation ```bash # Force simple complexity for quick tasks python spec_runner.py --task "Fix typo" --complexity simple # Skip research phase SKIP_RESEARCH_PHASE=true python spec_runner.py --task "..." ``` ### API Timeout Tuning ```bash # Reduce timeout for faster failure detection API_TIMEOUT_MS=120000 # 2 minutes (default: 10 minutes) ``` ## Cost Management ### Monitor Token Usage ```bash # Enable cost tracking ENABLE_COST_TRACKING=true # View usage report python usage_report.py --spec 001 ``` ### Cost Reduction Strategies 1. **Use cheaper models for simple tasks** ```bash # For simple specs AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python spec_runner.py --task "..." ``` 2. **Limit context window** ```bash MAX_CONTEXT_TOKENS=50000 # Reduce from 100000 ``` 3. **Batch similar tasks** ```bash # Create specs together, run together python spec_runner.py --task "Add feature A" python spec_runner.py --task "Add feature B" python run.py --spec 001 python run.py --spec 002 ``` 4. **Use local models for memory** ```bash # Ollama for memory (free) GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama ``` ### Cost Estimation | Operation | Estimated Tokens | Cost (Opus) | Cost (Sonnet) | |-----------|-----------------|-------------|---------------| | Simple spec | 10k | ~$0.30 | ~$0.06 | | Standard spec | 50k | ~$1.50 | ~$0.30 | | Complex spec | 200k | ~$6.00 | ~$1.20 | | Build (simple) | 50k | ~$1.50 | ~$0.30 | | Build (standard) | 200k | ~$6.00 | ~$1.20 | | Build (complex) | 500k | ~$15.00 | ~$3.00 | ## Memory System Optimization ### Embedding Performance ```bash # Faster embeddings OPENAI_EMBEDDING_MODEL=text-embedding-3-small # 1536 dim, fast # Higher quality (slower) OPENAI_EMBEDDING_MODEL=text-embedding-3-large # 3072 dim # Offline (fastest, free) OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384 ``` ### Query Optimization ```python # Limit search results memory.search("query", limit=10) # Instead of 100 # Use semantic caching ENABLE_MEMORY_CACHE=true ``` ### Database Maintenance ```bash # Compact database periodically python -c "from integrations.graphiti.memory import compact_database; compact_database()" # Clear old episodes python query_memory.py --cleanup --older-than 30d ``` ## Build Efficiency ### Spec Quality = Build Speed High-quality specs reduce iterations: ```markdown # Good spec (fewer iterations) ## Acceptance Criteria - [ ] User can log in with email/password - [ ] Invalid credentials show error message - [ ] Successful login redirects to /dashboard - [ ] Session persists for 24 hours # Bad spec (more iterations) ## Acceptance Criteria - [ ] Login works ``` ### Subtask Granularity Optimal subtask size: - **Too large**: Agent gets stuck, needs recovery - **Too small**: Overhead per subtask - **Optimal**: 30-60 minutes of work each ### Parallel Work Let agents spawn subagents for parallel execution: ``` Main Coder ├── Subagent 1: Frontend (parallel) ├── Subagent 2: Backend (parallel) └── Subagent 3: Tests (parallel) ``` ## Environment Tuning ### Optimal .env Configuration ```bash # Performance-focused configuration AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 API_TIMEOUT_MS=180000 MAX_PARALLEL_AGENTS=6 # Memory optimization GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama OLLAMA_LLM_MODEL=llama3.2:3b OLLAMA_EMBEDDING_MODEL=all-minilm OLLAMA_EMBEDDING_DIM=384 # Reduce verbosity DEBUG=false ENABLE_FANCY_UI=false ``` ### Resource Limits ```bash # Limit Python memory export PYTHONMALLOC=malloc # Set max file descriptors ulimit -n 4096 ``` ## Benchmarking ### Measure Build Time ```bash # Time a build time python run.py --spec 001 # Compare models time AUTO_BUILD_MODEL=claude-opus-4-5-20251101 python run.py --spec 001 time AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 python run.py --spec 001 ``` ### Profile Memory Usage ```bash # Monitor memory watch -n 1 'ps aux | grep python | head -5' # Profile script python -m cProfile -o profile.stats run.py --spec 001 python -c "import pstats; p = pstats.Stats('profile.stats'); p.sort_stats('cumulative').print_stats(20)" ``` ## Quick Wins ### Immediate Optimizations 1. **Switch to Sonnet for most tasks** ```bash AUTO_BUILD_MODEL=claude-sonnet-4-5-20250929 ``` 2. **Use Ollama for memory** ```bash GRAPHITI_LLM_PROVIDER=ollama GRAPHITI_EMBEDDER_PROVIDER=ollama ``` 3. **Skip QA for prototypes** ```bash python run.py --spec 001 --skip-qa ``` 4. **Force simple complexity for small tasks** ```bash python spec_runner.py --task "..." --complexity simple ``` ### Medium-Term Improvements 1. Optimize prompts in `apps/backend/prompts/` 2. Configure project-specific security allowlist 3. Set up memory caching 4. Tune parallel agent count ### Long-Term Strategies 1. Self-hosted LLM for memory (Ollama) 2. Caching layer for common operations 3. Incremental context building 4. Project-specific prompt optimization ## Related Skills - **auto-claude-memory**: Memory configuration - **auto-claude-build**: Build process - **auto-claude-troubleshooting**: Debugging