--- name: clade-architecture-variants description: "Build different types of Claude-powered applications \u2014 chatbots,\ \ RAG systems,\nUse when working with architecture-variants patterns.\nagents, content\ \ pipelines, and code generation tools.\nTrigger with \"claude architecture\", \"\ anthropic rag\", \"build with claude\",\n\"claude agent pattern\", \"anthropic app\ \ design\".\n" allowed-tools: Read, Write, Edit version: 1.0.0 license: MIT author: Jeremy Longshore tags: - saas - anthropic - claude - architecture - rag - agents compatibility: Designed for Claude Code --- # Claude Architecture Variants ## Overview Five architecture patterns for Claude-powered applications: Chatbot (stateless API wrapper), RAG (retrieval-augmented generation with vector search), Agent (tool use loop), Content Pipeline (batch processing), and Evaluation (using Claude as a judge). Each includes complete code and a comparison table. ## 1. Chatbot (Stateless API Wrapper) Simplest pattern — proxy Claude with a system prompt. ```typescript // api/chat.ts export async function POST(req: Request) { const { messages } = await req.json(); const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, system: 'You are a helpful assistant for our SaaS product.', messages, stream: true, }); return new Response(response.toReadableStream()); } ``` **Best for:** Customer support, Q&A, simple conversational interfaces. ## 2. RAG (Retrieval-Augmented Generation) Fetch relevant context, inject into prompt, generate grounded answer. ```typescript async function ragQuery(question: string) { // 1. Embed the question (use Voyage, OpenAI, or Cohere — not Anthropic) const embedding = await embeddingClient.embed(question); // 2. Search vector DB for relevant chunks const chunks = await vectorDb.query(embedding, { topK: 5 }); // 3. Send to Claude with context const message = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 2048, system: `Answer based on the provided context. If the context doesn't contain the answer, say so.`, messages: [{ role: 'user', content: `Context:\n${chunks.map(c => c.text).join('\n---\n')}\n\nQuestion: ${question}`, }], }); return message.content[0].text; } ``` **Best for:** Documentation Q&A, knowledge bases, support with source citations. ## 3. Agent (Tool Use Loop) Claude decides which tools to call, you execute them, loop until done. ```typescript async function agentLoop(userInput: string, tools: Anthropic.Tool[]) { let messages: MessageParam[] = [{ role: 'user', content: userInput }]; while (true) { const response = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 4096, tools, messages, }); messages.push({ role: 'assistant', content: response.content }); if (response.stop_reason === 'end_turn') { return response.content.find(b => b.type === 'text')?.text; } // Execute tools const results = []; for (const block of response.content) { if (block.type === 'tool_use') { const result = await executeTool(block.name, block.input); results.push({ type: 'tool_result', tool_use_id: block.id, content: JSON.stringify(result) }); } } messages.push({ role: 'user', content: results }); } } ``` **Best for:** Data analysis, code generation, multi-step workflows. ## 4. Content Pipeline (Batch Processing) Process thousands of documents through Claude asynchronously. ```typescript const batch = await client.messages.batches.create({ requests: documents.map((doc, i) => ({ custom_id: doc.id, params: { model: 'claude-haiku-4-5-20251001', // Cheap for bulk max_tokens: 512, messages: [{ role: 'user', content: `Extract entities: ${doc.text}` }], }, })), }); // 50% cheaper, processes within 24h ``` **Best for:** Summarization, classification, extraction at scale. ## 5. Evaluation / Grading Use Claude to evaluate other AI outputs or human content. ```typescript const evaluation = await client.messages.create({ model: 'claude-opus-4-20250514', // Best judgment max_tokens: 1024, system: `You are an expert evaluator. Score the response 1-5 on accuracy, relevance, and completeness. Return JSON: { "accuracy": N, "relevance": N, "completeness": N, "reasoning": "..." }`, messages: [{ role: 'user', content: `Question: ${question}\nResponse to evaluate: ${candidateResponse}`, }], }); ``` **Best for:** AI output quality, content moderation, automated grading. ## Choosing a Pattern | Pattern | Latency | Cost | Complexity | |---------|---------|------|------------| | Chatbot | Low (streaming) | Low | Simple | | RAG | Medium (embed + search + generate) | Medium | Medium | | Agent | High (multi-turn) | High | Complex | | Pipeline | High (async batch) | Low (50% off) | Simple | | Evaluation | Medium | Varies | Simple | ## Output - Architecture pattern selected based on requirements - Implementation code for chosen pattern - Cost and latency characteristics understood - Scaling strategy identified (streaming for chatbots, batches for pipelines) ## Error Handling | Error | Cause | Solution | |-------|-------|----------| | API Error | Check error type and status code | See `clade-common-errors` | ## Examples See five numbered pattern sections with complete TypeScript code, and the Choosing a Pattern comparison table with latency, cost, and complexity ratings. ## Resources - [Tool Use Guide](https://docs.anthropic.com/en/docs/build-with-claude/tool-use) - [Prompt Engineering](https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering) ## Next Steps See `clade-known-pitfalls` for common mistakes. ## Prerequisites - Completed `clade-install-auth` and `clade-model-inference` - Understanding of your use case requirements (latency, cost, complexity) - For RAG: vector database and embedding model (Voyage, OpenAI, or Cohere) ## Instructions ### Step 1: Review the patterns below Each section contains production-ready code examples. Copy and adapt them to your use case. ### Step 2: Apply to your codebase Integrate the patterns that match your requirements. Test each change individually. ### Step 3: Verify Run your test suite to confirm the integration works correctly.