--- name: prompt-engineering-suite description: Comprehensive prompt engineering with Chain-of-Thought, few-shot learning, prompt versioning, and optimization. Use when designing prompts, improving accuracy, managing prompt lifecycle. version: 1.0.0 tags: [prompts, cot, few-shot, versioning, optimization, langfuse, dspy, 2026] context: fork agent: prompt-engineer author: OrchestKit user-invocable: false --- # Prompt Engineering Suite Design, version, and optimize prompts for production LLM applications. ## Overview - Designing prompts for new LLM features - Improving accuracy with Chain-of-Thought reasoning - Few-shot learning with example selection - Managing prompts in production (versioning, A/B testing) - Automatic prompt optimization with DSPy ## Quick Reference ### Chain-of-Thought Pattern ```python from langchain_core.prompts import ChatPromptTemplate COT_SYSTEM = """You are a helpful assistant that solves problems step-by-step. When solving problems: 1. Break down the problem into clear steps 2. Show your reasoning for each step 3. Verify your answer before responding 4. If uncertain, acknowledge limitations Format your response as: STEP 1: [description] Reasoning: [your thought process] STEP 2: [description] Reasoning: [your thought process] ... FINAL ANSWER: [your conclusion]""" cot_prompt = ChatPromptTemplate.from_messages([ ("system", COT_SYSTEM), ("human", "Problem: {problem}\n\nThink through this step-by-step."), ]) ``` ### Few-Shot with Dynamic Examples ```python from langchain_core.prompts import FewShotChatMessagePromptTemplate examples = [ {"input": "What is 2+2?", "output": "4"}, {"input": "What is the capital of France?", "output": "Paris"}, ] few_shot = FewShotChatMessagePromptTemplate( examples=examples, example_prompt=ChatPromptTemplate.from_messages([ ("human", "{input}"), ("ai", "{output}"), ]), ) final_prompt = ChatPromptTemplate.from_messages([ ("system", "You are a helpful assistant. Answer concisely."), few_shot, ("human", "{input}"), ]) ``` ### Prompt Versioning with Langfuse SDK v3 ```python from langfuse import Langfuse # Note: Langfuse SDK v3 is OTEL-native (acquired by ClickHouse Jan 2026) langfuse = Langfuse() # Get versioned prompt with label prompt = langfuse.get_prompt( name="customer-support-v2", label="production", # production, staging, canary cache_ttl_seconds=300, ) # Compile with variables compiled = prompt.compile( customer_name="John", issue="billing question" ) ``` ### DSPy 3.1.0 Automatic Optimization ```python import dspy class OptimizedQA(dspy.Module): def __init__(self): self.generate = dspy.Predict("question -> answer") def forward(self, question): return self.generate(question=question) # Optimize with MIPROv2 (recommended) or BootstrapFewShot optimizer = dspy.MIPROv2(metric=answer_match) # Data+demo-aware Bayesian optimization optimized = optimizer.compile(OptimizedQA(), trainset=examples) # Alternative: GEPA (July 2025) - Reflective Prompt Evolution # Uses model introspection to analyze failures and propose better prompts ``` ## Pattern Selection Guide | Pattern | When to Use | Example Use Case | |---------|-------------|------------------| | Zero-shot | Simple, well-defined tasks | Classification, extraction | | Few-shot | Complex tasks needing examples | Format conversion, style matching | | CoT | Reasoning, math, logic | Problem solving, analysis | | Zero-shot CoT | Quick reasoning boost | Add "Let's think step by step" | | ReAct | Tool use, multi-step | Agent tasks, API calls | | Structured | JSON/schema output | Data extraction, API responses | ## Key Decisions | Decision | Recommendation | |----------|----------------| | Few-shot examples | 3-5 diverse, representative examples | | Example ordering | Most similar examples last (recency bias) | | CoT trigger | "Let's think step by step" or explicit format | | Prompt versioning | Langfuse with labels (production/staging) | | A/B testing | 50+ samples, track via trace metadata | | Auto-optimization | DSPy BootstrapFewShot for few-shot tuning | ## Anti-Patterns (FORBIDDEN) ```python # NEVER hardcode prompts without versioning PROMPT = "You are a helpful assistant..." # No version control! # NEVER use single example for few-shot examples = [{"input": "x", "output": "y"}] # Too few! # NEVER skip CoT for complex reasoning response = llm.complete("Solve: 15% of 240") # No reasoning! # ALWAYS version prompts prompt = langfuse.get_prompt("assistant", label="production") # ALWAYS use 3-5 diverse examples examples = [ex1, ex2, ex3, ex4, ex5] # ALWAYS use CoT for math/logic response = llm.complete("Solve: 15% of 240. Think step by step.") ``` ## Detailed Documentation | Resource | Description | |----------|-------------| | [references/chain-of-thought.md](references/chain-of-thought.md) | CoT patterns, zero-shot CoT, self-consistency | | [references/few-shot-patterns.md](references/few-shot-patterns.md) | Example selection, ordering, formatting | | [references/prompt-versioning.md](references/prompt-versioning.md) | Langfuse integration, A/B testing | | [references/prompt-optimization.md](references/prompt-optimization.md) | DSPy, automatic tuning, evaluation | | [scripts/cot-template.py](scripts/cot-template.py) | Full Chain-of-Thought implementation | | [scripts/few-shot-template.py](scripts/few-shot-template.py) | Few-shot with dynamic example selection | | [scripts/jinja2-prompts.py](scripts/jinja2-prompts.py) | Jinja2 templates (2026): async, caching, LLM filters, Anthropic format | ## Related Skills - `langfuse-observability` - Prompt management and A/B testing tracking - `llm-evaluation` - Evaluating prompt effectiveness - `function-calling` - Structured output patterns - `llm-testing` - Testing prompt variations ## Capability Details ### chain-of-thought **Keywords:** CoT, step by step, reasoning, think, chain of thought **Solves:** - Improve accuracy on complex reasoning tasks - Debug LLM reasoning process - Implement self-consistency with multiple CoT paths ### few-shot-learning **Keywords:** few-shot, examples, in-context learning, demonstrations **Solves:** - Format LLM output with examples - Handle complex tasks without fine-tuning - Select optimal examples for task ### prompt-versioning **Keywords:** version, prompt management, A/B test, production prompt **Solves:** - Manage prompts in production - A/B test prompt variations - Roll back to previous versions ### prompt-optimization **Keywords:** DSPy, optimize, tune, automatic prompt, OPRO **Solves:** - Automatically optimize prompts - Find best few-shot examples - Improve accuracy without manual tuning ### zero-shot-cot **Keywords:** zero-shot CoT, think step by step, reasoning trigger **Solves:** - Quick reasoning boost without examples - Add "Let's think step by step" trigger - Improve accuracy on math/logic ### self-consistency **Keywords:** self-consistency, multiple paths, voting, ensemble **Solves:** - Generate multiple reasoning paths - Vote on most common answer - Improve reliability on hard problems