---
name: prompt-engineering
description: >
  Provides workflows to write, debug, and optimize prompts for LLMs, including
  few-shot example selection, chain-of-thought structuring, system prompt
  design, and template composition. Use when the user asks to write or improve
  a prompt, wants help with few-shot examples, chain-of-thought, system prompts,
  prompt templates, or asks how to get better results from an LLM.
allowed-tools: Read, Write, Edit, Glob, Grep, Bash
---

# Prompt Engineering

## Overview

Use this skill to design prompt systems that are clear, testable, and reusable.
It covers prompt drafting, optimization, evaluation, and production-oriented
patterns for few-shot prompting, reasoning workflows, templates, and system
prompts.

Keep the main workflow in this file and load the targeted reference files only
for the pattern you are applying.

## When to Use

Use this skill when:

- A user asks to write, rewrite, or improve a prompt
- A prompt needs better structure, reliability, or output formatting
- Few-shot examples or reasoning scaffolds are needed
- A system prompt or reusable prompt template must be created
- An existing prompt needs measurable optimization and testing

Read the relevant files in `references/` when you need deeper guidance on a
specific pattern.

## Core Patterns

### 1. Few-Shot Learning

#### Example Selection Strategy
- Use `references/few-shot-patterns.md` for comprehensive selection frameworks
- Balance example count (3-5 optimal) with context window limitations
- Include edge cases and boundary conditions in example sets
- Prioritize diverse examples that cover problem space variations
- Order examples from simple to complex for progressive learning

#### Few-Shot Example (Sentiment Classification)
```
Classify the sentiment as Positive, Negative, or Neutral.

Text: "I love this product! It exceeded my expectations."
Sentiment: Positive
Reasoning: Enthusiastic language, positive adjectives, satisfaction

Text: "The app keeps crashing when I upload large files."
Sentiment: Negative
Reasoning: Complaint about functionality, frustration indicator

Text: "It arrived on time, as described."
Sentiment: Neutral
Reasoning: Factual statement, no strong emotion either way

Text: "{user_input}"
Sentiment:
Reasoning:
```

### 2. Chain-of-Thought Reasoning

#### Implementation Patterns
- Reference `references/cot-patterns.md` for detailed reasoning frameworks
- Use "Let's think step by step" for zero-shot CoT initiation
- Provide complete reasoning traces for few-shot CoT demonstrations
- Implement self-consistency by sampling multiple reasoning paths
- Include verification and validation steps in reasoning chains

#### CoT Template Structure
```
Let's approach this step-by-step:

Step 1: {break_down_the_problem}
Analysis: {detailed_reasoning}

Step 2: {identify_key_components}
Analysis: {component_analysis}

Step 3: {synthesize_solution}
Analysis: {solution_justification}

Final Answer: {conclusion_with_confidence}
```

### 3. Prompt Optimization

#### Optimization Process
- Use `references/optimization-frameworks.md` for comprehensive optimization strategies
- Measure baseline performance before optimization attempts
- Implement single-variable changes for accurate attribution
- Track metrics: accuracy, consistency, latency, token efficiency
- Use statistical significance testing for A/B validation
- Document optimization iterations and their impacts

Track these metrics: accuracy, consistency, token efficiency, robustness, safety. See `references/optimization-frameworks.md` for measurement utilities.

### 4. Template Systems

#### Template Design Principles
- Reference `references/template-systems.md` for modular template frameworks
- Use clear variable naming conventions (e.g., `{user_input}`, `{context}`)
- Implement conditional sections for different scenario handling
- Design role-based templates for specific use cases
- Create hierarchical template composition patterns

#### Template Structure Example
```
# System Context
You are a {role} with {expertise_level} expertise in {domain}.

# Task Context
{if background_information}
Background: {background_information}
{endif}

# Instructions
{task_instructions}

# Examples
{example_count}

# Output Format
{output_specification}

# Input
{user_query}
```

### 5. System Prompt Design

#### System Prompt Components
- Use `references/system-prompt-design.md` for detailed design guidelines
- Define clear role specification and expertise boundaries
- Establish output format requirements and structural constraints
- Include safety guidelines and content policy adherence
- Set context for background information and domain knowledge

#### System Prompt Framework
```
You are an expert {role} specializing in {domain} with {experience_level} of experience.

## Core Capabilities
- List specific capabilities and expertise areas
- Define scope of knowledge and limitations

## Behavioral Guidelines
- Specify interaction style and communication approach
- Define error handling and uncertainty protocols
- Establish quality standards and verification requirements

## Output Requirements
- Specify format expectations and structural requirements
- Define content inclusion and exclusion criteria
- Establish consistency and validation requirements

## Safety and Ethics
- Include content policy adherence
- Specify bias mitigation requirements
- Define harm prevention protocols
```

## Implementation Workflows

### Workflow 1: Create New Prompt from Requirements

1. **Analyze Requirements**
   - Identify task complexity and reasoning requirements
   - Determine target model capabilities and limitations
   - Define success criteria and evaluation metrics
   - Assess need for few-shot learning or CoT reasoning

2. **Select Pattern Strategy**
   - Use few-shot learning for classification or transformation tasks
   - Apply CoT for complex reasoning or multi-step problems
   - Implement template systems for reusable prompt architecture
   - Design system prompts for consistent behavior requirements

3. **Draft Initial Prompt**
   - Structure prompt with clear sections and logical flow
   - Include relevant examples or reasoning demonstrations
   - Specify output format and quality requirements
   - Incorporate safety guidelines and constraints

4. **Validate and Test**
   - Test with at least 3 inputs: one happy path, one edge case, one adversarial
   - Measure accuracy and token usage against defined success criteria
   - Change one variable at a time, re-test, keep only what improves metrics
   - Document optimization decisions and their rationale

### Workflow 2: Optimize Existing Prompt

1. **Performance Analysis**
   - Measure current prompt performance metrics
   - Identify failure modes and error patterns
   - Analyze token efficiency and response latency
   - Assess consistency across multiple runs

2. **Optimization Strategy**
   - Apply systematic A/B testing with single-variable changes
   - Use few-shot learning to improve task adherence
   - Implement CoT reasoning for complex task components
   - Refine template structure for better clarity

3. **Implementation and Testing**
   - Re-run the same test cases from step 1 against the optimized prompt
   - If accuracy < baseline, revert the change and try a different hypothesis
   - If accuracy >= baseline but < 90%, return to step 2 with a new strategy
   - Document the winning change and its measured impact

### Workflow 3: Scale Prompt Systems

1. **Modular Architecture Design**
   - Decompose complex prompts into reusable components
   - Create template inheritance hierarchies
   - Implement dynamic example selection systems
   - Build automated quality assurance frameworks

2. **Production Integration**
   - Implement prompt versioning and rollback capabilities
   - Create performance monitoring and alerting systems
   - Build automated testing frameworks for prompt validation
   - Establish update and deployment workflows

## Quality Gates

- Accuracy >90% on 10+ diverse test cases before shipping
- <5% variance across 3+ repeated runs
- All edge cases and adversarial inputs handled gracefully
- Output format matches spec on every test case

## Best Practices

- Optimize one variable at a time so results stay attributable
- Keep prompts explicit about task, context, constraints, and output format
- Prefer a small number of strong examples over many repetitive ones
- Test prompts against happy-path, edge-case, and adversarial inputs
- Move long pattern details to `references/` instead of bloating `SKILL.md`

## Constraints and Warnings

- Do not assume longer prompts are better; extra detail often adds ambiguity
- Avoid exposing hidden reasoning requirements when a concise rationale is enough
- Validate prompts on representative inputs before claiming improvement
- Keep model-specific assumptions explicit because behavior varies across models

## Integration with Other Skills

This skill integrates seamlessly with:
- **langchain4j-ai-services-patterns**: Interface-based prompt design
- **langchain4j-rag-implementation-patterns**: Context-enhanced prompting
- **langchain4j-testing-strategies**: Prompt validation frameworks
- **unit-test-parameterized**: Systematic prompt testing approaches

## Resources and References

- `references/few-shot-patterns.md`: Comprehensive few-shot learning frameworks
- `references/cot-patterns.md`: Chain-of-thought reasoning patterns and examples
- `references/optimization-frameworks.md`: Systematic prompt optimization methodologies
- `references/template-systems.md`: Modular template design and implementation
- `references/system-prompt-design.md`: System prompt architecture and best practices

## Common Pitfalls and Solutions

| Pitfall | Fix |
|---|---|
| Wrong output format | Add a concrete output example at the end of the prompt |
| Inconsistent answers | Add 2-3 few-shot examples showing expected reasoning |
| Hallucination | Add "If unsure, say 'I don't know'" + constrain the answer domain |
| Too verbose | Add explicit word/sentence limit + "Be concise" instruction |
| Missed edge cases | Add an edge-case few-shot example |

## Constraints

- Test across target models — capabilities and token limits vary
- Keep few-shot examples to 3-5 to manage context usage
- Validate with domain-specific test cases before production