---
name: dspy-expert
description: Builds and optimizes DSPy (dspy) programs end-to-end: signatures, modules, compilation/optimization, evaluation, and debugging. Use when the user mentions dspy/DSPy, Signature, Module, teleprompter/optimizer, compile, evaluate, few-shot, RAG, tool use, or local LLM endpoints (Ollama/vLLM/LM Studio).
---

# DSPy Expert

## Operating principles

- Default to **simple DSPy programs**: one clear `Signature`, one/few modules, tight eval loop.
- Prefer **measurable iteration**: dataset → metric → compile/optimize → evaluate → inspect failures → repeat.
- Assume **local OpenAI-compatible endpoints** by default (Ollama/vLLM/LM Studio). If not available, adapt.
- If an API detail is uncertain, **check the installed DSPy version** and confirm via docs/examples before coding.

## Quick workflow (copy/paste checklist)

### 1) Frame the task
- What is the **input**? What is the **output**?
- What are 10–200 **representative examples** (or how do we generate them)?
- What metric defines “good”? (exact match, F1, rubric judge, retrieval hit-rate, latency, cost)

### 2) Create a minimal baseline
- Define a `Signature` with the smallest useful fields.
- Implement the simplest module that can work (often `Predict`, `ChainOfThought`, or a tiny custom `Module`).
- Add deterministic pre/post-processing outside DSPy when helpful (parsing, normalization, schema validation).

### 3) Build an evaluation harness
- Create `train` / `dev` / `test` splits (even if small).
- Implement `metric(example, pred) -> float/bool`.
- Run baseline; save failure cases (inputs + model outputs + expected).

### 4) Compile/optimize
- Choose one optimizer/teleprompter and a small search budget first.
- Compile on `train`, select by `dev`, report final on `test`.
- Keep prompts/program changes attributable (one change at a time; log configs and seeds when possible).

### 5) Debug systematically
- Classify errors: schema/formatting, missing context, wrong reasoning, hallucination, retrieval, tool failures.
- Add constraints: structured outputs, validation + retry, better instructions, or tighter signatures.
- Only scale complexity (multi-stage, RAG, tools) after the baseline is measurable.

## Local LLM defaults (OpenAI-compatible)

Use a local OpenAI-compatible base URL when available. Prefer configuring via **environment variables** or a single “LM factory” in code.

Minimal pattern (adjust to your DSPy version):

```python
import os
import dspy

# Example OpenAI-compatible local endpoint (adjust as needed)
os.environ.setdefault("OPENAI_API_BASE", "http://localhost:11434/v1")
os.environ.setdefault("OPENAI_API_KEY", "ollama")  # placeholder for local gateways

# Model name depends on your gateway (e.g., "llama3.1", "qwen2.5", etc.)
lm = dspy.LM(model=os.environ.get("DSPY_MODEL", "qwen3:latest"))
dspy.settings.configure(lm=lm)
```

If the repo already has a working local-LLM helper, **reuse it** instead of re-inventing configuration.

## DSPy patterns (keep it simple)

### Classification / extraction
- Use a `Signature` with explicit output fields (and constraints like allowed labels).
- Add lightweight normalization (strip, lowercase, JSON parsing) and validate outputs.

### RAG (retrieval-augmented generation)
- Start with: retrieve top-k → single generate step referencing retrieved passages.
- Evaluate retrieval separately (recall@k) vs generation quality.

### Tool use
- Keep tool schema strict (inputs/outputs), validate tool results, and handle retries/timeouts.
- Prefer separating: “decide tool call” → “execute” → “final answer”.

## When asked to “learn DSPy and build X”

Follow this order:
1. Inspect the repo’s current DSPy usage (existing modules, eval scripts, LM config).
2. Identify the installed DSPy version (from `pyproject.toml`, lockfile, or import behavior).
3. Build the smallest working baseline and an eval harness.
4. Only then introduce compilation/optimization and extra components (retrieval, tools, multi-step).