--- name: dspy-expert description: Builds and optimizes DSPy (dspy) programs end-to-end: signatures, modules, compilation/optimization, evaluation, and debugging. Use when the user mentions dspy/DSPy, Signature, Module, teleprompter/optimizer, compile, evaluate, few-shot, RAG, tool use, or local LLM endpoints (Ollama/vLLM/LM Studio). --- # DSPy Expert ## Operating principles - Default to **simple DSPy programs**: one clear `Signature`, one/few modules, tight eval loop. - Prefer **measurable iteration**: dataset → metric → compile/optimize → evaluate → inspect failures → repeat. - Assume **local OpenAI-compatible endpoints** by default (Ollama/vLLM/LM Studio). If not available, adapt. - If an API detail is uncertain, **check the installed DSPy version** and confirm via docs/examples before coding. ## Quick workflow (copy/paste checklist) ### 1) Frame the task - What is the **input**? What is the **output**? - What are 10–200 **representative examples** (or how do we generate them)? - What metric defines “good”? (exact match, F1, rubric judge, retrieval hit-rate, latency, cost) ### 2) Create a minimal baseline - Define a `Signature` with the smallest useful fields. - Implement the simplest module that can work (often `Predict`, `ChainOfThought`, or a tiny custom `Module`). - Add deterministic pre/post-processing outside DSPy when helpful (parsing, normalization, schema validation). ### 3) Build an evaluation harness - Create `train` / `dev` / `test` splits (even if small). - Implement `metric(example, pred) -> float/bool`. - Run baseline; save failure cases (inputs + model outputs + expected). ### 4) Compile/optimize - Choose one optimizer/teleprompter and a small search budget first. - Compile on `train`, select by `dev`, report final on `test`. - Keep prompts/program changes attributable (one change at a time; log configs and seeds when possible). ### 5) Debug systematically - Classify errors: schema/formatting, missing context, wrong reasoning, hallucination, retrieval, tool failures. - Add constraints: structured outputs, validation + retry, better instructions, or tighter signatures. - Only scale complexity (multi-stage, RAG, tools) after the baseline is measurable. ## Local LLM defaults (OpenAI-compatible) Use a local OpenAI-compatible base URL when available. Prefer configuring via **environment variables** or a single “LM factory” in code. Minimal pattern (adjust to your DSPy version): ```python import os import dspy # Example OpenAI-compatible local endpoint (adjust as needed) os.environ.setdefault("OPENAI_API_BASE", "http://localhost:11434/v1") os.environ.setdefault("OPENAI_API_KEY", "ollama") # placeholder for local gateways # Model name depends on your gateway (e.g., "llama3.1", "qwen2.5", etc.) lm = dspy.LM(model=os.environ.get("DSPY_MODEL", "qwen3:latest")) dspy.settings.configure(lm=lm) ``` If the repo already has a working local-LLM helper, **reuse it** instead of re-inventing configuration. ## DSPy patterns (keep it simple) ### Classification / extraction - Use a `Signature` with explicit output fields (and constraints like allowed labels). - Add lightweight normalization (strip, lowercase, JSON parsing) and validate outputs. ### RAG (retrieval-augmented generation) - Start with: retrieve top-k → single generate step referencing retrieved passages. - Evaluate retrieval separately (recall@k) vs generation quality. ### Tool use - Keep tool schema strict (inputs/outputs), validate tool results, and handle retries/timeouts. - Prefer separating: “decide tool call” → “execute” → “final answer”. ## When asked to “learn DSPy and build X” Follow this order: 1. Inspect the repo’s current DSPy usage (existing modules, eval scripts, LM config). 2. Identify the installed DSPy version (from `pyproject.toml`, lockfile, or import behavior). 3. Build the smallest working baseline and an eval harness. 4. Only then introduce compilation/optimization and extra components (retrieval, tools, multi-step).