--- name: libeval description: > libeval - RAG evaluation system. Evaluator orchestrates quality assessment using LLM-as-judge patterns. CriteriaEvaluator scores responses against rubrics. RecallEvaluator measures retrieval performance. TraceEvaluator analyzes execution traces. EvalStore persists results. Use for automated quality testing, RAG pipeline evaluation, and agent performance testing --- # libeval Skill ## When to Use - Evaluating RAG agent response quality - Measuring retrieval recall and precision - Running automated quality assessments - Benchmarking agent performance over time ## Key Concepts **Evaluator**: Main orchestrator that runs test cases through the agent and collects metrics. **CriteriaEvaluator**: Uses LLM-as-judge to score responses against defined criteria and rubrics. **RecallEvaluator**: Measures how well the retrieval system returns relevant documents. **TraceEvaluator**: Analyzes execution traces for performance and correctness. ## Usage Patterns ### Pattern 1: Run evaluation suite ```javascript import { Evaluator } from "@copilot-ld/libeval"; const evaluator = new Evaluator(config); const results = await evaluator.run(testCases); console.log(results.summary); ``` ### Pattern 2: Criteria-based evaluation ```javascript import { CriteriaEvaluator } from "@copilot-ld/libeval"; const criteria = new CriteriaEvaluator(llmClient); const score = await criteria.evaluate(response, rubric); ``` ## Integration Configured via config/eval.yml. Run via `make eval`. Uses libllm for LLM-as-judge.