--- name: llm-evaluation author: JM Labs (Javier Montaño) version: 1.0.0 description: > Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT] Trigger: "llm evaluation" allowed-tools: - Read - Write - Glob - Grep - Bash --- # Llm Evaluation > "Method over hacks." ## TL;DR Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT] ## Procedure ### Step 1: Discover - Gather context and requirements ### Step 2: Analyze - Evaluate options per Constitution XIII/XIV ### Step 3: Execute - Implement with evidence tags ### Step 4: Validate - Verify quality criteria met ## Quality Criteria - [ ] Evidence tags applied - [ ] Constitution-compliant - [ ] Actionable output ## Usage Example invocations: - "/llm-evaluation" — Run the full llm evaluation workflow - "llm evaluation on this project" — Apply to current context ## Assumptions & Limits - Assumes access to project artifacts (code, docs, configs) [EXPLICIT] - Requires English-language output unless otherwise specified [EXPLICIT] - Does not replace domain expert judgment for final decisions [EXPLICIT] ## Edge Cases | Scenario | Handling | |----------|----------| | Empty or minimal input | Request clarification before proceeding | | Conflicting requirements | Flag conflicts explicitly, propose resolution | | Out-of-scope request | Redirect to appropriate skill or escalate |