---
name: llm-evaluation
author: JM Labs (Javier Montaño)
version: 1.0.0
description: >
  Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT]
  Trigger: "llm evaluation"
allowed-tools:
  - Read
  - Write
  - Glob
  - Grep
  - Bash
---
# Llm Evaluation
> "Method over hacks."
## TL;DR
Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT]
## Procedure
### Step 1: Discover
- Gather context and requirements
### Step 2: Analyze
- Evaluate options per Constitution XIII/XIV
### Step 3: Execute
- Implement with evidence tags
### Step 4: Validate
- Verify quality criteria met
## Quality Criteria
- [ ] Evidence tags applied
- [ ] Constitution-compliant
- [ ] Actionable output

## Usage

Example invocations:

- "/llm-evaluation" — Run the full llm evaluation workflow
- "llm evaluation on this project" — Apply to current context


## Assumptions & Limits

- Assumes access to project artifacts (code, docs, configs) [EXPLICIT]
- Requires English-language output unless otherwise specified [EXPLICIT]
- Does not replace domain expert judgment for final decisions [EXPLICIT]

## Edge Cases

| Scenario | Handling |
|----------|----------|
| Empty or minimal input | Request clarification before proceeding |
| Conflicting requirements | Flag conflicts explicitly, propose resolution |
| Out-of-scope request | Redirect to appropriate skill or escalate |