---
name: ml-model-eval-benchmark
description: "Compare model candidates using weighted metrics and deterministic ranking outputs. Use for benchmark leaderboards and model promotion decisions."
---

# ML Model Eval Benchmark

## Overview

Produce consistent model ranking outputs from metric-weighted evaluation inputs.

## Workflow

1. Define metric weights and accepted metric ranges.
2. Ingest model metrics for each candidate.
3. Compute weighted score and ranking.
4. Export leaderboard and promotion recommendation.

## Use Bundled Resources

- Run `scripts/benchmark_models.py` to generate benchmark outputs.
- Read `references/benchmarking-guide.md` for weighting and tie-break guidance.

## Guardrails

- Keep metric names and scales consistent across candidates.
- Record weighting assumptions in the output.