---
name: evaluate-model
description: "Measure model performance on test datasets. Use when assessing accuracy, precision, recall, and other metrics."
mcp_fallback: none
category: ml
tier: 2
user-invocable: false
---

# Evaluate Model

Measure machine learning model performance using appropriate metrics for the task (classification, regression, etc.).

## When to Use

- Comparing different model architectures
- Assessing performance on test/validation datasets
- Detecting overfitting or underfitting
- Reporting model accuracy for papers and documentation

## Quick Reference

```mojo
# Mojo model evaluation pattern
struct ModelEvaluator:
    fn evaluate_classification(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32, Float32]:
        # Returns accuracy, precision, recall
        ...

    fn evaluate_regression(
        mut self,
        predictions: ExTensor,
        ground_truth: ExTensor
    ) -> Tuple[Float32, Float32]:
        # Returns MSE, MAE
        ...
```

## Workflow

1. **Load test data**: Prepare test/validation dataset
2. **Generate predictions**: Run model inference on test set
3. **Select metrics**: Choose appropriate metrics (accuracy, precision, recall, F1, AUC, MSE, etc.)
4. **Calculate metrics**: Compute performance metrics
5. **Analyze results**: Compare to baseline and identify strengths/weaknesses

## Output Format

Evaluation report:

- Task type (classification, regression, etc.)
- Metrics (accuracy, precision, recall, F1, AUC, etc.)
- Per-class breakdown (if applicable)
- Comparison to baseline model
- Confusion matrix (classification)
- Error analysis

## References

- See CLAUDE.md > Language Preference (Mojo for ML models)
- See `train-model` skill for model training
- See `/notes/review/mojo-ml-patterns.md` for Mojo tensor operations