# Classifier-Style Agents

Use this reference for agents that extract structure, rank candidates, classify actions, route work, moderate content, or gate automation.

## Review Sequence

1. Load the domain contract before reviewing prompt wording.
2. Separate deterministic stages from model-driven stages.
3. Check retrieval or candidate quality before blaming the model.
4. Check thresholds, post-model validation, and human-review boundaries.
5. Require slice-based evals before recommending larger architecture changes.

If the system has no written schema, taxonomy, or action set, flag that gap first.

## Common Failure Types

- false positive final action
- false negative or missed match
- wrong abstain or over-escalation
- bad candidate set or missing evidence
- schema drift or normalization mismatch
- overconfident outputs on weak evidence
- threshold regression after prompt changes
- cost or latency regressions from unnecessary tool use

## Design Rules

- Keep the action set explicit and mutually exclusive.
- Add an abstain, `no_match`, or manual-review path when evidence can be weak.
- Keep candidate sets structured and comparable on decisive fields.
- Bias the runtime against costly false positives.
- Make confidence meaningful only if it drives thresholds or downstream policy.
- Validate model outputs after generation, not only in the prompt.

## Bottleneck Clues

| Symptom | Likely bottleneck |
| --- | --- |
| The model chooses the wrong item from a good candidate set | Decision policy, prompt clarity, tool descriptions, or calibration |
| The right item is not present when the model decides | Retrieval, normalization, or candidate generation |
| Model output looks good but persisted state is wrong | Output schema, validation, or integration bugs |
| Confidence is high on weak evidence | Thresholding, confidence semantics, or missing abstain rules |
| Queue load spikes after an "accuracy" change | Threshold regression, auto-action policy, or precision/recall imbalance |

## Repo Anchor: Peated Bottle Matcher

When the task is about the current bottle matcher or label extractor, read:

- `docs/development/schema-conventions.md`
- `apps/server/src/agents/whisky/guidance.ts`
- `apps/server/src/agents/priceMatch/classifyStorePriceMatch.ts`
- `apps/server/src/lib/priceMatchingProposals.ts`
- `apps/server/src/schemas/priceMatches.ts`
- `apps/server/src/lib/priceMatching.test.ts`
- `apps/server/src/schemas/priceMatches.test.ts`

Then check:

- extraction conservatism: prefer `null` or `[]` over guessing
- decisive identity fields: producer, distillery, expression, series, edition, age, cask flags, ABV, and years
- candidate generation before web search
- action set boundaries: `match_existing`, `correction`, `create_new`, `no_match`
- confidence normalization and automation thresholds
- server-side sanitization of ids and proposed entities
- non-whisky rejection and human-review boundaries

## Eval Minimum

Require:

- confusion-style breakdown by action or class
- hard-slice examples for decisive error modes
- trace or tool-call review
- cost, latency, and tool-usage metrics
- before vs after comparison for proposed changes