## Architecture rules (non-negotiable) - **Typed contracts are non-negotiable.** Never widen a contract field to `dict[str, Any]` or `Any`. Every module output is a typed Pydantic `BaseModel`. - **No fallback evaluators.** Unmapped criteria surface as `UNMAPPED` in `CriterionEvaluation.result`. Do not add a default/catch-all evaluator. - **Each review-data field is written from exactly one source.** The assembler maps one module → one field. No merging, no fan-in. - **Evaluator technique is private.** Rule, LLM, or hybrid — all implement the same `BaseEvaluator.evaluate()` interface. The tree evaluator does not care how the decision was made. - **`required_fields` is a contract.** List the field path(s) on `BaseReviewData` that your evaluator needs. The tree evaluator enforces non-None before calling `evaluate()`. Do not add None-guards inside evaluators for required fields. ## Key architectural boundaries - **`BaseExtractionModule[TOutput]`** — typed generic; `output_schema` is resolved at class-definition time via `__init_subclass__`. Always parametrize with a concrete `BaseModel` subclass. - **`@register_evaluator(*criterion_codes)`** — exact-match registry; one evaluator per criterion code; raises on duplicate. Registration triggers at Django app-ready time via `evaluation/evaluators/__init__.py`. - **`required_fields` contract** — declared as `ClassVar[list[str]]` on `BaseEvaluator`. The tree evaluator checks each dotted path against `data` before calling the evaluator. `None` at any point in the path = INSUFFICIENT_INFO. - **No `dict[str, Any]`** — enforced everywhere: module contracts, review data, assembler, evaluator inputs. Use `model_dump()` / `model_validate()` at DB boundaries only. ``` ## LLM determinism `temperature=0` helps but does not guarantee identical output across API versions. Cached fixtures in `fixtures/cached_llm_responses/` paper over this for tests. Do not delete or hand-edit those files — regenerate them via `scripts/generate_fixtures.py`. The provider is env-driven (`LLM_PROVIDER`): anthropic (default), gemini, openai, groq, openrouter. Prompts must produce strict JSON that parses cleanly across providers — the contract is the Pydantic schema, not any one provider's quirks. All provider paths pass `temperature=0`. ## Cache key lifecycle Cache keys are derived from the *data*, not the prompt. The extraction cache key now includes a prompt hash so prompt edits miss the cache cleanly — but evaluator cache keys do NOT include the evaluator version. If you change an evaluator's system prompt, bump the `cache_key_prefix` OR delete the cached file by hand OR re-run `scripts/generate_fixtures.py`. ### Provider-aware cache layout Cache files live under `fixtures/cached_llm_responses/`. When `LLM_PROVIDER` is set, the client scopes reads/writes into a provider subdirectory: - Read order (`LLM_MODE=cache`): `fixtures/cached_llm_responses//.json`, falling back to `fixtures/cached_llm_responses/.json` (the shared baseline shipped in the repo). A total miss raises `LLMCacheMiss` naming both paths. - Write location (`LLM_MODE=record`): `fixtures/cached_llm_responses//.json` when `LLM_PROVIDER` is set, else the shared top-level path. This keeps provider-specific recordings from clobbering the shared baseline. - If `LLM_PROVIDER` is unset, behavior matches the pre-existing shared layout exactly. ## Running tests ```bash make test ``` Tests run with `LLM_MODE=cache` by default (set in `tests/conftest.py`). They never hit the network.