# DEC-20260106-02 - Data integrity verify Status: Accepted Date: 2026-01-06 ## Goal Add automated verification that catches alias/taxonomy drift and encoding-related read issues before merges. ## Intended outcome - `tools/verify-breed-data.ps1` and `tools/verify-breed-data.sh` return exit code 1 if any of the following are true: - Any canonical key in `data/breed_aliases.json` (excluding keys starting with `_`) does not exist as a top-level key in `data/breed_taxonomy.json`. - Any alias string (case-insensitive, trimmed) appears under more than one canonical key. - Windows PowerShell reads JSON as UTF-8 explicitly to prevent false failures caused by encoding issues (e.g., `"Vend\u00e9en"` becoming `"Vend\u00c3\u00a9en"`). ## Non-goals - Fixing historical data problems inside the JSON files as part of this decision. - Adding a full schema validator for every data file. ## Constraints - Must run on Windows (PowerShell 7) without requiring a separate runtime. - Must not rely on network access. ## Acceptance criteria - The verification scripts fail fast with a clear error message and non-zero exit code. - The verification scripts pass on a clean repo checkout on Windows and Linux/macOS. ## Rollback plan - Remove the verification scripts and any workflow hooks that call them. - Revert the commit(s) that introduced the checks. ## Notes This is intentionally minimal: it protects core cross-file relationships and prevents silent drift.