# Agents.md ― spec _version: **0.3.3** (2025-07-21) > **TL;DR** > `anml-exp` is a *pip-installable* anomaly-detection playground with locked > dependencies (`uv.lock`), artefact registry, strict dataset hashing, and a > structured hardware descriptor in benchmark outputs. --- ## 0 · Purpose & Non-Goals This repository remains a **rapid-prototyping and benchmarking framework for anomaly-scoring / detection algorithms**. Out of scope: * Streaming / online detection * Production serving or long-horizon monitoring * Adversarial-attack tooling, drift dashboards, model-card generation --- ## 1 · Big Picture LLM-powered *agents* collaborate with human maintainers to 1. Implement a diverse zoo of anomaly-detection models. 2. Expose a **uniform API** and artefact registry for seamless benchmarking. 3. Automate dataset ingestion (SHA-256 verified), metric computation, and result logging. 4. Guard code quality with linting, typing, tests, docs, and a locked dependency graph (`uv.lock`). --- ## 2 · Canonical Folder Layout . ├── src/ │ └── anml_exp/ │ ├── init.py │ ├── models/ │ ├── data/ │ ├── benchmarks/ │ ├── registry/ # NEW: model artefact versioning (#43) │ ├── resources/ │ └── cli.py ├── tests/ ├── docs/ ├── pyproject.toml ├── uv.lock # NEW: reproducible dependency lockfile (#42) └── README.md > *Historic note* – the hidden `.agents/` folder mentioned in spec v0.2 has been > retired; helpers live in `anml_exp/benchmarks/` and `anml_exp/registry/`. --- ## 3 · Common Schema & Interfaces ### 3.1 Base Model API Every model **must** subclass `anml_exp.models.base.BaseAnomalyModel` and implement: | Method / Property | Signature | Notes | |-------------------|-----------|-------| | `fit` | `def fit(self, X, y=None) -> Self` | | | `score_samples` | `def score_samples(self, X) -> NDArray[float]` | Higher ⇒ more anomalous. | | `predict` | `def predict(self, X, *, threshold=None) -> NDArray[int]` | | | `decision_threshold` | `@property def decision_threshold(self) -> float` | | | `save` / `load` | optional | Use `anml_exp.registry` for artefact versioning. | `anml_exp.registry` stores model binaries and metadata under a semantic version (`MAJOR.MINOR.PATCH`) with SHA-256 digests. --- ### 3.2 Dataset Registry ```python from anml_exp.data import load_dataset X_train, y_train = load_dataset("kddcup99", split="train") • Each dataset module must declare SHA256 hashes for every file. • load_dataset verifies each hash before extraction; mismatch ⇒ HashError (#44). • Deterministic splits (seed = 42). ⸻ 3.3 Metrics & Result Schema Benchmarks report: • ROC-AUC • PR-AUC (Average Precision) • F1 @ best Youden threshold • Mean wall-time per 1 000 samples Each run is saved to results/{exp_name}/{model_name}.json and must validate against anml_exp/resources/results-schema.json. Structured hardware descriptor (#45) "hardware": { "device_type": "GPU", "vendor": "NVIDIA", "model": "RTX A6000", "driver": "535.104", "num_devices": 1, "notes": "desktop workstation" } Minimal example { "$schema": "./results-schema.json", "dataset": "kddcup99", "model": "isolation_forest", "model_version": "0.1.0", "n_samples": 145586, "seed": 42, "hardware": { "device_type": "CPU", "vendor": "Intel", "model": "i7-1185G7", "driver": "N/A", "num_devices": 1, "notes": "laptop" }, "roc_auc": 0.921, "pr_auc": 0.604, "f1": 0.432, "threshold": 0.79, "fit_time": 1.23, "score_time": 0.02, "params": {"n_estimators": 100, "max_samples": "auto"}, "artefact_digest": "sha256:13f0…" } ⸻ 4 · Agent Roles Agent Intent Success Criteria Builder Generate / extend code (models, loaders, registry). API compliance, passes tests, artefact registered. Evaluator Run benchmarks & aggregate metrics. JSON validates, hardware descriptor correct. Reviewer Static analysis, typing, docs, tests, perf. CI green (ruff, mypy, pytest, hash check, lock diff). ⸻ 5 · Contribution Workflow flowchart TD draft["Builder → Draft PR"] review["Reviewer → CI checks"] maintainer["Human → Merge / Request changes"] draft --> review --> maintainer CI additionally ensures: • uv sync --frozen produces identical env (#42). • Dataset SHA-256s match declared values (#44). ⸻ 6 · Coding Standards • Dependency lock: uv.lock is the single source of truth. • PEP 8 via ruff; PEP 561 typing (mypy --strict). • Speed up mypy in CI by caching `.mypy_cache` and installing `mypy[faster-cache]` via `uv pip` to ensure the local environment runs the optimized wheels. • NumPy-style docstrings. • pyproject.toml + uv.lock define mandatory and optional extras. ⸻ 7 · Testing Strategy • Unit + property tests. • Hash-verification tests for every dataset file. • CI fails if uv lock --check detects drift. • Perf suite (tests/perf/) skipped in CI. ⸻ 8 · Installation & Quick-Start # Reproducible dev install uv sync --frozen pip install -e ".[torch,plot]" # After release: pip install anml-exp[torch,plot] CLI: anml-exp benchmark --dataset toy-blobs \ --model isolation_forest \ --output results/demo.json ⸻ 9 · Road-Map Milestone Owner Exit Criteria M0 – Skeleton Builder Base class, dataset registry (SHA-256), artefact registry, CI, uv.lock. M1 – Classical Benchmark Evaluator 3 tabular datasets; JSON outputs pass new schema. M2 – Deep Models Builder AutoEncoder, DeepSVDD, USAD registered & versioned. M3 – Time-Series Support Builder + Evaluator Loader + STOMP baseline + benchmarks. ⸻ 10 · Open Questions 1. Unified config system (omegaconf) – still pending. 2. Preferred experiment tracker (mlflow, wandb, plain JSON). 3. CPU vs GPU determinism in CI. 4. Sandboxing policy for code-gen agents. ⸻ 11 · Meta • _spec_version bumped → 0.3.3 (adds #42–#45). • See CONTRIBUTING.md for human-targeted guidelines. • results-schema.json is the machine-readable contract. Last updated – 2025-07-20 @ 20:55 AEST