# Repository Guidelines ## Project Structure & Module Organization Keep the layout aligned with `PRD.md`. Library-style pipeline logic belongs in `pipelines/`, while thin CLI wrappers live in `scripts/`. Harvested inputs flow from `data/legacy_ro/` (read-only) into `data/intake/` with hashes recorded in `manifests/harvest_manifest.csv`; intermediate and blessed outputs go to `data/interim/` and `data/processed/`. QC panels land under `qc/panels/`, and reproducibility checks plus fixtures live in `tests/`. Update `RUNBOOK.md` whenever you add or retire a blessed command. ## Build, Test, and Development Commands Set up the environment via `requirements.txt` (`make env && source .venv/bin/activate`). The Makefile is the authoritative interface for working commands (`make env`, `make validate`, `make golden`, `make test-lidar`, `make thermal`). For iterative debugging you may invoke stage scripts directly, e.g. ```bash .venv/bin/python scripts/run_lidar_hag.py \ --data-root data/legacy_ro/penguin-2.0/data/raw/LiDAR/sample \ --out data/interim/lidar_test.json \ --emit-geojson --crs-epsg 32720 --plots --strict-outputs ``` Always rerun the golden AOI guardrail with `.venv/bin/python -m pytest -q tests/test_golden_aoi.py` before shipping changes. ## Coding Style & Naming Conventions Target Python 3.12.x, four-space indentation, and snake_case for modules, files, functions, and variables; reserve PascalCase for dataclasses or typed containers. Keep stage parameters and IO contracts encapsulated inside `pipelines/.py`, exposing a single `run()` entry point that the script imports. Run `ruff check` and `ruff format`; add type hints and short docstrings noting inputs, side effects, and outputs. ## Testing Guidelines Place unit and integration tests beside the stage they exercise; name files `test_.py` and keep fixtures in helper modules suffixed `_fixtures.py`. Golden AOI tests must assert the presence, schema, and hashes (when available) of outputs such as `candidates.gpkg`, `rollup_counts.json`, and QC PNGs. Add regression coverage whenever parameters, schemas, or filenames change, and mirror new QC metrics in `manifests/qc_report.md`. ## Commit & Pull Request Guidelines Write concise, imperative commit subjects (`stage: action`, e.g. `lidar: clamp hag thresholds`) and include provenance notes (legacy source path, SHA) in the body when harvesting. Every PR should: (1) link the tracked task or issue, (2) summarize parameter or schema changes, (3) attach before/after QC thumbnails when relevant, and (4) confirm `make golden` plus pytest ran cleanly. Never commit harvested data—reference manifest entries instead. ## Security & Data Handling Treat `data/legacy_ro/` as immutable. Copy artifacts through harvesting scripts only, populate `manifests/harvest_manifest.csv` with SHA256 and size, and keep credentials or API tokens out of the repo. Scrub logs before sharing externally and note any sensitive paths in PR descriptions. ## Cursor IDE Project Rules This repo uses Cursor **Project Rules** under `.cursor/rules/*.mdc` to keep AI-assisted edits aligned with project constraints. - The rule `00-core-invariants.mdc` is **always applied** and encodes non-negotiables like `data/legacy_ro/` immutability, determinism, and the single sources of truth (`RUNBOOK.md`, `docs/reports/STATUS.md`, `notes/pipeline_todo.md`). - Other rules are scoped by file globs (Python quality, geospatial CRS correctness, testing guardrails, git hygiene) and are attached automatically when working in matching files.