# Public Evidence Harness STRIX public validation should produce repeatable evidence, not one-off terminal claims. The public test matrix is a small software-only harness for running approved checks and recording the exact commit, branch, selected tags, command results, and report timestamps. This is an upstream-maintainer advantage: downstream forks can still run or modify the code, but they do not inherit official release provenance, maintained test matrices, or the project-specific validation discipline unless they keep that evidence layer current themselves. ## Scope The public matrix is intentionally conservative. It covers: - public surface hygiene; - release manifest generation; - harness self-tests; - scenario contract validation; - software-only scenario replay generation; - public scenario envelope checks; - Python regression tests; - targeted Rust contract tests. Program-specific scenarios, customer data, internal review ledgers, and non-public benchmark data do not belong in this public matrix. ## Usage List the available checks: ```bash python scripts/strix_test_matrix.py --list ``` Run the fast smoke layer: ```bash python scripts/strix_test_matrix.py --select smoke ``` Run all non-manual checks and write an explicit report path: ```bash python scripts/strix_test_matrix.py \ --matrix Project_Docs/testing/public_test_matrix.json \ --output target/strix-test-reports/public.json ``` Each run writes both JSON and Markdown reports. The default output directory is under `target/`, so generated evidence stays out of source control unless a maintainer intentionally promotes a report into release notes. Generate a deterministic scenario replay plus a browser-viewable HTML canvas: ```bash python scripts/strix_sim_replay.py \ --scenario sim/scenarios/gps_denied_recon.yaml \ --output target/strix-replays/gps_denied_recon.json \ --html target/strix-replays/gps_denied_recon.html ``` The replay harness is intentionally described as a deterministic kinematic public replay. It is useful for visual inspection, regression evidence, seeded event playback, and pre-field behavior review. It is not a hardware, RF, sensor-fidelity, or field-readiness simulator. ## Next Capabilities The next useful expansion is scenario-family batch replay: every public scenario already declares a seed, metric set, and `pass_envelope`; the next step is to run every scenario through replay and compare observed metrics against that envelope. After that, add statistical Monte Carlo sweeps, richer trace exports, and integration checks for criticality, contagion, and quorum-style confirmation loops.