# Contributing

This repository is a Lean competition judge plus an LLM-driven evaluation
pipeline. Maintenance priorities, in order, are:

1. **Correctness** of contestant verdicts.
2. **Determinism** of the judge across repeated runs.
3. **Sandbox and secret isolation** in the solver runner.
4. **Public-contract consistency** (status mapping, JSON schemas, CLI
   surface, README, rules under `rules/`).
5. **Harness coverage** — every behavioral guarantee must be locally
   testable via `python3 scripts/run_harness.py`.

Contributions that strengthen these are welcome. Contributions that
compromise them, or that cannot be evaluated against them, are not.

## TL;DR

**Open an issue first. Do not open a pull request before the issue is
filed and acknowledged**, with the narrow exceptions listed under
[Trivial fixes](#trivial-fixes). This project aligns on problem
definition, semantic edges, and compatibility impact *before* code is
written — not after.

"Write the code first, discuss later" is not how this repository
operates.

## Issue-first policy

All non-trivial changes must start as a GitHub issue. The following
categories require an issue regardless of patch size:

- Correctness bugs in the judge, harness, pipeline, or sandbox.
- Changes to judge verdict semantics or the public five-status mapping
  (`accepted` / `unparsed` / `malformed` / `incomplete_proof` /
  `incorrect`).
- Changes to the pipeline protocol — stdin/stdout JSON contract,
  `pipeline/config.json` schema, environment-variable surface, LLM
  routing behavior.
- Sandbox or security changes — process isolation, filesystem access,
  network policy, secret handling, dependency-allowlist rules.
- Harness coverage changes — new tests, removed tests, or fixture
  edits that alter what is asserted.
- Documentation changes that alter a documented behavior, contract, or
  example contestants rely on.
- Feature requests of any size.
- Performance changes touching hot paths (judge run loop, Lean
  elaboration, sandbox startup, LLM proxy).
- Architectural changes — new modules, new dependencies, new
  submission shapes, new certificate branches.

See [Trivial fixes](#trivial-fixes) for the exceptions.

## Bar for filing an issue

Before opening an issue:

1. **Search.** Check open issues, closed issues, and Discussions.
   Comment on the existing thread instead of duplicating.
2. **Reproduce on a current build.** Use the latest `main` or latest
   tagged release. Reports against unmerged branches will not be
   investigated.
3. **Minimize.** Strip the reproduction to the smallest input that
   still triggers the behavior — problem JSON, answer JSON, `solver.py`,
   and any config override needed. Inline the inputs or attach as a
   gist; do not link to private workspaces.
4. **State the contract.** For judge / pipeline / harness / sandbox
   reports, name the specific contract being violated: a clause in the
   README, a row in the status-mapping table, an explicit harness
   expectation, or a documented invariant. "It feels wrong" is not a
   contract.
5. **State expected vs. actual concretely.** Both must be verifiable.
   "It should work" is not an expected behavior.

Issues without reproduction, evidence, or contract justification will
be marked `needs-repro` / `needs-contract`, moved to Discussions, or
closed.

## Required fields for bug reports

Every bug report must include:

- **Commit, branch, or release tag** the bug reproduces on.
- **Affected module:** `judge`, `pipeline`, `docs`, or `harness`.
- **Issue type:** `correctness`, `determinism`, `sandbox`, `security`,
  `docs`, `coverage`, or `performance`.
- **Severity:** does this affect the public verdict path, harness exit
  code, or only an internal/dev surface?
- **Reproduction steps**, runnable end-to-end without manual edits.
- **Minimal repro inputs:** problem JSON, answer JSON, `solver.py` if
  relevant, and any config override.
- **Actual output:** exact text, including the `status` field, exit
  code, and any stderr.
- **Expected output:** exact text, or — for under-specified cases —
  the contract that determines the expectation.
- **Logs / stderr / traceback** for the failing run.
- **Why this is a bug, not intended behavior.** Cite the README, the
  status-mapping table, an existing harness expectation, or a
  comparable accepted issue.
- **Suggested regression test**, if you can identify where it would
  slot into the harness.

A report missing the inputs, the actual output, or the contract
justification is not actionable and will be returned with
`needs-repro` or `needs-contract`.

## Feature and enhancement requests

Feature issues must answer:

- **User value.** Who benefits, and in what concrete workflow?
- **Scope.** What is in, what is out. A bounded feature is
  reviewable; a vision statement is not.
- **Public-contract impact.** Does it change status semantics, JSON
  schemas, CLI flags, environment variables, or harness expectations?
- **Backward compatibility.** Does it break existing solvers,
  contestant submissions, or the documented status mapping?
- **Ownership.** Are you proposing to implement, co-implement, or only
  to request?

Open-ended ideas, "wouldn't it be nice if…" suggestions, and unscoped
wishlists belong in Discussions. Maintainers will move them.

## Pull requests

- **Every non-trivial PR must reference a prior issue.** PRs without
  one may be closed with a request to file an issue first.
- **Correctness changes require tests.** Add or update fixtures under
  `tests/` and ensure `python3 scripts/run_harness.py` exits `0`.
- **Behavior-changing PRs update documentation in the same change
  set.** README, the per-track tutorials
  (`examples/solo/TUTORIAL.md`, `examples/marathon/TUTORIAL.md`),
  the per-track specs under `docs/` (`docs/solo_mode.md`,
  `docs/marathon_mode.md`), status-mapping tables, and any rule
  files under `rules/` must stay aligned with the code.
- **One topic per PR.** Do not bundle unrelated refactors with a fix.
  Drive-by cleanup belongs in its own issue and PR.
- **Keep PRs small.** Large rewrites without prior issue alignment
  will be returned.
- **Compatibility impact statement.** PRs touching judge verdict
  semantics, the pipeline protocol, sandbox behavior, or any
  documented public contract must state, in the PR description:
  - what changes for existing solvers,
  - what migration (if any) is required,
  - which fixtures or harness tests demonstrate the new behavior.
- **No skipping hooks or signing.** Do not push with `--no-verify`,
  `--no-gpg-sign`, or equivalent. If a hook fails, fix the cause.
- **Determinism.** PRs that introduce non-determinism — random seeds,
  time-dependent logic, network access in tests — will be rejected.

## Triage and closing

Maintainers use the following labels and actions:

- `needs-repro` — no reproduction, or non-minimal reproduction.
  Author has 14 days to respond before the issue is closed.
- `needs-contract` — no statement of which documented contract is
  violated. Same 14-day clock.
- `discussion` — open-ended; moved to Discussions.
- `duplicate` — closed with a pointer to the canonical issue.
- `stale` — no maintainer-actionable update for 30 days; closed.

Maintainers reserve the right to close issues that are low-quality,
unscoped, or not actionable, regardless of label. Reopen requests are
welcome once new evidence is provided.

## Project-specific emphasis

This project holds a stronger line than typical open-source on:

- **Determinism.** A change that produces different outputs across
  repeated runs of the same input is a regression, even if both
  outputs look "correct".
- **Exact status semantics.** The five public statuses are a contract.
  Do not collapse infrastructure failures into `incorrect`. Do not
  introduce new statuses without an issue and explicit acceptance.
- **Harness-backed regressions.** Every fix needs a test that would
  have caught the bug. "I tested it manually" is not coverage.
- **Sandbox and secret isolation.** Solver code runs untrusted.
  Changes that loosen process isolation, expand filesystem access, or
  expose environment variables to user-supplied code require explicit
  security review.
- **Docs-vs-behavior consistency.** When code and docs disagree, one
  of them is wrong. Both are fixed in the same PR.
- **Skepticism toward unsubstantiated correctness claims.** "I think
  there might be a bug in X" without a reproduction is held until
  evidence appears. Maintainers will not pre-investigate hunches; the
  issue bar applies.

## Security issues

Security-relevant findings — sandbox escapes, secret leaks, judge
bypasses that smuggle accepted verdicts past the contract, anything
that could affect contestant fairness — are welcome as **regular
public issues**, same bar as any other bug report (reproducer,
expected vs. actual, version / commit, normalized status if
applicable). Use the issue type label `security`.

If for some reason you'd rather report privately first, you can email
the maintainer at the address on the GitHub profile, but it is not
required and there is no embargo: filing publicly is the default and
fine.

## Trivial fixes

The following may go directly to a PR without a prior issue:

- Typo fixes in comments, docs, or strings.
- Pure copy-edits to README, the per-track tutorials
  (`examples/solo/TUTORIAL.md`, `examples/marathon/TUTORIAL.md`), or
  files under `rules/` that do not change a documented contract.
- Markdown or formatting cleanup with no semantic change.
- Test-only cleanup that does not change what is asserted.

If you are unsure whether your change qualifies, open an issue first.
The cost of one extra issue is much lower than the cost of an
unwanted PR.