# Local CLI: `tools/checkyourself.py` CheckYourself is still a folder-first audit system, but the CLI is now the deterministic engine an agent can drive. It uses only the Python standard library, sends nothing over the network, and never prints secret values. The AI still supplies judgment. The CLI supplies repeatable receipts: discovery, schemas, coverage checks, scoring, backlog ranking, validation, and the thin MCP wrapper. ## Fast Start ```bash # Backward-compatible scan path python3 tools/checkyourself.py /path/to/your/project # Explicit scan subcommand python3 tools/checkyourself.py scan /path/to/your/project # Diagnostic alias for agent workflows that use that word python3 tools/checkyourself.py diagnostic /path/to/your/project # Machine-readable scan python3 tools/checkyourself.py scan . --format json --no-write # Discover every command and schema python3 tools/checkyourself.py describe --format json ``` ## Command Map | Command | Purpose | | --- | --- | | `describe` | Emits the full machine-readable capability manifest. | | `scan` | Detects stack signals and deterministic local findings. | | `diagnostic` | Alias for `scan`, kept so docs and agents can use the natural workflow word. | | `scan --deep` | Adds slower validation checks for detected surfaces, including mutable GitHub Actions and dependency-update coverage. | | `coverage --emit` | Writes the 20-surface coverage skeleton for an agent to fill. Use `--format json` for stdout. | | `coverage --check FILE` | Checks a filled coverage artifact for completeness. | | `score --findings FILE [--coverage FILE]` | Computes the deterministic Production Reality Score or a low-confidence scan-derived estimate. | | `backlog --findings FILE` | Ranks the complete remediation backlog. | | `next --findings FILE` | Returns the next safest unresolved approval batch. | | `diff --old FILE --new FILE` | Compares two findings artifacts and reports added, resolved, and regressed findings. | | `validate --kind KIND FILE` | Validates JSON against bundled schema contracts. | | `schema NAME` | Prints a bundled JSON schema. | | `init [PROJECT]` | Creates starter generated context and coverage files. | | `mcp` | Runs the stdio MCP server over the same functions. | ## Typical Agent Pipeline ```bash python3 tools/checkyourself.py describe --format json > CHECKYOURSELF_CAPABILITIES.generated.json python3 tools/checkyourself.py scan . --format json --no-write > CHECKYOURSELF_SCAN.generated.json python3 tools/checkyourself.py coverage --emit ``` The agent then fills `CHECKYOURSELF_COVERAGE.generated.json` with evidence from the full diagnostic and can run: ```bash python3 tools/checkyourself.py coverage --check CHECKYOURSELF_COVERAGE.generated.json python3 tools/checkyourself.py score --findings CHECKYOURSELF_SCAN.generated.json --coverage CHECKYOURSELF_COVERAGE.generated.json --format json python3 tools/checkyourself.py backlog --findings CHECKYOURSELF_SCAN.generated.json --format json python3 tools/checkyourself.py next --findings CHECKYOURSELF_SCAN.generated.json --format json ``` That makes the score and first batch reproducible. Same evidence, same score. No vibes with a clipboard. See the field notes behind this remediation in [`docs/postmortems/checkyourself-field-postmortem-2026-05-29.md`](postmortems/checkyourself-field-postmortem-2026-05-29.md). ## Scan ```bash python3 tools/checkyourself.py scan /path/to/project python3 tools/checkyourself.py scan . --json python3 tools/checkyourself.py scan . --json - python3 tools/checkyourself.py scan . --format json --no-write python3 tools/checkyourself.py scan . --ci python3 tools/checkyourself.py scan . --deep --format json --no-write ``` `scan` detects stack signals, dependencies, scripts, env files, tests, CI, risk-surface path hints, and obvious deterministic risks. Each finding has a **stable, semantic rule ID** (for example `CY-SECRET-001`, `CY-CONFIG-001`) that stays the same across runs and releases, so you can suppress, diff, and cite findings reliably: | Rule ID | Severity | What it catches | | --- | --- | --- | | `CY-SECRET-001` | P0 | High-confidence credential-shaped values in source. | | `CY-SECRET-002` | P2 | Secret-like assignments without a known credential shape. | | `CY-SECRET-003` | P3 | Sensitive file patterns missing from `.gitignore` (deep). | | `CY-ENV-001` | P0 | A real `.env` file that may be committed and is not gitignored. | | `CY-ENV-002` | P2 | A local `.env` present; verify it was never committed. | | `CY-ENV-003` | P1 | No `.env.example` for required configuration. | | `CY-CONFIG-001` | P2 | Debug mode enabled in committed configuration. | | `CY-CONFIG-002` | P1 | Default or weak credentials in committed configuration. | | `CY-API-001` | P2 | CORS configured to allow any origin. | | `CY-CODE-001` | P2 | Dangerous code patterns (eval, unsafe deserialization, disabled TLS, raw HTML injection). | | `CY-WEB-001` | P3 | Production source maps enabled. | | `CY-TEST-001` | P1 | No automated tests detected. | | `CY-CI-001` | P2 | No CI pipeline detected. | | `CY-PAY-001` | P1 | Payments dependency present but no tests. | | `CY-AI-001` | P2 | LLM dependency present but no tests. | | `CY-SUPPLY-001` | P2 | Mutable GitHub Action references (deep). | | `CY-SUPPLY-002` | P2 | `package.json` with no committed lockfile. | | `CY-SUPPLY-003` | P3 | No dependency update automation (deep). | | `CY-SUPPLY-004` | P3 | CI uses `npm install` instead of `npm ci` (deep). | Secrets are scanned in every file, including tests and docs, because real credentials get committed in both. The lower-noise heuristic detectors (debug flags, CORS, dangerous sinks, default credentials) skip test, fixture, doc, and example paths to keep the false-positive rate low. Package scripts and all evidence context are redacted before they appear in JSON or Markdown output: credential-shaped values are replaced with `[REDACTED]`. Evidence includes file, line, matched pattern type, confidence, and redacted context. Name-only or assignment-only signals stay lower severity so schema fields like `feedbackToken` do not wreck the score just for having an unfortunate name. Env example variants such as `.env.dogfood.example`, commented placeholders, and obvious placeholder values are treated as setup documentation instead of real local secrets. The scan never follows symlinks and never reads files outside the scanned tree; skipped symlinks and unreadable files are reported in `scan_limits`. Large projects are scanned up to `--max-files` (default 6000); when that cap is hit, `scan_limits.truncated` is `true` and the count of skipped files is disclosed so a partial scan is never mistaken for a clean one. Projects can suppress reviewed false positives with `.checkyourself.yml`. Use the stable rule ID and always scope it to the relevant path: ```yaml version: 1 suppress: - id: CY-SECRET-002 reason: "feedbackToken is a UUID reference, not a credential" files: ["src/dispatcher/tool-registry.ts"] reviewed_by: simon reviewed_at: "2026-06-12" ``` Suppressed findings remain in JSON with `status: "suppressed"` and a `suppression` note, but they do not count toward severity totals or score caps. `scan --deep` is still intentionally conservative. It validates a few detected surfaces instead of pretending to be a full SAST platform: mutable GitHub Action refs, missing dependency update automation, `npm install` in CI, and missing sensitive-file `.gitignore` patterns. The scan is not a clean bill of health. It is cheap evidence for the full CheckYourself diagnostic. ## Diff ```bash python3 tools/checkyourself.py diff --old baseline.json --new current.json python3 tools/checkyourself.py diff --old baseline.json --new current.json --ci python3 tools/checkyourself.py diff --old baseline.json --new current.json --format json ``` `diff` compares two findings artifacts (scan output, reports, or finding lists) and reports which findings were **added**, **resolved**, and **unchanged**, plus evidence-level changes on findings that persisted. Because rule IDs are stable, the delta reflects real changes rather than ID-shuffle noise. The result includes a `regression` flag that is `true` when the open P0 or P1 count increased against the baseline. With `--ci`, `diff` exits non-zero on a regression, so CI can gate on *new* risk instead of only absolute P0 count — the right control for a project that already has a known backlog. ## Coverage ```bash python3 tools/checkyourself.py coverage --emit python3 tools/checkyourself.py coverage --emit --format json > CHECKYOURSELF_COVERAGE.generated.json python3 tools/checkyourself.py coverage --check CHECKYOURSELF_COVERAGE.generated.json ``` In text mode, `coverage --emit` writes `CHECKYOURSELF_COVERAGE.generated.json` in the current directory. Use `--out PATH` to choose a path, or `--format json` when another tool wants stdout. Coverage has 20 surfaces. Each surface must be marked: - `Pass`; - `Finding`; - `Unknown`; - `NotApplicable`. `Pass` needs evidence. `Unknown` needs missing evidence. `NotApplicable` needs a reason. ## Scoring ```bash python3 tools/checkyourself.py score --findings findings.json --coverage coverage.json --format json python3 tools/checkyourself.py score --findings CHECKYOURSELF_SCAN.generated.json --format json ``` The score uses the weights and caps from [`02_RUN_DIAGNOSTIC/scoring-method.md`](../02_RUN_DIAGNOSTIC/scoring-method.md): - unresolved P0 caps the score at `49`; - unresolved P1 caps the score at `74`; - missing critical evidence caps at `84`; - scores above `90` require evidence for tests, secrets, deploy/rollback, observability, auth, and data boundaries. The evidence caps (`84` and `90`) apply in **every** score mode, including estimates — an estimate without coverage evidence can never report a launch-ready number. Absence of findings is treated as absence of evidence, not proof of safety: a scan that finds no secrets leaves the secrets surface `Unknown`, not `Pass`. The result includes `per_category` penalties, caps applied, confidence, and the finding IDs scored. With `--coverage`, the result is `score_mode: "coverage-backed"`. A coverage entry marked `Pass` without `evidence_reviewed`, or `NotApplicable` without a reason, is downgraded to `Unknown`, and any surface omitted from the artifact counts as `Unknown` — so omitting or hand-waving a surface can never score better than honestly reporting it, and `confidence: "high"` requires all 20 required surfaces present with real evidence. Without `--coverage`, the CLI produces a `scan-derived-estimate` when the findings file is scan JSON, or a `finding-only-estimate` otherwise. Both keep confidence `low` and return `manual_evidence_needed` so nobody mistakes the estimate for launch permission. Every score appends a receipt to `.checkyourself-score-history.json` beside the findings file by default. History timestamps are UTC so receipts from laptops and CI stay comparable, and a corrupt history file is preserved as `.corrupt.bak` rather than overwritten. Use `--history PATH` to choose a file, `--note` to add context, or `--no-history` for disposable runs. Scoring from stdin (`--findings -`) does not write history unless `--history` is explicit. ## Validation ```bash python3 tools/checkyourself.py schema scan python3 tools/checkyourself.py validate --kind scan CHECKYOURSELF_SCAN.generated.json python3 tools/checkyourself.py validate --kind score CHECKYOURSELF_SCORE.generated.json ``` Supported schema kinds: - `capabilities`; - `scan`; - `coverage`; - `score`; - `backlog`; - `next`; - `diff`; - `report`; - `dashboard`; - `dashboard-data`; - `learning-plan`. Validation uses a small standard-library JSON Schema subset: `required`, `type`, `enum`, `minimum`, `maximum`, `properties`, and `items`. ## Exit Codes | Code | Meaning | | ---: | --- | | `0` | Success; no gating condition. | | `1` | Gating condition: `--ci` P0, invalid artifact, or incomplete coverage. | | `2` | Usage/input error. | ## GitHub Action This repo includes a composite action for projects that vendor or reference CheckYourself in CI: ```yaml name: Production Readiness Check on: [pull_request] jobs: checkyourself: runs-on: ubuntu-latest steps: - uses: actions/checkout@ - uses: KyaniteLabs/checkyourself/.github/actions/checkyourself@main with: fail-on-p0: "true" deep: "true" ``` The action writes scan JSON, validates it, and fails the job when `fail-on-p0` is enabled and unresolved P0 findings remain. Pin the action ref for production use. ## MCP The MCP wrapper is local stdio and thin by design: ```bash python3 tools/checkyourself.py mcp ``` It exposes native tools for `describe`, `scan`, `coverage_emit`, `coverage_check`, `score`, `backlog`, `next`, `diff`, `validate`, and `schema`. MCP-initiated scans are confined to `CHECKYOURSELF_SCAN_ROOT` (defaulting to the server process working directory). A `scan` request for a path outside that root is rejected, and unknown or misspelled tool arguments are rejected rather than silently ignored, so an agent cannot accidentally scan the wrong directory. See [`mcp.md`](mcp.md). ## API Decision There is no hosted API in this repo. The CLI is the canonical engine. MCP is a local convenience wrapper over that engine. A hosted API only makes sense if CheckYourself becomes a SaaS/team product with accounts, hosted runs, shared history, billing, or browser-only usage.