# Agents Shipgate · Long-Form Agent Reference (llms-full.txt)
> Single-fetch concatenation of the canonical agent-facing reference
> material. AI search engines and coding agents that prefer one document
> over chasing links should fetch this file. The short index is at
> [`llms.txt`](llms.txt); machine-readable triggers are at
> [`docs/triggers.json`](docs/triggers.json).
>
> Generated by `scripts/build-llms-full.py` from the source files below.
> Do not edit by hand — re-run the script to update.
## Sources (in order)
- [`AGENTS.md`](AGENTS.md)
- [`docs/agent-recipes.md`](docs/agent-recipes.md)
- [`docs/agent-contract-current.md`](docs/agent-contract-current.md)
- [`docs/checks.md`](docs/checks.md)
- [`docs/concepts.md`](docs/concepts.md)
- [`docs/autofix-policy.md`](docs/autofix-policy.md)
---
# Agents Shipgate · Agent Instructions
Authoritative instructions for AI coding agents (Claude Code, Codex, Cursor, Aider) working **with** this repository or a project that uses Agents Shipgate.
> If you are a human, the README and the [wiki](https://github.com/ThreeMoonsLab/agents-shipgate/wiki) are the right places to start. This file is optimized for agent ingest: short, copy-pasteable, machine-friendly.
---
## What this project is
Static release-readiness gate for AI agent tool surfaces. Reads `shipgate.yaml` plus tool sources (MCP exports, OpenAPI specs, OpenAI Agents SDK Python files, Anthropic Messages API tool/prompt artifacts, Google ADK Python/config files, LangChain/LangGraph Python files, CrewAI Python files, n8n workflow JSON/stubs) and produces deterministic findings.
- **Inputs:** MCP · OpenAPI · OpenAI Agents SDK · Anthropic Messages API · Google ADK · LangChain/LangGraph · CrewAI · n8n
- **Outputs:** Markdown · JSON · SARIF
- **Trust:** Static-by-default. No agent execution, tool calls, LLM calls, or network access.
- **Marketing site:** [threemoonslab.com](https://threemoonslab.com/) — canonical brand URL with human-readable companion pages: [/quickstart/](https://threemoonslab.com/quickstart/), [/glossary/](https://threemoonslab.com/glossary/), [/checks/](https://threemoonslab.com/checks/), [/design-partners/](https://threemoonslab.com/design-partners/). The site also serves a [/.well-known/agents-shipgate.json](https://threemoonslab.com/.well-known/agents-shipgate.json) discovery file **pinned to the latest released tag** for external consumers and AI search. **If you are an agent working inside this repo, use the in-tree [`.well-known/agents-shipgate.json`](.well-known/agents-shipgate.json) (current `main` contract, may be ahead of the released file) for schema-version and gating-signal decisions.**
---
## Naming (canonical)
Use exactly one form depending on context. Mixing them in user-visible copy is an adoption cost.
| Form | When to use |
|---|---|
| **Agents Shipgate** | Display name. Prose, headings, marketing copy, social cards, slide titles, blog posts. |
| **`agents-shipgate`** | Package, CLI binary, repo, GitHub Action, PyPI distribution name, env-var prefix (`AGENTS_SHIPGATE_*`), import path (`agents_shipgate`). Always lowercase, kebab-case. |
| **`shipgate`** | Short alias for the CLI binary only. Acceptable in shell snippets where brevity helps; never as the project name. |
Do **not** use any of: `Agent Shipgate` (singular), `Agent Shipcheck`, `agents shipgate` (display lowercase), `Agents-Shipgate` (display kebab). When in doubt: prose → `Agents Shipgate`; code → `agents-shipgate`.
The canonical tagline is:
> Static release-readiness gate for AI agent tool surfaces.
This single sentence is the source of truth for the GitHub repo description, [README.md](README.md), the [wiki Home page](https://github.com/ThreeMoonsLab/agents-shipgate/wiki/Home), and the [marketing site](https://threemoonslab.com/) ``. Keep them in sync.
---
## Install (canonical)
```bash
pipx install agents-shipgate
```
Alternatives if `pipx` is unavailable:
```bash
python -m pip install agents-shipgate # global pip
uv tool install agents-shipgate # via uv
python -m agents_shipgate --help # run from a pip install without PATH
```
The CLI binary is `agents-shipgate`. A short alias `shipgate` is also installed.
---
## Run (canonical)
In a repo that contains an agent and its tools:
```bash
agents-shipgate init --workspace . --write
agents-shipgate scan -c shipgate.yaml
```
Reports land at `agents-shipgate-reports/report.{md,json}`.
To verify your install on a known fixture without writing any YAML:
```bash
agents-shipgate fixture run support_refund_agent
```
---
## Single-turn agent flow (v0.6+)
For coding agents adopting Shipgate end-to-end in one turn:
```bash
agents-shipgate detect --json
agents-shipgate init --write --ci --json
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches --from agents-shipgate-reports/report.json \
--confidence high --apply
```
Or chain all four in one call:
```bash
agents-shipgate bootstrap --json
```
`bootstrap` runs `detect → init --write --ci → scan --suggest-patches → apply-patches --confidence high` against the current workspace, stopping on the first non-recoverable error and emitting a structured per-step summary. Use it for first-time adoption; for ongoing CI keep using the GitHub Action. Flags: `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json`.
- **`detect`** — read-only; classifies the workspace. `is_agent_project: false`
means stop early.
- **`init`** — auto-detects by default. `--ci` writes
`.github/workflows/agents-shipgate.yml`; orthogonal to `--write`. Use
`--minimal` for the pre-v0.6 CHANGE_ME-heavy template.
`--agent-instructions=all` (or a comma-separated subset of
`agents-md,claude-md,cursor,pr-template`) renders agent-facing snippets to
stdout; combined with `--write` it commits them to the target repo via
managed `` markers (idempotent — safe to
rerun). Strict CI and baselines remain opt-in human decisions; the flag
emits advisory guidance only.
- **`scan --suggest-patches`** — attaches Patch objects to every active
finding. `Finding.patches` is absent without the flag.
- **`apply-patches`** — file-grouped, dry-run by default. Containment-
checked against `report.manifest_dir`. v0.6 default `--confidence high`
applies only manifest stale-removals; scope-coverage appends require
`--confidence medium`. Trace approval/confirmation findings are
always `ManualPatch` — never auto-applied (flipping the trace patches
the evidence, not the agent's runtime gate).
---
## Agent mode
Every command supports JSON output for programmatic consumption:
```bash
agents-shipgate detect --workspace . --json
agents-shipgate init --workspace . --write --json
agents-shipgate scan -c shipgate.yaml # already produces report.json
agents-shipgate apply-patches --from agents-shipgate-reports/report.json --json
agents-shipgate doctor --json
agents-shipgate contract --json
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING --json
agents-shipgate list-checks --json
agents-shipgate self-check --json
agents-shipgate fixture list --json
```
Errors carry a structured `next_action` (single string, back-compat) and `next_actions` (ranked list) when run with `AGENTS_SHIPGATE_AGENT_MODE=1`:
```bash
$ AGENTS_SHIPGATE_AGENT_MODE=1 agents-shipgate scan -c missing.yaml
Config error: Config file not found: missing.yaml
{"error": "config_error", "message": "...", "next_action": "agents-shipgate detect --workspace . --json", "next_actions": [{"kind": "command", "command": "agents-shipgate detect --workspace . --json", "why": "..."}, {"kind": "command", "command": "agents-shipgate init --workspace . --write", "why": "..."}]}
```
The full set of error kinds emitted in agent mode: `config_error`, `config_already_exists`, `input_parse_error`, `unknown_check_id`, `unknown_fingerprint`, `other_error`, `internal_error`, `malformed_patch`. `unknown_fingerprint` is emitted by `explain-finding` when the fingerprint doesn't match any entry in the supplied report; the payload includes `suggestion` (a close-match fingerprint, when one exists) and `source_report`.
The machine-readable catalog of error kinds — exit codes, typical causes, additional fields per kind, recovery hints — lives at [`docs/errors.json`](docs/errors.json). Pre-fetch it once and pattern-match the `error` field instead of re-deriving the recovery vocabulary from this prose.
`detect --json` and each `doctor --json` payload also carry `diagnostics: [...]` and `next_actions: [...]` fields. `next_action` (single string) remains the rank-1 action projected to a string; `next_actions` is the ranked list with `kind`, `command|path`, `why`, and `expects` fields. See [docs/diagnostics.md](docs/diagnostics.md) for the full catalog and schema.
### Doctor behavior change for unresolved tool_sources
When a required `tool_sources[].path` does not resolve under the manifest directory (file missing OR resolves outside the manifest dir):
- `agents-shipgate doctor --json` exits **0** with a `SHIP-DIAG-MISSING-SOURCE-FILE` diagnostic and an `unresolved_sources: [{id, declared_path, line, reason}]` field in the payload, so an agent can route to a fix without parsing the error message. `reason` is `"missing"` or `"outside_manifest_dir"`.
- `agents-shipgate doctor` (no `--json`) prints the same `unresolved_sources` + diagnostic block in human-readable form and **exits 3**, preserving the pre-feature loud failure for interactive users.
- `agents-shipgate scan` is unchanged — it still raises `InputParseError(3)` regardless of `--json`. Once you're past doctor, missing sources are real scan failures.
### Missing vs invalid manifests
`config_error` covers two distinct shapes — agent-mode emits a different rank-1 action for each:
- **Missing**: file does not exist → `SHIP-DIAG-MISSING-MANIFEST`, rank-1 is `agents-shipgate detect --workspace
--json` (then `init --write`).
- **Invalid**: file exists but the loader rejected it (invalid YAML, schema validation, unsupported version) → `SHIP-DIAG-INVALID-MANIFEST`, rank-1 is `edit ` with the loader error in `why`. Do **not** re-run `init` — it refuses to overwrite an existing file.
---
## When to use this tool
Use it when the user is:
- Writing or modifying an AI agent that calls tools (MCP servers, OpenAPI APIs, `@function_tool`-decorated Python, Google ADK, LangChain/LangGraph, CrewAI, n8n)
- Publishing or modifying a Codex plugin package or marketplace entry that declares skills, apps, MCP servers, or hooks
- Adding a new tool, scope, or policy to an existing agent
- Preparing an agent for production promotion
- Wiring a release gate into PR CI
Do NOT use it for:
- General code linting (ruff, eslint, etc.)
- Runtime monitoring (use observability/tracing)
- Behavior testing (use evals)
- Stopping a tool call in flight (use a gateway or guardrail)
### Should I run Shipgate on this PR?
| Trigger in this PR | Run Shipgate? |
|---|---|
| Adds/changes MCP exports, OpenAPI specs, or `tools/*openai*tools*.json` | Yes |
| Adds/changes Codex plugin manifests, marketplace files, `.app.json`, `.mcp.json`, or `SKILL.md` files | Yes |
| Adds/changes `@function_tool`/`@tool` decorators (LangChain, CrewAI, OpenAI Agents SDK) | Yes |
| Adds/changes n8n workflow JSON, credential stubs, or n8n tool inventories | Yes |
| Edits `prompts/`, `policies/`, or `permissions.scopes` in `shipgate.yaml` | Yes |
| Adds/edits `.github/workflows/agents-shipgate.yml` or related CI | Yes |
| Pure read-only doc/test changes with no manifest impact | Skip |
| Refactor with no behavior change to tools or policies | Skip (or dry-run only) |
Two implicit triggers also fire even when no row above matches:
- **Repo already opted in (shipgate.yaml present in the workspace)** — run on every PR; the manifest's existence is the opt-in.
- **(Optional) Refactor or framework upgrade that may shift the extracted tool surface** — dry-run only; bumping `openai-agents`, `langchain`, `crewai`, or `google-adk` can change AST extraction even without app-code edits.
A machine-readable mirror of these triggers lives at [`docs/triggers.json`](docs/triggers.json). Coding agents that have not yet adopted Shipgate can fetch the file (raw URL: `https://raw.githubusercontent.com/ThreeMoonsLab/agents-shipgate/main/docs/triggers.json`), apply the rules to a PR diff, and decide whether to propose `agents-shipgate detect`. The catalog is stable for `0.x` and pinned by the public-surface contract test against this prose table — if you change a row above, update `triggers.json` in the same commit. To verify a path list locally:
```bash
python -m agents_shipgate.triggers shipgate.yaml prompts/refund.md
python -m agents_shipgate.triggers --list-rules --json
```
**Stop conditions.** Stop and do not run `init` only when **all** of these hold:
- `agents-shipgate detect --json` returns `is_agent_project: false`, AND
- `suggested_sources` is empty (no MCP/OpenAPI hits flowing in as `mcp` or `openapi`), AND
- `codex_plugin_candidates` is empty (no Codex plugin package or marketplace hits), AND
- no `shipgate.yaml` already exists in the workspace, AND
- the user did not explicitly request a scan.
Otherwise proceed to `init`. MCP/OpenAPI tool-surface repos and Codex plugin package repos register as `is_agent_project: false` because they have no Python framework imports — but they are valid Shipgate targets. MCP/OpenAPI hits surface as `suggested_sources`; Codex plugin hits surface as `codex_plugin_candidates`. The trigger table above is the authoritative go/no-go.
---
## Five common agent tasks
### Task 1 · Add the gate to an existing repo
```bash
pipx install agents-shipgate
agents-shipgate init --workspace . --write
# edit shipgate.yaml to replace any CHANGE_ME values
agents-shipgate scan -c shipgate.yaml
```
`init` writes a manifest with `CHANGE_ME` placeholders for `agent.name` and `agent.declared_purpose`. Replace them by reading the agent's prompt or main file.
### Task 2 · Read findings programmatically
Always parse `agents-shipgate-reports/report.json`, not the markdown.
The canonical field list — `release_decision`, `capability_facts` / `declared_intentions` / `misalignments` / `release_consequence` / `suggested_scenarios`, `tool_surface_facts` / `tool_surface_diff`, and `action_surface_facts` / `action_surface_diff` — lives in [`docs/agent-contract-current.md`](docs/agent-contract-current.md#read-these-first-for-release-gating). It updates first when the contract bumps; this file links to it instead of restating the field set.
Other stable top-level fields:
- `summary.{critical_count, high_count, medium_count, status}` (status preserved for v0.7 compat — see note below)
- `findings[].{id, fingerprint, check_id, severity, tool_name, evidence, recommendation, suppressed}`
- `findings[].{autofix_safe, requires_human_review, suggested_patch_kind, docs_url}` (v0.7+)
- `findings[].patches[]` (v0.6+, only when scan ran with `--suggest-patches`)
- `baseline.{matched_count, new_count, resolved_count}`
- `tool_inventory[]`
- `codex_plugin_surface` (v0.13+, static Codex plugin package/marketplace facts)
- `findings[].provenance_kind` (v0.15+, per-finding rule provenance — `static_declaration | ast_extraction | keyword_heuristic | regex_heuristic | policy_pack`; independent of `confidence`, useful for filtering heuristic-only findings)
- `findings[].blocks_release` (v0.16+, explicit release-policy blockers from Action Surface Diff policies)
- `action_surface_facts` / `action_surface_diff` (v0.16+, deterministic action snapshot and base/head action delta)
The full schema is at [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (current; emitted reports carry `report_schema_version: "0.16"`). v0.16 adds first-class Action Surface Diff fields, on top of v0.15's per-finding `provenance_kind` enum, v0.14's `insufficient_evidence` value in the `release_decision.decision`/`agent_summary.verdict` enums, and v0.13's `codex_plugin_surface` block. Older reports validate against [`docs/report-schema.v0.15.json`](docs/report-schema.v0.15.json) (frozen reference). What's-stable is documented in [STABILITY.md](STABILITY.md).
**Release gating signal**: prefer `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`) over `summary.status`. The new field is **baseline-aware** — a baseline-matched critical surfaces in `release_decision.review_items` (accepted debt), not `release_decision.blockers`. `summary.status` stays baseline-blind for v0.7 compatibility, so a baseline-matched-only critical produces both `summary.status = "release_blockers_detected"` AND `release_decision.decision = "review_required"` (intentional divergence — see [STABILITY.md](STABILITY.md#release_decisiondecision-vs-summarystatus)). `insufficient_evidence` (added v0.14) signals that the scan saw too many low-confidence tools or source-loader warnings to be trustworthy; consumers that switch on the enum must fall back to `review_required` for unknown future values.
For a step-by-step reader's primer with anti-patterns and concrete code rewrites, see [`docs/report-reading-for-agents.md`](docs/report-reading-for-agents.md).
### Task 3 · Suppress a finding with a reason
```yaml
# shipgate.yaml
checks:
ignore:
- check_id: SHIP-DOC-MISSING-DESCRIPTION
tool: legacy_search
reason: tool deprecated 2026-Q2
```
`reason` is required and non-empty; the manifest fails validation otherwise.
### Task 4 · Save a baseline before enabling strict CI
```bash
agents-shipgate baseline save -c shipgate.yaml --out .agents-shipgate/baseline.json
```
Then in CI:
```bash
agents-shipgate scan -c shipgate.yaml \
--baseline .agents-shipgate/baseline.json \
--ci-mode strict --fail-on critical,high
```
Strict mode fails CI only on **new** findings (those not in the baseline).
### Task 5 · Explain a check or a specific finding
For static catalog metadata about a check ID (rationale, fires-when, recommendation):
```bash
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING --json
```
Returns the full `CheckMetadata` with `id`, `category`, `default_severity`, `description`, `rationale`, `fires_when`, `evidence_fields`, `recommendation`.
For a contextual explanation tied to a specific finding from a real scan (catalog metadata + the finding's evidence + a 3–5 sentence templated prose summary):
```bash
agents-shipgate explain-finding fp_ \
--from agents-shipgate-reports/report.json --json
```
Returns the canonical Finding fields plus `metadata` (CheckMetadata for the check_id) and `explanation` — a deterministic prose summary suitable for direct quotation in a PR comment or chat reply. The companion prompt is [`prompts/explain-finding-to-user.md`](prompts/explain-finding-to-user.md).
---
## Agent FAQ
### Where is the manifest schema?
Use [`docs/manifest-v0.1.json`](docs/manifest-v0.1.json) for machine
validation and [`docs/manifest-v0.1.md`](docs/manifest-v0.1.md) for prose.
### Where is the report schema?
Parse `agents-shipgate-reports/report.json` and validate against
[`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) (current).
Older reports (`report_schema_version: "0.10"`) validate against the
frozen [`docs/report-schema.v0.10.json`](docs/report-schema.v0.10.json).
Do not scrape Markdown when JSON is available.
### How do I add a new check?
Follow [`docs/architecture.md`](docs/architecture.md) and update the check
registry, tests, [`docs/checks.md`](docs/checks.md), and
[`docs/checks.json`](docs/checks.json). Check IDs must not change after
publication.
### How do I add a new framework adapter?
Start with [`docs/framework-adapter-checklist.md`](docs/framework-adapter-checklist.md).
Adapters must be static by default: no user-code import, no network access, no
agent execution.
### Where are runnable examples?
Use [`samples/README.md`](samples/README.md) for sample agents and
[`docs/examples.md`](docs/examples.md) for a narrative overview. The fastest
fixture is `agents-shipgate fixture run support_refund_agent`.
### What vocabulary should I use in user-facing copy?
Use the [canonical names](#canonical-names) table above and the website
glossary: https://threemoonslab.com/glossary/.
---
## Schemas
For the short, current statement of "which fields to read", see [`docs/agent-contract-current.md`](docs/agent-contract-current.md). It is the single file that updates first when the contract bumps; the table below lists the underlying schemas.
| What | Path | Stable |
|---|---|---|
| Manifest schema | [`docs/manifest-v0.1.json`](docs/manifest-v0.1.json) | `0.1` |
| Report schema (current) | [`docs/report-schema.v0.16.json`](docs/report-schema.v0.16.json) | `0.16` |
| Report schema (v0.15 frozen reference) | [`docs/report-schema.v0.15.json`](docs/report-schema.v0.15.json) | `0.15` |
| Report schema (v0.14 frozen reference) | [`docs/report-schema.v0.14.json`](docs/report-schema.v0.14.json) | `0.14` |
| Report schema (v0.13 frozen reference) | [`docs/report-schema.v0.13.json`](docs/report-schema.v0.13.json) | `0.13` |
| Report schema (v0.12 frozen reference) | [`docs/report-schema.v0.12.json`](docs/report-schema.v0.12.json) | `0.12` |
| Report schema (v0.11 frozen reference) | [`docs/report-schema.v0.11.json`](docs/report-schema.v0.11.json) | `0.11` |
| Report schema (v0.10 frozen reference) | [`docs/report-schema.v0.10.json`](docs/report-schema.v0.10.json) | `0.10` |
| Report schema (v0.9 frozen reference) | [`docs/report-schema.v0.9.json`](docs/report-schema.v0.9.json) | `0.9` |
| Report schema (v0.8 frozen reference) | [`docs/report-schema.v0.8.json`](docs/report-schema.v0.8.json) | `0.8` |
| Report schema (v0.7 frozen reference) | [`docs/report-schema.v0.7.json`](docs/report-schema.v0.7.json) | `0.7` |
| Report schema (v0.6 frozen reference) | [`docs/report-schema.v0.6.json`](docs/report-schema.v0.6.json) | `0.6` |
| Packet schema (Release Evidence Packet) | [`docs/packet-schema.v0.5.json`](docs/packet-schema.v0.5.json) | `0.5` |
| Check catalog | [`docs/checks.json`](docs/checks.json) | regenerated each release |
| Anti-patterns (what NOT to write) | [`samples/_anti_patterns/`](samples/_anti_patterns/) | reference |
| Minimal manifest example | [`docs/manifest-v0.1.example.minimal.yaml`](docs/manifest-v0.1.example.minimal.yaml) | reference |
For VS Code / Cursor live YAML validation, every manifest produced by `init` includes:
```yaml
# yaml-language-server: $schema=https://raw.githubusercontent.com/ThreeMoonsLab/agents-shipgate/main/docs/manifest-v0.1.json
```
---
## Stable command surface
Promised to not break in `0.x` minor versions. See [STABILITY.md](STABILITY.md) for the full contract.
| Command | Stable flags |
|---|---|
| `agents-shipgate scan` | `-c`, `--out`, `--format`, `--ci-mode`, `--fail-on`, `--baseline`, `--diff-from`, `--no-plugins`, `--verbose`, `--packet`/`--no-packet`, `--packet-format` |
| `agents-shipgate evidence-packet` | `--from`, `--out`, `--format`, `--json` |
| `agents-shipgate init` | `--workspace`, `--write`, `--json` |
| `agents-shipgate doctor` | `-c`, `--workspace`, `--json`, `--verbose` |
| `agents-shipgate contract` | `--json` |
| `agents-shipgate explain` | ``, `--no-plugins`, `--json` |
| `agents-shipgate explain-finding` | ``, `--from`, `--no-plugins`, `--json` |
| `agents-shipgate bootstrap` | `--workspace`, `--confidence`, `--no-ci`, `--no-apply`, `--json` |
| `agents-shipgate list-checks` | `--json`, `--no-plugins` |
| `agents-shipgate baseline save` | `-c`, `--out` |
| `agents-shipgate fixture` | `list`, `run`, `copy`, `verify` |
| `agents-shipgate self-check` | `--json` |
### Release Evidence Packet (v0.5)
`scan` emits a reviewer-shaped Release Evidence Packet alongside `report.{md,json}` by default. The packet is a curated synthesis with fixed reviewer sections derived from the in-memory scan; outputs land at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the optional `[pdf]` extras are installed). For the field-level packet contract, see [`docs/agent-contract-current.md`](docs/agent-contract-current.md#read-these-for-release-review) and [STABILITY.md §Release Evidence Packet](STABILITY.md#release-evidence-packet-v05).
```bash
pipx install agents-shipgate # md, json, html packet outputs
pipx install 'agents-shipgate[pdf]' # adds packet.pdf via weasyprint
agents-shipgate scan -c shipgate.yaml # default: emit packet
agents-shipgate scan -c shipgate.yaml --no-packet # skip
agents-shipgate scan -c shipgate.yaml --packet-format md,json,html,pdf
# Re-render from the existing packet (full fidelity):
agents-shipgate evidence-packet --from agents-shipgate-reports/packet.json --format html,pdf
# Or rebuild from a CI-archived report.json (degraded — see §10 of the output):
agents-shipgate evidence-packet --from agents-shipgate-reports/report.json --format md,html
```
Rules of the packet contract (do not break in 0.x):
- The packet is **derived from JSON** (the in-memory scan) and is a **local artifact only** — no hosted/SaaS view.
- §10 ("What this packet did NOT prove") **always** lists the four canonical disclaimers verbatim — prompt robustness, runtime behavior, model correctness, adversarial resistance — regardless of run state.
- All reviewer sections are **always present** in `packet.json`, including `tool_surface_diff`. Sections that have no evidence render with `status: "not_declared"` (or `"informational"`) and refer the reviewer to §10.
- §8 (`human_in_the_loop`) always carries `runtime_control_disclaimer`. When local validation artifacts are available, `source_provenance[]` traces approval traces, override logs, high-risk exclusions, promotion criteria, and manifest requirements.
- §1 verdict (`PASSED` / `REVIEW REQUIRED` / `INSUFFICIENT EVIDENCE` / `BLOCKED`) derives from `release_decision.decision` only (with `INSUFFICIENT EVIDENCE` mirroring the v0.14 `insufficient_evidence` decision value). CI behavior (`fail_policy`) is rendered separately as metadata, not as the verdict source.
- The current manifest schema does **not** model `agent.memory`. §7 always renders "not declared, see §10" until a future schema bump adds the field.
Exit codes (stable):
| Code | Meaning |
|---|---|
| `0` | Pass (advisory or strict-no-blockers) |
| `2` | Manifest config error |
| `3` | Input parse error (file missing, malformed, path traversal blocked, file too large) |
| `4` | Other Agents Shipgate error |
| `20` | Strict-mode gate failure |
---
## What you can't do (intentionally)
This section is the **CLI's** invariants. For the **agent's** behavioral boundary — what an agent driving Shipgate may assert in PR comments and review summaries — see [`docs/agent-autofix-boundary.md`](docs/agent-autofix-boundary.md).
- The CLI does not modify user code; it only reads.
- The CLI does not connect to MCP servers; it reads exported JSON only.
- Tool sources outside the manifest directory are rejected (path traversal containment).
- Files larger than 10 MB are rejected.
- Plugins are off by default (`AGENTS_SHIPGATE_ENABLE_PLUGINS=1` to enable; `--no-plugins` to force off).
---
## When you make changes to this repo
- Run `python -m ruff check .` and `python -m pytest` before committing.
- Bumping a check's behavior requires updating the test suite and any golden fixtures under `samples/*/expected/`.
- New checks must include: code in `src/agents_shipgate/checks/`, metadata in `checks/registry.py:CHECK_METADATA`, a test in `tests/`, and a row in `docs/checks.md`.
- Do not change check IDs in published versions; always add new ones.
- If you regenerate the JSON schemas, run `python scripts/generate_schemas.py` and commit `docs/manifest-v0.1.json` + `docs/checks.json`.
---
## Reusable prompts
Prebuilt prompts for common workflows live in [`prompts/`](prompts/):
- [`decide-shipgate-relevance.md`](prompts/decide-shipgate-relevance.md) — apply [`docs/triggers.json`](docs/triggers.json) to decide whether Shipgate should run at all
- [`add-shipgate-to-repo.md`](prompts/add-shipgate-to-repo.md) — bootstrap a repo
- [`fix-top-finding.md`](prompts/fix-top-finding.md) — iterate on a single finding
- [`recommend-fixes.md`](prompts/recommend-fixes.md) — walk all active findings and surface targeted fix recommendations across the four autofix-policy classes
- [`explain-finding-to-user.md`](prompts/explain-finding-to-user.md) — translate one finding into 3–5 sentences of user-facing prose; companion to `agents-shipgate explain-finding`
- [`stabilize-strict-mode.md`](prompts/stabilize-strict-mode.md) — tune → baseline → promote
- [`triage-false-positive.md`](prompts/triage-false-positive.md) — override vs suppress decision
- [`upgrade-shipgate-version.md`](prompts/upgrade-shipgate-version.md) — bump agents-shipgate version safely (regenerate baseline if needed)
For downstream repos, use [`docs/target-repo-agent-snippets.md`](docs/target-repo-agent-snippets.md)
to copy Shipgate trigger rules into `AGENTS.md`, `CLAUDE.md`, Cursor rules,
PR templates, and advisory CI. Use
[`docs/agent-adoption-harness.md`](docs/agent-adoption-harness.md) to evaluate
whether coding agents discover and use Shipgate without being prompted by name.
### Editor / agent integrations
Per-agent install guides for dropping Shipgate into your own agent project:
- [`docs/agents/use-with-claude-code.md`](docs/agents/use-with-claude-code.md) — install the `/shipgate` slash command and `agents-shipgate` auto-discoverable skill. Source surfaces ship at [`.claude/commands/shipgate.md`](.claude/commands/shipgate.md) and [`skills/agents-shipgate/`](skills/agents-shipgate/) (named `agents-shipgate` to avoid colliding with the slash command — Claude Code lets a same-named skill preempt a command). The skill bundles the recipes in [`skills/agents-shipgate/prompts/`](skills/agents-shipgate/prompts/) and a starter advisory CI workflow at [`skills/agents-shipgate/ci-recipes/advisory-pr-comment.yml`](skills/agents-shipgate/ci-recipes/advisory-pr-comment.yml); when you change anything in [`prompts/`](prompts/) or `examples/github-actions/01-advisory-pr-comment.yml`, sync the bundled copy.
- [`docs/agents/use-with-codex.md`](docs/agents/use-with-codex.md) — drop the canonical `AGENTS.md` snippet (from [`docs/target-repo-agent-snippets.md`](docs/target-repo-agent-snippets.md)) into your repo. Codex reads `AGENTS.md` natively. Codex Skills (`.agents/skills//SKILL.md` repo-scoped or `$HOME/.agents/skills//SKILL.md` user-scoped; invoked with `/skills` or `$`) are also supported, but this repo does not currently ship a Codex skill bundle — the parallel to [`skills/agents-shipgate/`](skills/agents-shipgate/) has not been authored. The `AGENTS.md` snippet is the minimal on-ramp that works today.
- [`docs/agents/use-with-cursor.md`](docs/agents/use-with-cursor.md) — drop the canonical `.cursor/rules/agents-shipgate.mdc` auto-attach rule (from [`docs/target-repo-agent-snippets.md`](docs/target-repo-agent-snippets.md)) into your repo. The rule fires whenever a chat touches `shipgate.yaml`, an MCP/OpenAPI spec, a tool JSON, or a `.py` file.
---
## Verification
After you (the agent) complete a task involving Agents Shipgate, verify:
1. `agents-shipgate self-check --json` returns `"ready": true`.
2. `agents-shipgate contract --json` matches the installed CLI contract you expect.
3. The user's `shipgate.yaml` has no `CHANGE_ME` placeholders.
4. A scan completes with exit code 0 (advisory mode) and writes `report.json`.
5. The user's repo `.gitignore` includes `agents-shipgate-reports/` (do not commit reports).
# Agent recipes
Copy-pasteable workflows for AI coding agents (Claude Code, Codex, Cursor,
Aider) that need to drive `agents-shipgate` end-to-end without prompting
the user. Every command is read-only or schema-validated; no agent
execution, no LLM calls, no network access.
> If you are a human, [`quickstart.md`](quickstart.md) is the friendlier
> entry point. This page is structured for agents that consume `--json`.
---
## Recipe 1 · Single-turn adoption (the canonical 4-call flow)
Use this when a repo doesn't yet have `shipgate.yaml`. Four calls in
one user turn take it from "looks like an agent project" to "Shipgate
is integrated, scan green or with safe trivial findings auto-applied,
CI workflow optionally drafted."
```bash
agents-shipgate detect --json
agents-shipgate init --write --ci --json
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches \
--from agents-shipgate-reports/report.json \
--confidence high --apply
```
### Step 1 · `detect --json` (read-only)
Consume the response to decide whether to proceed. Key fields:
- `is_agent_project` — `true` when at least one Python framework
scored ≥ 2.0 with a strong signal.
- `frameworks[]` — per-framework scores + evidence + candidate file
paths.
- `agent_name_candidates[]` — ranked `{value, source}`. Source values:
`Agent_name_literal` (highest), `ADK_name_field`, `workspace_dir`
(lowest).
- `project_name_candidates[]` — same shape; `pyproject` source seeds
`project.name` only.
- `suggested_sources[]` — MCP/OpenAPI files matched by glob. These do
NOT bump `is_agent_project` on their own.
- `codex_plugin_candidates[]` — Codex plugin package or marketplace
artifacts matched by convention. These also do NOT bump
`is_agent_project` on their own.
**Stop condition.** Stop and skip `init` only when ALL of:
- `is_agent_project` is `false`, AND
- `suggested_sources` is empty, AND
- `codex_plugin_candidates` is empty, AND
- no `shipgate.yaml` already exists, AND
- the user did not explicitly request a scan.
Otherwise proceed. MCP/OpenAPI-only tool-surface repos and Codex plugin
package repos surface as `is_agent_project: false` but should still be
onboarded — their sources will land in `tool_sources` during `init`.
### Step 2 · `init --write --ci --json`
Auto-detection runs again inside `init` and writes:
- `shipgate.yaml` with `tool_sources` populated per detected framework
candidate file.
- `.github/workflows/agents-shipgate.yml` (if `--ci` is set; refuses
to overwrite an existing workflow file or one that already calls
`ThreeMoonsLab/agents-shipgate@*` from a sibling workflow).
Key response fields:
- `manifest_status`: `"written"` | `"skipped_existing"` | `"not_attempted"`.
- `workflow.status` (when `--ci`): `"written"` | `"skipped_existing_target"`
| `"skipped_cross_reference"`.
- `placeholders[]` — entries the template intentionally leaves as
`CHANGE_ME` because no high-confidence signal was available. Each has
a `path` (YAML-pointer-ish location) and `current` value. Replace
these before scanning.
- `auto_detected.agent_name` — the value the manifest carries
(`null` when the template fell back to `CHANGE_ME`; matches the YAML
exactly).
`--ci` is orthogonal to `--write`: each gets its own overwrite-refusal.
Exit code is the max of per-action outcomes; manifest-error and
workflow-skip can co-occur.
### Step 3 · `scan -c shipgate.yaml --suggest-patches --format json`
Writes to `agents-shipgate-reports/report.json`. Read it, walk
`findings[]` filtering on `suppressed`. Per-finding fields you can rely
on today:
- `check_id`, `title`, `severity`, `category`, `evidence`,
`confidence`, `recommendation`.
- `patches[]` (only when `--suggest-patches` is set) — list of
patch objects with `kind` ∈ `{set_pointer, append_pointer,
remove_pointer, manual}`. Non-manual patches additionally carry
`confidence` ∈ `{low, medium, high}`, `target_file`, `pointer`,
`target_format`, `rationale`, `target_sha256`.
- `manifest_dir` (top-level on the report) — absolute path to the
directory containing `shipgate.yaml`. `apply-patches` enforces a
containment check against this.
When `--suggest-patches` is set, every active (unsuppressed) finding
has at least one patch. Manual-only findings (e.g. trace approval
flips, per-check policy decisions) carry a single `ManualPatch` with
`instructions` instead of a machine-applicable patch.
Optional dynamic-validation handoff:
```bash
agents-shipgate scenario suggest \
--from agents-shipgate-reports/report.json \
--out agents-shipgate-reports/suggested-scenarios.yaml
```
This YAML is a concrete per-finding/per-tool fan-out of
`report.json.suggested_scenarios[]`, not a separate scenario engine.
Suppressed findings are omitted; baseline-matched findings remain because
they are accepted debt, not resolved risk.
### Step 4 · `apply-patches --confidence high --apply`
Default `--confidence high` only auto-applies patches whose `confidence`
field is `"high"`. Today that's the 3 stale-manifest removals
(`SHIP-MANIFEST-STALE-{SUPPRESSION,POLICY,RISK-OVERRIDE}`). Scope
coverage appends ship at `medium` and require explicit
`--confidence medium` to apply.
`apply-patches` is dry-run by default — `--apply` is required to
mutate files. Containment-checked: any `target_file` outside
`report.manifest_dir` aborts with exit code 5 before SHA verification.
### Step 5 (optional) · Summarize for the user
When the flow completes, summarize `report.json`:
- `release_decision.decision` (`"blocked" | "review_required" | "insufficient_evidence" | "passed"`)
— the v0.8+ release-gate signal (`insufficient_evidence` added v0.14).
Prefer this over `summary.status`, which stays baseline-blind for
backwards compat. Switch on the value with a `review_required`
fallback for unknown future values.
- `release_decision.reason` (one-sentence explanation).
- Top 3 active critical/high findings with their `check_id`,
`tool_name` (when present), and `recommendation`.
- Whether any patches were applied (count from
`apply-patches --json` output's `files`).
Link findings back to [`docs/checks.md#`](checks.md) so the user
can read full check rationale.
---
## Recipe 2 · Add Shipgate to a repo that already has tool surfaces
Same as Recipe 1, but `detect` may report `is_agent_project: false`
when the repo only ships MCP exports or OpenAPI specs. Per the soft
stop rule above, proceed anyway when `suggested_sources` is non-empty.
`init` will populate `tool_sources` from those globs. The rest of the
flow (steps 2-5) is identical.
### First-real-repo recovery rules
When the first repo scan does not produce useful tools, follow these
rules before changing code:
- If `detect --json` has MCP/OpenAPI `suggested_sources`, continue to
`init` even when `is_agent_project` is `false`.
- If `doctor` shows zero tools, inspect `tool_sources[].path`, MCP
`tools[]`, OpenAPI `paths`, optional source warnings, and dynamic
ADK/MCP warnings.
- If tools are created by factories, wrappers, runtime imports, or
dynamic ADK/MCP toolsets, provide an explicit MCP export, OpenAPI
spec, or local tool inventory artifact.
- Replace every `CHANGE_ME` value in `shipgate.yaml` before scanning;
use the prompt, main agent file, README, or owner-provided context.
- Agents Shipgate requires Python 3.12+. If the project runtime is
older, install the CLI outside the project env with `pipx` or `uv`.
- Ensure `agents-shipgate-reports/` is listed in `.gitignore`.
---
## Recipe 3 · Re-scan after editing the manifest
When the user has already replaced `CHANGE_ME` placeholders or added
policies:
```bash
agents-shipgate scan -c shipgate.yaml --suggest-patches --format json
agents-shipgate apply-patches \
--from agents-shipgate-reports/report.json \
--confidence high --apply
```
`run_id` is deterministic for the same input — if the report's
`run_id` is unchanged from the previous run, nothing semantic about
the manifest+tool-surface changed.
---
## Recipe 4 · Suppress a check or finding
When a finding is a known false positive, edit `shipgate.yaml`:
```yaml
checks:
ignore:
- check_id: SHIP-DOC-MISSING-DESCRIPTION
tool: support_lookup_v2 # optional; omit to suppress for ALL tools
reason: "Tool description matches the upstream OpenAPI summary."
```
`reason` is required — empty reasons fail manifest validation. Re-run
`scan` to confirm the finding is gone (it will appear in `findings[]`
with `suppressed: true` rather than disappearing from the report).
If you suppress a check that no longer fires, the next scan emits
`SHIP-MANIFEST-STALE-SUPPRESSION` — auto-removable via
`apply-patches`.
---
## Recipe 5 · Add Shipgate to CI without changing existing workflows
```bash
agents-shipgate init --workspace . --ci # no --write
```
Without `--write`, the manifest is printed to stdout (don't write a
new one). With `--ci`, the workflow file is still written orthogonally
unless an existing workflow already references the action — in which
case `workflow.status: "skipped_cross_reference"` and the path of the
existing workflow is reported in `cross_reference_path`.
---
## Output handling
- Always pass `--json` (where supported) and parse the result. The
human-readable stdout is unstable; the JSON shape is the contract.
- `scan` does not have `--json`; instead pass `--format json` and read
`agents-shipgate-reports/report.json`.
- Errors emit a structured `next_action` JSON line on stderr when
`AGENTS_SHIPGATE_AGENT_MODE=1` is set. Surface that path to the user
rather than scraping prose.
## Pre-flight reminder
`agents-shipgate-reports/` is a local artifact directory. Before
committing, ensure it's listed in `.gitignore`:
```gitignore
agents-shipgate-reports/
```
`init` does not touch `.gitignore` — leave that to the user or follow
up with an explicit edit.
---
## Reference
- [`docs/agent-autofix-boundary.md`](agent-autofix-boundary.md) — what
an agent may do mechanically vs. what must defer to a human reviewer.
- [`docs/report-reading-for-agents.md`](report-reading-for-agents.md) —
reader's primer for `agents-shipgate-reports/report.json`.
- [`docs/checks.md`](checks.md) — full check catalog with rationale
- [`docs/autofix-policy.md`](autofix-policy.md) — which findings are
safe to apply, which need review, and how `apply-patches --confidence`
filters them
- [`docs/minimal-real-configs.md`](minimal-real-configs.md) —
framework-specific minimal manifests
- [`AGENTS.md`](../AGENTS.md) — top-level agent instructions, install,
trigger table
# Current Agent Contract
The single, current statement of what AI coding agents and CI integrations should read from Agents Shipgate output. When the contract changes, update [STABILITY.md](../STABILITY.md) first, then this file. Other agent-facing surfaces (`AGENTS.md`, `llms.txt`, `.well-known/agents-shipgate.json`, the slash command, the skill, the FAQ) link here instead of restating field lists.
## Current versions
Verify the installed CLI contract locally before relying on hard-coded docs:
```bash
agents-shipgate contract --json
```
- Latest release: `v0.10.0` (see [pyproject.toml](../pyproject.toml) for the in-tree version)
- Runtime contract: `1`
- Current report schema: `0.16` — [`docs/report-schema.v0.16.json`](report-schema.v0.16.json)
- Current packet schema: `0.5` — [`docs/packet-schema.v0.5.json`](packet-schema.v0.5.json)
- Frozen-reference report schemas: [`v0.15`](report-schema.v0.15.json), [`v0.14`](report-schema.v0.14.json), [`v0.13`](report-schema.v0.13.json), [`v0.12`](report-schema.v0.12.json), [`v0.11`](report-schema.v0.11.json), [`v0.10`](report-schema.v0.10.json), [`v0.9`](report-schema.v0.9.json), [`v0.8`](report-schema.v0.8.json), [`v0.7`](report-schema.v0.7.json), [`v0.6`](report-schema.v0.6.json), older
## Read these first for release gating
In `agents-shipgate-reports/report.json`:
- `release_decision.decision` — `"blocked"` / `"review_required"` / `"insufficient_evidence"` / `"passed"`. Baseline-aware. **This is the gating signal.** `insufficient_evidence` (added v0.14) fires when evidence coverage is degraded past threshold (at least half of scanned tools are low-confidence — `ceil(N × 0.5)` with a minimum of 1, so 1-of-1 and 1-of-2 trip — or 4+ source-loader warnings); switch on the enum with a `review_required` fallback for unknown future values.
- `release_decision.blockers[]` — items that block release on this run.
- `release_decision.review_items[]` — items the human reviewer should look at; includes baseline-matched accepted debt.
- `release_decision.fail_policy.would_fail_ci` — `true`/`false`. Matches what the CI process will exit with.
- `release_decision.reason` — one-sentence explanation suitable for a PR comment.
The action exposes these as outputs `decision`, `blocker_count`, `review_item_count`, `ci_would_fail` (v0.8+).
## Read these for release review
`agents-shipgate contract --json` exposes `manual_review_signals[]` as the
installed CLI's stable list of report/packet fields to inspect for human review
work.
The capability/intent diff fields (v0.9+), used by reviewers to spot misalignment between declared agent intent and actual tool surface:
- `capability_facts[]` — every capability surfaced from the tool inventory.
- `declared_intentions[]` — what the manifest says the agent is supposed to do.
- `misalignments[]` — where capabilities exceed (or fall short of) declared intent.
- `release_consequence` — capability-aware roll-up of the release decision.
- `suggested_scenarios[]` — dynamic-validation scenarios derived from misalignments and findings.
The Action Surface Diff fields (v0.16+), reviewer-facing PR/release delta:
- `action_surface_facts.actions[]` — deterministic snapshot of the current agent action surface: action id, operation, effect, normalized risk tags, scopes, approval policy, safeguards, evidence, and hashes.
- `action_surface_diff.{enabled, base, summary, added, removed, modified, notes}` — what changed vs. a base report or v0.4 baseline. Policy findings generated from this diff can set `findings[].blocks_release=true` and appear in `release_decision.blockers`.
- `findings[].blocks_release` and `release_decision.{blockers,review_items}[].blocks_release` — explicit release-policy blockers from Action Surface Diff policies and policy-pack rules with `block: true`. Advisory CI may still exit 0; strict CI exits nonzero when an active unbaselined release blocker is present.
The tool-surface diff fields (v0.10+), lower-level explanatory data:
- `tool_surface_facts.{tools, scopes, controls, policies}` — current static facts about the tool surface.
- `tool_surface_diff.{enabled, base, summary, tools, high_risk_effects, scopes, controls, metadata_changes, policy_drift, finding_deltas, notes}` — what changed vs. a base ref. Disabled diffs render as `enabled: false` with a `notes` reason.
Source provenance fields on `findings[].source` (v0.11+), additive and optional:
- `path`, `start_line`, `end_line`, `start_column`, `pointer` — manifest-relative file path, 1-based line/column, and RFC 6901 JSON pointer for the offending tool. Populated for OpenAPI, MCP, OpenAI tool artifacts, and Anthropic tool artifacts when the source is YAML. JSON inputs carry `path` and `pointer` but no line in v0.11.
Per-finding `agent_action` enum (v0.12+), deterministic projection — read this **first** when deciding what to do with a finding so you don't have to synthesize an action from `patches`/`autofix_safe`/`requires_human_review`/`suggested_patch_kind`:
- `auto_apply` — `apply-patches --confidence high` will resolve cleanly. Every patch is non-manual and high-confidence.
- `propose_patch_for_review` — at least one non-manual patch is attached and machine-applicable, but the full patch set is not auto-safe. Two shapes land here: (a) every non-manual patch is medium- or low-confidence, and (b) a high-confidence non-manual patch sits alongside one or more `ManualPatch` siblings (the non-manual is safe to apply, but the manual instructions still need a human). In both cases the agent should ask the user before `--apply` and surface any manual instructions verbatim.
- `escalate_to_human` — no machine-applicable patch. Either every patch is `ManualPatch`, or `patches` is empty/absent and the check requires human review.
- `suppress_with_reason` — reserved for future check classes that explicitly mark themselves as suppressible. Not emitted by the v0.12 deterministic projection; the schema accepts it so callers can extend.
- `informational` — no action required (suppressed finding or non-actionable advisory).
Top-level `agent_summary` block (v0.12+), one-fetch summary shaped for direct agent consumption — read this when you want the headline numbers without traversing arrays:
- `verdict` — mirrors `release_decision.decision`.
- `headline` — single-sentence verdict + counts; suitable for a PR comment lead. The headline uses `needs_human_review` (action-driven) for "require human review" wording, so a `review_required` verdict with only auto-applicable findings reads honestly as "auto-applicable; none require human input" rather than falsely claiming N findings need review.
- `blocker_count` — mirrors `len(release_decision.blockers)`.
- `review_item_count` — mirrors `len(release_decision.review_items)`; **severity-driven** (medium-and-up severity findings that aren't blockers, plus baseline-matched accepted debt). Use this when reporting release-review debt to the human reviewer.
- `auto_appliable_patches` — number of active findings with `agent_action == "auto_apply"`.
- `needs_human_review` — **action-driven**: number of active findings with `agent_action ∈ {"escalate_to_human", "propose_patch_for_review"}`. Both kinds need explicit human attention before any change applies — full escalations have no machine path, and proposed patches ship at medium/low confidence and require an explicit `--apply` after the user confirms. Use this when reasoning about what work an agent must do.
- **`review_item_count` and `needs_human_review` track different populations and can diverge.** A medium-severity stale-suppression finding lands in `release_decision.review_items` (severity rule) but its `agent_action` is `auto_apply` (high-confidence patch attached), so it's counted in `review_item_count` and `auto_appliable_patches` but **not** in `needs_human_review`.
- `first_recommended_action` — `{kind, command|null, why}`; deterministic next step. `kind: "command"` carries an actual CLI invocation; `kind: "info"` is a "surface this to the user" hint with no command. The agent_summary block is a deterministic projection — same inputs, same output, no agent-side aggregation needed.
Codex plugin surface block (v0.13+), explanatory only — never a release-gate
input by itself:
- `codex_plugin_surface.{plugins, marketplaces, skills, apps, mcp_server_stubs, hook_stubs, mcp_inventory_files, component_path_issues, warnings}` — local static plugin package and marketplace facts.
- Only explicit MCP inventory tools from `codex_plugins.mcp_tool_inventories` appear in `tool_inventory[]`; apps, hooks, skills, and MCP server declarations stay in `codex_plugin_surface`.
Per-finding `provenance_kind` enum (v0.15+), additive classification — read this when you want to filter findings by the kind of rule that fired, independent of `confidence` (sureness):
- `static_declaration` — declared metadata: manifest, MCP export, OpenAPI schema, ADK YAML agent config, LangChain/CrewAI inventory JSON. High-trust structural facts.
- `ast_extraction` — Tool parsed from user Python source by a framework extractor (LangChain function/structured tools, CrewAI function/class tools, ADK Python toolsets). Subject to extraction errors; agents that distrust AST quality may filter these as a class.
- `keyword_heuristic` — matched a keyword list (broad-scope tokens, read-only/approval prompt terms, free-text parameter names). Higher false-positive risk than declarative facts.
- `regex_heuristic` — matched a regex (secret-like values in descriptions, prompt-injection patterns). Highest false-positive risk; pair with the recommendation before acting.
- `policy_pack` — emitted by an external policy pack rule. The rule's own confidence applies — Shipgate does not second-guess the pack.
Provenance generally follows the rule's own trigger (e.g., a rule that checks for a declared manifest field is `static_declaration` even when the underlying Tool was AST-extracted). For framework checks that fire across both AST and declarative tool sources (ADK's per-tool checks against `google_adk_function` AND `google_adk_config` tools), the label tracks the underlying tool's source. Third-party plugin checks that don't yet set the field land at `static_declaration` by default — pre-v0.15 plugins continue to validate against the v0.15 wire schema. Use `findings[].source.type` for the precise underlying tool source.
For reviewer-shaped output, also read the **Release Evidence Packet** at `agents-shipgate-reports/packet.{md,json,html}` (and `packet.pdf` when the `[pdf]` extras are installed). The packet has fixed reviewer sections governed by [`docs/packet-schema.v0.5.json`](packet-schema.v0.5.json) — see [STABILITY.md §Release Evidence Packet](../STABILITY.md#release-evidence-packet-v05).
Packet schema `0.5` preserves the v0.4 HITL fields
(`human_in_the_loop.runtime_control_disclaimer` and
`human_in_the_loop.source_provenance[]`) and adds
`action_surface_diff` so packet-only consumers can see release-blocking action
changes. The `release_decision.verdict` label includes
`INSUFFICIENT EVIDENCE` when the report decision is insufficient evidence.
## Don't use for new gating
- `summary.status` — preserved for v0.7 callers, **baseline-blind**. A baseline-matched critical flips this to `release_blockers_detected` even though `release_decision.decision` correctly classifies it as `review_required`. New consumers should not gate on `summary.status`. See [STABILITY.md §`release_decision.decision` vs `summary.status`](../STABILITY.md#release_decisiondecision-vs-summarystatus).
## Per-finding contextual explanation (v0.12+)
For prose summaries of a single finding (PR comments, chat replies, commit messages), use:
```bash
agents-shipgate explain-finding \
--from agents-shipgate-reports/report.json --json
```
The payload is the full `Finding` shape (every field on `findings[]` in `report.json`, including `source`, `patches`, `confidence`, `agent_id`, etc.) overlaid with three derived fields:
- `metadata` — full `CheckMetadata` for the check_id (rationale, fires_when, evidence_fields, docs_url) when the check is in the catalog; null for unknown ids (third-party plugins, future checks).
- `explanation` — a deterministic 3–5 sentence prose summary suitable for direct quotation. Names the affected tool, the severity, the recommended fix, and an action-aware closing sentence keyed to `agent_action`. Same inputs always produce the same output.
- `source_report` — **absolute** path (always; relative `--from` values are resolved before serialization) to the report file the explanation was sourced from; round-trippable for caching and audit.
`explain-finding` requires `report_schema_version >= 0.12` because the action-aware explanation depends on per-finding `agent_action`. Pre-v0.12 reports are rejected with `input_parse_error` and a `next_action` pointing at the canonical scan command. The Pydantic `ReadinessReport` model is intentionally looser than this command's contract (so test fixtures can construct minimal findings); the version gate is what enforces v0.12 semantics on emitted reports.
Companion prompt: [`prompts/explain-finding-to-user.md`](../prompts/explain-finding-to-user.md). Use it when you need to translate a finding for a human who has never read the Shipgate docs. Keep `agents-shipgate explain ` for static catalog metadata (no specific finding); use `explain-finding` whenever you have a fingerprint and want the evidence-tied prose.
## Authoritative references
- [STABILITY.md](../STABILITY.md) — full 0.x stability contract. Source of truth for everything above.
- [AGENTS.md](../AGENTS.md) — agent-facing instructions: install, run, single-turn flow, error semantics.
- [`docs/report-schema.v0.16.json`](report-schema.v0.16.json) — machine-validatable JSON Schema for the current report.
- [`docs/packet-schema.v0.5.json`](packet-schema.v0.5.json) — machine-validatable JSON Schema for the current packet.
- [`docs/checks.json`](checks.json) — check catalog.
## See also
- [`report-reading-for-agents.md`](report-reading-for-agents.md) — reader's primer that walks the JSON in the order a new consumer should read it; complements this field index.
- [`agent-autofix-boundary.md`](agent-autofix-boundary.md) — what an agent may assert mechanically vs. what must defer to a human reviewer when surfacing findings from `report.json`.
# Check Catalog
Agents Shipgate checks are deterministic static checks. They do not certify safety, run agents, call tools, call LLMs, or verify runtime routing.
## Severity Contract
- `critical`: strict CI exits `20` unless the finding is explicitly suppressed with a reason.
- `high`: requires human review but does not fail CI by default.
- `medium`: review during release hardening.
- `low` and `info`: informational.
Only unsuppressed `critical` findings block strict mode. Suppressed findings remain in JSON with `suppressed: true` and are excluded from active severity counts.
## Evidence Coverage
- `static`: all enumerated tools came from high-confidence static sources.
- `mixed`: at least one enumerated tool came from lower-confidence enrichment, such as SDK AST extraction.
Suppressions do not change evidence coverage.
## Baselines
v0.2 adds local baseline gating. `agents-shipgate baseline save` writes active,
unsuppressed findings to `.agents-shipgate/baseline.json`. A later
`agents-shipgate scan --baseline .agents-shipgate/baseline.json --ci-mode strict`
marks findings as `matched` or `new` and fails only on new findings that match
the active fail policy. Resolved baseline findings are counted in the report
baseline summary and do not fail CI.
## Checks
| Check ID | Severity | Meaning |
| --- | --- | --- |
| `SHIP-INVENTORY-NOT-ENUMERABLE` | high | No tool surface could be enumerated from the manifest inputs. |
| `SHIP-INVENTORY-WILDCARD-TOOLS` | high | A source exposes wildcard/all tools instead of an explicit allowlist. |
| `SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE` | medium | The normalized tool count exceeds the MVP review threshold. |
| `SHIP-DOC-MISSING-DESCRIPTION` | medium | A tool has no description or a description too short for reliable review. |
| `SHIP-DOC-INJECTION-RISK` | medium/high | A tool description contains instruction-override style language. High only when multiple patterns match on a write/high-risk tool. |
| `SHIP-DOC-SECRET-IN-DESCRIPTION` | medium/high | A tool description contains a secret-like token or credential value. High only when multiple patterns match on a write/high-risk tool. |
| `SHIP-SCHEMA-BROAD-FREE-TEXT` | high | A write/action-like tool accepts broad `action`, `body`, `command`, `updates`, or similar free-form input. |
| `SHIP-SCHEMA-MISSING-BOUNDS` | high | A risky numeric parameter such as `amount`, `count`, or `quantity` lacks a maximum. |
| `SHIP-SCHEMA-FREEFORM-OUTPUT` | medium | A tool returns free-form string output that may later be placed in model context. |
| `SHIP-AUTH-MISSING-SCOPE` | high | A write-like tool has no declared auth scope metadata. |
| `SHIP-AUTH-MANIFEST-BROAD-SCOPE` | high | The manifest declares broad scopes such as `*`, `admin`, or `service:*`. |
| `SHIP-AUTH-TOOL-BROAD-SCOPE` | high | A tool declares broad scopes such as `*`, `admin`, or `service:*`. |
| `SHIP-AUTH-SCOPE-COVERAGE-MISSING` | high | A tool requires scopes that are not covered by `permissions.scopes`. |
| `SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE` | high | A write-capable tool contradicts a read-only declared purpose. |
| `SHIP-SCOPE-PROHIBITED-TOOL-PRESENT` | high | A tool appears to overlap with a manifest `prohibited_actions` entry. |
| `SHIP-POLICY-APPROVAL-MISSING` | critical | A high-risk tool lacks a manifest approval policy. |
| `SHIP-POLICY-CONFIRMATION-MISSING` | high | A destructive, external-write, or customer-communication tool lacks a confirmation policy. |
| `SHIP-ACTION-UNDECLARED` | high | A loaded tool lacks explicit action-surface metadata when explicit actions are required. |
| `SHIP-ACTION-POLICY-VIOLATION` | high | A user-declared action-surface policy requirement is not satisfied. |
| `SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING` | critical | A newly added financial write action lacks approval, audit, or idempotency controls. |
| `SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING` | critical | A newly added destructive action lacks approval or rollback controls. |
| `SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING` | high | A newly added external communication action lacks audit evidence. |
| `SHIP-ACTION-WILDCARD-SCOPE` | critical | An action declares or expands into a wildcard/admin-like scope. |
| `SHIP-ACTION-EFFECT-ESCALATED` | critical | An action effect escalated compared with the base surface. |
| `SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED` | high | An action declaration weakens the effect inferred from the loaded tool surface. |
| `SHIP-ACTION-CONTROL-DOWNGRADE` | high | An action declaration weakens an inherited approval or safeguard control. |
| `SHIP-ACTION-APPROVAL-REMOVED` | critical | An existing action approval policy was removed. |
| `SHIP-ACTION-SAFEGUARD-REMOVED` | high | An existing action safeguard was removed. |
| `SHIP-EVIDENCE-APPROVAL-TRACE-MISSING` | high | Local HITL approval trace evidence is missing or incomplete for an approval-required tool. |
| `SHIP-EVIDENCE-OVERRIDE-REASON-MISSING` | high | Local HITL override reason evidence is missing or incomplete. |
| `SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING` | high | Local high-risk auto-approval exclusion evidence is missing or incomplete. |
| `SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING` | high | Local HITL promotion criteria evidence is missing or incomplete. |
| `SHIP-SIDEFX-IDEMPOTENCY-MISSING` | critical/high | A risky write tool lacks idempotency evidence. Critical only when retry behavior is known. |
| `SHIP-API-FUNCTION-SCHEMA-STRICTNESS` | high/medium | An OpenAI API function schema is missing strictness, required fields, or bounded risky fields. |
| `SHIP-API-STRUCTURED-OUTPUT-READINESS` | high/medium | An OpenAI API response format is missing or too broad for downstream decisions. |
| `SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH` | high/medium | Prompt language contradicts the enabled OpenAI API tool surface or lacks approval/confirmation instructions. |
| `SHIP-API-RETRY-POLICY-MISSING` | medium | High-risk OpenAI API tools are enabled without retry policy metadata. |
| `SHIP-API-TIMEOUT-MISSING` | medium | High-risk OpenAI API tools are enabled without timeout metadata. |
| `SHIP-API-TEST-CASES-MISSING` | medium | High-risk OpenAI API tools are enabled without declared test cases. |
| `SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING` | medium | A high-risk OpenAI API tool lacks success/failure output modeling. |
| `SHIP-API-RETRY-WITHOUT-IDEMPOTENCY` | high | A risky OpenAI API write tool may be retried without idempotency evidence. |
| `SHIP-API-TRACE-APPROVAL-MISSING` | medium | A trace sample shows a policy-controlled tool call without approval. |
| `SHIP-API-TRACE-CONFIRMATION-MISSING` | medium | A trace sample shows a policy-controlled tool call without confirmation. |
| `SHIP-API-OPERATIONAL-READINESS` | medium | Deprecated v0.3 compatibility alias for the v0.4 atomic OpenAI API operational readiness checks. |
| `SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE` | high | A Google ADK toolset cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-ADK-MCP-TOOLSET-UNFILTERED` | high/medium | A Google ADK `McpToolset` has no static `tool_filter`. |
| `SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING` | medium | A Google ADK function/config tool lacks static description or parameter metadata. |
| `SHIP-ADK-LONGRUNNING-CONTRACT-MISSING` | high | A Google ADK long-running tool lacks operation-id and status/progress contract evidence. |
| `SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING` | high | High-risk Google ADK tools lack callback/plugin or policy guardrail evidence. |
| `SHIP-ADK-EVAL-COVERAGE-MISSING` | medium | Production-like Google ADK inputs are present without declared eval files. |
| `SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | A LangChain/LangGraph tool surface cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING` | medium | A LangChain/LangGraph function tool lacks static description or parameter metadata. |
| `SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | A CrewAI tool surface cannot be statically enumerated and no explicit inventory is declared. |
| `SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING` | medium | A CrewAI function/class tool lacks static description or parameter metadata. |
| `SHIP-CODEX-PLUGIN-METADATA-MISSING` | medium | A Codex plugin package has incomplete or ambiguous identity metadata. |
| `SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING` | high | A declared Codex plugin component path is missing or outside the package/workspace. |
| `SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING` | medium | A Codex plugin marketplace entry lacks installation/authentication policy metadata. |
| `SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE` | high | A Codex plugin MCP server is declared without a local enumerable tool inventory. |
| `SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE` | medium | A Codex plugin connector app surface is not statically enumerable from local metadata. |
| `SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING` | medium | A Codex plugin skill lacks unique name/description frontmatter. |
| `SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE` | high | An n8n tool surface uses runtime, unresolved, wildcard, or uninventoried custom exposure. |
| `SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED` | high/medium | An n8n MCP Client Tool exposes `All` or `All Except` tools without an explicit inventory. |
| `SHIP-N8N-AI-TOOL-METADATA-MISSING` | medium | An n8n AI-exposed tool lacks static description or parameter metadata. |
| `SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING` | high | Production-like n8n workflows reference credentials without declared credential stubs. |
| `SHIP-N8N-EVAL-COVERAGE-MISSING` | medium | Production-like n8n workflows are present without declared eval files. |
| `SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER` | high | n8n workflow JSON contains a secret-like value; evidence is redacted. |
| `SHIP-MANIFEST-STALE-SUPPRESSION` | medium | A suppression references a missing check ID or missing tool. |
| `SHIP-MANIFEST-STALE-POLICY` | medium | An approval, confirmation, or idempotency policy references a missing tool. |
| `SHIP-MANIFEST-STALE-RISK-OVERRIDE` | medium | A risk override references a missing tool. |
| `SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING` | high | A high-risk production or production-like tool lacks owner metadata. |
| `SHIP-MANIFEST-UNUSED-SCOPE` | medium/high | `permissions.scopes` contains a scope unused by any loaded tool; broad unused scopes are high. |
## Check Details
### SHIP-INVENTORY-NOT-ENUMERABLE
The scanner could not enumerate any tools from required manifest inputs. Add a local MCP JSON or OpenAPI source before relying on the report.
### SHIP-INVENTORY-WILDCARD-TOOLS
A source exposes wildcard or all-tools access. Replace it with an explicit allowlist so review can reason about the actual release surface.
### SHIP-INVENTORY-TOOL-SURFACE-TOO-LARGE
The normalized tool count exceeds the MVP review threshold. Split or reduce the surface when the report becomes too broad to review.
### SHIP-INVENTORY-LOW-CONFIDENCE-PRODUCTION-SURFACE
A production target depends on lower-confidence extraction, such as SDK AST enrichment. Declare the tools through manifest, MCP, or OpenAPI inputs.
### SHIP-DOC-MISSING-DESCRIPTION
A tool has no description or a description too short for reliable review. Add a concise capability description.
### SHIP-DOC-INJECTION-RISK
A tool description contains instruction-override-like language. Rewrite it as neutral metadata.
Purely heuristic matches default to `medium`; multiple matches on write/high-risk tools are `high`.
### SHIP-DOC-SECRET-IN-DESCRIPTION
A tool description contains a secret-like token or credential value. Remove it and rotate the exposed secret.
Purely heuristic matches default to `medium`; multiple matches on write/high-risk tools are `high`.
### SHIP-SCHEMA-BROAD-FREE-TEXT
A write/action-like tool accepts broad free-form input. Constrain the field with structured schema or enums.
### SHIP-SCHEMA-MISSING-BOUNDS
A risky numeric parameter lacks a maximum. Add a maximum or equivalent policy limit.
### SHIP-SCHEMA-FREEFORM-OUTPUT
A tool returns free-form string output that may later be placed in model context. Prefer structured output for model-consumed tool results.
### SHIP-AUTH-MISSING-SCOPE
A write or sensitive-data tool has no auth scope metadata. Declare scopes in OpenAPI, MCP, or manifest metadata.
### SHIP-AUTH-MANIFEST-BROAD-SCOPE
The manifest declares broad permission scopes such as wildcard or admin scopes. Replace them with operation-specific scopes.
### SHIP-AUTH-TOOL-BROAD-SCOPE
A tool declares broad auth scopes. Use narrower tool scopes where possible.
### SHIP-AUTH-SCOPE-COVERAGE-MISSING
A tool requires scopes that are not covered by `permissions.scopes`. Reconcile the manifest with the tool requirements.
### SHIP-SCOPE-TOOL-OUTSIDE-PURPOSE
A write-capable tool contradicts a read-only declared purpose. Remove the tool or update the declared release scope.
### SHIP-SCOPE-PROHIBITED-TOOL-PRESENT
A tool appears to overlap with a manifest `prohibited_actions` entry. Remove or narrow the tool, or revise policy/scope text.
### SHIP-POLICY-APPROVAL-MISSING
A high-risk tool lacks a declared approval policy. Add an approval policy or remove the tool from the release.
### SHIP-POLICY-CONFIRMATION-MISSING
A destructive, external-write, or customer-communication tool lacks a confirmation policy. Add confirmation policy or remove the tool.
### SHIP-ACTION-UNDECLARED
`action_surface.require_explicit_actions` is true, but a loaded tool has no
matching `action_surface.actions[]` declaration. Add action metadata for the
tool or disable the explicit-action requirement.
### SHIP-ACTION-POLICY-VIOLATION
A user-declared `action_surface.policies[]` rule matched an action, and one or
more required dot-path values were absent or different. Satisfy the policy
requirements or narrow/remove the action.
### SHIP-ACTION-FINANCIAL-WRITE-CONTROL-MISSING
A newly added action is classified as `financial_write` and is missing
`approval.required`, `safeguards.audit_log`, or `safeguards.idempotency`.
Declare the required controls before releasing the action.
### SHIP-ACTION-DESTRUCTIVE-ROLLBACK-MISSING
A newly added destructive action is missing `approval.required` or
`safeguards.rollback`. Declare the approval and rollback controls, or remove
the destructive action from the release surface.
### SHIP-ACTION-EXTERNAL-COMMUNICATION-AUDIT-MISSING
A newly added external communication action lacks `safeguards.audit_log`.
Declare audit evidence so reviewers can trace outbound side effects.
### SHIP-ACTION-WILDCARD-SCOPE
An added action declares a broad scope, or a modified action expands into a
broad scope such as wildcard/admin access. Replace it with operation-specific
scopes.
### SHIP-ACTION-EFFECT-ESCALATED
An action changed to a higher-risk effect, such as read to write or write to
destructive. Add reviewer approval for the escalation or reduce the effect.
### SHIP-ACTION-EFFECT-DOWNGRADE-DECLARED
An `action_surface.actions[]` declaration sets a lower-risk effect than
Shipgate inferred from the loaded tool metadata. Align the declared effect
with the inferred operation or remove the weaker declaration.
### SHIP-ACTION-CONTROL-DOWNGRADE
An `action_surface.actions[]` declaration sets an inherited approval or
safeguard control from `true` to `false`. Keep the inherited control enabled
or remove the weakening declaration.
### SHIP-ACTION-APPROVAL-REMOVED
The base action required approval, but the current action no longer does.
Restore `approval.required` or document a reviewed override.
### SHIP-ACTION-SAFEGUARD-REMOVED
An existing action lost a safeguard such as audit logging, idempotency,
rollback, or dry-run support. Restore the safeguard or document a reviewed
override.
### SHIP-EVIDENCE-APPROVAL-TRACE-MISSING
`validation.required_evidence.approval_trace_required` is true, but local
validation evidence does not show `approved: true` for an approval-required
tool. Add local approval trace evidence produced by runtime middleware or
change the declared review posture. Agents Shipgate reads this evidence; it
does not produce or certify it. Missing local evidence does not prove the
runtime approval control is absent.
### SHIP-EVIDENCE-OVERRIDE-REASON-MISSING
`validation.required_evidence.override_reason_required` is true, but override
logs are absent, empty, or include normalized `override`, `bypass`, or
`auto_approve` events without a non-empty `reason`. Record reviewer-visible
reasons in the local override log. Missing local evidence does not prove the
runtime override control is absent.
### SHIP-EVIDENCE-HIGH-RISK-EXCLUSION-MISSING
`validation.required_evidence.high_risk_auto_approval_exclusion_required` is
true, and a high-risk tool with declared approval policy is not listed under
`high_risk_auto_approval_exclusions`. This is separate from
`SHIP-POLICY-APPROVAL-MISSING`: it only fires after approval policy is already
declared, because it checks the local evidence that the tool is excluded from
auto-approval review posture. Missing local evidence does not prove the
runtime exclusion control is absent.
### SHIP-EVIDENCE-HITL-PROMOTION-CRITERIA-MISSING
`validation.target_review_posture` is `limited_auto_approval`, but local
promotion criteria evidence is missing or the canonical required-evidence
flags are not true in the manifest and criteria file. Finding evidence includes
`reason: file_missing` or `reason: flags_missing` so reviewers can distinguish
an absent local source from incomplete criteria. Missing local evidence does
not prove runtime controls are absent.
### SHIP-SIDEFX-IDEMPOTENCY-MISSING
A risky write tool lacks idempotency evidence. Add an idempotency key, idempotent annotation, or declared idempotency policy.
### SHIP-API-FUNCTION-SCHEMA-STRICTNESS
An OpenAI API function schema is not strict enough for reliable tool calls. The check flags missing `strict: true`, missing object parameters, `additionalProperties` not set to `false`, properties omitted from `required`, broad free-text action fields, and risky numeric fields without bounds or enums.
### SHIP-API-STRUCTURED-OUTPUT-READINESS
An OpenAI API response format is missing or under-specified. The check flags missing response schemas for high-risk API tools, broad response objects, decision/status fields without enums, missing `refusal` / `needs_review` / `error` modeling, and missing `downstream_critical_fields`.
### SHIP-API-PROMPT-TOOL-SCOPE-MISMATCH
Prompt files contradict the enabled API tool surface. The check flags prompts that say "advise only" or "read-only" while write/high-risk tools are enabled, and high-risk tools whose prompts do not mention approval and confirmation expectations.
### OpenAI API Operational Readiness Checks
v0.4 splits the former `SHIP-API-OPERATIONAL-READINESS` bundle into atomic
check IDs so suppressions, severity overrides, SARIF rules, and baselines can
target one missing contract at a time. The split checks use `model_config`,
`policy_rules`, simple test cases, and trace samples to flag missing retry
policy, missing timeouts, missing test cases, non-idempotent high-risk tools
with retry evidence, missing success/failure tool-output modeling, and trace
samples that show required approval or confirmation missing.
The old bundled check ID remains as a deprecated compatibility alias through at
least one minor release. v0.4 does not emit new findings with
`SHIP-API-OPERATIONAL-READINESS`, but existing suppressions, severity overrides,
baseline entries, `explain`, `list-checks`, and stale-suppression validation
continue to recognize it. New configs should use the specific v0.4 ID that
represents the condition.
### SHIP-API-OPERATIONAL-READINESS
Deprecated compatibility alias for the v0.3 OpenAI API operational readiness
bundle. Migrate suppressions, severity overrides, and baselines to the specific
v0.4 `SHIP-API-*` readiness checks when you touch the config.
### SHIP-API-RETRY-POLICY-MISSING
A high-risk OpenAI API tool flow runs without declared retry policy metadata.
Reviewers cannot reason about duplicate side effects when retry behavior is
unspecified. Declare `retry_policy` in `openai_api.policy_rules` or
`openai_api.model_config`.
### SHIP-API-TIMEOUT-MISSING
A high-risk OpenAI API tool flow runs without declared timeout metadata.
Without an explicit timeout, failure behavior and tool-call continuation
become ambiguous. Declare a tool-call timeout in policy rules or model
config.
### SHIP-API-TEST-CASES-MISSING
High-risk OpenAI API tools exist with no declared test cases. Tool-call flows
that approve refunds, send mail, or modify state should ship with simple test
cases as release evidence. Add cases under `openai_api.test_cases`.
### SHIP-API-TOOL-OUTPUT-SCHEMA-MISSING
A high-risk OpenAI API tool lacks declared success/failure output modeling.
Reviewers depend on `success_fields` and `failure_fields` to reason about
downstream failure handling. Declare them in policy rules.
### SHIP-API-RETRY-WITHOUT-IDEMPOTENCY
A retry policy is declared and a risky write tool lacks idempotency evidence.
Retries against non-idempotent writes can duplicate financial, destructive, or
external side effects. Either add idempotency evidence or remove the retry
policy for this tool.
### SHIP-API-TRACE-APPROVAL-MISSING
A trace sample shows a policy-controlled tool call with `approved: false` for
a tool that has approval policy evidence elsewhere in the manifest. Implement
the runtime approval gate; **do not edit the trace recording** to flip
`approved` — that patches the evidence, not the agent's behavior.
### SHIP-API-TRACE-CONFIRMATION-MISSING
A trace sample shows a policy-controlled tool call with `confirmed: false`
for a tool that has confirmation policy evidence. Implement the runtime
confirmation gate; **do not edit the trace recording** to flip `confirmed`
— same anti-pattern as the approval-missing finding above.
### SHIP-ADK-DYNAMIC-TOOLSET-NOT-ENUMERABLE
A Google ADK `OpenAPIToolset`, `McpToolset`, or dynamic tools expression could
not be enumerated statically. Provide explicit local OpenAPI, MCP, or ADK tool
inventory inputs before relying on the release report.
### SHIP-ADK-MCP-TOOLSET-UNFILTERED
An ADK `McpToolset` has no static `tool_filter`. Add a narrow filter and an
explicit inventory file so reviewers can see the intended runtime surface.
### SHIP-ADK-FUNCTION-TOOL-METADATA-MISSING
An ADK function or Agent Config tool reference lacks description or parameter
metadata. Add docstrings, type annotations, or explicit local inventory
metadata.
### SHIP-ADK-LONGRUNNING-CONTRACT-MISSING
An ADK `LongRunningFunctionTool` lacks static evidence for operation id and
status/progress fields. Google-style `name` plus `done`, `state`, `phase`,
`metadata`, or `result` fields count as contract evidence; tools may also carry
`annotations.long_running_contract: true` in explicit inventory metadata.
Document the handoff and completion contract before promotion.
### SHIP-ADK-GUARDRAIL-EVIDENCE-MISSING
High-risk ADK tools are present without static callback/plugin or manifest
policy evidence. ADK callbacks and plugins count only as static evidence of
intent; they are not proof that runtime enforcement works.
### SHIP-ADK-EVAL-COVERAGE-MISSING
Google ADK inputs target `production_like` or `production` without declared eval
files. Add eval artifacts that cover expected responses and tool-use
trajectories.
### SHIP-LANGCHAIN-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE
A LangChain/LangGraph tool list, binding, or graph node could not be enumerated
statically. Provide an explicit local inventory when tools are produced by
factories, comprehensions, loop-built lists, unresolved imports, or other
runtime-only code. This ID uses `TOOL-SURFACE` instead of ADK's `TOOLSET`
because LangChain exposes ad hoc tool lists and model/graph bindings rather
than a consistent toolset abstraction.
### SHIP-LANGCHAIN-FUNCTION-TOOL-METADATA-MISSING
A LangChain/LangGraph `@tool` function or `StructuredTool.from_function(...)`
surface lacks a static description or parameter metadata. Add docstrings,
function annotations, or same-file Pydantic `args_schema` metadata.
### SHIP-CREWAI-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE
A CrewAI agent or crew tool surface could not be enumerated statically. Provide
an explicit local inventory when tools are produced by factories,
comprehensions, loop-built lists, unresolved imports, or other runtime-only
code. This ID uses `TOOL-SURFACE` instead of ADK's `TOOLSET` because CrewAI
agents bind ad hoc tool lists rather than a consistent toolset abstraction.
### SHIP-CREWAI-FUNCTION-TOOL-METADATA-MISSING
A CrewAI `@tool` function or `BaseTool` subclass lacks a static description or
parameter metadata. Add descriptions, `_run` annotations, or same-file Pydantic
`args_schema` metadata.
### SHIP-CODEX-PLUGIN-METADATA-MISSING
A Codex plugin package has incomplete or ambiguous identity metadata. Fill
`name`, `version`, and `description`; keep the plugin name aligned with the
package root; and avoid duplicate plugin names across scanned package roots.
### SHIP-CODEX-PLUGIN-COMPONENT-PATH-MISSING
A Codex plugin component path for skills, MCP servers, apps, or hooks could not
be loaded. Paths must resolve inside both the plugin package and the manifest
directory.
### SHIP-CODEX-PLUGIN-MARKETPLACE-POLICY-MISSING
A marketplace entry lacks `policy.installation`, `policy.authentication`, or
`category`. Add those fields so coding agents can see installation and
authentication posture before adoption.
### SHIP-CODEX-PLUGIN-MCP-SERVER-NOT-ENUMERABLE
A plugin declares an MCP server in `.mcp.json`, but Agents Shipgate does not
execute MCP commands to discover tools. Provide a local MCP tools inventory via
`codex_plugins.mcp_tool_inventories`.
### SHIP-CODEX-PLUGIN-APP-SURFACE-NOT-ENUMERABLE
A plugin declares a connector app in `.app.json`. Connector-backed capabilities
are externally mediated and are review items unless a local inventory or policy
artifact documents the effective surface.
### SHIP-CODEX-PLUGIN-SKILL-METADATA-MISSING
A `skills/**/SKILL.md` file is missing parseable `name` or `description`
frontmatter, or duplicates another skill name in the same plugin. Give every
skill a unique routing name and clear description.
### SHIP-N8N-DYNAMIC-TOOL-SURFACE-NOT-ENUMERABLE
An n8n workflow uses a runtime expression in a tool name, an unresolved
Call-Workflow target, wildcard MCP Server/Client exposure, or an uninventoried
community/custom tool node. Provide a local n8n/MCP inventory or replace the
dynamic exposure with a static allowlist. This is high severity in every
environment because static release evidence cannot prove the actual tool
inventory.
### SHIP-N8N-MCP-CLIENT-TOOLSET-UNFILTERED
An n8n MCP Client Tool exposes `All` or `All Except` tools without a local
inventory. Select explicit MCP tools or provide a local MCP inventory for
release review. The severity is environment-sensitive because the selector is
easy to narrow before production, while production-like use increases blast
radius.
### SHIP-N8N-AI-TOOL-METADATA-MISSING
An n8n AI-exposed tool lacks a static description or parameter metadata. Add
tool descriptions, `$fromAI()` metadata, workflow input schemas, or explicit
inventory metadata.
### SHIP-N8N-CREDENTIAL-EVIDENCE-MISSING
Production-like n8n workflows reference credentials but no local credential
stubs are declared. Declare source-control credential stubs so reviewers can
see credential types without seeing secret values.
### SHIP-N8N-EVAL-COVERAGE-MISSING
n8n workflows target `production_like` or `production` without declared eval
files. Add eval artifacts that cover expected responses and tool-use
trajectories.
### SHIP-N8N-SECRET-IN-WORKFLOW-PARAMETER
An n8n workflow parameter, node note, `pinData` entry, or `staticData` entry
contains a secret-like value. Evidence includes only the source reference,
stable pointer, and secret kind; it never includes the matched secret value or
a verifier hash for that value.
### SHIP-MANIFEST-STALE-SUPPRESSION
A suppression references an unknown check ID or a tool that is not loaded in the
current scan. Remove stale suppressions so reviewers can trust the suppression
list as current release intent.
### SHIP-MANIFEST-STALE-POLICY
A policy entry references a tool that is not loaded. Remove or update stale
approval, confirmation, or idempotency policies so release policy matches the
actual tool surface.
### SHIP-MANIFEST-STALE-RISK-OVERRIDE
`risk_overrides.tools` references a tool that is not loaded. Remove stale
overrides or update them to the current tool names.
### SHIP-MANIFEST-HIGH-RISK-OWNER-MISSING
A high-risk tool in `production_like` or `production` has no owner metadata.
Declare an owner in the tool source or `risk_overrides.tools` so reviewers know
who is accountable for remediation.
### SHIP-MANIFEST-UNUSED-SCOPE
`permissions.scopes` includes a scope not required by any loaded tool. Remove
unused scopes or add tool metadata showing why the permission is needed. Broad
unused write/admin scopes are `high`; other unused scopes are `medium`.
## Risk Tags
Risk tags are hints, not findings by themselves. Checks consume tags with confidence thresholds.
Common tags:
- `read_only`
- `write`
- `destructive`
- `external_write`
- `financial_action`
- `customer_communication`
- `sensitive_data_access`
- `infrastructure_change`
- `code_execution`
Manual `risk_overrides` in `shipgate.yaml` are treated as high-confidence evidence. Use `remove_tags` to subtract heuristic tags that are known to be wrong for a specific tool.
## Listing Checks
Use the CLI to inspect the built-in catalog:
```bash
agents-shipgate list-checks
agents-shipgate list-checks --json
agents-shipgate explain SHIP-POLICY-APPROVAL-MISSING
```
Third-party packages can register checks through the `agents_shipgate.checks` Python entry-point group. Plugins are disabled by default because loading them imports third-party Python modules. Set `AGENTS_SHIPGATE_ENABLE_PLUGINS=1` to opt in, or pass `--no-plugins` to force them off for a scan or catalog command. Reports include `loaded_plugins` provenance for every third-party check entry point that ran. A plugin check should expose a callable with the same `ScanContext -> list[Finding]` shape as built-ins and may attach `AGENTS_SHIPGATE_METADATA` as either a `CheckMetadata` instance or a compatible dictionary. Adapter artifacts are available through `context.framework_artifacts` or `context.artifact("openai_api", OpenAIApiArtifacts)`. Legacy `context.*_artifacts` read-only properties remain available for v0.11 plugin compatibility, raise `TypeError` on artifact type mismatch, and are scheduled for removal in v0.12.
## Declarative Policy Packs
v0.4 adds local YAML policy packs for organization-specific release rules.
Policy packs are static data and are safe to enable by default when declared in
`checks.policy_packs` or passed with `scan --policy-pack`. External rule IDs
must use a non-`SHIP-*` namespace such as `ORG-*`; `SHIP-*` is reserved for
built-in checks. Pack findings behave like built-ins for suppressions, severity
overrides, baselines, Markdown, JSON, and SARIF. Python plugins remain a
separate opt-in extension mechanism.
## OpenAI Agents SDK Static Extraction
SDK extraction is optional enrichment. Agents Shipgate detects Python functions decorated directly with `@function_tool`, `@function_tool(...)`, `@agents.function_tool`, `@openai_agents.function_tool`, or simple import aliases such as `from agents import function_tool as ft`, for example:
```python
@function_tool
def search_customer(customer_id: str) -> str:
...
```
The static extractor does not execute user code and intentionally does not detect dynamic wrappers, factory-created tools, `Tool.from_fn()` style objects, runtime imports, or dynamic tool lists. Declare those tools through MCP/OpenAPI inputs or manifest metadata.
## Google ADK Static Extraction
Google ADK extraction is optional static enrichment. Agents Shipgate detects
Python `Agent` / `LlmAgent` definitions, literal function tools,
`FunctionTool`, `LongRunningFunctionTool`, `OpenAPIToolset`, `McpToolset`,
callbacks, plugins, sub-agents, and Agent Config YAML references where those
values are statically knowable.
The ADK extractor does not import user modules, run `adk`, connect to MCP
servers, fetch OpenAPI specs over the network, call tools, or call models.
Dynamic ADK toolsets produce source warnings and one ADK finding per unresolved
toolset unless explicit local MCP/OpenAPI/tool inventory inputs are provided.
## LangChain And CrewAI Static Extraction
LangChain/LangGraph and CrewAI extraction are optional static enrichment.
Agents Shipgate detects supported Python tool definitions, wrappers, agent
bindings, and local inventory files where those values are statically knowable.
CrewAI `BaseTool` class metadata may use literal strings or Pydantic-style
`Field(default="...")` assignments for `name` and `description`.
The extractors do not import user modules, import framework packages, run
agents, run graphs, run crews, connect to MCP servers, fetch specs over the
network, call tools, call models, or execute framework subprocesses. Dynamic
tool surfaces produce source warnings and framework findings unless explicit
local tool inventory inputs are provided. CrewAI prebuilt `crewai_tools.*Tool()`
references are emitted as low-confidence stubs and warnings; they do not by
themselves produce the dynamic-tools finding.
## n8n Static Extraction
n8n extraction reads only local workflow JSON exports/source-control files and
optional local stubs or evidence artifacts declared under `n8n:`. It does not
call a live n8n instance, run `n8n`, execute workflows, decrypt credentials,
connect to MCP endpoints, execute code nodes, or fetch network resources.
The adapter enumerates AI Agent tool sub-nodes, MCP Client Tool selections,
MCP Server Trigger exposed tools, Call n8n Workflow Tool entrypoints, Custom
Code Tool nodes, HTTP Request Tool nodes, and explicit inventories when those
surfaces are statically visible. Workflow triggers such as Webhook and Chat
Trigger are recorded as ingress evidence, not as tools.
Inactive workflows (`active: false`) are recorded as workflow evidence but are
not normalized as live tool or ingress surfaces; their workflow JSON is still
scanned for secret-like values. Workflow tags, error-workflow settings, and
node execution controls such as retry/continue-on-fail are preserved as
review metadata when present.
Credential names, workflow/node names, code bodies, request bodies, headers,
pinned data, static data, node notes, variable values, execution payloads, and
detected secrets are redacted or omitted from reports. Credential types and
credential IDs may be preserved as local release evidence.
# Concepts
The mental model behind agents-shipgate, in one page.
For the product-level definition of an "agent release gate," see
[`category.md`](category.md). For the agent-facing
walkthrough, see [`AGENTS.md`](../AGENTS.md).
## Tool-use readiness
**Tool-use readiness** is the static check that an agent's tool surface
is ready for promotion. It is *not* "did the tool call succeed" (a
runtime concern) or "did the model pick the right tool" (an eval
concern). It is the question a release reviewer answers at PR time:
> Given the tool surface declared in this PR, do we have explicit
> approval policies, scope coverage, idempotency evidence, and review
> readiness for every action — *before* promotion?
Tool-use readiness has seven dimensions. agents-shipgate produces
findings against each one.
| Dimension | What it asks | Evidence in the manifest |
|---|---|---|
| **Inventory** | What tools can the agent call? | A complete, named list — no wildcards, no "whatever this MCP server returns" |
| **Schema** | What inputs does each tool accept? | Strict JSON schema — `additionalProperties: false`, complete `required`, bounded numeric fields |
| **Auth** | What scopes does each tool need? | Declared per-tool or in `permissions.scopes` — narrower than the service account's actual scopes |
| **Approval** | Who reviews destructive actions before they fire? | `policies.require_approval_for_tools: [...]` for every write/destructive/financial action |
| **Side effects** | What does this tool change in the world? | Risk tags on the tool: `write`, `destructive`, `external_write`, `financial_action`, `customer_communication` |
| **Idempotency** | Can it be retried safely? | Idempotency key in the schema, documented retry policy, or explicit "do not retry" |
| **Blast radius** | If this tool fires unexpectedly, how bad is it? | Owner declared, prohibited actions enumerated, scope of resources bounded |
## Tool surface
The **tool surface** is the set of named, schemaed actions an agent can
invoke at runtime. It is declared via:
- Model Context Protocol (MCP) exports
- OpenAPI specs
- Framework-specific code (OpenAI Agents SDK Python, Google ADK, LangChain/LangGraph, CrewAI)
- API-specific artifacts (Anthropic Messages API tools.json, OpenAI
Agents API function schemas)
The tool surface is a **release artifact** in the same sense as a
service deployment's binary or an API contract: it's a checked-in,
diff-able statement of what the agent can do, and it should be reviewed
on every PR.
## Manifest-first
agents-shipgate is **manifest-first**: the canonical claim about an
agent's surface lives in a single `shipgate.yaml` checked into the
repo. Every tool source the manifest references is reviewed at scan
time. There is one place to look for "what does this agent ship with."
This is intentional. Implicit configurations (e.g. "use whatever the
MCP registry returns") fail the inventory dimension above. The manifest
is what makes the release gate reviewable.
## Static vs dynamic
agents-shipgate is **static**. It does not run the agent, invoke the
model, call MCP servers, or make any network calls by default. Every
finding is derived from the artifact diff alone.
Static analysis covers the release-readiness slice. Dynamic concerns —
behavior under unusual inputs, runtime tool routing, latency,
hallucination — belong in evals, observability, and runtime guardrails.
agents-shipgate is additive to those, not a replacement.
## Where this fits in the wider stack
| Guard | When it runs | What it catches |
|---|---|---|
| Tests | CI on every PR | Code paths in the agent's *code* |
| Evals | On a schedule or per release | Model behavior on curated inputs |
| **agents-shipgate** | CI on every PR | Tool surface, scopes, policies, prompt/surface alignment |
| Runtime guardrails / gateway | At call time | Per-call policy enforcement |
| Observability | Runtime | What actually happened in production |
Each catches something the others can't. Removing any of them is a
regression.
## Related reading
- [`category.md`](category.md) — the product-level "what is an agent release gate"
- [`checks.md`](checks.md) — every check the scanner runs
- [`manifest-v0.1.md`](manifest-v0.1.md) — full manifest schema
- [`trust-model.md`](trust-model.md) — local-only guarantees and disclosure process
- [`glossary.md`](glossary.md) — category vocabulary
# Autofix policy
Which Agents Shipgate findings are safe to apply automatically, which
need human review, and how the per-finding metadata in `report.json`
maps to `apply-patches --confidence` flag semantics.
> **Audience.** AI coding agents driving the canonical 4-call flow
> (see [`agent-recipes.md`](agent-recipes.md)) and CI integrators
> deciding what to gate on.
---
## The four classes
Every active finding falls into one of four classes. The class is
encoded by the `autofix_safe` and `requires_human_review` fields on
each Finding, plus the `kind` and `confidence` fields on each
attached Patch.
| Class | Finding fields | Patch shape | v0.7 examples |
|---|---|---|---|
| **Safe auto-fix** | `autofix_safe: true`, `requires_human_review: false` | All patches non-manual AND high confidence | The 3 stale-manifest removals (`SHIP-MANIFEST-STALE-{SUPPRESSION,POLICY,RISK-OVERRIDE}`) when the match is unique |
| **Medium-confidence config fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: append_pointer/set_pointer` | Non-manual patch but at `medium` confidence | `SHIP-AUTH-SCOPE-COVERAGE-MISSING` scope appends |
| **Manual source/policy fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: manual` | `ManualPatch` with curated `instructions` | All other ~30 active checks (documentation, schema bounds, owner gaps, ADK/LangChain/CrewAI metadata, …) |
| **Never auto-fix** | `autofix_safe: false`, `requires_human_review: true`, `suggested_patch_kind: manual` | `ManualPatch` with explicit anti-pattern language | `SHIP-API-TRACE-{APPROVAL,CONFIRMATION}-MISSING` (flipping the trace patches the *evidence*, not the agent's runtime gate) |
Class four is a deliberate subset of class three — the distinction is
that an agent must NEVER attempt to "auto-fix" a trace finding by
editing the trace recording, even if the user asks. The
`ManualPatch.instructions` for these checks spell out the
anti-pattern in prose so even a curious operator gets the message.
---
## Catalog vs. Finding (the dual-source contract)
Two sources describe per-check remediation policy, and they answer
different questions:
| Source | Endpoint | What it answers |
|---|---|---|
| **CheckMetadata** | `agents-shipgate list-checks --json`, `agents-shipgate explain --json`, `docs/checks.json` | What an agent should *assume* when it has only the catalog and no scan output. Conservative across the board. |
| **Finding** | `agents-shipgate-reports/report.json` (per-finding) | What this *specific* instance produced. Can be more permissive than the catalog when the generator emitted clean high-confidence patches. |
**Catalog `autofix_safe` and `requires_human_review` describe the
worst-case per-check outcome.** A check whose generator USUALLY emits
a safe non-manual patch but falls back to `ManualPatch` in edge
cases (e.g. ambiguous duplicate matches in the stale-manifest
generators) keeps the safe-closed defaults at the catalog level. The
per-Finding fields tell the truth for that instance.
`suggested_patch_kind` at the catalog level is **informational** —
it documents the kind the generator *targets* when conditions are
clean, not what the report carries. An agent that sees
`suggested_patch_kind: "remove_pointer"` in `list-checks --json`
should still consult `Finding.patches` (or the per-Finding
`suggested_patch_kind`) to know whether this particular instance
actually produced one.
When in doubt, **trust the per-Finding fields over the catalog**
for any specific finding. The catalog is for static planning
("which check IDs *might* yield safe fixes"); the report is for
acting on a specific scan.
---
## Strict derivation rule
When a scan runs with `--suggest-patches`, every active finding
gets one or more attached patches and the four per-Finding fields
are derived from those patches with this rule:
```text
autofix_safe = True iff EVERY patch is non-manual AND has confidence == "high"
```
That is: a single `ManualPatch` mixed in, or a single `medium`/`low`
confidence patch mixed in, drops the entire finding to safe-closed.
The earlier "at least one safe patch wins" rule was unsafe — it
would have marked a `[high_remove, manual]` combination
auto-fixable while a ManualPatch still required review.
`suggested_patch_kind` is the kind of the **first non-manual patch**
even when ManualPatches are also present. (If ALL patches are
manual: `"manual"`. If the patches list is empty: `"none"`.)
`requires_human_review` is always the inverse of `autofix_safe`.
`docs_url` always comes from `CheckMetadata.docs_url`. Patches
don't carry per-instance documentation URLs.
### Three patch states
| `Finding.patches` | Source of derived fields |
|---|---|
| `None` (scan ran without `--suggest-patches`) | CheckMetadata, with safe-closed fallback for unknown check IDs |
| `[]` (scan ran WITH `--suggest-patches` but generator emitted nothing) | Safe-closed shape, `suggested_patch_kind: "none"`. Does NOT fall back to catalog — the report carries no patches, so reporting a catalog-level kind would mislead. |
| Non-empty | Strict derivation rule above |
### Unknown check IDs (policy packs and third-party plugins)
A finding whose `check_id` isn't in the loaded catalog (a policy
pack rule, a third-party plugin emitted while plugins are disabled)
gets the safe-closed fallback when patches are absent:
```text
autofix_safe: false
requires_human_review: true
suggested_patch_kind: "manual"
docs_url: null
```
The fallback only applies when patches are absent. A high-confidence
non-manual patch from a policy pack still derives correctly.
---
## How `apply-patches --confidence` filters
`apply-patches` reads the report, filters patches by `--confidence`
and `--kinds`, and applies the survivors. Default flags:
```bash
agents-shipgate apply-patches \
--from agents-shipgate-reports/report.json \
--confidence high \
--kinds set_pointer,append_pointer,remove_pointer \
--apply
```
| Flag | Default | What it accepts |
|---|---|---|
| `--confidence` | `high` | Minimum patch confidence. Patches below this are skipped. |
| `--kinds` | `set_pointer,append_pointer,remove_pointer` | Patch kinds to include. ManualPatch is filtered out unconditionally — even with `--kinds manual`. |
| `--apply` | (off) | Without this, dry-run only. Always preview before mutating. |
So in v0.7 with the default flags:
- The 3 stale-manifest removals (when unambiguous) auto-apply.
- `SHIP-AUTH-SCOPE-COVERAGE-MISSING` scope appends are **skipped**
(medium confidence). Pass `--confidence medium` to opt in — but
read the appended scopes before merging, since adding scopes can
encode policy choices.
- Trace approval/confirmation findings are **never** applied —
ManualPatch is filtered out.
- Everything else with a ManualPatch is **never** applied.
`apply-patches` enforces a **containment check**: every patch's
`target_file` must resolve under `report.manifest_dir`. Anything
outside aborts with exit code 5 before any SHA verification.
---
## Decision tree for agents
When walking `findings[]` from a `--suggest-patches` report:
```text
for finding in active_findings:
if finding.suggested_patch_kind == "manual":
# Manual source/policy fix or never-auto-fix.
# Read finding.patches[0].instructions and surface to user.
# Do NOT attempt to auto-edit, especially for trace findings.
surface_to_user(finding)
continue
if finding.suggested_patch_kind == "none":
# Scan ran with --suggest-patches but the generator emitted
# nothing for this finding (empty patches list — see "Three
# patch states" above). There's nothing to apply via
# apply-patches at any confidence level. Surface for human
# triage instead.
surface_to_user(finding)
continue
if finding.autofix_safe is True:
# Safe to include in the next `apply-patches --confidence high`.
plan_to_apply(finding)
continue
# Medium-confidence non-manual patch (e.g. scope coverage).
# Surface as "review and run apply-patches --confidence medium"
# but do not auto-apply on the high-confidence path.
surface_for_medium_review(finding)
```
After running `apply-patches --apply`, re-run `scan` to confirm the
fixed findings are gone. The `run_id` will only change if the
manifest or tool surface actually changed — patches are excluded
from the hash so toggling `--suggest-patches` doesn't shift it.
---
## See also
- [`agent-autofix-boundary.md`](agent-autofix-boundary.md) — the
*behavioral* counterpart to this *mechanical* page. What an agent may
assert in a PR comment or review summary, beyond which patches
`apply-patches` will run.
- [`agent-recipes.md`](agent-recipes.md) — copy-pasteable AI-agent
workflows, including the soft-stop rule for `detect`.
- [`report-reading-for-agents.md`](report-reading-for-agents.md) —
reader's primer for `report.json`.
- [`checks.md`](checks.md) — full check catalog with rationale.
- [`minimal-real-configs.md`](minimal-real-configs.md) — per-framework
minimal manifests to build from.
- [`report-schema.v0.16.json`](report-schema.v0.16.json) — current JSON
Schema for `report.json`.
- [`AGENTS.md`](../AGENTS.md) — top-level agent instructions, install,
trigger table.
- [`STABILITY.md`](../STABILITY.md) — what won't break across `0.x`.