--- name: codex-verify description: "Verify a plan or document using Codex as independent reviewer with PASS/FAIL verdict. Use when asked \"codex 검수\", \"verify this plan\", \"플랜 검수\"." argument-hint: "path/to/document.md [--model SLUG] [--effort LEVEL]" allowed-tools: ["Bash", "Read", "Grep", "Glob", "AskUserQuestion"] --- # Codex Document Verification + Double-Check You are a **translator + executor + double-checker**. The user wants an independent review of a plan or document. Your job is to hand the document to Codex **without ever loading it into your own context**, so your follow-up evaluation is genuinely independent. For code review use `/codex-review`. For research use `/codex-research`. ## Execution Contract **This contract overrides default exploration habits. Read it before Phase 1.** | Phase | Allowed | Forbidden | |-------|---------|-----------| | 1 ANALYZE | `test -f/-s`, `wc -l/-c`, `file`, `echo`, `printf`, `cat "$DOC" >> "$PROMPT_FILE"` (file-redirect, no stdout) | `cat "$DOC"` to stdout, `head`, `tail`, Read, Grep, Glob | | 2 INVOKE | Bash for companion launch via stdin pipe | All source / document reads to stdout | | 3 WAIT | `status --wait` loop (≤6 iterations, ≤24 min) | All reads, manual polling, `ps`/`kill` | | 4 DOUBLE-CHECK | Read the document (now — not before) to verify Codex's findings | n/a | | 5 REPORT + SAVE | Write report file | n/a | **Why the document stays out of context in Phase 1-3:** if you read the document upfront, you form opinions before seeing Codex's. The double-check is then biased — you'll rationalize away valid catches. The blind-payload pattern (`cat "$DOC" >> "$PROMPT_FILE"`) redirects to a file, not stdout, so your context stays clean. Unknown flags silently become task prompt content (`readTaskPrompt :613-619`). Phase 1 is the only safety net. --- ## Phase 1: Analyze + assemble blind payload ### Parse `$ARGUMENTS` **Whitelist for this skill:** `--model `, `--effort ` (skill-level, route through `apply-codex-config.py` — never reach the companion). The document path is another skill input, not a companion flag. Rules: - **A single path** → treat as the document to verify. - **`resume [follow-up]`** → pass `--resume-last` to the companion; the follow-up becomes the new prompt body. - **Multiple paths** → `AskUserQuestion` which one. - **Meta-instructions addressed to YOU** ("한국어로 평가해", "엄격하게") → obey for your own behavior, never include in the prompt. - **No args** → `AskUserQuestion`: "What document should I verify?" - **Unknown flags** (e.g., `--base`, `--write`, `--foo`) → `AskUserQuestion`. verify has no companion flags to forward. `--model`/`--effort` are skill-level and route through `apply-codex-config.py`. ### Resolve the document path ```bash # Input validation only — never load content. # Replace with the path parsed from $ARGUMENTS. test -f "" || { echo "File not found: " >&2; exit 1; } test -s "" || { echo "File is empty: " >&2; exit 1; } echo "DOC_LINES=$(wc -l < "")" # size info, not content ``` ### Assemble the blind payload ```bash set -o pipefail CODEX_COMPANION=$("${CLAUDE_PLUGIN_ROOT}/scripts/resolve-companion.sh") \ || { echo "Official Codex plugin not found — run /codex-setup" >&2; exit 1; } mkdir -p "${CLAUDE_PLUGIN_DATA}/tmp" TS=$(date +%s%N) PROMPT_FILE="${CLAUDE_PLUGIN_DATA}/tmp/verify-prompt-${TS}.txt" JOB_JSON_FILE="${CLAUDE_PLUGIN_DATA}/tmp/verify-job-${TS}.json" echo "PROMPT_FILE=$PROMPT_FILE" echo "JOB_JSON_FILE=$JOB_JSON_FILE" # Header via heredoc — no document content yet cat > "$PROMPT_FILE" <<'EOF' You are a brutally honest technical reviewer. Review the following document for material issues that would cause implementation failure. Focus areas: - Logical gaps and unstated assumptions - Missing error handling or edge cases - Overcomplexity (is there a simpler approach?) - Feasibility risks (what could go wrong?) - Missing dependencies or sequencing issues - Internal contradictions or ambiguous requirements Return a structured verdict: 1. PASS or FAIL (with clear reasons) 2. Blocking issues (P1) — must fix before proceeding 3. Recommendations (P2) — non-blocking improvements Be direct. No compliments. Just the problems. Ground every finding in the document text. Cite specific sections. Do not speculate about issues not evidenced in the document. Review the entire document before finalizing. Check for interactions between sections that may create contradictions. EOF # Append document via file redirect — stdout stays empty, context stays clean. # Use the literal doc path, NOT a shell variable from a prior Bash call. cat "" >> "$PROMPT_FILE" # Close XML printf '\n\n' >> "$PROMPT_FILE" ``` ### Apply model/effort (if either flag was provided) Run after payload assembly, before Phase 2, so the companion sees the new `config.toml`: ```bash python3 "${CLAUDE_PLUGIN_ROOT}/scripts/apply-codex-config.py" \ "" \ "" ``` Relay the `Model: ... | Effort: ...` stdout line verbatim; pass stderr advisories through. **config.toml is global** — the change affects every Codex invocation until changed again. Flag that to the user when values changed. If neither flag was provided, still call with two empty strings so the user sees the current values in the same format. **Before Phase 2, also print the Parsed line:** ``` Parsed: doc="docs/plan.md" (DOC_LINES=247), payload=PROMPT_FILE ``` Order: apply-codex-config.py output first, Parsed line second. Remember the literal `PROMPT_FILE`, `JOB_JSON_FILE`, and `USER_DOC` paths. They are needed in later phases. For edge cases, read `${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §7` (ANALYZE rules) and `§8` (blind-payload details). --- ## Phase 2: Invoke (Pattern B — stdin pipe to `task --background`) ```bash # NEVER pass a positional arg — readTaskPrompt short-circuits on # positionalPrompt (:619), silently dropping the entire blind payload. cat "" | node "$CODEX_COMPANION" task --background --json \ > "" 2> ".stderr" \ || { echo "task launch failed:" >&2; cat ".stderr" >&2; exit 1; } # Capture jobId (node, not python — avoid host assumptions) JOB_ID=$(node -e 'const fs=require("fs");try{const j=JSON.parse(fs.readFileSync(process.argv[1],"utf8"));if(!j.jobId)throw new Error("no jobId");process.stdout.write(j.jobId);}catch(e){process.stderr.write("JOB_ID parse failed: "+e.message+"\n");process.exit(1);}' "") \ || { echo "raw companion stdout:" >&2; cat "" >&2; exit 1; } echo "JOB_ID=$JOB_ID" ``` Remember the literal `JOB_ID` for Phase 3-4. --- ## Phase 3: Wait (`status --wait` loop) Each call blocks ≤4 min. Re-call on timeout. Cap at **6 iterations** (24 minutes). ```bash # Repeat until status is "completed" or "failed", or cap hit. node "$CODEX_COMPANION" status --wait "" \ --timeout-ms 240000 --json ``` - `status === "completed"` → fetch result - `status === "failed"` → categorize per §6, save failure report - `waitTimedOut === true` with `queued`/`running` → re-call - 6 iterations exhausted → `wait-timeout` (§6). Show JOB_ID, suggest `/codex:status `. Fetch result: ```bash node "$CODEX_COMPANION" result "" --json ``` Full error table: `${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §6`. --- ## Phase 4: Double-check with verdict **Now you read the document.** Not before. Read `${CLAUDE_PLUGIN_ROOT}/references/evaluation.md` (Peer AI Evaluation + Self-Bias Awareness). **Self-bias warning:** if Claude authored the document (same session), acknowledge it: "Note: I authored this — extra honesty required." Don't rationalize away valid catches. For each of Codex's findings: - **Valid catch** — "Codex caught this. I missed it during planning." Read the cited document section to confirm. - **Already considered** — "I considered this: [reason]." Cite the document section that addresses it. - **False Positive (hallucination)** — Codex cited a document section that does **not exist**, or misread what the section says. Read the cited section to confirm. - **Uncited** — no concrete section reference. Surface as "verification deferred". Never invent citations. ### Produce the verdict ```markdown ## Verification Result: PASS / FAIL ### Blocking Issues (P1 — must fix before proceeding) - [issue]: [why it's blocking] ### Recommendations (P2 — non-blocking) - [suggestion]: [why it would be better] ### False Positives - [finding]: [why it's not a real issue] ### Agreement: (N/M findings) ``` **FAIL** if any P1 issue exists. **PASS** if only P2 or none. --- ## Phase 5: Report + save ```bash mkdir -p "${CLAUDE_PLUGIN_DATA}/reviews" ``` **Success:** save to `${CLAUDE_PLUGIN_DATA}/reviews/verify-.md` with: - The document path - Codex's output verbatim - Per-finding classification with document citations - Final verdict (PASS / FAIL) **Failure:** save to `${CLAUDE_PLUGIN_DATA}/reviews/verify--failed.md` with the §6 error category, stderr, and the document path. Clean up temp files using the literal paths captured in Phase 1: ```bash rm -f "" "" ".stderr" ``` --- ## Gotchas - **Never Read the document before Phase 4.** The blind-payload pattern preserves double-check independence. Reading in Phase 1 defeats the entire purpose of the skill. - **`cat "$USER_DOC" >> "$PROMPT_FILE"`** — file redirect keeps stdout empty. `cat "$USER_DOC"` alone would dump content into Claude's context. The `>> "$PROMPT_FILE"` is load-bearing. - **Never pass a positional argument with Pattern B's stdin pipe.** `readTaskPrompt` short-circuits on `positionalPrompt || readStdinIfPiped()` (`:619`); a positional silently drops the entire blind payload. - **`set -o pipefail` is mandatory.** Without it, a cat-side failure sends 0 bytes and the companion's `prompt-empty` error masks the root cause. - **Temp file paths must come from Phase 1 stdout.** Do not rely on `$PROMPT_FILE` / `$JOB_JSON_FILE` variables in later Bash calls — Bash spawns a fresh shell each call. Re-inject literal absolute paths. - **Claude has bias reviewing its own work.** If the document was authored in this session, be extra honest. Don't rationalize valid catches. For the full shared gotchas list, read `${CLAUDE_PLUGIN_ROOT}/references/companion-usage.md §10`.