# Codex Skill Standards ## Canonical Contract Source of truth: `docs/contracts/codex-skill-api.md` ## Frontmatter Codex SKILL.md frontmatter must include `name` and `description`: ```yaml --- name: skill-name description: 'When this skill triggers and when it does not.' --- ``` **Prohibited fields** (Claude-internal, ignored by Codex): `skill_api_version`, `context`, `allowed-tools`, `model`, `user-invocable`, `output_contract` Existing generated bundles may still carry compatibility metadata, but the executable validator only requires the `name` and `description` fields above. ## Tool References Skills must reference the Codex session agent surface that actually exists in this repo's runtime: | Codex session surface | Purpose | Usage note | |-----------------------|---------|------------| | `spawn_agent` | Create a focused subagent | Use one agent per task, judge, or worker | | `wait_agent` | Wait for one or more agents | Prefer explicit waits over polling loops | | `send_input` | Send a short follow-up message | Use only for brief steering or retry prompts | | `close_agent` | Terminate an agent | Use for stuck or no-longer-needed agents | | `agent_type` | Label the agent role | Common roles in this repo are `default`, `explorer`, and `worker` | ### Prohibited Tool References These Codex primitives have **no Codex equivalent** and must not appear: - `Skill(skill=...)` — Codex uses `$skill-name` invocation, not a Skill tool - `Agent(subagent_type=...)` — Codex uses agent roles, not subagent_type ### Mapped Forms Also Prohibited Lowercase-hyphenated forms are equally invalid (`task-create`, `team-create`, `send-message`). The previously-mapped `todo_write` and `update_plan` are **not available** as general-purpose tools in Codex sessions (empirically verified via `codex exec`). ## Skill Discovery Paths | Scope | Path | |-------|------| | Repo | `.agents/skills/` | | User | `~/.agents/skills/` | | Admin | `/etc/codex/skills/` | **Prohibited paths:** `~/.claude/skills/`, `~/.codex/skills/` ## Sub-Agent Patterns Codex orchestration uses: | Pattern | Tool | Use Case | |---------|------|----------| | Repeated spawn | `spawn_agent` | Many similar tasks, one agent per unit of work | | Agent roles | `agent_type` | Specialized sub-agents (worker, explorer, monitor) | | Shell orchestration | `cmd` + `bd` CLI | Issue tracking, wave management | ## Common Issues | Pattern | Problem | Fix | |---------|---------|-----| | `$vibe ` | Claude Skill tool, doesn't exist | Use `$vibe` invocation syntax | | `context.window: fork` | Claude frontmatter, ignored | Remove from Codex SKILL.md | | `~/.claude/skills/` | Wrong path | Use `.agents/skills/` | | `todo_write(...)` | Not available in Codex sessions | Use `bd` CLI or file-based tracking | ## Testing Codex Skills ### Two-Phase Validation (Recommended) Use a two-phase approach for comprehensive coverage at minimal cost: **Phase 1 — Static (fast, no API cost):** - Check frontmatter has only `name` + `description` - Check for `~/.codex/` paths - Verify reference files are also clean **Phase 2 — Live (thorough, requires Codex API):** ```bash # Check if skill loads and is understood AGENTOPS_INTENT_ECHO_DISABLED=1 codex exec -s read-only -C "$(pwd)" \ "Read \$skill-name. Verify it loads, check all referenced tools exist. Rate PASS/PARTIAL/FAIL." ``` ### DAG-First Traversal When validating multiple interdependent skills, traverse in dependency order (leaves first). This ensures that when a skill references `$other-skill`, the referenced skill has already been validated. Encode the dependency graph explicitly — computed DAGs from frontmatter parsing are error-prone. ### Prompt Constraint Boundaries When using LLM judges to evaluate skills, always include explicit constraint boundaries: - "Read-only sandbox and missing network access are NOT reasons to FAIL — those are test environment limits, not skill defects" - "Rate the skill's design quality, not whether it can execute in this test environment" Without these boundaries, judges conflate environment limits with skill defects. ### Shell Compatibility Scripts that validate Codex skills must work on both macOS (BSD tools) and Linux (GNU tools): - Use `[[:space:]]` not `\s` in grep patterns (BSD grep doesn't support `\s`) - Use `awk` instead of BSD-incompatible `sed` compound expressions - Pre-process multi-line LLM output with `tr -d '\n'` before regex extraction ### Release Gate Script Full DAG-based validation: `scripts/smoke-test-codex-skills.sh` ```bash scripts/smoke-test-codex-skills.sh --static-only # Fast CI check (no API) scripts/smoke-test-codex-skills.sh --chain 2 # Test one chain scripts/smoke-test-codex-skills.sh # Full 54-skill live test ``` ## Checklist When reviewing Codex skills (`skills-codex/*/SKILL.md`): - [ ] Frontmatter has only `name` + `description` - [ ] No Claude primitive names (PascalCase or lowercase-hyphenated) - [ ] No `~/.codex/` paths - [ ] No `Skill(skill=...)` tool invocations - [ ] No `Agent(subagent_type=...)` tool invocations - [ ] No batch-only spawn primitive references - [ ] No `context.*` or `metadata.*` frontmatter - [ ] Reference files (`references/*.md`) also free of Claude primitives - [ ] Instructions are actionable for a Codex agent with only Codex tools