--- name: learn-skill description: | Build and evolve domain skills inside Codex using isolated subagents, file-based handoffs, live web research, and persistent memory. Use when the user wants Codex to learn a bounded professional workflow, distill it into a reusable skill package, and iteratively improve it through closed-book evaluation without writing a custom orchestrator. --- # learn.skill `learn.skill` is a no-code orchestration skill for Codex. It builds and evolves domain skills through isolated subagents, persistent files, live research bundles, and repeatable evaluation loops. ## Primary goal Convert a bounded domain goal into a reusable skill package that can: - operate at general-practitioner level for the chosen domain - pass structured evaluation without the main agent participating in execution or judging - retain long-lived memory across runs and across future scenarios - separate live research evidence from closed-book execution inputs The current scenarios are: - `scenarios/public-opinion/` for Chinese public-opinion risk analysis - `scenarios/finance-rumor/` for listed-company rumor verification ## Non-negotiable rules 1. The main agent is an orchestrator-observer only. 2. The main agent must not answer evaluation cases, generate gold answers, or score cases. 3. All subagents default to `fork_context=false`. 4. Communication happens through files only. Subagents do not directly message each other. 5. Agent depth is capped at two levels: orchestrator -> subagent -> grandchild. 6. `ExecutorAgent` receives only `skill_package/` plus one `case_pack.json`. 7. `JudgeAgent` receives only the current case, the executor result, and the minimum rubric/gold needed for scoring. 8. `final_hidden` failures return aggregate results and failure classes only. Do not expose gold details to repair loops. 9. Live research stores only `URL + metadata + structured short excerpts + access timestamp + tier`; do not persist long page bodies. 10. Every run must append durable records to scenario and global memory before the thread ends. 11. `demo runs` are strictly off the training path unless the user explicitly promotes them into research backlog. ## Repository map - `memory/`: global persistent memory and evolution logs - `references/`: orchestration rules, source confidence policy, prompt contracts, schema reference - `templates/`: canonical starter payloads for contracts and outputs - `runs/demo-/`: ad hoc demo runs that do not enter training or memory by default - `scenarios//`: one scenario pack with its own memory, corpus, eval sets, skill package, and run history ## Execution protocol ### 1. Prepare the scenario For a new learning request: 1. Read `memory/global-memory.md` 2. Read the target scenario's `scenario.yaml` and `memory.md` 3. Read `references/workflow.md` 4. Create or refresh the next run directory under `scenarios//runs//` 5. Write a `manifest.json` that declares: - run id - scenario - stage - files visible to each agent - hidden-set restrictions - whether the run is `bootstrap_seed`, `live_corpus`, or `demo_only` ### 2. Spawn isolated agents Use these role boundaries: - `SpecAgent`: outputs `task_contract.json` - `ResearchAgent`: populates `corpus/` and source index; may spawn grandchildren for source streams - `DistillAgent`: updates `skill_package/` - `CaseBuilderAgent`: updates eval case packs and judge bundles - `ExecutorAgent`: runs closed-book on one case at a time - `JudgeAgent`: writes score and repair reports - `RepairAgent`: revises the skill package from allowed failure summaries only Only `ResearchAgent` may spawn grandchildren in the main learning flow. ### 3. Preserve state aggressively For each run, write: - `runs//manifest.json` - `runs//traces/*.jsonl` - `runs//outputs/*.json` - `runs//summaries/*.md` Update: - `memory/global-evolution.jsonl` - `scenarios//evolution.jsonl` - `memory/global-memory.md` when a rule generalizes across scenarios - `scenarios//memory.md` when a rule generalizes across runs inside one scenario ### 4. Respect dataset tiers - `train`: usable for learning and rule shaping - `dev`: usable for automated iteration - `canary_hidden`: usable for automated blind checks, but only failure summaries may flow into repair - `final_hidden`: stage gate only; only aggregate outcomes may flow into repair If `final_hidden` fails twice in a row, log a process-level defect in `memory/global-evolution.jsonl`. ### 5. Respect live research and demo boundaries - Live research artifacts belong under `scenarios//corpus/raw-index/` and `scenarios//corpus/research-bundles/`. - `CaseBuilderAgent` consumes research bundles, not webpages directly. - `demo runs` belong under `runs/demo-/` and must not modify `corpus/`, `eval/`, `memory.md`, `global-memory.md`, or any evolution log unless the user explicitly requests promotion. ## What to read next - Read `references/workflow.md` for stage-by-stage behavior. - Read `references/source-rating.md` before any research run. - Read `references/prompt-contracts.md` before spawning agents. - Read `references/schemas.md` before editing contracts, research artifacts, eval files, or memory logs. ## Scenario loading The root skill does not contain domain rules. Domain behavior lives in scenario packs and scenario-specific `skill_package/` directories.