--- name: table-schema description: | Define evidence-first table schemas for a survey: what each table must answer, row unit, columns, and which evidence-pack fields are required to fill it. **Trigger**: table schema, schema-first tables, table design, 表格 schema, 先 schema 后填充. **Use when**: you want survey tables that are verifiable and fillable before LaTeX (typically Stage C4, after evidence packs exist). **Skip if**: `outline/table_schema.md` already exists and is refined (covers both index tables and Appendix tables; no placeholders; evidence mapping is explicit). **Network**: none. **Guardrail**: no invented facts; schema must be checkable and map each column to an evidence source. --- # Table Schema (two layers: index vs Appendix) Tables are not decorations; they are compression. A common failure mode in this pipeline: the first table that gets generated looks like an internal index. So we separate tables into two layers: 1) `outline/tables_index.md` (internal) - purpose: coverage/debugging + fast evidence scan - allowed to be more exhaustive - should NOT be inserted into the paper 2) `outline/tables_appendix.md` (reader-facing) - purpose: publishable survey tables (clean layout + high information density) - can be inserted into the final PDF as Appendix This skill designs both layers before filling. ## Default mode: semantic (LLM-first) Treat this as a design task, not running a script. If a column cannot be filled from existing evidence packs without guessing, the schema is wrong. ## Roles (use explicitly) ### Table Designer (reader lens) Mission: choose tables that answer reader questions, not pipeline questions. Do: - make each table answer one question - keep tables small enough to fill (two good tables beat one impossible mega-table) Avoid: - internal/log-style tables inside the paper - column labels that only make sense inside this repo ### Evidence Steward (fillability) Mission: refuse schemas that require invented facts. Do: - map every column to concrete upstream fields - reject columns that would force long paragraph cells Avoid: - TODO columns - placeholders ("TBD", "...", "(placeholder)") ## Workflow (explicit inputs) - Use `GOAL.md` to keep the reader question and scope stable. - Use `outline/outline.yml` to align table row units with the paper structure. - Use `outline/subsection_briefs.jsonl` to ground table dimensions/axes in the approved structure. - Use `outline/evidence_drafts.jsonl` to ensure every planned column is fillable without guessing. ## Inputs - `outline/outline.yml` - `outline/subsection_briefs.jsonl` - `outline/evidence_drafts.jsonl` - Optional: `GOAL.md` ## Output - `outline/table_schema.md` ## Non-negotiables (schema contract) - Minimum definitions: - Index tables: >=2 - Appendix tables: >=2 - Every table definition must include: - the question it answers - the row unit (H3 / benchmark / work / failure mode) - columns (with cell style constraints) - evidence mapping (which upstream fields fill which columns) - Cell style: short phrases; avoid paragraph cells. - Paper voice: Appendix table captions/columns must be publishable (no pipeline jargon). ## Recommended defaults (arxiv-survey family) ### Index tables (for `table-filler` -> `outline/tables_index.md`) I1) Subsection map (axes + representative works) - Row unit: H3 - Columns: subsection; axes; representative works - Evidence sources: `subsection_briefs.axes` + citations from `evidence_drafts` I2) Concrete anchors (benchmarks / numbers / caveats) - Row unit: H3 - Columns: subsection; anchor facts; representative works - Evidence sources: `anchor_sheet.anchors` ### Appendix tables (for `appendix-table-writer` -> `outline/tables_appendix.md`) A1) Method/architecture map (representative works) - Row unit: work/system line - Columns: work; core idea; loop + interface assumptions; key refs - Evidence sources: `evidence_drafts` (comparisons + definitions) + `anchor_sheet` A2) Evaluation protocol / benchmark map - Row unit: benchmark or evaluation setting (fallback: protocol dimension) - Columns: benchmark/setting; task+metric; key protocol constraints; key refs - Evidence sources: `anchor_sheet` + `evidence_drafts.evaluation_protocol` ## Positive / negative examples Good (publishable question + fillable columns): - Question: "Which benchmarks anchor evaluation in this area, and what task/metric/constraints do they imply?" - Columns: Benchmark; Task+metric; Protocol constraints; Key refs - Evidence mapping: `evidence_drafts.evaluation_protocol` + `anchor_sheet` Bad (internal/pipeline voice): - Question: "What is evidence readiness + verification needs?" - Columns: evidence levels, missing fields, TODO checklist If you want internal diagnostics, put them in an audit report, not in reader-facing tables. ## Script (optional bootstrap) ### Quick Start - `python .codex/skills/table-schema/scripts/run.py --help` - `python .codex/skills/table-schema/scripts/run.py --workspace workspaces/` ### All Options - `--workspace ` (required) - `--unit-id ` (optional; used only for runner bookkeeping) - `--inputs ` (optional; override inputs) - `--outputs ` (optional; defaults to `outline/table_schema.md`) - `--checkpoint ` (optional; ignored by the bootstrapper) ### Examples - Bootstrap a two-layer schema (index + Appendix): `python .codex/skills/table-schema/scripts/run.py --workspace workspaces/` - Write to a custom schema path (rare): `python .codex/skills/table-schema/scripts/run.py --workspace workspaces/ --outputs outline/table_schema.md` Notes: - Use the script as a starting point, then refine the schema as a paper artifact. - If a column cannot be filled from evidence packs without guessing, the schema is wrong.