---
name: extraction-form
description: |
  Extract study data into a structured table (`papers/extraction_table.csv`) using the protocol’s extraction schema.
  **Trigger**: extraction form, extraction table, data extraction, 信息提取, 提取表.
  **Use when**: systematic review 在 screening 后进入 extraction（C3），需要把纳入论文按字段落到 CSV 以支持后续 synthesis。
  **Skip if**: 还没有 `papers/screening_log.csv` 或 protocol 未锁定。
  **Network**: none.
  **Guardrail**: 严格按 schema 填字段；不要在此阶段写 narrative synthesis（那是 `synthesis-writer`）。
---

# Extraction Form (systematic review)

Goal: create a consistent, analysis-ready extraction table that is directly grounded in the protocol.

## Inputs

Required:
- `papers/screening_log.csv`
- `output/PROTOCOL.md`

Optional:
- `papers/paper_notes.jsonl` (if you already have structured notes)

## Outputs

- `papers/extraction_table.csv`

## Workflow

1. Determine the included set
   - From `papers/screening_log.csv`, collect all rows with `decision=include`.

2. Build/confirm the schema
   - Use the extraction schema defined in `output/PROTOCOL.md`.
   - If the protocol does not define fields yet, stop and update `output/PROTOCOL.md` first.

3. Populate `papers/extraction_table.csv`
   - One row per included paper.
   - If `papers/paper_notes.jsonl` exists, use it as a structured source for values/provenance (but keep the table schema governed by `output/PROTOCOL.md`).
   - Always include provenance columns:
     - `paper_id`, `title`, `year`, `url`
   - For each protocol-defined field:
     - fill concrete values (units explicit)
     - use an explicit sentinel for unknowns (recommended: empty cell + `notes`)

4. Keep it auditable
   - If a value is inferred (not directly stated), mark it in a notes column.
   - Do not write synthesis; only extraction.

5. Quick QA
   - Ensure 1:1 coverage: included papers == extraction rows.
   - Spot-check a few rows against the paper text/notes.

## Definition of Done

- [ ] `papers/extraction_table.csv` exists.
- [ ] Every included paper from `papers/screening_log.csv` has exactly one extraction row.
- [ ] Column meanings match `output/PROTOCOL.md` (no ad-hoc columns without updating the protocol).

## Troubleshooting

### Issue: the protocol does not specify extraction fields

**Fix**:
- Update `output/PROTOCOL.md` (extraction schema section) and re-run extraction.

### Issue: extraction table mixes narrative text with fields

**Fix**:
- Move narrative into a `notes` column and keep the rest as atomic values (numbers/enums/short strings).